Cultured Perl: Perl 5.6 For C And Java Programmers How the new Perl 5.6 features stack up against C/C++/Java

Cultured Perl: Perl 5.6 for C and Java programmers

By Teodor Zlatanov - 2004-09-30 Page: 1 2 3

How the new Perl 5.6 features stack up against C/C++/Java

Ted Zlatanov explains some of the peculiarities in Perl 5.6 for C and Java programmers, who may actually be pleasantly surprised by some familiar features hailing from sources other than Perl, like operator ambiguity, multiple ways of doing the same thing, punctuation, regular expressions, and variable mechanism. All of them put variety and power at your fingertips. The point is, Perl isn't too far from anyone's familiar territory and may be useful to even C and Java programmers at some point. So here's your opportunity to enhance your Perl 5.6 skills.

Perl often bewilders even experienced programmers, primarily because it allegedly makes it too easy to write obfuscated code. But the confusion regarding Perl's structure, features, and philosophy is inevitable given that it's such a rich and powerful language, and that it was designed from the start to allow for more than one way to do the same thing.

Here we're going to look at some of the more confusing features of Perl 5.6, comparing and contrasting them to the corresponding C/C++/Java features. We'll concentrate on the principles in Larry Wall's paper "Natural Language Principles in Perl" (see the Resources later in this article), because they distinguish Perl from C, C++, and Java most readily. The exact mechanics of Perl's syntax are better learned from the "perldoc perlsyn" manual page and from Programming Perl, the best guide to Perl today (see Resources).

Interpreter mechanics

Novice Perl programmers notice right away that there seems to be no compilation. A Perl script is run immediately by the Perl interpreter ("perl" on UNIX systems, "perl.exe" on DOS/Windows systems, neither on MacOS systems). You can try it yourself: type the name of your Perl interpreter, or run it on a MacOS system, and you can start giving it expressions to evaluate right away. On most systems, the end-of-file (Control-D on UNIX) key sequence has to be used to indicate the end of user input. So, on a UNIX system, the following will print the result of "5+6":


> perl
(Perl is waiting for user input here, because no script name is given)

print 5+6


You press Control-D here
11

Here you see that Perl ran through the one-line script, and evaluated the line with the effect of printing "11" to your screen.

The Perl interpreter has many options. The "-e" flag, for instance, will tell it to execute the command-line arguments as a script, so the command perl -e'print 5+6' (note the quotes around the print command) is equivalent to the small program above. The "-i" flag allows editing of files in place, sort of like running them through a filter. The "-n" and "-p" switches cause the interpreter to act mostly like an input-output filter, with actions specified by the programmer. The "-w" switch (highly recommended) turns warnings on and is similar to the C/C++ "-Wall" switch to the compiles, except the "-w" switch is also active during program execution.

Speed and benchmarking

People often compare Perl to C or C++ and complain that Perl is not fast enough. This is sometimes true, but I recommend that you use the Benchmark module (perldoc Benchmark) before you decide that a C or C++ program will be faster, because it's not always the case that it will be. Also, Perl is very good at linking to C/C++ code and libraries, and the built-in Perl functions such as sort and print are usually nearly as fast as the C code for the same thing. Again, benchmark before you decide what's better.

Remember that premature optimization is the root of all evil. If you write a working prototype in Perl, and then rewrite the program in another language, that's fine. Prototypes are meant to be quickly developed and easily thrown away.

Compared to Java, Perl does quite well, but benchmarks are still recommended. Java (unlike Perl) is very good at threading, so it's better to do algorithms that can be threaded in Java. But Perl's Tk GUI interface toolkit compares favorably to Java's Swing GUI libraries, and Java code can always be linked to a Perl program, and vice versa. So sometimes, you can even get the best of both worlds!

Exceptions, compilation and documentation

Perl has exceptions through CPAN modules or through the built-in eval() function. Eval evaluates a code block or a string as if it were a part of the program running inside a try/catch block in C++ or Java.

Perl does compile scripts before they are run, but not in the way that a C/C++/Java programmer would think of it. It is closest to the Java byte-compilation process in design and effect. The "perldoc perlrun" and "perldoc perlcc" manual pages have more information on compilation.

Documentation can be embedded inside Perl programs with the POD format. This is more general than the Javadoc format, which is best suited to API documentation, but more specific than C/C++/Java comments, which are so general that no embedded markup or sections are allowed.

Perl programs are not structured at all, even compared to C, C++, or Java. BEGIN blocks, for instance, will be executed first but can be specified many times throughout the program. Namespaces can begin and end anywhere. Definitions, variables, and functions bodies can occur anywhere and Perl will do its best to accommodate such madness.

Because of the loose structure, embedded comments, and overall ambiguity of the language when it is convenient, writing Perl is more like writing a letter in English than writing in any other programming language.

Language ambiguity

Perl tolerates ambiguity better than C/C++/Java. Commas, for example, can separate statements or function parameters:


print 'Hello', ' ', 'there.', "\n";     # print "Hello there\n"
foreach (1..10)
{
 my $i;
 $i = $_ * 2, print "$i\n";             # print evens from 2 to 20
}

Perl disambiguates as best it can, though sometimes it's hard to do (in this respect Perl is much like English).

Another common ambiguity in Perl is that a variable will often be implicitly used. For example, "print" by itself will print the contents of the $_ variable. This makes sense when you realize that the $_ variable is the default for most operations when they are ambiguous. For instance:


$_ = "hello";
s/hello/hi/;                            # $_ is "hi" now
print;                                  # prints "hi"

Note how straightforward the code becomes when default variables are used. Ambiguity makes it possible to shorten expressions, both in Perl and in English.

There's more than one way to do it (TMTOWTDI)

Every language has its idioms. In C, a for() loop is the best way to iterate over a range of numbers. In Java, static methods should be invoked with the class name instead of an instance name.

Perl has at least two ways to do anything. The TMTOWTDI principle is fundamental to the language, and diversity is not only tolerated but actively encouraged in the Perl community.

Let's take a look at an example of printing an array. All the expressions do the same thing.


print foreach @array;

foreach (@array) {print};

map {print} @array;

print @array;

The way to understand the code above is the way to understand all Perl code. Don't worry about the right way to do it -- there's more than one right way. Think about the different approaches, and what they teach you about the language.

By the way, just because there's more than one way to do it doesn't mean there's no wrong way to do it :) There are always more ways to write bad code than there are to write good code. Make your code legible, use Perl's built-in functions instead of writing your own, and document obscure ways of doing obvious things.

View Cultured Perl: Perl 5.6 for C and Java programmers Discussion

Page: 1 2 3 Next Page: Regular expression mayhem

First published by IBM developerWorks