Sorting with map: the Schwartzian and Guttman-Rosler transforms
The sort()
function in Perl is "sort of" procedural, in that it takes a code block
or a function name and uses it to sort all elements. The comparison
function has to be written as if it were only looking at two elements
-- it doesn't know which ones specifically out of the whole list. Like map()
and grep()
, sort()
deals with references to the values being compared, so modifying them
would modify the values being compared. Don't do this (for more
information on the sort()
function, see "perldoc -f sort").
Perl's sorting abilities are remarkably simple to use. In its simplest form, a sort can be done like this:
Listing 9. The default sort()
|
The default sort uses simple string comparisons on all the scalars in the list. This is fine if the list contains dictionary words to be sorted alphabetically, but not so great if the list contains numbers. Why? Because "1" comes before "010" in a string comparison, for example. Numbers have to be compared by value, not as strings.
Fortunately, this is easy to do and is in fact a common Perl idiom:
Listing 10. The numeric sort()
|
I quoted 010, because in Perl numbers that begin with 0 are interpreted as octal, so 010 octal would have been 8 decimal. Try it without the quotes and see for yourself. This also demonstrates how Perl scalars are automatically converted to numbers when necessary.
If you run the default sort from Listing 9 on the @old_list in Listing 10, you will see that 200 is before 5, for example. That's what we are trying to avoid.
To reverse the sorted list, you can either apply the reverse()
function to the list after it's sorted, or you can change the sorting
function. For example, the reverse sort of the one in Listing 10 would
have the comparison code block be { $b <=> $a }
.
See "perldoc perlop" for more information on the <=>
and cmp operators, which are essential to all sorting in Perl. The cmp
operators are what's used in the default search in Listing 9 behind the
scenes.
Well, fine, so we can sort scalars. That's not enough -- most sorting is done on data structures such as arrays and hashes. Perl supports almost any kind of sorting because of its flexible syntax. For instance, say we need to sort a bunch of hash references, where the 'name' key in the hash is the sorting field. We want a regular alphabetical sorting order, so the cmp operator should be used:
Listing 11. The sort by a hash member
|
Now we get into the
interesting stuff. What if it's expensive to obtain data from the
objects being sorted? Say we need to apply the split()
function to a string every time we need to obtain the value to sort by. It would be computationally expensive to run a split()
every time the comparison value is needed, and your co-workers would
laugh at you. You could build a temporary list of the comparison
values, but that's not so easy to do and can easily introduce bugs. You
are probably better off using the Schwartzian transform, named after
Randal Schwartz.
The Schwartzian transform looks like this:
Listing 12. The Schwartzian transform
|
Look at it from right to left. First, we apply a map()
to the @old_list list to create a new temporary list. Remember that map()
can transform a list. In this case, we rewrite @old_list to contain an array consisting of the first value from a split()
of the string (this is the comparison value) and the string itself, for
each string in @old_list. The result is a new list; no changes are made
to @old_list.
Next, we sort by the comparison value (first element of the array elements in @old_list). Note that @old_list is not actually modified in this whole process.
Then, we perform another map()
on the sorted list to reduce it back to just strings by mapping only
the second array element into the current variable. $_->[1] means
"rewrite $_
to be the value stored in the second object in the list that $_
refers to."
Right about now your eyes are probably glazed from looking at the Schwartzian transform. It really does look frightening at first, but deep down inside it's just a little pussycat. If you are still unclear on it, see the Resources below.
The Guttman-Rosler transform is fascinating in its own right, but discussion of it will only make your eyes glaze further. Look at the Resources for a paper on the GRT, which explains it best. The paper is a very nice introduction to sorting in general. I recommend taking a look at that paper if you do not have at least a little bit of background knowledge on sorting algorithms. The theory behind sorting is extremely useful to a programmer, and understanding O(n) notation can be an invaluable tool not just for sorting, but also for profiling, debugging, and writing good code.
View Road to better programming: Chapter 4. Functional programming Discussion
Page: 1 2 3 4 5 6 Next Page: When to use FP & Exercises