Monday, December 31, 2007

Some Perl Unicode Idioms

Q: I want to represent a Unicode code-point as a Perl character
A:
"\x{05d0}"
will represent the Hebrew letter א as a Perl character
pack("U0U*",0x05d0)
will do the same

Q: I want to get the list of bytes in UTF-8 encoding for a Unicode code-point
A:
my @bytes = unpack "C*", Encode::encode("utf8",pack("U0U",0x05d0));


Q: I want to see the list of bytes in UTF-8 encoding for a Unicode code-point represented in hex notation
A: For the Hebrew letter Alef (code point 0x05d0)
my $octets = Encode::encode("utf8","\x{05d0}");
my @bytes = unpack "C*", $octets;
my @chars = map { sprintf "%#x", $_ } @bytes;




Here's a function that returns a blessed hash with a few useful operations:

sub new_char {
my ($code_point,$encoding) = @_;

my $perl_char = pack "U0U", $code_point;
my $octets_str = Encode::encode($encoding,$perl_char);
my @bytes = unpack "C*", $octets_str;
my $hex_str = [map {sprintf "%#x", $_} @bytes];
my $length = @bytes;
bless {
code_point => $code_point,
encoding => $encoding,
perl_char => $perl_char,
octets_str => $octets_str,
bytes => [@bytes],
hex_str => $hex_str,
length => $length,
}, 'Char';
}

Sunday, December 30, 2007

A interview question: the next permutation

Write the code for the following question:

You get an array of digits that represent a number. The digits are unique, such that for every index i in the array there is not index j such that the digit in i is the same as the digit in j (unless j=i). You need to write a function that will give the next number (the smallest number that has this property with its digits and the same digits, that is bigger than the given number). If the given number is the largest, then your function should indicate that there is no next number.

Examples:
* if you get 123 you should return 132
* if you get 4312 you should return 4321
* if you get 12543 you should return 13245
* if you get 21 you cannot return the next number because there is no next number.

OK. This sounds like a task that requires writing an iterator for the next permutation.

For a number (string) with n distinct digits (symbols) there are n! permutation.

n! is recursively defined for every non-negative integer to be:

0! = 1
n! = n*(n-1)!
So, we have a way to see if we wrote down all the permutations of such a number with n digits, by making sure that we wrote n! different numbers with the same digits.

Here are some examples for 1, 2, 3, and 4 digit numbers:


























n=1n=12n=123n=1234
1 12 1231234

21 1321243


2131324


2311342


3121423


3211432



2134



2143



2314



2341



2413



2431



3124



3142



3214



3241



3412



3421



4123



4132



4213



4231



4312



4321

Observations:

  1. when all digits are sorted (from left to right) in a descending order then we have reached the largest number that has the listed property.
  2. in order to get the next permutation (the next number) we scan the digits from right to left and seek for the first pair of digits where the left digit in the pair is smaller than the right digit in the pair, replace the left digit of the pair with the smallest digit to its right that is larger than it, and then sort the digits to its right (after the swap) from smallest to largets.
Here's a Perl implementation of that function.

#!/usr/bin/perl
use strict;
use warnings;

print join('', @{get_next([@ARGV])}),"\n";

sub get_next {
my ($permutation) = @_;
for (my $i=@$permutation-1; $i>0; --$i) {
if ($permutation->[$i]>$permutation->[$i-1]) {
swap($permutation,$i-1,swap_with($permutation,$i-1));
@$permutation[$i .. @$permutation-1] =
sort @$permutation[$i .. @$permutation-1];
return $permutation;
}
}
return [];
}

sub swap_with {
my ($array_ref,$index) = @_;
my $candidate=$index+1;
for (my $i=$index; $i<@$array_ref; ++$i) {
$candidate=$i
if ($array_ref->[$i]<$array_ref->[$candidate] and
$array_ref->[$i]>$array_ref->[$index]);
}
return $candidate;
}

sub swap {
my ($array_ref,$this_index,$that_index) = @_;
($array_ref->[$this_index],$array_ref->[$that_index]) =
($array_ref->[$that_index],$array_ref->[$this_index]);
}

Sunday, December 23, 2007

כוס מרוקאי


מה מוכרים פה?
את התמונה צילם משה פונטש מתחת לביתו.

Saturday, December 15, 2007

A mention on Dr Dobb's Journal

I was doing a vanity search and found a mention in Dr Dobb's Journal. The mention is with regards to an ANSI C implementation of a Suffix Tree that Dotan Tsadok did under my supervision as his B.Sc. project at Haifa University.

Here is the link to the Dr. Dobb's Data Compression Newsletter Issue #46 - September 2003

Here is the quote (the link that was given there was indeed deprecated a few years later -- the code is available at the following URL http://mila.cs.technion.ac.il/~yona/suffix_tree/):

Another Suffix Tree
September 10, 2003

Just yesterday I ran a pointer to some Suffix Tree code, not a particularly common occurrence here. It just happens that yesterday's article gives much credit to a Suffix Tree implementation by Shlomo Yona, which is pointed to here.

Shlomo's code is designed as a library, and includes some sample code to drive it, as well as some Perl code to make it easier to exercise.

Get a copy while you can; these student Web sites on University servers come and go quickly.



Citation of my work on Hebrew language resources

I just read Characteristics of Prosodic Unit Boundaries in Spontaneous Spoken Hebrew: Perceptual and Acoustic Analysis, an M.A. thesis by Vered Silber Varod.

The website of the knowledge center for processing Hebrew is mentioned there with my name associated with its creation.

Friday, December 14, 2007

Alcatel Launches XML Security Box

I just read Alcatel Launches XML Security Box, Searches for Software Partners and got curios: ha! yet another big-fish is going into the XML market. The press release on Alcatel's website, surprisingly, doesn't say much about neither web services related functionality in general or about XML in particular. So if you see the focus in XML as the article on Network Computing suggests, then Alcatel is also becoming a player in the XML market and should probably be listed in the list of players in the XML/WS market that I made in another post a while ago. If this is indeed the case or not -- let's wait and see.

An advantage, according to Andy Dornan from Network Computing is by specialized XML Software that is combined with XML hardware. This makes me wonder if indeed this special purpose XML hardware will beat the multi-purpose multi-core processors hardware.

Another advantage, according to Dornan, it that the appliance also includes identity management, service provisioning and a UDDI registry. F5 has Firepass which can probably contribute identity management functionality. F5's ASM is already integrated into BIGIP and TMOS, so load balancing and acceleration are already available, along with other functionality and products that F5 has to offer on the same appliance.

It is an interesting announcement. I'll stay tuned to see if more XML related information becomes available.