In the world of Perl programming, we often encounter tasks that require parsing and manipulating strings. Today, we're going to explore my implementation of a string scanner in Perl, inspired by Ruby's StringScanner. This powerful tool allows us to traverse a string, find patterns, and keep track of our position - all with a clean, functional interface.
Let's dive into the code and break down its functionality:
#!/usr/bin/env perl
use ;
use ;
package ;
sub
{
my $string = |@_|;
@_ = undef; # Clear @_ to free memory
# start position
my $pos = 0;
# end of string position
my $eos = length ( $string );
return
{
pos => sub { return $pos }, # Closure to return current position
=> sub { $pos = shift }, # Closure to modify position
=> sub { return $eos == $pos ? 0 : 1; }, # Check if not at end of string
=> sub # Closure to find regex
{
my $regex = shift;
# Match regex against substring from current position
if ( my ($found) = substr ( $string, $pos ) =~ m~($regex)~ )
{
# $-[0] contains the start offset of the match within the substring
my ( $start, $length ) = ( $-[0], length ( $found ) );
# Update position: add start offset and length of match
$pos += $start + $length;
return # Return hash ref with match details
{
pos => $pos,
=> $start,
=> $length,
=> $found,
}
}
else
{
$pos = $eos; # If no match, set position to end of string
return undef;
}
}
}
}
1;
__END__
This code defines a package called strscan
with a single function create
. Let's break down what's happening:
The create
function takes a string as input and initializes the scanner.
It sets up two important variables: $pos
(current position in the string) and $eos
(end of string position).
The function returns a hash reference containing four closure functions:
pos
: Returns the current position
mod_pos
: Allows modifying the current position
eos_check
: Checks if we've reached the end of the string
find
: The core function that searches for a regex pattern
The find
function is where the real magic of our string scanner happens:
It accepts a regex pattern as its input, allowing for flexible searching.
Using substr
, it creates a slice of the string starting from the current position, then attempts to match the provided regex against this substring.
If a match is found:
It uses $-[0]
, a lesser-known Perl feature, to determine the start position of the match. $-[0]
contains the offset of the entire match within the string that was matched against.
It calculates the length of the matched string.
It updates the scanner's position by adding both the start offset and the length of the match, effectively moving past the matched portion.
It returns a hash reference containing detailed information about the match, including the new position, match start, length, and the matched text itself.
If no match is found:
It moves the position to the end of the string, signaling that scanning is complete.
It returns undef
to indicate failure to find a match.
Now, let's look at how we can use this scanner:
# Demo
package ;
use :: ;
use |say|;
my $scan = :: ( 'This is just a test!' );
say | : |, $scan->{pos}(), |\|; # return position
while ( $scan->{eos_check}() )
{
if ( my $match = $scan->{find}('\w+') )
{
say |: |, $match->{match};
say |pos: |, $match->{pos};
say ||;
}
}
say | : |, $scan->{pos}(); # return position
We create a new scanner with the string "This is just a test!".
We print the start position (which is 0).
We enter a loop that continues until we reach the end of the string.
In each iteration, we search for one or more word characters ('\w+')
.
For each match, we print the matched word and the new position.
Finally, we print the end position.
This scanner provides a flexible way to parse strings, allowing us to easily move through the string and find patterns. It's particularly useful for tasks like tokenizing input or parsing structured text.
The use of closures in this implementation is a powerful Perl technique. It allows us to maintain state (the position and string) without using global variables, providing a clean and encapsulated interface.
In conclusion, this Perl string scanner demonstrates how we can create powerful, Ruby-inspired tools using Perl's flexible syntax and functional programming capabilities. It's a testament to Perl's expressiveness and ability to handle complex string manipulation tasks with elegance.