From Faber Fedor
Answered By Ben Okopnik, Yann Vernier
From the chaos of creation
just the final form survives
-- The World Inside The Crystal, Steve Savitsky
We could have just posted the finished script in 2c tips. but there's juicy perl bits to learn from the crafting. Enjoy. -- Heather
Hey Gang,
I was playing with my new scanner last night (under a legacy OS unfortunately) when I realized a shortcoming: I wanted all of the scanned pages to be in one PDF file, not in separate ones. Well, to that end, I threw together this quick and dirty Perl script to do just that.
The script assumes you have Ghostscript and pdf2ps installed. It takes two arguments: the name of the output file and a directory name that contains all of the PDFs (which have .pdf extensions) to be combined, e.g.
./combine-pdf.pl test.pdf test/
I'm sure you can point out many flaws with the script (like how I grab the command line arguments and clean up after myself), but that's why it's "quick and dirty". If/when I clean it up, I'll repost it.
See attached combine-pdf-faber1,pl.txt
[Ben] If you don't mind, I'll toss in some ideas. See my version at the end.
#!/usr/bin/perl -w use strict;
Good idea on both.
# n21pdf.pl: A quick and dirty little program to convert multiple PDFs # to one PDF requires pdf2ps and Ghostscript # written by Faber Fedor (faber@linuxnj.com) 2003-05-27 if (scalar(@ARGV) != 2 ) {
You don't need 'scalar'. Scalar behavior (which is defined by the comparison operator) would cause the list to return the number of its members, so "if ( @ARGV != 2 )" works fine.
Okay. I was trying to get ptkdbi (my fave Perl debugger) to show me the scalar value of @ARGV and the only way was with scalar(). That's also what I found in the Perl Bookshelf.
[Ben] This is the same as "$foo = @foo". $foo is going to contain the number of elements in @foo.
my $PDFFILE = shift ; my $PDFDIR = shift;
You could also just do
my ( $PDFFILE, $PDFDIR ) = @ARGV;
Combining declaration and assignment is perfectly valid.
Cute. I'll have to remember that.
[Ben]
chomp($PDFDIR);
No need; the "\n" isn't part of @ARGV.
$PDFDIR = $PDFDIR . '/' if substr($PDFDIR, length($PDFDIR)-1) ne '/';
Yikes! You could just say "$PDFDIR .= '/'"; an extra slash doesn't hurt anything (part of the POSIX standard, as it turns out).
I know, but I really don't like seeing "a_dir//a_file". I always expect it to fail (although it never does).
[Yonn] I'm no Perlist myself, but my first choice would be: $foo =~ s%/*$%/%;
Which simply ensures that the string ends with exactly one /.
[Ben] That's one of the ten most common "Perl newbie" mistakes that CLPM wizards listed: "Using s/// where tr/// is more appropriate." When you're substituting strings, think "s///"; for characters, go with "tr///".
tr#/##s
Better yet, just ignore it; multiple slashes work just fine.
[Yonn] I did say I'm no perlist. Tr to me would be the translation tool, for replacing characters, including deletion.
[Yonn] Yep; that's exactly what it does. However, even the standard utils "tr" can _compress strings - which is exactly what was needed here (note the "s"queeze modifier at the end.)
[Yonn] Thank you. It's a modifier I had not learned but should have noticed in your mail. The script would have to tack a / onto the end of the string before doing that tr.
[Ben] You're welcome. Yep, either that or use the globbing mechanism the way I did; it eliminates all the hassle.
for ( <$dir/*pdf> ){ =head Here's the beef, Granny! :) All you get here are the specified files as returned by "sh". You could also use the actual "glob" keyword which is an alias for the internal function that implements <shell_expansion> mechanism. =cut # Mung individual PDF to heart's content ... }
[Yonn] I don't know how to apply it to the end of the string, which is very easy given a regular expression as the substitute command uses. I'm more used to dealing with sed. Remember, the input data may well look like "/foo/bar/" and not just "bar/".
[Ben] You can't apply it to the end of the string, but then I'd imagine Faber would be just as unhappy with ////foo/////bar////. "tr", as above, will regularize all of that.
[Ben]
opendir(DIR, $PDFDIR) or die "Can't open directory $PDFDIR: $! \n" ;
Take a look at "perldoc -f glob" or read up on the globbing operator <*.whatever> in "I/O Operators" in perlop. "opendir" is a little clunky for things like this.
`$PDF2PS $file $outfile` ;
Don't use backticks unless you want the STDOUT output from the command you invoke. "system" is much better for stuff like this and lets you check the exit status.
Note - the following is untested but should work.
See attached combine-pdf-ben1.pl.txt
Thanks, I've cleaned it up and attached it. there's one thing that I couldn't make work, but first...
(now looking inside Ben's version)
die "Usage: ", $0 =~ /([^\/]+)$/, " <outfile.pdf> <directory_of_pdf_files>\n" unless @ARGV == 2;
Uh, that regex there. Take $_, match one or more characters that aren't a / up to the end of line and remember it and place it in $0? Huh?
[Ben] Nope - it's exactly the behavior that Jason was talking about. "print" takes a list - that's why the members are separated by commas. The "match" operator, =~, says to look in whatever comes before it; "$_" doesn't require it.
print if /gzotz/; # Print $_ if $_ contains "gzotz" print if $foo =~ /gzotz/; # Print $_ if $foo contains "gzotz" print $foo if /gzotz/; # Print $foo if $_ contains "gzotz"
So, what I'm doing is looking at what's in "$0", and capturing/returning the part in the parens as per standard list behavior. It's a cute little trick.
I guess I will have to do this one soon in my One-Liner articles; it's a useful little idiom.
I had to move a few things around to get it to work. I did have one problem though
#convert ps files to a pdf file system $GS, $GS_ARGS, $filelist and die "Problem combining files!\n";
This did not work no way, no how. I kept getting "/undefinedfilename" from GS no matter how I quoted it (and I used every method I found in the Perl Bookshelf).
[Ben] Hm. I didn't try it, but -
perl -we'$a="ls"; $b="-l"; $c="Docs"; system $a, $b, $c and die "Fooey!\n"'
That works fine. I wonder what "gs"s hangup was. Oh, well - you got it going, anyway. I guess there's not much of a security issue in handing it to "sh -c" instead of execvp()ing it in this case: the perms will take care of all that.
To get it to finally work, I did:
#convert ps files to a pdf file my $cmd_string = $GS . $GS_ARGS . $filelist ; system $cmd_string and die "Problem combining files!\n";
<shrug>
Anywho, here's the final (?) working copy:
See attached combine-pdf-faber2.pl.txt
[Ben] Cool! Glad I could help.
Meet the Gang 1 2 3 4 |