"Linux Gazette...making Linux just a little more lovable!"


Procmail Mini-Tutorial:

Automated Mail Handling

by Jim Dennis, Proprietor, Starshine Technical Services

Converted to HTML by Heather Stern


procmail is the mail processing utility language written by Stephen van den Berg of Germany. This article provides a bit of background for the intermediate Unix user on how to use procmail.

As a "little" language (to use the academic term) procmail lacks many of the features and constructs of traditional, general-purpose languages. It has no "while" or "for" loops. However it "knows" a lot about Unix mail delivery conventions and file/directory permissions -- and in particular about file locking.

Although it is possible to write a custom mail filtering script in any programming language using the facilities installed on most Unix systems -- we'll show that procmail is the tool of choice among sysadmins and advanced Unix users.

Unix mail systems consist of MTA's (mail transport agents like sendmail, smail, qmail mmdf etc), MDA's (delivery agents like sendmail, deliver, and procmail), and MUA's (user agents like elm, pine, /bin/mail, mh, Eudora, and Pegasus).

On most Unix systems on the Internet sendmail is used as an integrated transport and delivery agent. sendmail and compatible MTA's have the ability to dispatch mail *through* a custom filter or program through either of two mechanisms: aliases and .forwards.

The aliases mechanism uses a single file (usually /etc/aliases or /usr/lib/aliases) to redirect mail. This file is owned and maintained by the system administrator. Therefore you (as a user) can't modify it.

The ".forward" mechanism is decentralized. Each user on a system can create a file in their home directory named .forward and consisting of an address, a filename, or a program (filter). Usually the file *must* be owned by the user or root and *must not* be "writeable" by other users (good versions of sendmail check these factors for security reasons).

It's also possible, with some versions of sendmail, for you to specify multiple addresses, programs, or files, separated with commas. However we'll skip the details of that.

You could forward your mail through any arbitrary program with a .forward that consisted of a line like:

"|$HOME/bin/your.program -and some arguments"

Note the quotes and the "pipe" character. They are required.

"Your.program" could be a Bourne shell script, an awk or perl script, a compiled C program or any other sort of filter you wanted to write.

However "your.program" would have to be written to handle a plethora of details about how sendmail would pass the messages (headers and body) to it, how you would return values to sendmail, how you'd handle file locking (in case mail came in while "your.program" was still processing one, etc).

That's what procmail gives us.

What I've discussed so far is the general information that applies to all sendmail compatible MTA/MDA's.

So, to ensure that mail is passed to procmail for processing the first step is to create the .forward file. (This is safe to do before you do any configuration of procmail itself -- assuming that the package's binaries are installed). Here's the canonical example, pasted from the procmail man pages:

"|IFS=' '&&exec /usr/local/bin/procmail -f-||exit 75 #YOUR_USERNAME"

This seems awfully complicated compared to my earlier example. That's because my example was flawed for simplicity's sake.

What this mess means to sendmail (paraphrasing into English) is:

This complicated line can be just pasted into most .forward files, minimally edited and forgotten.

If you did this and nothing else your mail would basically be unaffected. procmail would just look for its default recipe file (.procmailrc) and finding none -- it would perform its default action on each messages. In other words it would append new messages into your normal spool file.

If your ISP uses procmail as its local delivery agent then you can skip the whole part of about using the .forward file -- or you can use it anyway.

In either event the next step to automating your mail handling is to create a .procmailrc file in your home directory. You could actually call this file anything you wanted -- but then you'd have to slip the name explicitly into the .forward file (right before the "||" operator). Almost everyone just uses the default.

Now we can get to a specific example. So far all we've talked about it how everything gets routed to procmail -- which mostly involves sendmail and the Bourne shell's syntax. Almost all sendmail's are configured to use /bin/sh (the Bourne shell) to interpret alias and .forward "pipes."

So, here's a very simple .procmailrc file:

:0c:
$HOME/mail.backup

This just appends an extra copy of all incoming mail to a file named "mail.backup" in your home directory.

Note that a bunch of environment variables are preset for you. It's been suggested that you should explicity set SHELL=/bin/sh (or the closest derivative to Bourne Shell available on your system). I've never had to worry about that since the shells I use on most systems are already Bourne compatible.

However, csh and other shell users should take note that all of the procmail recipe examples that I've ever seen use Bourne syntax.

The :0 line marks the beginning of a "recipe" (procedure, clause, whatever. :0 can be followed be any of a number of "flags." There is a literally dizzying number of ways to combine these flags. The one flag we're using in this example is 'c' for "copy."

You might ask why the recipe starts with a :0. Historically you used to use :x (where x was a number). This was a hint to procmail that the next x lines were conditions for this recipe. Later, the option was added to precede conditions with a leading asterisk -- so they didn't have to be manually counted. :0 then came to mean something like: "count them yourself."

The second colon on this line marks the end of the flags and the beginning of the name for a lockfile. Since no name is given procmail will pick one automatically.

This bit is a little complicated. Mail might arrive in bursts. If a new message arrives while your script is still busy processing the last message -- you'll have multiple sendmail processes. Each will be dealing with one message. This isn't a problem by itself. However -- if the two processes might try to write into one file at the same time they are likely to get jumbled in unpredictable ways (the result will not be a properly formatted mail folder).

So we hint to procmail that it will need the check for and create a lockfile. In this particular case we don't care what the name of the lock file would be (since we're not going to have *other* programs writing into the backup file). So we leave the last field (after the colon) blank. procmail will then select its own lockfile name.

If we leave the : off of the recipe header line (ommitting the last field entirely) then no lockfile is used.

This is appropriate whenever we intend to only read from the files in the recipe -- or in cases where we intend to only write short, single line entries to a file in no particular order (like log file entries).

The way procmail works is:

It receives a single message from sendmail (or some sendmail compatible MTA/MDA). There may be several procmail processing running currently since new messages may be coming in faster than they are being processed.

It opens its recipe file (.procmailrc by default or specified on its command line) and parses each recipe from the first to the last until a message has been "delivered" (or "disposed of" as the case may be).

Any recipe can be a "disposition" or "delivery" of the message. As soon as a message is "delivered" then procmail closes its files, removes its locks and exits.

If procmail reaches the end of it's rc file (and thus all of the INCLUDE'd files) without "disposing" of the message -- than the message is appended to your spool file (which looks like a normal delivery to you and all of your "mail user agents" like Eudora, elm, etc).

This explains why procmail is so forgiving if you have *no* .procmailrc. It simply delivers your message to the spool because it has reached the end of all its recipes (there were none).

The 'c' flag causes a recipe to work on a "copy" of the message -- meaning that any actions taken by that recipe are not considered to be "dispositions" of the message.

Without the 'c' flag this recipe would catch all incoming messages, and all your mail would end up in mail.backup. None of it would get into your spool file and none of the other recipes would be parsed.

The next line in this sample recipe is simply a filename. Like sendmail's aliases and .forward files -- procmail recognizes three sorts of disposition to any message. You can append it to a file, forward it to some other mail address, or filter it through a program.

Actually there is one special form of "delivery" or "disposition" that procmail handles. If you provide it with a directory name (rather than a filename) it will add the message to that directory as a separate file. The name of that file will be based on several rather complicated factors that you don't have to worry about unless you use the Rand MH system, or some other relatively obscure and "exotic" mail agent.

A procmail recipe generally consists of three parts -- a start line (:0 with some flags) some conditions (lines starting with a '*' -- asterisk -- character) and one "delivery" line which can be file/directory name or a line starting with a '!' -- bang -- character or a '|' -- pipe character.

Here's another example:

:0
* ^From.*someone.i.dont.like@somewhere.org
/dev/null

This is a simple one consisting of no flags, one condition and a simple file delivery. It simply throws away any mail from "someone I don't like." (/dev/null under Unix is a "bit bucket" -- a bottomless well for tossing unwanted output DOS has a similar concept but it's not nearly as handy).

Here's a more complex one:

:0
* !^FROM_DAEMON
* !^FROM_MAILER
* !^X-Loop: myaddress@myhost.mydomain.org
| $HOME/bin/my.script

This consists of a set of negative conditions (notice that the conditions all start with the '!' character). This means: for any mail that didn't come from a "daemon" (some automated process) and didn't come a "mailer" (some other automated process) and which doesn't contain any header line of the form: "X-Loop: myadd..." send it through the script in my bin directory.

I can put the script directly in the rc file (which is what most procmail users do most of the time). This script might do anything to the mail. In this case -- whatever it does had better be good because procmail way will consider any such mail to be delivered and any recipes after this will only be reached by mail from DAEMONs, MAILERs and any mail with that particular X-Loop: line in the header.

These two particular FROM_ conditions are actually "special." They are preset by procmail and actually refer to a couple of rather complicated regular expressions that are tailored to match the sorts of things that are found in the headers of most mail from daemons and mailers.

The X-Loop: line is a normal procmail condition. In the RFC822 document (which defines what e-mail headers should look like on the Internet) any line started with X- is a "custom" header. This means that any mail program that wants to can add pretty much any X- line it wants.

A common procmail idiom is to add an X-Loop: line to the header of any message that we send out -- and to check for our own X-Loop: line before sending out anything. This is to protect against "mail loops" -- situations where our mail gets forwarded or "bounced" back to us and we endlessly respond to it.

So, here's a detailed example of how to use procmail to automatically respond to mail from a particular person. We start with the recipe header.

:0

... then we add our one condition (that the mail appears to be from the person in question):

* ^FROMharasser@spamhome.com

FROM is a "magic" value for procmail -- it checks from, resent-by, and similar header lines. You could also use ^From: -- which would only match the header line(s) that start with the string "From:"

The ^ (hiccup or, more technically "caret") is a "regular expression anchor" (a techie phrase that means "it specifies *where* the pattern must be found in order to match." There is a whole book on regular expression (O'Reilly & Associates). "regexes" permeate many Unix utilities, scripting languages, and other programs. There are slight differences in "regex" syntax for each application. However the man page for 'grep' or 'egrep' is an excellent place to learn more.

In this case the hiccup means that the pattern must occur at the beginning of a line (which is its usual meaning in grep, ed/sed, awk, and other contexts).

... and we add a couple of conditions to avoid looping and to avoid responding to automated systems

* !^FROM_DAEMON
* !^FROM_MAILER

(These are a couple more "magic" values. The man pages show the exact regexes that are assigned to these keywords -- if you're curious or need to tweak a special condition that is similar to one or the other of these).

... and one more to prevent some tricky loop:

* !^X-Loop: myaddress@myhost.mydomain.org

(All of these patterns start with "bangs" (exclammation points) because the condition is that *no* line of the header start with any of these patterns. The 'bang' in this case (and most other regex contexts) "negates" or "reverses" the meaning of the pattern).

... now we add a "disposition" -- the autoresponse.

| (formail -rk \
-A "X-Loop: yourname@youraddress.com" \
-A "Precendence: junk"; \
echo "Please don't send me any more mail";\
echo "This is an automated response";\
echo "I'll never see your message";\
echo "So, GO AWAY" ) | $SENDMAIL -t -oi

This is pretty complicated -- but here's how it works:

Most of the difficulty in understanding procmail as nothing to do with procmail itself. The intricacies of regular expressions (those wierd things on the '*' -- conditional lines) and shell quoting and command syntax, and how to format a reply header that will be acceptable to sendmail (the 'formail' and 'sendmail' stuff) are the parts that require so much explanation.

The best info on mailbots that I've found used to be maintained by Nancy McGough (sp??) at the Infinite Ink web pages:

http://www.jazzie.com/ii/

More information about procmail can be found in Era Eriksson's "Mini-FAQ." at http://www.iki.fi/~era/procmail/mini-faq.html

I also have a few procmail and SmartList links off of my own web pages.


Copyright © 1997, James T. Dennis
Published in Issue 14 of the Linux Gazette


[ TABLE OF 
CONTENTS ] [ FRONT PAGE ]  Back  Next