Avi Freedman's Tech and Biz Topics

on internet plumbing, gadgets, nerd herding, and other related topics

Quick Mailman Search

In case it’s useful to anyone, I wrote a quick proof of concept for
doing search with mailman lists (code after the break).

The background is – I subscribed to a mailing list and shortly thereafter
there was a call for feedback posted. One of the points was:

“We’re looking specifically for systems that offer strong search and
archiving features, to help manage the wealth of knowledge that [redacted]
generates. What other specific functionality should we look for in a new
platform?”

This is a closed list so google doesn’t pick up and index the archives.

There was a lot of discussion and it seemed to me that most people
really were fine with status quo but wanted to add search.

I was surprised that noone’s done a quick and dirty search hack -
the search integrations that are out there are relatively more
complex to integrate, though less cumbersome than porting a Unix
utility from one variant to another was in the 90s…

I despite posting the code, a demo of it, and an offer to integrate
it (I don’t run any mailman sites myself), but…

It turns out that the bigger issue was probably the list/group runners’
wanting to outsource the whole thing and not have volunteers need to
maintain things.

So everything’s the same a few months later (though they may move to
discourse eventually, which looks awesome).

So in case it’s useful to anyone, the code is below.

Feel free to ping me as I’d be open, depending on my schedule, to
moving it from proof of concept to nasty but integrated little hack.

Note – this just operates on a bunch of articles retrieved with
wget and put into a directory. Indexing could be done without
too much difficulty, but for low search volume, especially on a
closed list, it shouldn’t be a problem.

#!/usr/bin/perl

use CGI qw(:standard);

my $searchstr = param('search');
my $globalfound = 0;

print "Content-type: text/html\n\n";

$searchstr =~ s/[\W]//g;

print "
<html>
<head>
<title>Quick Search Test</title>
</head>
<body>
";

if (length($searchstr) < 3)
{
  print "
<pre>
Sorry, your search request (<b>$searchstr</b>) is too short.
</pre>
</body>
</html>
  ";

  exit(0);
}

print "
<pre>
<b>Search Request:</b>

You asked to search for <b>$searchstr</b>

<b>Search Results:</b>

";

foreach $file (<sub/*>)
{
  my $inbody = 0;
  my $matches = 0;
  my $base = "";
  my $title = "";
  my $date = "";
  my $from = "";

  open(IN, $file);
  while (<IN>)
  {
    if (/<BASE HREF=\"(.*)">/) { $base = $1; }
    if (/<TITLE> (.*)/) { $title = $1; }
    if (/<I>(.*20.*)<\/I>/) { $date = $1 ; }
    if (!/Messages so/ && /<B>(.*)<\/B>/) { $from = $1; }

    if (/^\<\!\-\-beginarticle/) 
    { 
      $inbody = 1; 
    }
    else
    {
      if ($inbody)
      {
        if (/^\<\!\-\-endarticle/)
        { 
          $inbody = 0; 
        }
        else
        {
          if (index($_, $searchstr)> -1)
          {
            $matches = 1;
          }
        }
      }
    }
  }
  close(IN);

  if ($matches) 
  { 
    $globalfound++;

    print "[$date] <a href=\"$base\"><b>$title</b> ($from)</a>\n"; 
  }
}

if ($globalfound == 0)
{
  print "Sorry, no matching articles found.\n";
}

print "
</pre>
</body>
</html>
";

exit;