Need help searching a LISTSERV® archive

The Search-and-Ye-Shall-Find Tutorial
© 1997, 1998 Jared Weinberger
Last update: Sept. 9, 1998

  A few basics to keep in mind
  How to order posts
  Search examples with comments
  Summary of basic search patterns
  Valid date formats
  Web-based searching  
  Search tips and additional information
 

Catching up on missed mail

  FAQs

You don't need a PhD in computer science to use the database; au contraire, it's a pity not to take advantage of it. The new software is easier than ever to use, too! Here are some sample searches of the Opera-L archive. Find one that suits your need and copy & paste it into your mail program. Then edit it for your specific search, send and voilà! You can use your prowess to search any archive on any LISTSERV: just substitute your listname for Opera-L and change the LISTSERV address, of course.

A few basics to keep in mind

1. The examples below illustrate how to search an archive by e-mail using the SEARCH command and how to order posts of interest with the GETPOST command. Some lists, opera-L included, can be accessed and searched on the Web. While searching is much easier, the basic concepts and syntax are very similar. See Web-based searching for a summary.

2. Make certain your search is a message on a single line in a post with no subject header. (Actually the subject line is ignored in searches; if your mail program requires that you have one, you can type a single character like "/".) Send all searches to
listserv@listserv.cuny.edu (or the address of the LISTSERV of interest) and never to the list itself! The same holds for your requests for back posts. Do not add your signature (or, if you have it automatically added, remove it from your search post). Posts to LISTSERV should contain valid listserv commands only and nothing extraneous.

3. Capitalization generally does not matter. In the examples below, the words of the search language are capitalized for clarity only: lowercase works just as well.

4. A more detailed document (not without its own problems) explaining the database, including the more advanced features, is available. To request it, send LISTSERV a post with no subject header and the one-line message:
GET LISTDB MEMO

5. In trial searches you do to gain familiarity, you can help keep your LISTSERV system running smoothly (especially for archives of high-volume lists) by limiting the time period so that only a small portion of the archive is searched.

How to order posts

LISTSERV will send you back the results of your search (assuming you keep reading and learn how to do one). You will see a list of the archived posts meeting your criteria, with their item # and subject lines, plus a bit of context showing the match. You request the full post(s) by sending a GETPOST command to LISTSERV, which must be a one-line message with no subject header and no signature, e.g.

GETPOST opera-L 4362 24605 1964

Note that you have to give the list name, there are no # signs or commas in this line, and posts do not have to be in numerical order. Each non-empty search comes back to you with a GETPOST line ready to edit and send, BUT a long string of post numbers will be truncated, and you will receive only some of the posts followed by an "Unknown command" message. Two solutions: (1) break it up into separate lines, each starting with GETPOST (the posts will arrive in separate e-mail messages from LISTSERV, one per GETPOST command).

GETPOST opera-L 1964 4362 4381 4458 12172 22496 24605 25905 27145
GETPOST opera-L 30905 32709 34739 36577 37978 41212

or (2) start the first line with "// " (slash, slash, space) and end each but the last line with " ," (space comma), e.g.

// GETPOST opera-L 1964 4362 4381 4458 12172 22496 24605 25905 27145 ,
30905 32709 34739 36577 37978 41212

You can also specify ranges, e.g. GETPOST opera-L 840 859 19464-19472

Note that these post numbers may well change in time: as the listowner deletes duplicate posts and posts of little interest, the archives are renumbered. Thus the same search may return a different set of numbers at a later date - posts may be added, deleted or renumbered.

Search examples (you can copy, paste and edit these)

SEARCH domingo IN opera-L

This is the basic form of the search command. Changes in the capitalization of any of the words in this example will not affect the results. Note that there are no quotation marks. No dates are specified, so the entire archive is searched. Each post is searched in its entirety, including the sender, subject line and message.

SEARCH tosca price IN opera-L
or
SEARCH tosca NEAR price IN opera-L

Searches for all posts that contain "tosca" in the post near "price". NEAR is the "default operator", thus the two examples above give identical results. For a NEAR search to consider a post to be a match, there must be five or fewer intervening words and the two words can be no more than one line distant (the order of the two words, however, is not important). Matches for this example would include cases like "priced", "caprices" and "Toscanini" -- the "keyword in context" in the search results should help you decide which posts might be of interest. Note that there are no quote marks. Since no dates are specified in this search, all posts in the archive are searched. The manual, LISTDB MEMO, incorrectly states that the default operator is AND, when it is in fact NEAR: the two examples above are identical.

SEARCH tosca AND price IN opera-L

The AND operator must be used if you want posts that have the two words anywhere in the post.

SEARCH boehm OR bohm OR bo"hm IN opera-L FROM may 96 TO july 96

Searches for all posts with any of these spellings. The OR operator is used when any of your criteria will do. If you mistakenly use AND in this search, it will probably come back with no matches, since no single post will have all the variants. (Note that you wouldn't bother searching for the real spelling, böhm, since the archived posts contain no accented letters and not all e-mail programs handle them correctly anyway.) FROM...TO is always used as a pair, with two dates. Time frame includes all of May and all of July.

SEARCH 'light baritone' IN opera-L SINCE jan 97

The quote marks mean you want these words to be adjacent in the post. Note that single quote marks ensure a wide search net, catching both "Light baritone" and "light baritones". But a search without the quotes might be better here, as this example misses a post that contains "light-timbred baritones". A SINCE expression takes one date. The search will begin with 1 Jan 97 and end with the latest archived post.

SEARCH light NEAR bariton IN opera-L SINCE jan 97
or
SEARCH light bariton IN opera-L SINCE jan 97

Probably a better way of handling the previous search. The two forms are identical, since NEAR is the default operator in a search for separate words. Dropping the "e" in "baritone" caught one post that had "light" near the adjective "baritonal". In general use the root or base form of a word: ticket will find ticket, tickets, ticketing, etc. Take into account common misspellings and variants, e.g. traveling or travelling; center or centre.

SEARCH 'opera news' IN opera-L SINCE TODAY-15

Finds all posts containing the string "opera news" in the last 15 days (the archive is searched forward, not backward, in time). The single quotes ensure that the two words are adjacent yet will find any combination of capitalization: Opera News, opera news, OPERA NEWS, etc. (N.B. There can be no spaces on either side of the minus sign.)

SEARCH 'opera news' IN opera-L UNTIL feb 96

Finds all posts containing the string "opera news" from the earliest archived post through Feb. 1996. Only the first 100 matches (of over 200 in this time frame) are returned. Search again with a FROM...TO.. expression to list those after the first 100. An UNTIL expression takes one date.

SEARCH "White" IN opera-L FROM jan 96 TO dec 96
or
SEARCH "White" IN opera-L FROM 96 TO 96

Assume you're looking for a name, White, Whiteman, Whiteside, or something similar that you can't quite remember, in all of 1996. The double quote marks means you want an exact match in capitalization. The 131 matches is more manageable than the 487 matches for the same search without quotes (only the first 100 are returned; change the time frame to see others). Note that all of 1996 is covered in these equivalent searches.

SEARCH 'ashoka''s dream' IN opera-l

If you need single quote mark(s) (i.e. apostrophes) inside single quotes, you must double the single-quote mark(s). N.B. before the 's' there are two apostrophes. BTW,
SEARCH ashoka IN opera-l
would really be sufficient for the search in this case. Double quotes inside double quotes must also be doubled. Single quote mark(s) inside double quotes (and vice versa) should not be doubled. Got that?

SEARCH * IN opera-L SINCE jan 97 WHERE SENDER CONTAINS rkosovsk

Finds all posts that Bob Kosovsky sent since January 1, 1997. The asterisk (*) is used to represent "everything". N.B. the SENDER is the e-mail address of the sender only and does not include the "name" portion that you usually also see in the FROM line. If you need the current e-mail address of someone, you can use the SCAN command with all or part of the name. For example,
SCAN OPERA-L pete
returned a list of 21 names containing the substring "pete" along with their e-mail addresses. Note: some lists cannot be scanned; this decision rests with the listowner.

SEARCH arabella IN opera-L WHERE SENDER CONTAINS jared

Searches for all of Jared's posts containing "arabella" . (I've posted from more than one address, which is why I did not use my full, current mailing address.)

SEARCH arabella IN opera-L WHERE SENDER DOES NOT CONTAIN jared

All posts containing "arabella" except for Jared's. Only the first 100 posts are listed. Search with a FROM... TO... date expression to list more.

SEARCH * IN opera-L UNTIL may 96 WHERE SUBJECT CONTAINS tosca

Finds all posts from the earliest through May 96 where the subject line contains "tosca". Note that * is needed to represent "everything". While there are some 180 posts meeting this criteria, only the first 100 are returned (use a FROM...TO expression to list the others). Note that UNTIL may 96 is the same as UNTIL 31 may 96. UNTIL or SINCE with a month name and no day always includes the whole month in the search.

SEARCH * IN opera-L WHERE SUBJECT CONTAINS (tosca OR butterfly)

Finds all post with either word in the subject line. Note that multiple words in the CONTAINS expression must be placed inside parentheses. This CONTAINS expression is equivalent to:
...WHERE SUBJECT CONTAINS tosca OR SUBJECT CONTAINS butterfly

SEARCH manon BUT NOT 'manon lescaut' IN opera-L

BUT NOT is the same as AND NOT. It is perhaps easier to understand. There are 380 posts compared to the 653 for a search of manon alone (only the first 100 are returned). Beware, however, that there is a price to pay with this technique: you will not catch posts that mention both operas!

// SEARCH * in opera-L since march 97 WHERE SUBJECT CONTAINS tosca and ,
sender IS johndoe@aol.com

Finds all posts from johndoe@aol.com with tosca in the subject line from March 1, 97 to the present. The IS (you can also use "=") means that you want an exact match; CONTAINS will generally suffice (see example above). This search command may well be longer than the width setting of your e-mail program, but LISTSERV requires it be on "one line". The solution is to start long commands with // SEARCH (note the obligatory space after the double slash) and end each but the last line with a space and a comma. This ensures that all the text is interpreted as one line.

SEARCH tosca IN opera-L.100-210
SEARCH tosca IN opera-L.-20000
SEARCH tosca IN opera-L.20000-

This form of SEARCH is useful when you want to search a range of posts by their item # (which you might have, for example, from the results of a previous search). The first example searches post #100 through #210; the second searches from #1 to #20000; and the third from #20000 to the most recent post.

* * *

The best way to learn is to experiment. Even when you make a mistake, the error message that LISTSERV sends back explains where you went wrong. For more complex searches and other advanced search features, see the above-mentioned file LISTDB MEMO available from LISTSERV.

Summary of some basic search patterns

SEARCH <w> IN opera-L
SEARCH <w> IN opera-L FROM <date1> TO <date2>
SEARCH <w> IN opera-L SINCE <date>
SEARCH <w> IN opera-L UNTIL <date>
SEARCH <w> IN opera-L SINCE TODAY
-<number of days>
SEARCH * IN opera-L WHERE SUBJECT CONTAINS ...
SEARCH * IN opera-L WHERE SENDER CONTAINS ...
SEARCH <w> IN opera-L WHERE SENDER DOES NOT CONTAIN ...
SEARCH * IN opera-L SINCE <date> WHERE SENDER CONTAINS ...
SEARCH * IN opera-L SINCE <date> WHERE SUBJECT CONTAINS ...
// SEARCH * IN opera-L FROM <date1> TO <date2> WHERE ,
SUBJECT CONTAINS ... AND SENDER CONTAINS ...

<w> is your search word(s). Use * for "everything"  
Note that when you have both, the time expression goes before the WHERE expression.
   
Valid date formats  
  15 october 95 In the first four examples, a four-digit year can be used, e.g. 1995
  october 95 If a month is used without a day, the entire month is included in the search
  oct 95 Any unambiguous month abbreviation is valid: d 95 is valid, ju 95 is not
  15 Oct 95 Capitalization is not important
  oct If the year is omitted, the current year is assumed
  95/10/15 With slashes, a year/month/day format is obligatory; two-digit year only
  95-10-15 With hyphens, a year-month-day format is obligatory; two-digit year only
  7/21 With slash, a month/day format is obligatory (current year assumed)
  7-21 With hyphen, a month-day format is obligatory (current year assumed)
  today The day of the search. Convenient for relative searching.
  today-20 n Days back (20 in this case) from the search day; no space before or after "-"
  yesterday The day before the search. Convenient for relative searching.
  yesterday-30
n Days back (30 in this case) from yesterday. An example of a time expression: FROM yesterday-30 TO yesterday-15

Searching with the Web Interface

Opera-L and some other lists can also be conveniently accessed and searched on the Web. The address for Opera-L is http://listserv.cuny.edu/archives/opera-l.html -- check with your list owner about your list. The Search Form uses the same basic syntax as in e-mail searches. Leave a field blank when you do not want to limit the search. N.B. You must check the Substring Search box to find embedded cases, e.g. if you want price to also find prices.

Search: name of the list (choose from those listed).
For: word or words to find anywhere in message.

See examples above for use of
AND, NOT, BUT NOT, OR (implicit operator is NEAR) and use of single and double quotes. Briefly:

price [finds Price, price, caprices]
light baritone [finds light close to baritone, e.g. light, pleasant baritone; same as light near baritone]
light and baritone [finds posts that contain light and baritone anywhere]
'opera news' [finds Opera News, opera news, OPERA NEWS, but not opera in the news]
"Opera News" [finds Opera News, but not opera news or OPERA NEWS]
domingo or 'three tenors' [finds Domingo or three tenors]

Note that you cannot leave all the fields blank: the "everything wildcard" (*) is necessary in the For field if you leave all the other fields blank. This will let you see all posts (useful only for very small lists).

In messages where  
the subject is or contains word or words to find in Subject of posts. Leave blank when you do not want to limit by subject. Note that a "For:" field seach includes the subject line in the search.
new season [finds New SF Season, news about next season in the Subject line]
author's address is or contains letters to find in Sender of posts; leave blank to include all senders.
doe [finds janedoe@aol.com, pjdoe@big.net]
pete aol [finds petedoe@aol.com, peter@aol.com]
not peter@aol [excludes posts sent by peter@aol]
Since: date
Earliest day, month or year of interest; leave blank to start search from the first archived post. See
Valid date formats.
SINCE and UNTIL are inclusive, so to view all messages posted on a single day, enter the same date in both fields.
Until: date
Last day, month or year of interest; leave blank to end search at the most recent post. See
Valid date formats.

After you click the "Start the search!" button, the results will appear, up to 50 matches at a time (when more matches are available you will see a convenient more hits link).

Note that with the Web interface, when you enter information in more than one field an implicit AND is used, i.e. you are searching for posts that meet all of your criteria. Some advanced searches -- such as an OR between fields or a phonetic search (SOUNDS LIKE ...) -- cannot be done with the Web interface and require the e-mail method. See also the LISTDB MEMO mentioned above. E-mail is probably more convenient when you want to store and edit the results of your searches, although search results can be saved on your drive as single web pages.

Search tips and additional information

1. Keep the initial search general, so as not to exclude posts of potential value. If you get back too many matches, you can narrow your search expression or your time frame.

2. Do not quote your search words unless you are looking for a string of words that must be in order. When you do quote, use single quotes: 'opera news', not "opera news".

3. Use double quotes only when you need an exact capitalization match: "BRAVO" will find "BRAVO", but not "Bravo" or "bravo".

4. Use the root or shortest form of a word to catch inflected forms, e.g. ticket will pick up ticket, tickets, ticketing, etc. Account for common misspellings and/or variants when searching, e.g.
SEARCH 'traveling show' OR 'travelling show' IN opera-L
Here, too, you might be better off using the NEAR operator:
SEARCH (traveling show) OR (travelling show) IN opera-L
turned up a post with "traveling minstrel show". Note that the latter is identical to
SEARCH (traveling NEAR show) OR (travelling NEAR show) IN opera-L

5. Note that the item # of a given post (especially recent ones) may change over time as the listowner deletes some posts of limited interest from the archive.

6. There is an implicit NEAR between search words, so
SEARCH tosca price IN opera-L

is the same as:

SEARCH tosca NEAR price IN opera-L

If you are content to have the words or strings anywhere in the post, use
AND:
SEARCH tosca AND price IN opera-L
If any of your criteria will suffice, use
OR:
SEARCH 'four last songs' OR fls OR 'vier letzte lieder' OR vll IN Opera-L
Note that as you add words with NEAR, as in
SEARCH red green blue IN opera-L
or
SEARCH red NEAR green NEAR blue IN opera-L
the first and last (red and blue in this case) may be father apart than red and green or green and blue. This is because the NEAR relationship is guaranteed only between adjacent words.

7. A convenient place to keep the "Summary of Basic Search Patterns" above is in the address book of your e-mail program. In Eudora, for instance, you could keep them on the NOTES page for your LISTSERV entry. Or you can bookmark this page.

8. Long search lines: your mail program is probably set for a width of 80 characters or less, and even if your long search line looks OK on the screen, it may get lopped off when you send it (in this case you will probably get back an error message in which you will be able to see that the entire line was not received). To send long searches that span more than one line, you must start with "// search" and end each but the last line with " ," (space comma). There must be a space after the // and before the comma(s). Also, don't let a quoted string span more than one line. Here's an example:

// SEARCH 'four last songs' OR fls OR 'vier letzte lieder' ,
OR vll IN Opera-L

9. Operator precedence: according to the manual, the AND operator has a higher precedence than OR; however, evaluation is, in fact, left to right, i.e.

SEARCH w1 OR w2 AND w3
is the same as
SEARCH (w1 OR w2) AND w3

SEARCH w1 AND w2 OR w3
is the same as
SEARCH (w1 AND w2) OR w3

Keep in mind that parentheses never hurt, and may ensure that your intended search is also that which is understood by the parsing portion of the search program.

10. The default operator for multiple-word searches in the subject line is AND rather than NEAR.
SEARCH * IN opera-L WHERE SUBJECT CONTAINS (atlanta tosca)
is the same as
SEARCH * IN opera-L WHERE SUBJECT CONTAINS (atlanta AND tosca)
This is logical, given that subject lines are short to begin with. In fact, NEAR is not permitted in the CONTAINS expression and
SEARCH * IN opera-L WHERE SUBJECT CONTAINS (atlanta NEAR tosca)
returns an error message.

11. Catching up on missed mail: Catching up is easy if your list has a Web interface: just browse the Archives. Otherwise, if you missed a number of days of posts there are two ways to catch up via e-mail. The first is to request the relevant LOG file. Send listserv the message:

index opera-L

(or whatever your list name). You will get back a listing of all the files that the list owner has made available. These may be special files as well as the list's LOG files, which contain the list's posts. Here is an excerpt from the opera-L filelist I received:

...
OPERA-L LOG9702A ... Started on Sat, 1 Feb 1997 ...
OPERA-L LOG9702B ... Started on Fri, 7 Feb 1997 ...
OPERA-L LOG9702C ... Started on Fri, 14 Feb 1997 ...
...

So if I was NOMAIL and want to see all the posts in 1997 from Feb 9th through the 11th, I would send listserv the message:

GET OPERA-L LOG9702B

If the dates of interest are not all in one file then I would have to get the other file(s) too (you can put another GET command on the next line). Note that these log files can be very large -- so large that you may not be able to view them with your mail reader! If you look at the menu in your mail program, there is usually a "Save as" option that saves an e-mail message to a file. Then you can open this file with your word processor. (LOG9702B was close to 400 pages long when I saved it as a file and opened it in Word!)

The second way to catch up is to use GETPOST to order just the posts for the three days in question. However, to do this, we first need to know the numbers of the first post on the 9th and the last post on the 11th. So we do a search for the period of interest:

Search * in opera-L from 97/02/09 to 97/02/11

Since ours is a high-volume list, we get back the numbers for only the first 100 posts. But we now know from the results that the first post on the 9th is number 32813. Next we do a search for the last day:

Search * in opera-L from 97/02/11 to 97/02/11

so I can see the number of the last post (which turns out to be 33049). Now I can order all the posts for just these three days:

GETPOST opera-L 32813-33049

which is a much smaller message than the LOG file for the whole week. Note the use of the asterisk in the searches (it stands for "everything") and the use of the hyphen in the GETPOST command to indicate a range of posts.

FAQs

My results say there are more matches, but only the first 100 are listed. How do I list the others?
The new release of the search software gives you the neat "keyword in context" for the matches, but to keep file length, resource time, and your life manageable, there is a 100-match limit. Just send in your search again with a modified time frame (use SINCE or FROM... TO...).

I send back the GETPOST line that LISTSERV suggests, but I get only some of the posts followed by an error message.
A GETPOST command can be no longer than one line. See How to order posts for a way around this.

I'm searching for a set of isolated words. I get back an error message, but I can't figure out what's wrong -- everything looks OK.
Certain words like since are reserved words that belong to the search language itself. Try putting each word in its own set of single quote marks. Some reserved words are: FROM, IN, SINCE, TO, UNTIL, WHERE, WITH, NEAR.

I get back a list of matches, but no lines of "context".
You (logically) get no context back with the matches if you use the "everything" symbol (*) or if you have a search expression consisting of a single NOT phrase.

I want all the posts sent by my friend, Mieze Maier, but I get nothing when I use
SEARCH * IN opera-L WHERE SENDER CONTAINS mieze
.
SENDER is the e-mail address only. Her e-mail address may be, e.g., MMAIER@AOL.COM and may not contain "Mieze".
SEARCH * IN opera-L WHERE SENDER CONTAINS maier
would work in this case.

The manual mentions that you can narrow matches from a search with a second search that omits the IN expression: the second search will be performed on the results of the first, as in:
SEARCH tosca OR butterfly IN opera-L
SEARCH summer

Don't believe everything you read. This was possible in earlier releases of the software, but does not appear to be true any longer. There is nothing you can do with this technique that you can't do in a single search. In this example, use
SEARCH (tosca OR butterfly) AND summer IN opera-L

To see the messages posted on July 8, 1997, I sent the following:
SEARCH * IN opera-L from 97/07/08 to 97/07/08
Why are there breaks in item numbers in this GETPOST command that came back?
GETPOST OPERA-L 42710-42738 42740-42744 42746-42748
Breaks in the numeric sequence occur because the dates are based not upon when the list receives the message, but whatever the poster's e-mail software inserts. In this case, messages 42739 and 42745 were dated July 9. Similarly "errors" can occur when limiting by dates, because some people have configured their e-mail software incorrectly.

When I do a search with * (everything), the list returned includes the item #, date and subject. Is there any way to list the sender also?
Unfortunately not at present. I was told that this may be included in a future release of the software.

I've searched the archive high and low, front to back, but can't find the answers to my question.
You may have stumbled upon the Fundamental Question, to which there is no answer. Then again, you might try posting the list -- it's full of fundamentalists.

Enjoy the database and send me your comments about this tutorial.

Copyright © 1997, 1998 Jared Weinberger
LISTSERV is a registered trademark licensed to L-Soft international, Inc.