Welcome to the ``tip of the week'', a pithy kickshaw provided on a sometimes weekly basis. Bibliographies and other databases ---------------------------------- If you collect bibliographies, addresses, quotations, or other notes which you need to search, then you should learn about some of the tools available on the Mathematics Department workstations for searching files and recognizing strings. Index cards are almost as outmoded for personal use as in library catalogs. The mother of all these tools is `grep', a utility written in the early days of Unix by K. L. Thompson. (The letters g-r-e-p reportedly stand for ``Globally look for Regular Expressions and Print.'' `grep' is often used as a verb in conversation.) The original regular expression tools (grep, egrep, and fgrep) are oriented toward line-by-line searches, which can be awkward for files like bibliographies which might be formatted in paragraphs. The more recent string-searching tool `agrep' developed at the University of Arizona offers a wide variety of options (see the manual page for details) and is well-adapted to the sort of record-keeping many of us do. Example 1: case-insensitive, line-based search for the string `Gainesville' in the file `my-file': agrep -i Gainesville my-file (In this case, each line containing the target string, in any combination of upper- and lower-case letters, will be printed.) Example 2: line-based search for any string beginning with `Ga' and ending with `lle' in the file `my-file': agrep 'Ga*lle' my-file Example 3: paragraph-based search for the string `Gauss' in the file `my-file' agrep -d '$$' Gauss my-file (Agrep treats a file as a sequence of ``records'' which are checked for string matches. The -d flag changes the ``record separator'' from a newline character to another character or string: in this instance, records are delimited by a double newline, i.e. by the break between two paragraphs.) Example 4: paragraph-based search of every file in your `bibliographies' subdirectory whose name ends in `.bib': agrep -d '$$' -h Gauss $HOME/bibliographies/*.bib (This example is a model for BibTeX or Tib users who keep their bibliographies in one or a few directories. agrep usually displays the filename with any hits if a search covers multiple files. The `-h' flag in this example suppresses filenames.) Example 5: You might want to define an alias or write a script to search a personalized database of the sort suggested in Example 4. A simple shell script, which you might save as a file named `search', could consist of two lines: #!/bin/sh /math/text/agrep -d '$$' -h -i $* $HOME/bibliographies/*.bib (Don't forget to make your new file executable by running `chmod u+x search'. Invoke your new tool with command lines of the form ``search Euler''.) Tricks and tweaks ----------------- Agrep may be used to count the number of records in a file containing (or not containing) a string, to handle approximate matches (up to an assigned number of allowed errors), and to perform Boolean operations in the course of a search. See the manual page for details, but we offer one more example. Example 6: if you create the `search' script above and you want to search for all the records containing both ``Gauss'' and ``map'' (a reasonable goal for a geometer) then run this command line: search "Gauss;map" -- the semicolon serves as the Boolean `and' operator for agrep. Further developments -------------------- Versions of agrep and grep are available for a wide variety of computers, including PC's. We have tools to index large collections of records for faster searching than agrep would provide. These can be awkward to use, but may be appropriate if you want to offer full-text searching of files for the World Wide Web. More powerful tools for processing files to extract and process data are also available, including the awk (or gawk) and perl programming languages. See the Info pages on these for more information. ---------------------- Past Tips of the Week are found in /depot/documentation/tips. They may also be consulted through the Documentation/Online Help menu item of the window system's main menu or through Info mode in the Emacs editor. cws@math.ufl.edu Subject: Bibliographies and other databases