Efficient BibTeX

August 16, 2008

The setup I describe here consists of a single monolithic BibTeX database which will be used to hold all references. GNU Emacs can be used in conjunction with various web-based utilities to efficiently capture, tag, and format BibTeX entries. Finally, BibTool can be used when authoring papers to automatically extract the required entries from the primary database.

Capturing References

Emacs has a mode for editing BibTeX files that can make creating new entries very easy. Each entry type has a corresponding keybinding for inserting a “skeleton” entry of that type, complete with all required and optional fields. These commands all begin with C-c C-e (a prefix in Emacs parlance). The most common is probably C-c C-e C-a where the final C-a is for article. Other common entry insertion commands are C-c C-e C-t for technical reports (I use this for working papers) and C-c C-e b for books.

When the point is on an entry, pressing C-j moves to the next field. When you are finished editing the fields, pressing C-c C-c checks the entry, cleans up the unused fields, and automatically generates the reference key if it doesn’t already exist. Finally, C-c C-q formats the entry nicely.

It is possible to customize both the algorithm used to generate keys as well as how C-c C-q formats the entry. I use an algorithm that in most cases generates a unique key that is still readable. It generates keys of the form authorYYtitle where author is the last name of the first author, YY is the year of publication (omitting the century), and title is the first word of the title (omitting words like the, an, and, etc.). As for formatting, I choose to have the fields aligned at the equals sign. I use the following in my ~/.emacs file to accomplish this:

(setq bibtex-align-at-equal-sign t
      bibtex-autokey-name-year-separator ""
      bibtex-autokey-year-title-separator ""
      bibtex-autokey-titleword-first-ignore '("the" "a" "if" "and" "an")
      bibtex-autokey-titleword-length 30
      bibtex-autokey-titlewords 1)

It can also be useful to create a bookmark to your primary BibTeX database. To do so, open the file and press C-x r m and type the name of a tag such as bib. In the future you can open the file quickly by pressing C-x r b and typing bib (or using tab-completion, just b TAB).

Several web-based tools such as Google Scholar and JSTOR, can configured to export BibTeX entries for papers. In Google Scholar’s preferences one can choose to display a BibTeX export link. JSTOR now provides this option as well without any configuration. However, once you become accustomed to your workflow, it is very fast to open your BibTeX database and either fill in the fields manually or copy and paste from a website.

Automated Entry Extraction

BibTeX files for specific LaTeX documents can be created using BibTool which can automatically extract the required entries from a master BibTeX database. After running LaTeX an .aux file is created which contains the keys of BibTeX entries cited in the paper. If paper.tex is the name of the LaTeX document and /path/to/research.bib is the path of the master BibTeX database, then a paper.bib file can be created as follows:

% latex paper.tex
% bibtool -i /path/to/research.bib -x paper.aux > paper.bib

The above command can be placed in a Makefile to automate the process. This setup removes the need to copy and paste BibTeX entries and ensures that only the necessary references are included.

BibTool is also useful for “normalizing” databases, rewriting the keys according to certain rules such as the ones described above. My ~/.bibtoolrsc file looks like this:

ignored.word = "on"
ignored.word = "the"
ignored.word = "a"
ignored.word = "an"
ignored.word = "if"

key.generation = on
key.number.separator = {}
key.base = {digit}
key.format = {{%s(key) # %-1p(author) # %-1p(editor) # %-5.1W(institution) # %-5.1W(organization) } %2d(year)%-T(title)}