This post is outdated because I’m not using BibTeX files anymore for my website. But it’s content remains valid.
I’ve moved to Pelican to generate my website. My home-brewed generator needed some updates in order to work with Python 3 and new versions of some packages, so I decided I had no more time for such games. One features I miss from Pelican is the ability to generate my publication list out of some BibTeX files, here is how I’ve rebuilt this feature with Pelican.
First, I store my publications has pairs of files content/Publications/foo.bib
and content/Publications/foo.pdf
(the PDF is optional), so they’ll become articles in a blog category Publications (see the menu bar above).
The goal is to treat .bib
files as source files from which an article is generated.
First, I’ve added a directory bibreader
in my site’s directory, just alongside directory content
.
It’s a package consisting of two files:
__init__.py
is the plugin itselflatex.py
is an auxiliary module to parse (limited) LaTeX source file and generate Markdown from it, it’s explained in this old post
Then, I’ve added one line in pelicanconf.py
to load the plugin:
PLUGINS=["bibreader"]
Let’s look at bibreader/__init__.py
.
It first imports a bunch of objects that we’ll use latter on.
Then, it defines function tex
to process BibTeX text, interpret its LaTeX content, and return a Markdown string.
It also defines a function _get
that tries to get several keys in turn from a dictionary.
And finally, dictionary _entries
converts BibTeX entry names into plain English.
from pelican.readers import BaseReader
from pelican import signals
from datetime import datetime
from io import StringIO
from urllib.parse import urlparse
from pathlib import Path
from tempfile import NamedTemporaryFile
from pybtex.database import parse_file as parse_bibfile
from markdown import Markdown
from .latex import tex as _tex
def tex (txt) :
if isinstance(txt, str) :
return _tex(txt)
elif txt is not None :
return _tex(txt)
def _get (d, *keys) :
for k in keys :
if k in d :
return d[k]
_entries = {"phdthesis" : "PhD thesis",
"inproceedings" : "Conference paper",
"proceedings" : "Conference proceedings",
"techreport" : "Technical report",
"inbook" : "Book chapter",
"article" : "Journal paper",
"book" : "Book"}
Then, class BibTexReader
is defined to handle .bib
source files, and it is registered as a Pelican reader.
class BibTexReader (BaseReader) :
enabled = True
file_extensions = ["bib"]
def read (self, source_path) :
# skipped, continued below
def add_reader (readers) :
readers.reader_classes["bib"] = BibTexReader
def register () :
signals.readers_init.connect(add_reader)
Method read
has to read the given source_path
, parse its content, and return some HTML together with a meta-data dictionary.
It starts with the former, and provides just the required information:
# skipped, continued from above
path = Path(source_path)
bib = parse_bibfile(source_path)
entry = list(bib.entries.values())[0]
fields = entry.fields
metadata = {"slug": path.stem,
"title": tex(_get(fields, "title", "booktitle")),
"date" : datetime(year=int(fields.get("year", 1)),
month=int(fields.get("month", 1)),
day=int(fields.get("day", 1)))}
Note that each .bib
file has only one BibTeX entry.
Then, we build the article content as Markdown that will be parsed at the end.
To start with, we handle potential sub-title, and authors’ or editors’ names.
content = StringIO()
if txt := fields.get("subtitle", None) :
content.write(f"> {tex(txt)}\n\n")
if persons := _get(entry.persons, "author", "editor") :
for i, who in enumerate(persons) :
if i :
content.write(", ")
names = who.first_names + who.middle_names + who.last_names
content.write(" ".join(tex(n) for n in names))
content.write("\n\n")
Then we handle publication type, with journal/conference name, and so on. I’ve used a simple method that tries to generate a string from required fields. If it fails, strings are tried in turn, and if everything fails, an exception is raised so it will be reported by Pelican.
_fields = {k : tex(v) for k, v in fields.items()}
if "type" not in _fields and entry.type in _entries :
_fields["type"] = _entries[entry.type]
for info in ["**{type}:** {school}",
"**{type}:** {booktitle}, {series} {volume}",
"{booktitle}, {series} {volume}",
"**{type}:** {booktitle}",
"{booktitle}",
"**{type}:** {journal} {volume}",
"{journal} {volume}",
"**{type}:** {journal} {number}",
"{journal} {number}",
"**{type}:** {institution}",
"**{type}:** {publisher} (ISBN {isbn})",
"{publisher} (ISBN {isbn})"] :
try :
txt = info.format(**_fields)
except :
continue
content.write(f"_{txt}_\n\n")
break
else :
raise ValueError("missing publication context")
Here, we handle external links, as DOI or HAL ids, as well as PDF file. And finally the abstract, and a copy of the BibTeX source to be copied/pasted by visitors.
if url := fields.get("DOI", None) :
doi = urlparse(url).path.lstrip("/")
content.write(f" * DOI: [{doi}]({url})\n")
if halid := fields.get("hal-id", None) :
content.write(f" * HAL: [{halid}](https://hal.archives-ouvertes.fr/"
f"{halid})\n")
if path.with_suffix(".pdf").exists() :
content.write(f" * [get PDF]({{static}}{path.stem}.pdf)\n")
if abstract := fields.get("abstract", None) :
content.write("## Abstract\n\n"
f"{tex(abstract)}\n\n")
content.write("## BibTeX\n\n")
for line in path.open() :
content.write(f" {line.rstrip()}\n")
Finally, HTML content is rendered and we return it together with meta-data:
return Markdown().convert(content.getvalue()), metadata