These procedures describe how to build an HTML or PDF version of the dictionary from the core source XML lexicon.
You will need the following command-line tools on your system:
xsltproc - XSLT processor.
The provided XSLT style-sheets are in XSLT 1.0 format and rely on a few EXSLT extensions. You may try another XSLT processor, but there is no guarantee it will work.
node - Node.js JavaScript runtime.
Any recent version should work. It is currently only used to clean-up the HTML output from the XSLT style-sheets, using regular expressions to replace some patterns. You can do without it and skip that step, but the output will not be perfect.
Optional: bash - Shell command-line scripting.
It is needed in order to use the provided shell scripts described below. You can do without it, but you will then need to adapt the commands to your operating system.
sile - The SILE typesetter
Version 0.12 (at the time of writing) was used to generate a PDF version of the lexicon, along with a set of specialized classes and packages from the Omikhleia SILE packages repository.
The core source lexicon in XML is src/dict-sd-fr-en.xml
To just convert the core source lexicon to HTML, the steps are the following:
scripts/tei2html/tei-lite.xsl
XSLT style-sheet, to generate a raw HTML file.scripts/tei2html/post-process.js
on that generated HTML to clean it. This step
fixes a few issues (e.g. spacing, punctuations, blank lines, etc.) that would have been hard to manage at the
XSLT level.The provided convert.sh
script applies these two steps in a row, and puts the converted
HTML in docs/dict-sd.html
To produce a much better version of the lexicon, several additional tasks are performed.
scripts/tei2html/tei-lite.xsl
XSLT style-sheet to the final XML, to generate a raw HTML file.scripts/tei2html/post-process.js
on that generated HTML to clean it.The provided process.sh
script applies all the necessary steps in a row, and puts the converted
HTML in docs/dict-sd.html
The post-processing tasks may be updated and the list below may become outdated, please check the script in case of doubts.
But basically, for the record, the main following steps should be performed:
scripts/process/expand-xref-pass1.xsl
): this task
creates dictionary entries for all entry forms which are not main headwords, i.e. appear in an entry as
alternative variants or inflected forms.scripts/process/expand-xref-pass2.xsl
): this task
creates dictionary entries for secondary entries (a.k.a. related sub-entries) which are not
marked as already having an existing main entry.scripts/process/sort.xsl
): obviously, the previous
tasks bluntly added new entries that won’t be ordered. (And the original core source XML is not
guaranteed, anyway, to be ordered correctly.)scripts/process/expand-renum.xsl
)scripts/process/add-sections.xsl
): that is, insert an alphabetical section milestone when
the first letter of entries changes.As an illustration, assuming an original entry:
form1 S. XX/xxx (variant1 S. YY/yyy, variant2 N. ZZ/zzz), pl. plural1 S. TT/ttt (plvariant N. UU/uuu) n. Some definitions - subform S. MM/mmm n. abst.
The expanded post-processed lexicon would yield to something such as:
F
form1 S. XX/xxx (variant1 S. YY/yyy, variant2 N. ZZ/zzz), pl. plural1 S. TT/ttt (plvariant N. UU/uuu) n. Some definitions - subform S. MM/mmm n. abst.
P
plural1 S. TT/ttt pl. → form1
plvariant N. UU/uuu pl. → form1
S
subform S. MM/mmm n. abst. → form1
V
variant1 S. YY/yyy → form1
variant2 N. ZZ/zzz → form1
To start with, we need the post-processed version of the lexicon in XML. The initial steps are therefore the same as for the HTML output, without the final (HTML-only) steps.
The provided process-to-xml-only.sh
script applies all the necessary steps in a row,
and puts the converted (expanded) XML in docs/dict-sd.xml
As noted above, PDF generation requires:
Installing SILE and our custom classes and packages is an exercise left to the reader. When properly set up:
For the English PDF:
sile -I preambles/dict-sd-en-preamble.sil docs/dict-sd.xml
For the French PDF
sile -I preambles/dict-sd-fr-preamble.sil docs/dict-sd.xml