How Preprocessing Works¶
Preprocessing consists in transforming source code before it is compiled. The goal of this document is to explain how this works in Dune.
Dune supports two separate ways of applying preprocessors, the “classic pipeline” (used
with (staged_pps)
), and the “fast pipeline” (used for all other
preprocessing specifications including
(pps)
).
The OCaml compilers provide options for specifying a preprocessing step. The
-pp
option is used to invoke a textual preprocessor (something that reads
text and returns text). The -ppx
option is used to invoke a ppx rewriter
(a function that takes an AST and outputs an AST).
This is the “classic pipeline”: preprocessing is part of the compilation
itself. This is simple, but has a problem: in order to compute the dependencies
of a module, it is necessary to pass the same -pp
or -ppx
option to
ocamldep
.
The classic pipeline has the following steps:
preprocessing (as part of
ocamldep
)dependency analysis
preprocessing (as part of compilation)
compilation
Dune supports a “fast pipeline” where the preprocessor is invoked separately from the compiler and its output is saved. Afterwards the preprocessed code is compiled directly.
The fast pipeline has the following steps:
preprocessing
dependency analysis
compilation
It has several advantages: it only invokes the preprocessor once per file, and
the preprocessed code is reused between dependency analysis and different kinds
of compilation. Also, when several preprocessors use ppxlib
, they can be
combined in a preprocessing program that traverses the AST only once.
However, some specific code generators or preprocessors require direct
access to the compilation artefacts of their dependencies. Therefore they
need to be used with the classic pipeline, even if it is slower. Note that a
PPX is able to know if it was called as part of ocamldep -ppx
or ocamlopt
-ppx
, so it can act differently in each phase.
Dune chooses which pipeline to use depending on the
provided Preprocessing Specification. It will select the fast pipeline,
unless (staged_pps)
is used. In that case, the classic pipeline is used.
In the case of the fast pipeline, a single executable is built and accepts arguments for all preprocessors.