Using Pandoc to Build a Technical Manual
Pandoc is a markup transformation tool that converts documents between formats. It's an invaluable tool for any software engineer who works with documentation.
In this guide, we'll build a "technical manual" using Pandoc. Our manual will be a collection of Markdown documents with PlantUML diagrams. We'll use Pandoc to combine these documents into a single HTML page, rendering PlantUML diagrams as SVG images. Along the way, we'll explore Pandoc filters for rendering diagrams and including source code from your project.
By the end of this guide, you'll have enough understanding of Pandoc to build your own documentation system.
Installation
You can install Pandoc on macOS with Homebrew:
brew install pandoc
On Fedora:
sudo dnf install pandoc
Or with GNU Guix:
guix install pandoc
Introduction to Pandoc
Let's say you have a simple HTML document:
<!DOCTYPE html> <head> <meta charset="utf-8" /> <meta name="description" content="Software Wanderings" /> </head> <body> <h1>Hello</h1> <ul> <li>one item</li> <li>two item</li> <li>three item</li> </ul> <a href="/">Relative reference</a> <table> <tr> <th>Company</th> <th>Contact</th> <th>Country</th> </tr> <tr> <td>Alfreds Futterkiste</td> <td>Maria Anders</td> <td>Germany</td> </tr> <tr> <td>Centro comercial Moctezuma</td> <td>Francisco Chang</td> <td>Mexico</td> </tr> </table> <svg viewBox="0 0 100 100" preserveAspectRatio="xMidYMid slice" role="img"> <title>A gradient</title> <linearGradient id="gradient"> <stop class="begin" offset="0%" stop-color="red" /> <stop class="end" offset="100%" stop-color="black" /> </linearGradient> <rect x="0" y="0" width="100" height="100" style="fill:url(#gradient)" /> <circle cx="50" cy="50" r="30" style="fill:url(#gradient)" /> </svg> </body>
You can convert this from HTML to Markdown with the following command:
pandoc -f html -t markdown public/about.html cat public/about.html | pandoc -f html -t markdown
Which produces the following output
# Hello - one item - two item - three item [Relative reference](/) Company Contact Country ---------------------------- ----------------- --------- Alfreds Futterkiste Maria Anders Germany Centro comercial Moctezuma Francisco Chang Mexico 
Or Emacs's Org Mode
* Hello :PROPERTIES: :CUSTOM_ID: hello :END: - one item - two item - three item [[/][Relative reference]] | Company | Contact | Country | |----------------------------+-----------------+---------| | Alfreds Futterkiste | Maria Anders | Germany | | Centro comercial Moctezuma | Francisco Chang | Mexico | [[]]
Pandoc also has native and JSON output formats. When you select JSON, the raw AST that Pandoc generates from your document is returned. Let's take a quick peek at this AST:
pandoc code-examples/sample.html -f html -t json | jq > code-examples/sample.json
] ] }, { "t": "BulletList", "c": [ [ { "t": "Plain", "c": [ { "t": "Str", "c": "one" }, { "t": "Space" }, { "t": "Str", "c": "item" }
Now we can clearly see that Pandoc is a transformation system. It parses a document into its native AST, then uses a set of writers to render that AST into the target format:
INPUT --reader--> AST --filter--> AST --writer--> OUTPUT
Taken from the Pandoc Website
Building Your Technical Document
Now that we understand the basics of how Pandoc works, let's use it to build a technical manual. We have three main goals:
- PlantUML-based diagrams - Render architecture and sequence diagrams
- Code inclusion - Include code fragments from your project
- Markdown-based content - Write in Markdown for simplicity
Project Setup
We'll use a separate Markdown file for each section and a justfile to build our document into a single HTML file. Just is a modern command runner that's simpler and more intuitive than Make. Here's our project structure:
- justfile
- metadata.yml
- sections/introduction.md
- sections/architecture.md
First, let's create a Pandoc metadata file. Pandoc uses YAML for metadata, which you can include inline or in a separate file.
Create a file called metadata.yml with the following content:
title: "Software alchemist Docs" author: "Ivan Willig" email: "iwillig@gmail.com" description: "Software Alchemist Docs" lang: "en-US" toc-title: "Table of Contents" sections: - chapters/introduction.org - chapters/design.org css: - css/pico.min.css - css/docs.css - css/highlighting.css
Now let's create a justfile to automate the build process. First, install Just:
# macOS brew install just # Or with cargo cargo install just
Create a file called justfile with the following content:
# Build the documentation default: docs.html # Generate HTML documentation from markdown sources docs.html: sections/*.md metadata.yml pandoc metadata.yml \ --toc --toc-depth=5 \ --template templates/template.html \ --highlight-style zenburn \ -F pandoc-plantuml \ -f markdown -t html \ sections/introduction.md \ sections/architecture.md \ -o docs.html # Clean generated files clean: rm -f docs.html rm -f plantuml-images/*.svg rm -f plantuml-images/*.uml # Watch for changes and rebuild watch: watchexec -e md,yml just
Now you can build your documentation by running:
just
Or clean up generated files with:
just clean
Using Pandoc Filters
Pandoc filters allow you to manipulate the AST between the reader and writer stages. This is where the real power of Pandoc comes in—you can extend it to handle custom markup or integrate with external tools.
For our technical manual, we'll use the pandoc-plantuml filter to
render PlantUML diagrams. Install it with:
pip install pandoc-plantuml
You'll also need PlantUML itself:
# macOS brew install plantuml # Or download the JAR file # https://plantuml.com/download
Now you can include PlantUML diagrams directly in your Markdown:
## Architecture Diagram ```plantuml @startuml actor User participant "Web Server" as Web participant "Database" as DB User -> Web: HTTP Request Web -> DB: Query DB --> Web: Result Web --> User: HTTP Response @enduml ```
The filter automatically converts these code blocks into SVG images when you build your documentation.
You can also write custom filters in Python. Here's a simple example that converts emphasized text to uppercase:
#!/usr/bin/env python3 from pandocfilters import toJSONFilter, Emph, Str def caps(key, value, format, meta): if key == 'Emph': return [Str(value[0]['c'].upper())] if __name__ == "__main__": toJSONFilter(caps)
Pandoc Templates
Each output format has a default template. You can view a format's template using Pandoc itself:
pandoc -D html
This outputs the default HTML template. You can read more about Pandoc's template syntax in the official documentation.
<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml" lang="$lang$" xml:lang="$lang$"$if(dir)$ dir="$dir$"$endif$> <head> <meta charset="utf-8" /> <meta name="generator" content="pandoc" /> <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" /> $for(author-meta)$ <meta name="author" content="$author-meta$" /> $endfor$ $if(date-meta)$ <meta name="dcterms.date" content="$date-meta$" /> $endif$ <!-- ... rest of template ... -->
You can customize this template to add your own CSS, JavaScript, or
HTML structure. Save your custom template and reference it with the
--template flag as shown in the justfile above.
Getting Started Quickly
If you want to skip the manual setup, I've created a cookiecutter template with everything pre-configured:
This template includes:
- Pre-configured justfile with common tasks
- Sample metadata.yml
- Example markdown sections
- HTML template with responsive CSS
- PlantUML filter setup
- Code inclusion utilities
- Example diagrams and structure
You can use it to bootstrap a new documentation project:
# Install cookiecutter if you haven't already pip install cookiecutter # Generate a new project from the template cookiecutter gh:iwillig/pandoc-documenation-tool
The template will prompt you for a project name, author, and other details, then generate a complete documentation project. Customize it to fit your needs.
Conclusion
Pandoc is a powerful tool for building technical documentation. By combining Markdown, PlantUML, and Pandoc filters, you can create maintainable, version-controlled documentation that lives alongside your code. Using Just as a command runner keeps the build process simple and reproducible.
Key takeaways:
- Pandoc converts between formats by parsing to an AST, then rendering
- Filters extend Pandoc's capabilities
- Templates customize the output format
- PlantUML integration enables version-controlled diagrams
- Just provides a simple, modern way to automate builds
- The cookiecutter template gives you a head start
You now have everything you need to build your own technical manual. Start with the cookiecutter template, or build your setup from scratch using this guide. Happy documenting!