Exposing Pandoc to Create PDFs using AWS Lambda
2024 November 26

I needed an endpoint to convert markdown to a standard PDF for reasons. This is how I made a simple low traffic markdown to pdf converter using AWS Lambda and pandoc. When evaluating solutions I discovered that everything either required webkit or a native LaTeX installation. All options that are painful to maintain from a user perspective for a script I was building. This solution ensures that it will keep working into the future by freezing the dependencies. AWS Lambda is really neat because it will let you put whatever you want inside of a docker image and expose it openly on the internet. The hard part is putting together all the steps and doing the docker stuff. So here’s what I did to make it.

I decided to use a debian image because I know their collection of packages is pretty create and they generally stay up to date. I also know that their version of LaTeX is generally well built and favored by academics of all ages.

The important parts are below. I didn’t bother with squashing the image or doing any fat trimming since this is meant to be quick and dirty. AWS actually has pretty decent docs on how to deploy a docker image to lambda.

FROM debian:latest
ARG FUNCTION_DIR="/function"

RUN apt update && apt upgrade
RUN apt install -y pandoc texlive python3 python3-pip

RUN mkdir -p ${FUNCTION_DIR} 

WORKDIR ${FUNCTION_DIR}

RUN pip install --target ${FUNCTION_DIR} awslambdaric
COPY . ${FUNCTION_DIR}

ENTRYPOINT [ "/usr/bin/python3", "-m", "awslambdaric" ]
CMD [ "function.handler" ]

There is a minor fat trimming here by ensuring the COPY command is the last step. This ensures that there are fewer changes between builds and pushes to the repo allowing a shorter dev test cycle.

Honestly github copilot wrote most of the app cause it was just write file, shell out, read file, and return. Pretty basic stuff and common patterns.

I didn’t even bother writing any cloudformation because it wasn’t necessary as a one off. You can literally go in the console and play click ops for it. I did write a basic shell script to build the image and deploy it to the lambda function.

There’s a neat trick to ensure that you wait for the update to compete before you try to deploy again.

aws lambda wait  function-updated-v2 --function-name hello-pdf

Calling the Function

You can call it over a basic curl command. It just expects a markdown file to be POSTED to its endpoint.

url=$(aws lambda get-function-url-config --function-name hello-pdf --query FunctionUrl --output text)

curl -i -X POST $url -H "Content-Type: text/plain" --data-binary @TEST.md

Deploying it Yourself

You can go ahead and find the code for this application on my GitHub under a BSD 0-clause license. (Honestly any engineer who needed a similar thing packaged up would arrive at a similar solution to mine. It’s not that novel.) The README includes detailed deployment directions.

I hope you find this application useful in your endeavours. This solves a problem and may make your life easier in the long run, but who knows.


Remember you can also subscribe using RSS at the top of the page!

Share this on → Mastodon Twitter LinkedIn Reddit

A selected list of related posts that you might enjoy:

*****
Written by Henry J Schmale on 2024 November 26
Hit Counter