Comprehensive tips for generating PDFs in the browser

software-dev, javascript, cognizant

Recently, as an associate at Cognizant Australia, I have found myself diving much deeper into the world of “Print to PDF” than I ever thought neccesary. This is a summary of my learnings in the hope it may help others.

WARNING: Work In Progress #

This post is a work in progress. There are still sections left to complete, screenshots to add, proof reading to be done.

However, it may be some time before I am able to complete it, and I hope someone can find the information here useful in the meantime.

Summary #

Your web app has users, sometimes they just want a PDF copy of what they are looking at on the screen.

Sure, they could copy and paste the URL, you could provide a “share” button, they could bookmark the page, etc. However, what happens when:

When building a PDF export feature for a webapp, you are typically faced with a few options:

There are plenty of great articles elaborating on these options.

Generating PDFs in the browser using window.print #

The rest of this article will focus on utilising the browser print dialog, and some gotcha’s that you should consider if doing so.

The primary reason to opt for this approach is that you can benefit from a single code base for both displaying information to users from the web browser as well as exporting a PDF.

The main pitfall is that you have much less control over exactly when page-breaks will appear, and in general less control over the layout of the resulting PDF.

We will consider all of the following:

Overall user experience #

A naive approach would be: “Just ask the user to print the page”.

This experience is a sub-par for many reasons, especially since browsers dropped the notion of a standard “File -> Print” menu in favour of weird little hamburger mystery menus some time ago, making it less likely users actually knwo where the print functionality is found.

Instead, we can trivially invoke window.print() from JavaScript to trigger the same functionality. This should be done by a nice noticable button somewhere in your web app.

TODO: do and don’t images. Don’t is screenshot of hamburger menus and “Print” option. Do is a styled button in the webpage (with icon).

If, for reasons we will discuss below, we opt to render content into a new window for printing, then this button should:

The new window can be made almost transparent by sizing it such that the only thing the user sees is the print dialog. Without appropriate sizing, users may get confused as to why they are seeing two previews of the content (one from the underlying webpage DOM, and one from the print preview).

TODO: do and don’t images. Don’t is a window too large showing two previews, Do is a popup covered by the print dialog.

Render into a new window #

The promise of CSS media queries is that we can print a complex web app with only changes to CSS.

Sometimes, this just doesn’t cut it and we need much more control, such as:

These can be destructive operations that break the web app. For these reasons, it can be helpful to render your DOM for printing into a new window that can be disposed of once printed.

We still get to share rendering code, e.g. by rendering the same React components used by your web app into the new window, but we have the freedom of mangling it as we see fit to lay out appropriately on a the page.

Headers and footers #

This will be simple once CSS Paged Media Module Level 3 is ratified and made avaialble in major browsers. Until then, we are stuck with ugly workarounds.

Sidenote: There are rendering engines which understand advanced CSS Paged Media declarations, such as PrinceXML. However our goal is to utilise the users browser, which means as of writing, we don’t have access to these features.

Goals:

After searching around, the best approach is embodied in this blog post and variations appear throughout stackoverflow too.

Solution:

HTML

<table>
  <thead>
    <tr>
      <td class="header-space">
        <!-- Empty. Just to reserve space for the <header> -->
      </td>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td>
        <!-- Actual page content goes here -->
      </td>
    </tr>
  </tbody>

  <tfoot>
    <tr>
      <td class="footer-space">
        <!-- Empty. Just to reserve space for the <footer> -->
      </td>
    </tr>
  </tfoot>
</table>

<header class="header-content">
  <!-- Actual header content goes here -->
</header>

<footer class="footer-content">
  <!-- Actual header content goes here -->
</footer>

CSS

.header-content, .header-space,
.footer-content, .footer-space {
  height: 100px;
}

.header {
  position: fixed;
  top: 0;
}

.footer {
  position: fixed;
  bottom: 0;
}

Hopefully you can already see why it may be a good idea to render your printable pages in a new window. Although possible to use the above HTML in your regular website layout, it will quickly start to cause issues, and we are just getting started!

Fitting user generated content #

Many systems that require exporting PDFs are enterprise systems, often with user generated content. Such content is normally pasted into a WYSIWYG editor from MS Word.

Security Warning: We will leave the notion of protecting against XSS attacks for another day.

Goals:

Assumptions:

One potential solution is actually implemented by webkit itself, called the “shrink factor”. It is transparent to web developers, but the print dialog will attempt to shrink the entire document until it fits, but give up after a certain point and then truncate the rest of the content off the page.

We will build something similar in JavaScript, but instead of shrinking the entire document (unneccesarily making all text small and harder to read), we will just resize the problematic elements.

Solution:


Use word-break: all to prevent long strings of characters from flowing off the page.

Without this, large lines of text such as aaaaaaaaaaaaaaaaaaaaaaa... will end up off the page. By allowing the browser to wrap words midway through, we trade off the chance of breaking some words in inopportune places with the benefit of actually getting all our content on the page.

Find all tables which wider than the PDF page, and shrink them.

Gotchas