Breaking a Large PDF Document into Chunks

Chunking, or dividing large documents into smaller parts, is not required or recommended at EPA. 

At 50MB, you may consider the document content and the expected audience and make a determination if document chunking might be useful.

The choice of posting a chunked document on the Internet or providing complete documents on FTP or DVD/CD should be based on what can be well managed by the local web team.

For chunked documents:

  • If a document is chunked, do not post the entire document just post the chunks (creates redundant content) 
  • Include required document info on each chunk 

Note: The recommendation to consider chunking only applies to files above 50MB. It’s not recommended that a document be chunked at all but you should consider if YOUR audience needs the document in smaller sections. There should be very few documents being chunked.

If you decide to break a PDF into sections, it is important to provide a description (abstract, executive summary, blurb) on the Web page that links to the document chunks. There are multiple ways to link to the chunked documents. Two are described below.

  • Create a table of contents on the HTML page that links to each section. Each link should be accompanied by some description of that section. (Navigation on the HTML page)
  • Link to the first chunk of the document that contains bookmarks linking to the other chunks. (Navigation contained in the PDF document.) 

When you chunk a bookmarked PDF, create the bookmarks first, then break up the document using “Delete Pages” (i.e., snip away the bits that aren’t going to be in this chunk and Save As chunk number whatever). This way you won’t lose bookmarks. (Using “Extract Pages” may cause you to lose bookmarks.) However, you will wind up with superfluous bookmarks in the smaller chunks. You’ll have to go through and delete those by hand.

Identify the chunks
This is a requirement for divided documents. See the EPA standard for each segment of a large document.

Suppose a chunk of one of your long documents turns up in Google search results, and someone opens it. How will you make sure they can find the other chunks of the document if they want them? Or at least know what original document they came from? How will they be able to tell it’s an EPA document?

The goal is to make sure anyone that reaches a section of a document knows what document it belongs to and can get the rest of the document if it’s wanted.

What's the best way to present document source information?
There are three ways to present this information:

  • Attach a cover sheet. You can create a template for your program.
  • Add this information to the top of the first page of each chunk.
  • Add this information to the document as a header and/or footer. You can add this information in the original program (e.g., Word) or you can add it in Adobe Acrobat to a document that was produced in any program.

Document > Add Headers & Footers
Lets you add formatted text in three columns. You can add a header or footer or both. You can make the information appear on all pages or on just one page. 

Document > Add Watermark & Background
Lets you add formatted text or an image (jpg, bmp, or PDF) to a document. You can position this info in the margins and have it appear on all pages or on just one page. 

In addition, be sure to fill in the Document Properties/Metadata fields for each file. For a chunk, you can use the same metadata as in the full document file, but add the specific information for that chunk (e.g., "Chapter 1, Calculation Methodology").

Top of Page