I expect that most everyone working in SEO knows that PDFs are indexable by search engines. PDFs can also appear with an authorship-rich snippet in Google SERPs. But, just because a file format can be indexed doesn’t always mean that it’s the ideal approach. Today, I’d like to explore the pros and cons of PDFs from an SEO perspective.
There are some pros to using PDFs. Besides ease of use, they can help with indexing because these documents contain meta data, links, indexable content and authorship attributes.
1. Easy to Create
PDFs can be very helpful for marketers, especially those with smaller teams or limited resources. They’re easy to create — just save your document from Word, Illustrator, etc., as a PDF. Press releases, case studies, product data sheets and more can quickly be converted to an essentially web-ready format. For those without any HTML programming knowledge, PDFs for certain document types can be a fast way to publish web-based content.
2. Contain Meta Data
PDFs also contain meta data, such as meta keywords and descriptions. You can find and edit the meta information under Properties in the File menu in Adobe Acrobat. While meta data doesn’t have a high impact on SEO anymore, I like to think of the meta description as your opportunity to craft just the right description that will compel a searcher to choose your website in the SERPs, and I’d rather write my own description than have a search engine choose it for me.
3. Contain Links
Like web pages, PDFs can also contain links, and those links can be followed by search engine bots. These links can contain anchor text, as well.
4. Indexable Content
Perhaps the most attractive pro of using PDFs is that the content within the PDF is generally readable and indexable by search engines. However, not all PDFs have readable content. To ensure that the text is readable, it should be created as text, not as an image, making it ideal to create the PDF from the originating program, like Word or Illustrator.
5. Authorship Applied
Also like HTML pages, authorship can be identified and inferred by Google for PDFs. However, as with HTML pages, authorship will only show for the first author listed, so it’s important to be sure that the preferred author is listed first. Also, the PDF must be an identified “contributor” site in Google+ for that author.
There are a number of drawbacks to using PDFs when it comes to navigation and lack of control regarding document length, page content, document organization, code editing, structured markup and tracking.
1. Lack of Navigation
One of my greatest concerns about relying too heavily on PDFs for website content is that PDFs often lack site navigation. This means that when a site visitor arrives at the website, they have no simple way to reach other pages on the site. So if the PDF happens to rank well in organic search and a searcher finds the link and arrives at the PDF, how can that visitor easily access other content on your site?
2. Length of Document
Because it’s so easy to save a document as a PDF file, it’s not common to break up a PDF into multiple, smaller documents. For example, in the case of a whitepaper or report, the PDF could range from a few pages to hundreds of pages. This isn’t really ideal for SEO in some cases because longer documents contain more text and often multiple topics. This means that one PDF document, which will equate to one URL, may contain a lot of content that normally might be broken up into multiple website pages in HTML.
3. Lack of Page Organization/Control
Certainly one of the greatest benefits of using a content management system for a website is page organization and control. PDFs, however, don’t often work within the organizational structures of CMS as pages but rather as downloads. So, relying on PDFs as page content isn’t ideal simply from a page organization and control perspective.
4. Lack of Code Editing Capabilities
Certainly one of the benefits of HTML pages is the flexibility that HTML authors have to edit the website code. For instance, images can be optimized for search through tags and other options in HTML, but images cannot be optimized as well in a PDF. This also makes PDFs less than ideal for 508 compliance as well because you cannot add an “alt” tag to each image within the PDF.
5. Can’t Implement Structured Markup
Structured markup and the rich snippets they can generate have been shown through various studies to improve SERP visibility and click-through rate in organic search. But PDFs don’t work the same way that HTML does — authors cannot apply structured markup to the content because of the way the PDF file type works.
In my estimation, that’s a true disadvantage of PDFs. For instance, what if your PDF contains recipes? You won’t be able to use structured markup around those recipes, therefore excluding those recipes from Google’s recipe view in organic search and preventing those recipes from showing recipe rich snippets.
6. Lack of Tracking Mechanisms
I find the greatest disadvantage of using PDFs to be the lack of tracking mechanisms I can apply to PDF documents. Google Analytics can perform tracking through onclick event tracking for PDF downloads, but other tracking within the PDF is not as simple. Additionally, there may be other tracking mechanisms your site uses, such as a marketing automation system. The tracking code for these systems also would not be able to be added to the PDF.
Unlike with HTML pages, PDFs make it much more difficult to fully understand how a visitor is progressing through your site, which is less than ideal.
In the end, PDFs are clearly not the best option for SEO. This doesn’t mean they are bad for SEO, but they simply don’t put the control for SEO in the hands of the webmaster per se. To realize the greatest benefits from SEO, where applicable, I do recommend moving content from PDF to HTML site pages, giving webmasters greater control, flexibility and the best opportunity at SEO and visibility and tracking advantages.