AMP Validation

AMP’s strength isn’t just that it makes your pages fast, but that it makes your pages fast in a way that can be validated. This way, third parties such as Twitter, Instagram, or Google Search, can offer predictable performance characteristics when they serve valid AMP pages.

Validating AMP pages can be done in different ways, such as appending #development=1 to the AMP URL and looking into the DevTools console, using a web interface at  validator.ampproject.org, using a browser extension, or even programmatically by leveraging the AMP specification.

The AMP plugin validation approach consists of the following main pillars:

  1. Identifying validation errors (in sync with the AMP HTML Format specification)
  2. Determining the culprits of those errors(i.e. hooks, theme, plugin, core)
  3. Effectively enable WordPress developers to manage existing validation errors by providing easily accessible information on (1) and (2)

The AMP Specification

AMP validation is assessed with respect to a given version of the AMP HTML Format Specification. The AMP Plugin provides functionality that reads the validator Protocol Buffer (PB) specification, and outputs the AMP validation structure in PHP to be consumed by the validation functionality.  Every time the AMP specification is updated, the AMP plugin is also updated to reflect the changes. This update is triggered using the amphtml-update.sh Bash script, which is turn runs a Python script to read the PB specification and generate a PHP version of the specification in the AMP_Allowed_Tags_Generated class. The plugin uses the AMP_Allowed_Tags_Generated class to validate the full content being generated by WordPress for a given URL, and provide detailed information about all existing validation errors.

In a nutshell, the AMP plugin provides the following validation functionality:

  • Ensure that the minimal required markup is present (e.g. the meta viewport).
  • Identify tags and attributes from the content that are not allowed by the AMP spec
  • Remove offending elements, with user consent
  • Automatically enqueue the AMP component scripts required in a given page. For example, if an <amp-ad> tag is included in the markup of a given template, the plugin will automatically add the required  wp_enqueue_script( ‘amp-ad’ ) without user intervention

Identifying Validation Errors

The AMP Plugin analyzes the validity of any given AMP content. This is done as part of the plugin’s sanitization tasks. Specifically, the AMP_Tag_And_Attribute_Sanitizer class defines the functionality to parse a given URL and identify the tags and attributes that are not allowed by the AMP Specification. And the AMP_Style_Sanitizer class defines the functionality to process the styles for a given URL and do things like determining which parts of the CSS are not AMP valid (i.e. forbidden rules, CSS placement, and maximum total CSS size).

Determining the Sources of Errors

Once errors have been identified, the next step in to determine what parts of the WordPress stack are triggering them. For example, the plugin will identify the existence of any third-party JS script, either inline or injected, and it will also tell the developer if the offending actor was introduced by a plugin (and which one), the theme (and the theme name), the content (e.g. Custom HTML block), or by WordPress core. The plugin tells the developer details of error such as:

  • Specific part in the markup hierarchy (i.e. meta/script tag and its parent node)
  • Type of error (e.g. HTML/JS/CSS), and its code (e.g. Invalid Element)
  • Node attributes and their content (i.e.attribute viewport, content: width=…)
  • The source of the offending element

Armed with this information, the developer is better positioned to address the validation errors and work towards fixing/eliminating them. 

Managing Validation Errors

At this point developers know which are the validation errors the need to deal with, and what are the sources triggering them. The next step is to address the issues, one by one. The AMP Debugging Workflow [Link] Section describes the functionality provided by the AMP plugin aimed at streamlining the process of building AMP-compatible products (e.g. themes, plugins), or adapting existing implementations to become AMP compatible.

The error-handling functionality of the plugin is centered around the notion of Taxonomies, which are commonly used in the WordPress platform for a variety of purposes. There are four built taxonomies that come with core: Category, Tag, Link Category, and Post Formats. And Custom Taxonomies can be defined by theme and plugin makers. In essence, in WordPress, a “taxonomy” is a grouping mechanism for some posts (or links or custom post types).

AMP Validation Errors Taxonomy

The diagram below depicts the structures used by the AMP plugin for enabling the handling the AMP validation process in WordPress.

In any WordPress taxonomy, the names for the different groupings are called terms. For example, any given category/tag is associated with a term. The AMP plugin defines an AMP Validation Error Taxonomy, where each AMP validation error is associated with a Term class. This approach makes it easy to handle errors in a comprehensive way as they can manifest anywhere in the WordPress stack. Ech specific instance of an AMP validation error becomes an instance object of the corresponding Taxonomy class, and that object can then be manipulated, stored, and shared across occurrences of the error.

The AMP Plugin defines also a Custom Post Type (CPT), AMP_Invalid_URL_Post_Type,  to manage the errors associated with a given URL. This CPT contains all the validation information for the corresponding URL, including the ids of all validation errors encountered, as well as a reference to the actual content object in WordPress (e.g. Post, Terms, Users, CPT). The diagram above depicts the relationship between the AMP validation taxonomy terms, the invaid URL CPT, and the content objects.

This is the structure that supports the AMP Debugging Workflow enabled by the plugin, which allows developers to identify, source, and handle AMP validation errors in their products. The implementation of this functionality is mostly contained in the following classes:

CSS Tree Shaking

Because of AMP limiting of CSS into a single custom style element and a max of 50KB per page, the AMP_Style_Sanitizer is key in enabling the rendering of AMP pages that do not violate such constraints. The tasks performed include:

  • Collects inline styles and outputs them in the amp-custom stylesheet.  
  • Fetch all external stylesheets (except for whitelisted fonts), collect all style elements in the document, and create style rules from inline style attributes
  • Parse/process the collected styles to ensure that:
    • There are no invalid at-rules and no disallowed CSS properties
    • Relative paths for background images are absolute
    • Any !important qualifier is transformed into style rules with higher-specificity selectors
  • If the result is larger than AMP’s 50KB limit, as CSS tree-shaking algorithm is applied
  • If after tree-shaking there is still more than 50KB, then any stylesheet that takes the total over 50KB will be omitted.
  • The result of the parsed styles is then minified, serialized, and stored in a transient, which are a simple and standardized way in WordPress for storing cached data in the database, temporarily by giving it a custom name and a timeframe after which it will expire and be deleted
  • Construct the style[custom] element to be added to the head