Creating a parallel AMP site with Jekyll

Published on: June 10, 2017

Update: September 4th, 2018: I decided to remove AMP from this site as it wasn’t a quick and easy win. I didn’t want to style tables in amp or figure how to import gists without errors. The amp jekyll plugin is ok, but can print errors and is slow to be released. Thus, I only recommend jekyll readers continue if they are have relatively simplistic blogs (only images and text).

Google AMP (Accelerated Mobile Pages) is a strict subset of html that essentially only allows for content and a subset of supported inline css. Nearly all of the case studies are news sites. Google will give preference to valid AMP sites for mobile searches, if only because google can ensure the page is lightweight and fast. Though I attribute the project to google as google is leading the evangelizing effort, bing has joined with cloudflare hosting AMP pages too.

While I’ve known about AMP since last year, I’ve largely ignored it (I prefer to think I was waiting for the project to mature). Last weekend, I was chatting with a friend who works at a large travel corporation. She mentioned how they were working on creating an AMP enabled site because they’ve noticed their site falling in search rankings. Their site is a highly dynamic react app – as one might imagine – but how could they use AMP when first party javascript is disabled? She told me the homepage will be server side rendered, so it’s the only page that will be AMP enabled. It’ll have plenty of links to their main site where users can actually do searching and bookings. It’s a little cheeky, but I hear SEO is cutthroat and if large part of your business comes through search engines, I think it’s reasonable for one try new tech to and see if that increases the rank.

On a side note, I visited the travel site to find they have 250 elements in their head element. As an unfair comparison, this page has 5 at the time of this writing.

But this discussion got me thinking about how easy it would be for me to “AMPlify” this site which is a Jekyll site. The history of this site has basically been a testing grounds for new technologies:

2012: move from homegrown static site solution to jekyll
2013: incorporate LESS CSS, RequireJS, minification using Gruntfile.js. I can laugh at it now as the title is “The Perfect Blogging Workflow”. Heh.
2014: migrate from GoDaddy/IIS/WCF/C#/MySQL to DigitalOcean/Apache/Flask/Python/SQLite.
2015: Move frontend dependencies to bower. Still using grunt but starting to use webpack to generate sourcemaps and minify. Use skeleton.css for theme.
2016: Theme update, migrate to cssnext, remove Grunt, and use digested assets
2017: AMP!

AMPlifying with amp-jekyll

Jekyll is one of the most popular static site generators. As a result, there is a rich list of plugins and amp-jekyll is one of them.

Amp-jekyll is still maturing and the currently released version (1.0.1) does not support jekyll permalinks, so I had to directly reference the repo in the Gemfile.

gem 'amp-jekyll', :git => 'https://github.com/juusaw/amp-jekyll'

One of the first steps is to copy their amp.html for your own use. Since I have a stylesheet that is preprocessed and minified, I needed to include it in my amp.html. Typically, built pages use a link to reference the stylesheet, but since AMP requires only inline css, I had to go with an include directive.

<style amp-custom>
  {% include styles.css %}
</style>

This did not work out of the box for me, so I created the following script to execute on every build to make sure the included css fit AMP requirements.

# Copy the built css into the includes directory so that it can be used in the
# amp pages
cp _assets/build/styles.css _includes/.

# Amp doesn't like the !important css attribute that pure css uses, so we
# remove it!
sed -i -e 's/!important//g' _includes/styles.css

# pure contains IE7 specific hacks to get a style to reset using a syntax error.
# This is not needed for AMP, so we'll remove it from the css. See stackoverflow
# for more information https://stackoverflow.com/q/1690642/433785
sed -i -e 's/*display:inline;//g' _includes/styles.css

For regular posts, I updated their layout to include an amphtml link so that webcrawlers know that I support AMP and where the AMP pages are loaded.

{% if page.path contains '_posts' %}
  <link rel="amphtml" href="{{ page.id | prepend: '/amp' | prepend: site.baseurl | prepend: site.url }}">
{% endif %}

Now when I build the site, all blog posts have a sibling, AMP-enabled page that lives at the same url except it’s prefixed with /amp. So the urls look like:

/blog/introduction-to-journald-and-structured-logging
/amp/blog/introduction-to-journald-and-structured-logging

Unanticipated Benefits

I had three posts where I forgot the /blog prefix in the url. Amp-jekyll made these obvious and so I fixed these urls. Yes, I realize changing urls after the fact breaks the internet, but these weren’t popular articles – not popular enough for me to setup redirects.

I also had broken image links in older posts because I was using raw figure html elements instead of the img directive.

Criticisms for AMP

The top articles on Hacker News for AMP are all criticisms and I’d be remiss if this section was omitted. The top articles are:

Kill Google AMP before it KILLS the web
The Problem With AMP
Google AMP is Not a Good Thing
And Please Make Google AMP Optional, which was just posted a few hours ago

The articles are relatively succinct, so I recommend reading them – they’re decent, but I believe they’re overreacting:

Google doesn’t actually own AMP. AMP has an open governance model. Only when there is no one working on the project does Google give itself the power to appoint a leader, in which case, since no one is working on the project does it even matter?
Content is not just loaded from Google’s servers. Cloudflare hosts AMP content as well. Speaking of cloudflare, I’m not sure why having external sites cache content on edge server rubs people the wrong way. It alleviates concerns of DDOS attacks and site maintenance.
You can create your own AMP cache
Links work as normal. If a user clicks on an amp link, they’re taken to a feather-light page which has all the content they want, else if a user clicks on a non-amp link they get all the features of the full app.
The subset of CSS allowed is annoying if only for the fact that one has to do small tweaks to get the css compliant. Without this subset, AMP would not be able to assume the layout was fast and sensible. It’s a tough sell, as no one wants to create custom css for AMP, but I understand the reasoning. Thankfully, this blog does not use a lot of CSS – only a few sed statements were needed to become AMP compliant.
I’ll agree that removing AMP content from google’s AMP cache isn’t the most user friendly. Sending an “update-ping” to get immediate removal is tedious at best, but “cached content that no longer exists will eventually get removed from the cache” so one can just wait for the expiration. Since cache busting is a tough problem, this is a fine solution, though an optimal solution would be like Cloudflare’s where they have a nice UI for cache busting.
One isn’t giving up anything when having a parallel AMP site. AMP content is still served from the origin, and if there happens to be an AMP cache server present, the content is served from there. It’s not like whoever hosts the AMP server owns the content. It’s simply an optimization for those who seek the content in a nice layout.
Some lament that AMP would be good for fake news sites. I’m not sure where this idea sprouted from. I believe that satirical newspapers like the onion should be able to participate in AMP and reap any potential benefits.

One thing that I find very interesting is the topic of Subresource Integrity, which is where a client can specify the hash of the incoming file. If the hashes do not match then the file is rejected. The problem is that AMP defines itself as an evergreen library, so a single URL will always point to the lastest version of the library. This means that someone could compromise the javascript payload and clients would have no way of knowing. See the following Github issue for more information. I do have to chuckle though, as there as has been a recent trend towards digesting all assets, and now google is pushing a platform that is fundamentally opposed to identifying payloads based on hash (which one would normally think it could increase security and speed).

With all the back and forth, why did I use AMP? It was a combination of two reasons. First, this site is a testbed and I wanted to answer the question of what effort is required. Since I only spent a couple hours adding this feature, the effort was low. Second, will tons of traffic pour in from the increase in search rank? I won’t know until if I don’t try! Before we part, I have to embarrassingly admit, given my stance on the importance of metrics, this site employs zero analytics and I will have a hard time determining improvements unless there is an order of magnitude increase. AMP does have a whole section on analytics, so this may be an area worth exploring.

Comments

If you'd like to leave a comment, please email [email protected]

2017-10-04 - Anonymous

hello good job

2018-02-14 - Anonymous

hey man, your website has link on amp version pointing on development host (0.0.0.0:4000)

2018-02-14 - Nick

Thank you, great catch!

2018-08-24 - ArtiPedia

nice share.. Thank you