EllipsesEmailFacebookLinkedInTwitter

Canonical URL redirects for static sites

A canonical URL gives your website visitors consistency for bookmarks and copied links. Hosting your site on a static host like GitHub or GitLab Pages does not mean you are without redirects to a canonical URL, despite these hosts lacking server-side redirects. This post shows you how you can do it client-side with JavaScript.

April 21, 2017·4 min read

This blog can be accessed from the four URLs http(s)://(www.)jpap.org/, though it’s preferably for users to only see the canonical URL https://www.jpap.org/ in their browser address bar. That way any bookmarks and/or links they pass on via copy-and-paste are going to be consistent.

The traditional way of handling this is with server-side redirects so that, wherever a user may land, the server would issue a HTTP 301 Moved Permanently redirect to the canonical URL.

You can see this in action on Google when requesting http://google.com and looking at the HTTP response headers below, where the browser is redirected to http://www.google.com/. Observe that Google doesn’t redirect to SSL! (A missed opportunity?)

HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
Content-Type: text/html; charset=UTF-8
Date: Tue, 18 Apr 2017 18:29:33 GMT
Expires: Thu, 18 May 2017 18:29:33 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 219
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN

Since the proliferation of static site generators like Hugo and Jekyll, free static webhosts like GitLab Pages and GitHub Pages have also become popular. In fact, this blog is generated with Hugo and hosted on GitLab Pages, as they support custom domains and SSL. These static webhosts, generally however, don’t support anything more exotic than than simply serving up files off disk.

JavaScript redirects

So how to handle the redirects to the canonical URL without server-side support? Client side JavaScript, of course!

Start with a function that, given a canonical URL, will redirect the browser based on the current location given by location.href:

function redirectPageIfNeeded(canonicalURL) {
  // Extract protocol and host from canonical URL
  var regexp = new RegExp("(https?:)//([^/]+)");
  var matches = regexp.exec(canonicalURL);
  var canonicalProto = matches[1];
  var canonicalHost = matches[2];

  // Current browser URL
  var href = location.href;

  // Track whether we need to redirect the browser
  var hrefRedirect = false;

  // Perform protocol redirect?
  if (location.protocol.toLowerCase() !== canonicalProto) {
    href =
      canonicalProto + // new protocol
      href.substring(location.protocol.length); // host + path
    hrefRedirect = true;
  }

  // Perform hostname redirect?
  if (location.host.toLowerCase() !== canonicalHost) {
    var pos = href.indexOf(location.host);
    href =
      href.substring(0, pos) + // protocol
      canonicalHost + // new host
      href.substring(pos + location.host.length); // path
    hrefRedirect = true;
  }

  // Perform protocol and/or host redirect as required
  if (hrefRedirect) {
    location.href = href;
  }
}

Ensure the above function is called from every page on your site, passing the canonical site URL as the argument:

<script type="text/javascript">
  function redirectPageIfNeeded(canonicalURL){
    // ... as above
  }
  redirectPageIfNeeded("https://www.jpap.org/");
</script>

We can also compact the function execution into an IIFE, shown below.

This script block is best placed at the top of the <head> section of each page to ensure that the redirect is processed immediately on page load. If the script is instead placed at the bottom of the <body>, the browser may load additional resources (e.g. CSS, JavaScript, images) before the redirect is executed. If those resources are linked via a relative URL, the browser will attempt to download them again following the redirect because they now appear on a different domain, wasting bandwidth and polluting the browser cache.

Hugo tips and tricks

For Hugo, calling the redirect function becomes easy through a partial, where we can call the redirect with "{{ .Site.BaseURL }}" to provide the canonical URL. Nice!

Unfortunately this also breaks the local server preview, or “watch”, feature that we get with hugo -w.

The solution is to conditionalize redirectPageIfNeeded on an environment variable, and have that variable set in a npm run-command that is used to generate the final site for upload to the static webhost. Where that environment variable does not exist (while running hugo -w during local preview), the redirect does not come into play, and is only enabled when the final site is generated.

The package.json run command is defined as,

{ ...
  "scripts": {
    "final": "rm -rf public && NODE_ENV=production hugo",
    ...
  }
}

where if you build on Windows, you should additionally use cross-env.

The partial becomes:

<script type="text/javascript">
  {{ $environment := (getenv "NODE_ENV") }}
  {{ if (eq $environment "production") }}
    (function(canonicalURL){
      // redirectPageIfNeeded() code as above
    })("{{ .Site.BaseURL }}");
  {{ end }}
</script>

Now we can run hugo -w and the redirects are disabled, but when we run npm run final to generate the static files for our site, the redirect is baked in.

Many thanks to Tristan Ludowyk for his discussions on browser caching, which led to the observation (with Wireshark) that the JavaScript redirect is best placed in the <head>.

Questions? Drop a comment below.

If you enjoyed this post do share it with your friends and colleagues below and give me a shout on Twitter.


comments powered by Disqus