Hugo

After:

I'm now switching my blog to Hugo (0.15). Hugo is pretty much like Jekyll, it's a static site generator, which is a very cool thing (serving pages is fast, security issues reduced to the minimum…). But I had some issues with Jekyll:

  • it's written in Ruby, so it's painfully slow
  • it's installed with RubyGems, which messes up with the operating system (and is painfully slow)
  • hard to maintain (something's breaking somewhere at each new version)

Hugo is written in Go and therefore is much faster. On my server Jekyll takes around 21 minutes to build my blog. Hugo takes about 7 seconds.

Things I'm interested in the tools to power my blog:

Installation

Mac OS X

With brew already installed:

$ brew install golang hugo
$ sudo easy_install pip
$ sudo pip install Pygments

Pygments is a Python library used for syntax highlighting. I was using the same in Jekyll.

Debian

$ sudo apt-get install golang python-pygments
$ wget https://github.com/spf13/hugo/releases/download/v0.15/hugo_0.15_amd64.deb
$ sudo dpkg -i hugo_0.15_amd64.deb

Workflow

There are several way you can work with Hugo. I keep the one I was using with Jekyll:

  • write posts on my computer
  • commit them in git
  • push my commit on the git repository on my server
  • a git hook on the server (hooks/post-receive) checks-out the changes and build the blog
  • the blog is served by apache

Migration

There are several scripts to help migrating but they didn't work for me. Anyway, the migration process from Jekyll to Hugo is not too complicated.

Of course I had to fix some of the content in blog posts, some of the tasks where easily done by find and sed. For example, here are the commands I used to fix the date format and the highlight tags:

$ find . -name "*.md" -exec sed -i '' 's/^date: \([0-9\-]*\) \([0-9:]*\) \([+0-9:]*\)/date: "\1T\2\3"/' '{}' \;
$ find . -name "*.md" -exec sed -i '' -e 's/^{% highlight \([a-zA-Z]*\)\([ a-zA-Z]*\)%}/{{< highlight \1\2>}}/' -e 's/^{% endhighlight %}/{{< \/highlight >}}/' '{}' \;

Misc

Static comments

Comments are user generated data, so they can't be handled by a static website. The idea behind static comments is that the user sends an email with his comment inside, then the comment is manually (can be partly automatized) added to the blog post. Obviously it's not a valid solution for high traffic websites, but it's better than going through a third party (like Disqus) to host the comments. An other solution is to host a comment system yourself (like Discourse).

There is an interesting thread on static comments on Hugo website, but no good solution yet.

I decided to go with Hugo Data Files. I created a Comments folder inside the Data folder and a YAML file for each post having comments.

For a given post I would create a file name post-name.yaml like this:

Post: /permalink/to/post-with-comments.html
Comments:
  - Id: 1
    Date: "2008-04-20T18:41:00+02:00"
    Name: "Anonymous"
    Content: |
      This is a multiline comment.
      
      You can put `markdown` syntax in it.      
  - Id: 2
    Date: "2008-04-22T14:41:00+02:00"
    Name: "Somebody"
    Content: "That's very interesting."

Then in the post layout I retrieve comments for current post:

{{ $url := .RelPermalink }}
  {{ range .Site.Data.comments }}
    {{ if eq .Post $url }}
      <ul>
        {{ range .Comments }}
          <li id="comment-{{ .Id }}">
            <div class="comment-stamp">
              <span class="comment-author">From {{ .Name }}</span>
              <span class="separator">&middot;</span>
              <time datetime="{{ dateFormat "2006-01-02T15:04:05Z07:00" .Date | safeHTML }}">{{ dateFormat $.Site.Params.DateFormat .Date }}</time>
            </div>
            <div class="comment-content">{{ index . "Content" | markdownify }}</div>
          </li>
        {{ end }}
      </ul>
    {{ end }}
  {{ end }}

Unfortunately I haven't found how to use Hugo shortcodes inside comments. So no syntax highlighting in comments 😟.

Static search engine

When I was using Jekyll, I used YaCy to index my blog. But running a big Java application for that was a bit overkill.

Again there is a good discussion about how to implement site search on Hugo. For comments the common solution is to use a third party service and it's the same solution for search too. Usually people suggest Google Custom Search or a DuckDuckGo search box.

I was thinking about something along the line of generating a Lucene index on the server and having some JavaScript library to search in it… and somebody gave that exact answer: generate a JSON index on the server and use lunr.js in the browser to parse and search inside that index.

At build time, generate a JSON file that contains all posts content. For that, create a new type (called json for instance) and create a template for that type:

[
  {{ range $index, $page := .Site.RegularPages }}
    {{ if ne $page.Type "json" }}
      {{ if $index }},{{ end }}
      {
        "href": "{{ $page.RelPermalink }}",
        "title": "{{ $page.Title }}",
        "tags": [{{ range $tindex, $tag := $page.Params.tags }}{{ if $tindex }}, {{ end }}"{{ $tag }}"{{ end }}],
        "content": "{{ range $page.PlainWords }}{{ replace . "\\" "" }} {{ end }}"
      }
    {{ end }}
  {{ end }}
]

In the browser, fetch the JSON file, load it in lunr.js. See: https://gist.github.com/sebz/efddfc8fdcb6b480f567

There is one big problem, parsing the index file is slow, on my computer it freeze the browser interface for a couple of seconds (I have ≈ 250 posts, the index is about 500 kB). I did a simple modification to parse the index only when the user click on the search box, that way the browser freeze only when the user is going to search, and not on each page load.

I should check if there could be a way to generate the result of the parsing on the server and initialize lunr.js with it directly. That could help improve performances.

Minification

Minification is to reduce the size of files by removing unnecessary content. The goal being faster delivery by lowering the bandwidth consumption.

Unfortunately, Hugo does not provide anything for minification.

I found out that there was node.js minification libraries for everything I wanted and that all the tasks could be automated with grunt. I had no knowledge about node.js, grunt, npm before that. The start point being installing the JavaScript package manager: npm (with brew on Mac OS, apt on debian).

My package.json ended-up like that:

{
  "name": "blog-desgrange-net",
  "version": "0.1.0",
  "dependencies": {
    "grunt": "^0.4.5",
    "grunt-cli": "^0.1.13",
    "grunt-contrib-cssmin": "^0.14.0",
    "grunt-contrib-htmlmin": "^0.6.0",
    "grunt-contrib-uglify": "^0.10.1",
    "grunt-json-minify": "^0.4.0",
    "grunt-xmlmin": "^0.1.7",
  }
}

And my Gruntfile.js:

var path = require('path');

module.exports = function(grunt) {
  grunt.initConfig({
    pkg: grunt.file.readJSON("package.json"),
    cssmin: {
      options: {
        
      },
      build: {
        files: [{
          expand: true,
          cwd: "public",
          src: "**/*.css",
          dest: "public"
        }]
      }
    },
    htmlmin: {
      build: {
        options: {
          removeComments: true,
          collapseWhitespace: true,
          collapseBooleanAttributes: true,
          removeAttributeQuotes: true,
          removeRedundantAttributes: true,
          useShortDoctype: true,
          removeEmptyAttributes: true,
          removeIgnored: true
        },
        files: [{
          expand: true,
          cwd: "public",
          src: "**/*.html",
          dest: "public"
        }]
      }
    },
    uglify: {
      options: {
        mangle: true,
        compress: true,
        preserveComments: false
      },
      build: {
        files: [{
          expand: true,
          cwd: "public",
          src: "**/*.js",
          dest: "public"
        }]
      }
    },
    "json-minify": {
      build: {
        files: "public/**/*.json"
      }
    },
    xmlmin: {
      build: {
        files: [{
          expand: true,
          cwd: "public",
          src: "**/*.xml",
          dest: "public"
        }]
      }
    }
  }
  });

  grunt.loadNpmTasks("grunt-contrib-cssmin");
  grunt.loadNpmTasks("grunt-contrib-htmlmin");
  grunt.loadNpmTasks("grunt-contrib-uglify");
  grunt.loadNpmTasks('grunt-json-minify');
  grunt.loadNpmTasks('grunt-xmlmin');

  grunt.registerTask("default", ["cssmin", "htmlmin", "uglify", "json-minify", "xmlmin"]);
};

So when typing the grunt command, it looks in the public directory created by Hugo and minimize CSS, HTML, JS, JSON and XML files.

The highest gain is on JavaScript files since it can rewrite part of the code. On the other type of files, minification is less interesting, it's mostly about removing white spaces.

But JavaScript is slow and having npm defeats a bit the goal of getting rid of rubygems (another package system to think to update on my system). That could be nice to have all that available in standard Linux commands and organize all that in a bash script (I hate bash, but it's good enough for that kind of things).

Otherwise it could be great to have all those tasks available in Hugo, so it would be faster and easier to maintain.

Pre-compression

On HTTP the content can be compressed in gzip format to minimize the bandwidth. The issue is that compressing data for each request takes CPU time. So pre-compression is to store the compressed static data and deliver it directly (compressing beforehand and not compressing on the fly).

Usually data are compressed with gzip (which is a good trade-off between compression ratio and CPU time) but since we are doing pre-compression, we can go with Zopfli, which compress data to the gzip format with a better compression ratio (3 to 8 %) but takes much longer (80 times).

First install Zopfli:

$ # Mac OS:
$ brew install zopfli
$ # Debian:
$ sudo apt-get install zopfli

And there is a grunt plugin for it, add "grunt-zopfli": "^0.3.2" in package.json and a task like this in Gruntfile.js:

zopfli: {
  options: {
    report: false
  },
  build: {
    files: [{
      expand: true,
      cwd: "public",
      src: "**/*.+(css|html|js|json|otf|svg|ttf|xml)",
      dest: "public",
      rename: function(dest, matchedSrcPath, options) {
        return path.join(dest, matchedSrcPath + ".gz");
      }
    }]
  }
}

Otherwise you can do the same in bash:

$ find -E public -regex ".*\.(css|html|js|json|otf|svg|ttf|xml)" -exec zopfli '{}' \;

With parallel to speed it up a bit (here with -j4 option to run it on 4 threads):

$ find -E public -regex ".*\.(css|html|js|json|otf|svg|ttf|xml)" | parallel -j4 zopfli {}

Not all type of files benefit from compression. Files like JPEG, PNG, WOFF… are already compressed so there's not much space to save by compressing them again.

Next is to tell Apache to not do on-the-fly compression and deliver the gz file instead of the uncompressed one (if the web browser accept gzip format). There are a lot of examples on how to do that on the net, here is the solution I took (in document root Directory in my blog's Apache config file):

SetEnv no-gzip
RewriteEngine on
RewriteCond %{HTTP:Accept-Encoding} \b(x-)?gzip\b
RewriteCond %{REQUEST_URI} .*\.(css|html|js|json|otf|svg|ttf|xml)
RewriteCond %{REQUEST_FILENAME}.gz -s
RewriteRule ^(.+) $1.gz [L]
<FilesMatch \.css\.gz$>
  ForceType text/css
  Header set Content-Encoding gzip
</FilesMatch>
<FilesMatch \.html\.gz$>
  ForceType text/html
  Header set Content-Encoding gzip
</FilesMatch>
<FilesMatch \.js\.gz$>
  ForceType application/javascript
  Header set Content-Encoding gzip
</FilesMatch>
<FilesMatch \.json\.gz$>
  ForceType application/json
  Header set Content-Encoding gzip
</FilesMatch>
<FilesMatch \.otf\.gz$>
  ForceType font/opentype
  Header set Content-Encoding gzip
</FilesMatch>
<FilesMatch \.svg\.gz$>
  ForceType image/svg+xml
  Header set Content-Encoding gzip
</FilesMatch>
<FilesMatch \.ttf\.gz$>
  ForceType application/x-font-ttf
  Header set Content-Encoding gzip
</FilesMatch>
<FilesMatch \.xml\.gz$>
  ForceType application/xml
  Header set Content-Encoding gzip
</FilesMatch>

No synchronization of static files

When Hugo builds the site it copies the content of the static directory into the public directory. In my case it takes 3 minutes. I decided to rename the static directory to something else so it's not copied by Hugo and do it with rsync. The first time is a bit slow (but it takes less than 3 minutes) but once it has been synchronized, the subsequent times are much faster (a few milliseconds if there are no new files).

But why copy the files at all? Just have Apache merge the content of both directories (the one with the static files and the public directory). Here is the result in my blog's Apache configuration:

RewriteEngine on
RewriteCond "/path/to/static-resources%{REQUEST_URI}" -f
RewriteRule ^/?(.*)$ /path/to/static-resources/$1 [L]
<Directory /path/to/static-resources>
  Options -Indexes
  AllowOverride None
  Require all granted
</Directory>

HTTP/2

As I was playing with Apache configuration, I activated HTTP/2 which brings better performances. HTTP/2 is supported in Apache since version 2.4.17

Configuring HTTP/2 in Apache is pretty easy: add Protocols h2 http/1.1 in the VirtualHost (or h2c on non-HTTPS connection but web browsers are probably not going to use it).

Then enable the HTTP/2 module and restart Apache:

$ sudo a2enmod http2
$ sudo service apache2 restart

Note: mod_http2 is still experimental, I'm having some issues with it so I deactivated it for now.

TLS1.3

What about TLS v1.3 which will probably bring better performance in establishing TLS connections? Well, it's for an other time as the specification is not yet finished 😉.

Conclusion

From my requirements:

  • static website: ✓
  • static comments: ✓ (but not ideal)
  • syntax highlighting: ✓ (but not in comments)
  • search engine: ✓ (but not ideal)
  • everything stored in git: ✓
  • content minification (html, xml, css, js, json): ✓ (but with external tools)
  • content pre-compression (with zopfli): ✓
  • run on:
    • Mac OS: ✓
    • Debian GNU/Linux: ✓
  • fast to build: ✓ (except for the minification/pre-compression part)

So in the end Hugo is pretty cool and fast, but there is room for improvement.

Comments Add one by sending me an email.

  • From Jean-Philippe Caruana ·

    Salut Laurent,

    pour ma part, pour mon blog jekyll, j'évite la merde issue de gem avec docker:

    $ alias jekyll
    alias jekyll='docker run --rm --volume=$(pwd):/srv/jekyll -it -p
    127.0.0.1:4000:4000 jekyll/jekyll:pages jekyll'
    

    Comme ça j'ai jekyll (barreverte mets quelque secondes, comment fais-tu pour arriver à 21 minutes ?) sans la pollution par rubygems

    ps: switch to nginx ce sera plus rapide et plus simple à configurer !

  • From Jean-Philippe Caruana ·

    également, tu peux héberger tes commentaires avec isso :

    https://posativ.org/isso/

    (plus rapide pour mettre à jour les commentaires, mais moins rapide à servir par rapport à du statique)

  • From Laurent ·

    Salut Jean-Philippe,

    Oui j'avais vu que tu avais parlé d'utiliser docker pour éviter de pourrir le system avec gem mais je me suis pas trop penché sur le sujet.

    Pour les 21 minutes de build c'est facile, il suffit d'avoir un serveur anémique comme le mien ;-) (à base de Via Nano U2250 à 1.6 GHz).

    Pour nginx, vu que sur mon serveur j'ai des trucs qui ne fonctionnent qu'avec Apache, j'ai décidé de garder ça simple et éviter de me faire du proxying entre du nginx et de l'apache en fonction des besoins, mais clairement je suis d'accord que pour le coup la configuration avec nginx aurait été plus simple.

    Je ne connaissait pas isso, c'est le genre de truc qui m'irait bien, mon système de commentaires statiques est tout de même très limité… mais pour le coup j'aimerais ça plutôt en PHP (outre le fait que je déteste PHP), histoire de ne pas avoir un serveur qui tourne en permanence à prendre de la RAM, juste des scripts exécutés par Apache lorsqu'une requête arrive.