Blog Analytics

Let’s get some metrics!

So before even publishing my new hexo blog on the real internet I thought that a first requirement was to get some analytics in place so I can see utilization. I figure and suspect that the whole injecting javascript and calling home tends to freak people out and is probably pretty prone to a cat a mouse game of evasion (therefore more ops time to keep it working) therefore I have opted for log analytics. I will achieve this using a docker “on-premise” instance of Matomo (formally Piwik).

Implementation details

I will use for 2 docker containers:
- Matomo
- MariaDB

First I spun up the MariaDB container:

sudo docker run --name matomo-mariadb -v /my-location-for/matomo-mariadb:/var/lib/mysql -e MARIADB_ROOT_PASSWORD=supersectetpw -d mariadb:10.6

Then I connected to the MariaDB and set it up (i was not born knowing this, rather followed this guide)

Connected to container:

sudo docker exec -it matomo-mariadb /bin/bash

Connected to MariaDB as root:

root@338bfaed46da:/# mysql -u root -p
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 10.6.11-MariaDB-1:10.6.11+maria~ubu2004 mariadb.org binary distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

Created a new database:

MariaDB [(none)]> CREATE DATABASE matomo;
Query OK, 1 row affected (0.001 sec)

Created a user:

MariaDB [(none)]> CREATE USER 'matomo'@'localhost' IDENTIFIED by 'anothersecretpassword';
Query OK, 0 rows affected (0.007 sec)

Gave the user power:

MariaDB [(none)]>  GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, INDEX, DROP, ALTER, CREATE TEMPORARY TABLES, LOCK TABLES ON matomo.* TO 'matomo'@'localhost';
Query OK, 0 rows affected (0.020 sec)

And gave the user an extra boost with file privileges (something about being faster, and faster is always better. wait my mom always said haste makes waste. now i am not so sure this was good, but it is already done)

MariaDB [(none)]> GRANT FILE ON *.* TO 'matomo'@'localhost';
Query OK, 0 rows affected (0.004 sec)

Ok! now I have a DB let’s fireup an on-premise Matomo:

sudo docker run -d -p 8080:80 --link matomo-mariadb:db -v /my-location-for/matomo:/var/www/html --name matomo matomo

and cross your fingers and connect to it to configure it via the web interface:

you need to introduce Matomo to the MariaDB that was configured:

and it will connect and configure the DB:

then you give some details about the site you want to generate analytical data about:

and it is ready to roll:

Now that we have Matomo running and ready we need to send it the webserver log files.

First create a token to use to upload the files from config-> Personal -> Security:

Then grab import_logs.py and point it to the log files and your Matomo and let it do its parsing magic:

./import_logs.py --token-auth=YourtokenGoesHere --idsite 1 --url=http://MatomoIPgoesHere/ --enable-http-errors --enable-http-redirects --enable-static --enable-bots /path/to/log/access.log

If everything goes well the importer will generate a summary of the results:

0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
Parsing log /sd/pv/blog/log/access.log...
84 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
84 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)

Logs import summary
-------------------

80 requests imported successfully
0 requests were downloads
4 requests ignored:
0 HTTP errors
0 HTTP redirects
4 invalid log lines
0 filtered log lines
0 requests did not match any known site
0 requests did not match any --hostname
0 requests done by bots, search engines...
0 requests to static resources (css, js, images, ico, ttf...)
0 requests to file downloads did not match any --download-extensions

Website import summary
----------------------

80 requests imported to 1 sites
1 sites already existed
0 sites were created:

0 distinct hostnames did not match any existing site:



Performance summary
-------------------

Total time: 2 seconds
Requests imported per second: 33.35 requests per second

And I have some data in Matomo! Disclaimer: I have never used Matomo before so will have to spend some time investigating what all this is:

I then setup my cron to run import_logs.py at 00:00 every night and then run logrotate at 12:01 and now i can sit back and peruse my statistics that are quite sparse at this point as i am the only one using this blog, but that was the whole idea, get this in place before I actually have any visitors so I appreciate the growth!

Thanks for reading and feel free to give feedback or comments via email (andrew@jupiterstation.net).