Server-side kubernetes nginx-ingress log analysis using GoAccess

Server-side kubernetes nginx-ingress log analysis using GoAccess

I manage a single node kubernetes cluster to run some of my fun side projects and I am slowly getting rid of Google Analytics from my projects. Therefore I was looking for a server-side log analyser and GoAccess seem to have what I need, mostly.

GoAccess is a very fast open source web log analyser and interactive viewer that runs in a terminal in *nix systems or through your browser. It provides fast and valuable HTTP statistics for system administrators that require a visual server report on the fly.

GoAccess Dashboard

Parsing nginx-ingress container log

GoAccess recognises most common logs generated by IIS, Apache and nginx - However, the logs generated by nginx-ingress container that sits as a gateway to the services running on my kubernetes cluster was quite different. Here is an example log

{"log":"123.123.223.23 - [123.123.223.27] - - [30/Jul/2019:04:45:57 +0000] \"GET /about-me HTTP/1.1\" 500 117356 \"-\" \"Mozilla/5.0 (compatible; AhrefsBot/6.1; +http://ahrefs.com/robot/)\" 458 13.047 [mustakim-site-service-80] 172.17.0.17:5000 117349 13.048 500 830f5081a7740db11eb2ffbce625dafc\n","stream":"stdout","time":"2019-07-30T04:45:57.689166775Z"}

So I had to tell GoAccess how to parse these logs by providing the value for --log-format,  --date-format and --time-format command line arguments.

Arguments Value
--log-format %^ %^ [%h] - - [%d:%t] %~ %~ %m %U %^ %s %b %R %u %^ %^ %^ %^ %^ %T %^
--date-format %d/%b/%Y
--time-format %H:%M:%S +0000

Keep in mind the log json property of the container logs (usually found in /var/log/containers/) does not always contains access logs similiar to the example above. It also contains other miscelenious logs generated by the container. Also all the logs were combined into one or many files (depending on number of replicas available for the ingress deployment. So I decided whatever I do -

  • I need to grep with the service name to extract logs for a particular service.
  • If I need to generate dashboard for all requests then I'll simply grep Mozilla!

How it's done

  1. Get the name of all kubernetes services,
  2. Generate an index.html file that will contain links to each static html generated by GoAccess - in order to navigate easily. This will go to /storage/goaccess/out/ which is the root of a static web server already running.
  3. Combine all logs generated by from kubernetes nginx-ingress from /var/log/containers/
  4. grep the logs to extract logs of each of the services (this will make sure other logs are skipped)
  5. Copy extracted log to a temporary location (in my case: /storage/goaccess/imported-logs/imported-log.log)
  6. Run GoAccess and pass the log as well as instructions on how to parse them.
  7. Generated static html (that renders the nice dashboard) will go to /storage/goaccess/out/{svc_name}.html
  8. Repeat steps 3 - 7 for each of the service
  9. Repeat the above, but grep Mozilla and save as all.html so we have another dashboard for all requests in the server.

The script

goaccess-kubernetes-nginx-ingress.py
#!/usr/bin/python

import os
import subprocess

def process_log_for_svc(svc,out):
  print('Processing ' + svc)

  os.system('find /var/log/containers/ | grep nginx-ingress | xargs sudo cat | grep ' + svc + ' > /storage/goaccess/imported-logs/imported-log.log')
  
  print("Parsing...")
  os.system('goaccess -f /storage/goaccess/imported-logs/imported-log.log --real-os --log-format="%^ %^ [%h] - - [%d:%t] %~ %~ %m %U %^ %s %b %R %u %^ %^ %^ %^ %^ %T %^" --date-format="%d/%b/%Y" --time-format="%H:%M:%S +0000" > /storage/goaccess/out/' + out + '.html')

  print("Cleaning...")
  os.system("rm /storage/goaccess/imported-logs/imported-log.log")

print('Getting all services')
all_svc=os.popen('kubectl get svc | tail -n +2 | awk \'{print $1}\'').read()

all_svc_arr = all_svc.split('\n')

print('Creating index.html')
index_html=''
for svc in all_svc_arr:
  index_html += '' + svc + '
' index_html = '-ALL-
' + index_html text_file = open("/storage/goaccess/out/index.html", "w") text_file.write(index_html) text_file.close() print('Processing All Logs') process_log_for_svc('Mozilla','all') for svc in all_svc_arr: process_log_for_svc(svc,svc)

This is nowhere near perfect but It's a good start. I will keep the gist updated as I improve this.