Docker issues solved, Prometheus/Grafana setup, happy.

Yesterday I managed to make the selector work, but today it's not working. It appears I didn't save the dashboard or something.

I'm trying to recreate it and I think I did the same thing I did yesterday, but now the stream selector is not returning any results. I'm pretty pissed off.

I restarted docker and now it works again. I'm really confused.

Okay, I'll just move on. I can see logs per container on a nice UI, great.

What I would want to achieve next:

  • monitor high cpu/network usage per container
  • monitor error rates per container
  • set up alerting for high cpu/network usage

So the next step is setting up Prometheus.

Ok, I added a bunch of stuff:

  • Node Exporter for monitoring general system metrics
  • cAdvisor for collecting docker container metrics
  • Prometheus for collecting it all

I messed around with the configuration a bit, and it works! I can query for metrics from the Prometheus web interface. Now, I need to hook it up to my Grafana and create some useful dashboards.

Done! I can now track logs, cpu/memory/disk usage, network in/out, and disk read/write per container from a single grafana dashboard.

I'm really happy with the result, and I'm ending it here on a positive note.

Next up, I would like to add a second dashboard that will display the general status of the whole machine. And lastly, I would like to set up alerts for when some thresholds are broken. Any other dashboards I'll add as necessary.

I also need to figure out how to back this up and save with source control.

Something to do for the following week or so.