Docker issues solved, Prometheus/Grafana setup, happy.
Yesterday I managed to make the selector work, but today it's not working. It appears I didn't save the dashboard or something.
I'm trying to recreate it and I think I did the same thing I did yesterday, but now the stream selector is not returning any results. I'm pretty pissed off.
I restarted docker and now it works again. I'm really confused.
Okay, I'll just move on. I can see logs per container on a nice UI, great.
What I would want to achieve next:
- monitor high cpu/network usage per container
- monitor error rates per container
- set up alerting for high cpu/network usage
So the next step is setting up Prometheus.
Ok, I added a bunch of stuff:
- Node Exporter for monitoring general system metrics
- cAdvisor for collecting docker container metrics
- Prometheus for collecting it all
I messed around with the configuration a bit, and it works! I can query for metrics from the Prometheus web interface. Now, I need to hook it up to my Grafana and create some useful dashboards.
Done! I can now track logs, cpu/memory/disk usage, network in/out, and disk read/write per container from a single grafana dashboard.
I'm really happy with the result, and I'm ending it here on a positive note.
Next up, I would like to add a second dashboard that will display the general status of the whole machine. And lastly, I would like to set up alerts for when some thresholds are broken. Any other dashboards I'll add as necessary.
I also need to figure out how to back this up and save with source control.
Something to do for the following week or so.