Observability

📢   All about Observability (o11y), Prometheus, Grafana, et al:

Questions
Tips & tricks
Success stories
Failures
Use-cases / Showing off
Ideas / Design discussions

🗨   Join the chatter in

⛔   Hate speech, bigotry and NSFW content will not be tolerated.

Members

Posts

Active Today

Created

2 yr. ago

Sort

Observability @lemmy.ml
bahmanm @lemmy.ml
1y ago
ANN

lemmy-synapse v1.0.0

github.com Release v1.0.0 · bahmanm/lemmy-synapse
The very first stable version of lemmy-synapse 🎉 What is lemmy-synapse A humble bundle of observability and monitoring for your Lemmy cluster. Read more ... What's included Docker stats dashboard ...
cross-posted from: https://lemmy.ml/post/7353834
lemmy-synapse is a light-weight observability and monitoring stack for Lemmy servers.
Using Prometheus and Grafana, it allows the admins to visualise and query the stats of their instance. v1.0.0 comes out of the box with 3 detailed dashboards:
Host stats (CPU, RAM, disk, network, ...)
PostgreSQL stats (connections, locks, transations, queries, ...)
Docker stats (container CPU, RAM, disk, network, OOM signals, ...)
It runs as Docker compose cluster alongside the Lemmy cluster and does not require any changes to it in most cases. Uninstalling lemmy-synapse is as easy as tearing down its cluster and deleting its installation directory.
Got questions/feedback? Pray drop a line:
#lemmy-synapse:matrix.org
[email protected]
0
Observability @lemmy.ml
bahmanm @lemmy.ml
2y ago
HELP

Grafana - Manage changes between multiple environments
I'm using Grafana for one of my hobby projects which is also deployed to a public-facing server.
I am the only user of Grafana as it is supposed to be read-only for anonymous access.
My current workflow is:
1. Run Grafana locally.
2. Make changes to local dashboards, data-sources, ...
3. Stop local Grafana.
4. Stop remote Grafana.
5. Copy local grafana.db to the remote machine.
6. Start remote Grafana.
7. Goto (1)
However this feels terribly inefficient and stupid to my mind 😅
To automate parts of this process, I tried gdg and grafana-backup-tool.
I couldn't get the former to work w/ my workflow (local storage) as it barfed at the very start w/ the infamous "invalid cross-device link" Go error.
The latter seems to work but only partially; for example organisations are not exported.
❓ Given I may switch to PostgreSQL as Grafana's DB in the near future, my question is, what is
0
Observability @lemmy.ml
bahmanm @lemmy.ml
2y ago

andydote.co.uk Tracing: structured logging, but better in every way
It is no secret that I am not a fan of logs; I’ve baited (rapala in work lingo. Rapala is a Finnish brand of fishing lure, and used to mean baiting in this context) discussion in our work chat with things like: If you’re writing log statements, you’re doing it wrong. This is a pretty incendiary stat...

cross-posted from: https://lemmy.ml/post/5287125
TLDR; The author argues that free-form logging is quite useless/expensive to use. They also argue that structured logging is less effective than tracing b/c of mainly the difficulty of inferring timelines and causality.
I find the arguments very plausible.
In fact I very rarely use logs produced by several services b/c most of the times they just confuse me. The only time that I heavily use logs is troubleshooting a single service and looking at its stdout (or kubectl log.)
However I have very little experience w/ tracing (I've used it in my hobby projects but, obviously, they never represent the reality of complex distributed systems.)
Have you got real world experience w/ tracing in larger systems? Care to share your take on the topic?

0
Observability @lemmy.ml
bahmanm @lemmy.ml
2y ago

SOLVED HELP

Prometheus - Convert series to gauge
Update
Turned out I didn't need to convert any series to gauges at all!
The problem was that I had botched my Prometheus configuration and it wasn't ingesting the probe results properly 🤦‍♂️ Once I fixed that, I got all the details I needed.
For posterity you can view lemmy-meter's configuration on github.
Original post
I'm using blackbox_exporter to monitor a dozen of websites' performance. And that is working just fine for measuring RTT and error rates.
I'm thinking about creating a single gauge for each website indicating whether it is up or down.
I haven't been able to find any convincing resource as to if it is mathematically correct to convert such series to guages/counters - let alone how to do that.
So my questions are
- Have I missed a relevant option in blackbox_exporter configurations?
- Do you rec
0

0 active users

Update

Original post