Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "grafana"
-
Go home grafana, you're drunk....
Maybe I shouldn't run 50 containers on a system with 2 cores and 6 GB RAM.4 -
Pro tip:
Make sure you can RECOVER from your backups.
It's all well and good backing this and that up, but make sure that when the shit really hits the fan you can recover.
I've now 4 days into recovering a raspberry pi that ran:
Pi-hole
Snort
DHCP
VSFTP
Logwatch
Splunk forwarder
Grafana
And serveral other things... I've learnt my lesson4 -
Things you don't want to see at night
Ripped out of Netflix-Mode by a Warning notification and currently monitoring further development
Green line is temperature, blue humidity. Temperature rises at ~1°/10min, but seems to flatten just now. ~0.6°C to go and I'll have to head out. I'm thinking one of the ACs failed, but states are fine. Never trust a single information source for critical infrastructure guys15 -
Just released version 1 of my first API! For this project I did everything the way I wanted to, no shortcuts! I documented the shit out of every endpoint and parameter. Everything is throughly tested and it’s dockerized. I also have metrics for each endpoint (with Grafana in the frontend, which I love) as well as alerts in case it would go down for some reason.
I prepared all of this before deploying it out into the wild and damn, it feels so good. Probably no one will use it but I don’t care. It’s one of those projects where you have to force yourself to go to bed at 2 AM.
Just some thoughts. Don’t really have any techie friends so figured maybe someone here recognizes that feeling. Also I wrote it in Python, such a pleasant language.11 -
This begs for a rant... [too bad I can't post actual screenshots :/ ]
Me: He k8s team! We're having trouble with our k8s cluster. After scaling up and running h/c and Sanity tests environment was confirmed as Healthy and Stable. But once we'd started our load tests k8s cluster went out for a walk: most of the replicas got stoped and restarted and I cannot find in events' log WHY that happened. Could you please have a look?
k8s team [india]: Hello, thank you for reaching out to k8s support. We will check and let you know.
Me: Oh, you're welcome! I'll be just sitting here quietly and eagerly waiting for your reply. TIA! :slightly_smiling_face:
<5 minutes later>
k8s team India: Hi. Could you give me a list of replicas that were failing?
Me: I gave you a Grafana link with a timeframe filter. Look there -- almost all apps show instability at k8s layer. For instance APP_1 and APP_2 were OK. But APP_3, APP_4 and APP_5 were crashing all over the place
k8s team India: ok I will check.
<My shift has ended. k8s team works in different timezone. I've opened up Slack this morning>
k8s team India: HI. APP_1 and APP_2 are fine. I don't even see any errors from logs, no restarts. All response codes are 200.
Me: 🤦♂️ .... Man, isn't that what I've said? ... 🤦♂️5 -
online coding exams.
Ask me how to do a rest api, ask me how to do a certain visual in the website, ask me how to setup a docker service running grafana, please just ask me something about the actual job.
Dont ask me to create some mind game that was ambiguously phrased in a timed hackerrank question that expects me to write runnable solutions that pass all test cases.
I have way too much work to play around with hackerrank for weeks so i prepare for your useless test3 -
All these super expensive and fancy enterprise tools. CloudWatch, AppDynamics, Grafana, Splunk and whatnot. Spent a month trying to figure out why the fuck the app does not perform well.
Took 1 day with tcpdump, awk and gnu utils to figure out why.
Should anyone need a tcpdump analyzer -- try my awk script. Shows response times of each network call w/o impacting app performance :)
https://gist.github.com/netikras/...14 -
Attention guys and gals! If you are using grafana in your home setup, update it asap to 4.6.4 or 5.2.3. versions before those two are affected by an authentication bypass vulnerability. CVE 2018-15727
In the meanwhile, my nginx config is blocking everything but the LAN ips :) -
From now on I am administrating multiple servers in our company and monitoring is one thing our infrastucture lacks...almost completely. At least, useful monitoring.
Installing netdata or Grafana and integrate it with chat is definitely a solution, but what happens if the whole server just shuts down (very stupid scenario I know)? Well, it is easy, there will be no alert about the failure.
So, that's where I was wondering if there is a tool or even better plugin for netdata or Grafana, that enables remote monitoring from another server? I surely can write a simple script to check the server availability but having the whole monitoring tool on a single server instead of 5+ would be also easier to maintain and setup.10 -
Time to get going properly with ansible, consul and docker swarm.
Idea is first to convert tinc to a container, which automatically sets itself up based on previous consul announced tinc nodes.
Consul to keep track of all the nodes with prometheus too and hopefully auto attach to grafana.
Ansible to set up new nodes right with DO API, announce to consul, pull docker images and join the docker swarm master.2 -
When IT is like : hey our new grafana is at this place "some URL"
I submit a bug ticket: "I can't see metrics about this server that has been running for a while"
Their comment on the ticket : the URL to the old grafana -
Using grafana together with tinc+promotheus, has been a blast.
Initially I wanted to get into ELK with Kibana and all that, but that required 8G of ram, the instructions to get it running in the open source "mode" was nearly non-existent, together with all the ready docker compose stacks out there simply not working or the images being broken.
I'm sure I could've managed around most of those issues, but the fact it is as hungry as gitlab, made it a literal no-go for the usual server resources my clients host or my own scaled down server recently.
Thankfully I remembered that there's grafana and me having experimented some time ago with tinc, so I can have very lightweight beat'esque prometheus agents deployed listening on tinc local net only, with the typical nginx auth and some whitelists to all of the servers I host and all those of my clients.
The dashboard creation was especially great in grafana (tbf promotheus does actually most of it), literally what I always wanted out of those "complicated" solutions, that do it all, but have no proper query language, complex documentation, heavy collectors with no properly named data points, expensive resource runtimes, ..
with grafana I can just easily put dashboards into folders, create users to look only at certain stats or even dashboards (opened up some interesting contracts actually, because now I can also offer proper monitoring for all things delivered), easily drag and drop around stuff to fit more information (most others fix you to a small 3x2 grid, a too big grid for a TV or simply non resizable tiles, making that one counter take up an entire row) and resize to my hearts desire
tinc of course allows me to easily create private networks that are resistant to failure across any region and the routing is done for me, so I don't have to run around it all that much either
P.S: a damn tiny fly went into one of my now 4 monitors and died right in the middle, because I thought it's just some dirt and I pressed it in while trying to wipe it off, so that monitor now serves as the top most on a vesa mount5 -
*Frustrated user noises* Whyyyy, Grafana, why don't you implement any actual query forgery checks?!
So long as a user has access to the Grafana frontend, they can happily forge the requests going off to the backend, and modify them to return *whatever* data they want from the datasource.
No matter that they're a read-only user. That only stops them from modifying the dashboard definitions on the frontend, but doesn't enforce any sort of immutability on the BE...
If anyone had any tips on how to further secure it, I'm curious...5 -
Just built in half a day a OpcUA data logger with Grafana, InfluxDB and Telegraf. The same functionality was developped over like 4 years in house. Mostly because no one here is from IT but from OT (operational technology).3
-
Thoughts on the idea of including links/query starters for debugging or where the fucking logs are in AWS, grafana etc in repository READMEs?1
-
Storytime.
The Prometheus tales
Part IV - A new FUBAR.
A new and very fascinating problem emerged a few days, after feeding some node definitions to the new titan instance.
It's a storage fuck-up. A major one.
If I'm informed correctly, the latest prometheus should have the same (or even better) log compression algorithms for metrics, as the old one - because these fuckers are so damn good at what they are doing: compress some fucking logs.
The new instance is agregating metrics as planned. Grafana work's like a fucking charm.
Nethertheless, because of very fascinating but unknown reasons, the new instance creates 50GB of metrics in under 4 fucking hours.
Am I missing something here? Some magic parameter that has to be passed to the titan, that enables the hardcore compress-them-fuckers-feature?
Debugging session is tomorrow.
To be continued. -
Half a day wasted. FUCK!
I use grafana loki and mimir/prometheus for telemetry. A few days ago I queried loki to see if logging is still working. Yesterday I changed the datasource to mimir, changed the query parameters to get metrics from another env, ran the query, and... Querier [mimir] crashed.
Wtf.
Error says it got too much data to chew on.
So I spend 4 hours playing with the querier and grpc limits, balancing between limit errors and OOMKills [2G ram].
I got suspicious about oomk. Why would it...
Then I tried to shrink the timeframe to 15min. Still oomk. Down to 5min -- now it worked. But the number of different metrics returned was over 1k
then I look once again at the query. And ofc it is ´{env="prod"}´
turns out, forgetting that you're querying metrics with a logs' query is an expensive and frustrating mistake. Esp. at 3am.
idk why it even returned me anything...7 -
FOMO on technology is very frustrating.
i have a few freelance and hobby projects i maintain. mostly small laravel websites, go apis, etc ..
i used to get a 24$/ month droplet from digital ocean that has 4vCPUs and 8GB RAM
it was nore than enough for everything i did.
but from time to time i get a few potential clients that want huge infrastructure work on kubernetes with monitoring stacks etc...
and i dont feel capable because i am not using this on the daily, i haven't managed a full platform with monitoring and everything on k8s.
sure u can practice on minikube but u wont get to be exposed to the tiny details that come when deploying actual websites and trying to setup workflows and all that. from managing secrets to grafana and loki and Prometheus and all those.
so i ended up getting a k8s cluster on DO, and im paying 100$ a month for it and moving everything to it.
but what i hate is im paying out of pocket, and everything just requires so much resources!!!!3 -
Grafana managed alerts are so fucking over complicated it surprises me that it is a professional product.
And if you need help, the grafana community forum is a fucking ghost town
And the docs suck too8 -
Honest question. When do you consider yourself a "Big data engineer"?
Today I managed to create a system that collects historical metrics from monitoring tools every 5 minutes and do all sorts of crazy transformations to make them ingestible by grafana Mimir in OTLP protocol. Doing 600gb a dat, millions of active time series, .... And I still feel it's, "small"
Thoughts?5 -
I had a discussion - no, it was more a lobotomy - with one of our "experts"
I was kinda confused, as he had several grafana tabs open and an query editor...
He explained to me that he debugs and optimizes his query based on the grafana data....
Elasticsearch cluster with several hundred, different indices, > 20 TB data
I explained to him the scrape interval of 5secs, that he cannot distinguish his query from other queries, that there is far too much of an interference... Let alone that a 5 sec scrape interval is a very loooong time.....
Nope. It makes perfect sense to him and he'll continue to work like this. -
What do you use for performance monitoring on your infrastructure?
My company uses zabbix, OpenNMS and Nagios to monitor different parts of our infrastructure (from shared web hosting to OCCAS to IPTV to FutureVoice to Atlassian servers) but has no real-time performance checks.
I’ve set netdata master with prometheus backlog and grafana dashboards to monitor different metrics, however I am not sure whether any better approach could be done. Any suggestions?2 -
After the conversation, the real good way was already provided:
Prometheus exporter: https://github.com/prometheus/... (https://blog.opstree.com/2018/12/... for more details)
Overview: https://devconnected.com/complete-m...1 -
i just took a ride with my brand new bitch (3rd one in the roster) for our 2nd date. we kissed on first and now on second date she loves me.
while kissing, she cant stop kissing me. she said she never fucked a guy as friends w benefits but im pushing her to do it for the first time.
I'm the motherfucking slut maker.
I'm the creator of sluts now.
I take control.
I turn good girls into sluts rn cause my aura is beyond the universe and all these bitches can feel it by saying "i have that something" whatever it is
Now, i cheated on my blonde ex whore, and on my brand new gf, with this 3rd girl (I'll cheat on the 3rd girl too)
I will break as many females psychologically as possible and that is the price they have to pay for the psychological damage caused to me by my blonde ex whore.
I'm turning into a player rn and I'll fuckk all of them
They are all obsessed w me cause I'm different from the rest
They cant resist to let me fuck them
My aura attracts them
Because my behavior is nonchalant
I am on a great arc
2025 looks promising as Fuck
Also my current job offered me to work on another projects as a senior DevOps engineer which finally includes rancher kubernetes grafana prometheus harbor splunk etc, which pays me 4-8k euros a month
life is finally starting to become better but i went through Fucking hell to get here!
I got whores and i got money.
Im almost stress free.
The only thing left is to get more whores (3 arent enough, i need a roster of at least 10+ to be on the safe side), and i need to become a millionaire from theft in crypto
Then i fk 100s of whores all day and drive fast bmws
Btw i was driving my new hoe in my bmw late at night rn and a c63 coupe raced me. That mf gapped me! So i put sport + nitro mode on and gapped that mf so he quit
My bitch was holding my hand and said he gave up (but he actually let me win lol cause he saw i was with a bitch) i cant race a c63 coupe with a base model bmw bruh🤣
while we were kissing in the car (3rd bitch) i was leaking so much fucking precum (i fucked and cummed 3 times my blonde ex whore prior to this on the same day), and i was still horny af. this bitch got my dick rock solid hard
so then i came back at my blonde ex whore to grab my laptop and i kissed her, literally 2 minutes after tongue licking the 3rd side link
my ego is so fucking high and it will only get higher from here
it feels so good having aura, beast car and a roster of whores.
my day today was so fucking wild and random
my life is finally starting to make sense and become worth living
whores, money and fast cars is all i need in my life
(my new gf whos in love with me was the least important and she had to wait for hours for my reply until i get finished fucking my blonde ex whore and taking my 3rd link on a date and kissing (next date needs to be fuckijg w my new side link))
time to search for the 4th side link
I LOVE THIS💯10 -
If anyone is looking for a great tutorial on getting started with a docker cluster check out https://dockerswarm.rocks/
I had a 4 node cluster up on Digital Ocean with Traefik + Lets Encrypt, Prometheus, Portainer, Grafana all that good stuff in under 2 hours. Not much longer to test a basic WP and Next Cloud container with full SSL. Neat stuff. Just burning through $100 credit for testing but it's been fun5 -
We've got new TV for monitoring, which auto-rotating meme page you like ? Cats, dogs, dank (sfw), dev, testing. Gimme yours !!! :)1
-
Help is needed on observability tools to use.
I’m in the trenches trying to sort out tools for observability.
Did a bit of Googling and ran into Metoro and Groundcover. Both seem pretty slick, but I’m not sure which one to roll with.
Do any of you have experience with these? How do they hold up in real-world scenarios? Would love to hear any war stories or insights.
I've been looking for Grafana as well, but it doesn't fit my budget at all.1 -
I swear, there will come a day when I stop confusing Grafana and Kibana. The two things sound too similar for their own good.3
-
I'm tired and stressed and it's friday
all my work is done that is required for monday, i should do testing and code cleanup, but i'm burned out so instead i'm gonna play with grafana and see what I can do with it, seems cool and something more interesting to do than code cleanup and wanting to cry2 -
Just discovered wizzy ... Wow, freaking sweet!
https://github.com/utkarshcmu/wizzy
I like it for many reasons, just started playing with it, therefore #1 reason so far is saving dashboards and having them in a git version control, yay!!!
Also, if you're not familiar with Grafana, let me blow your mind: http://grafana.org4 -
Some really motivated guy.
He apparently wants to monitore his opensource application on his spare time.
His application is likely to have no users though.
But well, that guy looks like kinda montivated.
For professional purpose, guy already did monitore with newrelic.
Seems like he was not satisfied and switched to datadog 3 years ago.
But liking digging dirt, he migrated to self hosted telegraf/influx/grafana (which he likes to about)
Today that guy is not in his company but on his potatoe machine in the cloud. So he wants to be minimalistic, datadog should do.
Now you got it, random ff*** is me, on a weekend, a shinny saturday for that matter.
Actually now it is night.
Now let's start the fight.
I have datadog scripts!
But datadog be sneaky as well. datadog upgraded to v6 8=)
-> scripts ain't working. outdated.
I check the logs. Too bad!
-> datadog removed dogstatsD.log in v6!
Well I have nothing to do in my life it is too cold outside as they say. I read the (sluggy) datadoc and tries some shell command (given in doc) to upload some events to dogstatsd (via udp).
-> Nothing happens, neither in local nor in remote.
ok maybe command not up to date, so let me try some official library. datadog from python. Feels like a nice try!
-> only available for python >= 3.5. 3.4 on my good ol' jessie. Upgrading os for datadog not acceptable.
Maybe dogstatsD not started... doc says it is by default, but well, not the first time doc is wrong... I put datadog as log verbose. Guess what: as per standard: shitload of error.
Digging... kubexx, docker and whatsoever apparently preventing collector to do its normal stuff
np, I am gonna check that on github! Goog, people have the same errors. They seem to fix it by trying some settings, with. or without luck
-> I am not that warrior to check every stuff
Ok, let's stop the datadog events, it works. It does not anymore. You know that sentence. We all know it.
Still not enough!
How about testing that uber super nice feature of v6. The logs. After all I want to make events out of my applicative logs.
How about reading the log again. Configure the yaml log as they say. Done. Make some pattern. Read the best practive. Done. Configures the yaml. Done. Now testing.
-> remote datadog interface be like: no logs for you dude you need to pay
ff***f*f*f
Fuck datadog, fuck that v6 version, good old tail -Fxx | someaggreate.js|sendmail will do...