All posts by Blogs on Grafana Labs Blog

timeShift(GrafanaBuzz, 1w) Issue 80

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2019/02/15/timeshiftgrafanabuzz-1w-issue-80/

Welcome to TimeShift

We only have a handful of general admission tickets to GrafanaCon LA before angel tickets go on sale, but you still have time to grab one of the last remaining GA tickets.

Day 2 is going to be filled with morning tracks on:

  • Real-time analytics in IoT
  • Cloud Native observability
  • SQL and business analytics
Also, we’ll have hands-on workshops and talks:

  • Become a contributor – Get started developing Grafana
  • Writing React plugins
  • Grafana feature deep dives
  • Grafana plugin demos and showcases

We’re really excited how the schedule has shaped up and hope you can join us!


Latest Beta Release: Grafana v6.0 Beta2

Beta2 is coming along nicely and stabilizing. Lots of minor fixes and enhancements added to this beta release. For a full list of changes, be sure to read through the release notes and get more in-depth info about the features in the documentation.

Download Grafana v6.0 Beta2 Now


From the Blogosphere

Visualizing the Future with Grafana: Steffen Knott and Max von Roden from Energy Weather gave a really interesting talk at GrafanaCon AMS 2018 on how they use Grafana for forecasting, quantification of risk, and how weather affects energy markets.

A Performance Dashboard for Apache Spark: This article from CERN dives into the steps for deploying and using a performance dashboard to gain insights for troubleshooting and monitoring Apache Spark workloads.

How to monitor your Kubernetes cluster with Prometheus and Grafana (The short story): See how easy it is to use Prometheus and Grafana to monitor just about any metric in your Kubernetes cluster. If you’re looking for even more info, you can take a deeper dive on the topic in the whole, long story.

[VIDEO] Node-Red, InfluxDB, and Grafana Tutorial on a Raspberry Pi: Check out all the components you’ll need and steps to take to set up your own weather tracking station.


Only a handful of GA tickets left!

Join us in Los Angeles, California February 25-26, 2019 for 2 days of talks and in-depth workshops on Grafana and the open source monitoring ecosystem. Learn about Grafana and new/upcoming features and projects like Grafana Loki, Prometheus, Graphite, InfluxDB, Kubernetes, and more.

Register Now!


Grafana Plugin Update

We have an update to the Plotly panel plugin to share as well as a new version of the Zabbix App. To update any of your plugins in your on-prem Grafana, use the grafana-cli tool, or for Grafana Cloud update with one-click.

UPDATED PLUGIN

Zabbix App – This release contains a lot of improvements and bug fixes. The most notable are the fully updated design for Problems panel (former Triggers) and support InfluxDB as Direct DB Connection datasource.

Install

UPDATED PLUGIN

Plotly Panel – The Plotly panel was updated with some minor enhancements and fixes, and tested for compatibility with Grafana v6. Updates include:

  • Fix axis range configuration bug #49
  • Add basic annotations support #57 (tchernobog)
  • Improve loading times for plotly.js and support loading from CDN
  • Assume date x-axis when ‘auto’ and the mapping has ‘time’
  • Support Fixed-Ratio Axes

Install


We’re Hiring

We’re looking for passionate people from every corner of the world who want to solve interesting and challenging problems in a fun, supportive environment. Join us! Check out all of our open positions.

View All our Open Positions


How are we doing?

We’re always looking to make TimeShift better. If you have feedback, please let us know! Email or send us a tweet, or post something at our community forum.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

Visualizing the Future with Grafana

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2019/02/14/visualizing-the-future-with-grafana/

Max von Roden and Steffen Knott

Max von Roden [left] and Steffen Knott – Energy Weather

You’ve used Grafana to visualize what’s happened in the past, or what’s currently happening. But what about what might happen in the future? Steffen Knott and Max von Roden gave a talk at GrafanaCon 2018 about how they’re doing just that at Energy Weather.

Though the German company does employ meteorologists, “it isn’t exactly a weather service,” said Knott. “You can think of us as a translator between weather and the energy business. Our job is to reduce complexity and provide specific information, such as wind power forecast for a specific region, or a quantification of risk, like a sudden drop of temperature that affects energy markets.”

The energy business is complex, and there are many different ways that weather can impact it. “Nearly every weather condition can cause reactions on both the supply and demand side of the energy market,” said von Roden. Energy Weather uses Grafana visualization “to make it easy to understand,” said Knott. “Our meteorologist has a creative mind, and his ideas challenge us to use many of the exciting features of Grafana.”

Working with Weather Forecast Data

Energy Weather’s backend system, where all the data processing and computing are done, is written in C#. The team uses several forecasts from weather models, typically looking ahead 16 days, with a resolution of 1 to 12 hours. The information is in the form of gridded forecast data sets that contain values for each grid point around the globe, with the spatial resolution between each grid point about 4 kilometers.

“These datasets are unpacked, imported, maybe interpolated, and we then use them for our internal forecast processes, and these processes output time series,” said Knott. “We use these time series to create customer specific output files mostly in some more or less weird CSV format.”

Graphs before Grafana

Graphs before Grafana

That is, until 2015, when Knott first discovered Grafana at a Chaos Communication Camp near Berlin. “I immediately thought, ‘Well, that’s it,’” said Knott. “We had been looking for a visualization solution for in-house analysis of forecasts for quite a long time.”

Before the camp ended, Knott’s team had installed Grafana along with InfluxDB as a database backend. “When Robin our weather guy saw Grafana running, it kept him smiling for days,” he said.

The first Grafana dashboard!a

The first Grafana dashboard!

The first Grafana dashboard used at Energy Weather showed the company’s internal analysis of forecasts for photovoltaic power plants. “It shows the forecast along with measurements we received afterwards,” said Knott. “You can see the clouds passing by if you look at the blue line which shows the measurement. For plants with next-to-real-time measurement data feeds, this allows even for real-time monitoring or next-to-real-time monitoring of the forecast. You can evaluate it, and you have instant feedback, which is really great.”

The Challenges of Data Volume

Though Grafana provided a vast improvement over their previous solution, the team still had to investigate whether there would be any performance changes as more and more data was added into InfluxDB. “There is a significant difference between how time series are used in monitoring, and how we use them in forecasting,” Knott explained. “For monitoring, if you look at CPU usage for a specific core, you have one long time series that spans over a long, long time, with values that might have a resolution of five minutes, one minute, or one second or so.”

But for forecasting, for example, looking at the temperature for a specific location requires a time series that spans over the next 14 days, in 15 minutes or 60 minutes resolution. New forecasts are calculated every hour, and each time, that data has to be added to the database as a new time series. The old ones aren’t overwritten, because some applications need to look at the difference from one forecast to the next, for the same point in the future. This adds up to 10,000 time series added to the database an hour, 140,000 a day, about 7 million a month.

At the beginning, “we had strict retention policies in place that would keep the database light. Every now and then we dropped time series that we didn’t need anymore by hand to keep the system responsive,” said Knott. But the team has found that performance has “improved a lot over the years, so thanks to the Influx guys, that saved us some time.”

At the same time, the Energy Weather team has found that every update to Grafana offers new possibilities. “It became more and more clear that it could be used for so much more than just internal analysis of forecasts and comparing it to measurements and so on,” said Knott.

All the Pretty Dashboards

Over the past few years, the company has begun offering Grafana as a frontend solution to customers, with about 130 easy-to-understand dashboards using the new light theme.

The first dashboard the speakers shared is used by producers of power, or grid operators, and shows the current and expected renewable energy production for the German grid.

Rewnewable energy forecast graphs

Rewnewable energy forecast graphs

“On the top left, the dark line shows the current wind power forecast,” said von Roden. “As forecasts aren’t always spot-on, we provide a yellow colored range for the most likely outcome of the forecast. The bar chart in pink shows the measurements that are provided in new time and sometimes real time, by the grid operators usually. You see it’s very close to the forecast, which was good.”

The team also added other information to the dashboard:

  1. In the text panel on the right, an animated bitmap from windy.com
  2. On the bottom left, the photovoltaic forecast seen in the same format
  3. On the bottom right, the difference from the previous forecast

“The combined wind and photovoltaic generation was nearly 20 gigawatts on average on this day; that is nearly one third of the usual demand in a winter month in Germany,” said von Roden. “On a sunny and windy day in February, maybe every third power plant might not be needed. So a dashboard like this helps suppliers optimizing generation planning, or at least show risks for possible plant down times. In short, there is an impact on the supply side.”

The second dashboard showed the temperature forecast for France using four common weather models. Temperature is the most important parameter for the demand side of the energy market.

Temperature forecast graphs

Temperature forecast graphs

“The black line in this chart shows a long-term average temperature, and this is important because demand expectations usually are based on long-term averages,” says von Roden. “As you can see, the temperature is much colder than expected in the first days. If this information is not enough for proper planning, our backend translates it into how the power demand will change.” This information is shown in the table on the bottom left of the dashboard. On the bottom right, a chart shows the power demand forecast in daily and hourly resolutions.

Applying the Data

With all the weather and energy data in its database, Energy Weather produces customized market overviews and price forecasts.

Using Grafana table panels, the team created a dashboard that shows an hourly price forecast and the models used. The forecast can be seen in green, and the final price provided by the exchange is in red.

Renewables Analogy Model

Renewables Analogy Model

Another dashboard shows the price changes using Grafana tables.

Weather Analogy vs. Best Mix

Weather Analogy vs. Best Mix

“We use the sparklines to visualize the development along with an average price for the upcoming weeks,” said von Roden. “We like these sparklines, which are an amazing simplification of visualizing something. And it’s all together in one aggregated dashboard: There’s a change of weather, translated into a change of power; and demand and supply combined, transformed into an estimated price change in the next weeks.”

It’s an unusual use case for Grafana, von Roden said, but “Grafana can indeed be used to look into the future!”

Check out the full video of Max and Steffen’s talk below and download his presentation slides.

Video: Weather, Power, and Market Forecasts with Grafana

timeShift(GrafanaBuzz, 1w) Issue 79

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2019/02/08/timeshiftgrafanabuzz-1w-issue-79/

Welcome to TimeShift

Time is running out to get your ticket to GrafanaCon LA, but you can still grab one of the last remaining seats.

Day 2 is going to be filled with morning tracks on:

  • Real-time analytics in IoT
  • Cloud Native observability
  • SQL and business analytics

Also, we’ll have hands-on workshops and talks:

  • Become a contributor – Get started developing Grafana
  • Writing React plugins
  • Grafana feature deep dives
  • Grafana plugin demos and showcases

We’re really excited how the schedule has shaped up and hope you can join us!


Latest Beta Release: Grafana v6.0 Beta1

New Features
  • Explore: A whole new way to do ad-hoc metric queries and exploration. Split view in half and compare metrics & logs and much much more. Read more here
  • Alerting: Adds support for Google Hangouts Chat notifications #11221, thx @PatrickSchuster
  • Elasticsearch: Support bucket script pipeline aggregations #5968
  • Influxdb: Add support for time zone (tz) clause #10322, thx @cykl
  • Snapshots: Enable deletion of public snapshot #14109
  • Provisioning: Provisioning support for alert notifiers #10487, thx @pbakulev

For a full list of changes, be sure to read through the release notes and get more in-depth info about the features in the documentation.

Download Grafana v6.0 Beta1 Now


From the Blogosphere

[VIDEO] Grafana Loki: Like Prometheus, but for logs: Loki author and Grafana Labs VP, Product Tom Wilkie spoke at FOSDEM last week on his latest stop on the Loki road show. The organizers of FOSDEM were in top form and had the video of his talk up the next day! That’s a high bar to set as we prep for GrafanaCon LA and get videos published in a timely manner.

FOSDEM Public Dashboards: The organizers of FOSDEM track tons of different metrics and visualize them in Grafana – from amount of conference swag remaining, to room capacity, and even the noise level of rooms.

Prometheus with Grafana Using Ansible: This walkthrough shows you how to setup Grafana using Ansible. After Grafana is set up, Mitesh shows you how to add a data source and create a dashboard with Prometheus metrics.

Visualizing my heart rate on stage using a Hue lamp and a heart rate sensor and .NET: At the Swetugg conference in Stockholm this week, Daniel tracked his heart rate in real time using a sensor and ANT+ SDK, which was published via NATS. He then visualised this data in 2 ways– first, on a Grafana dashboard, and second via a smart lightbulb that changed hues depending on his stress level. This article walks through how he did it, and the reactions he received after the talk.

Couchbase Monitoring Integration with Prometheus and Grafana: Using the new Couchbase exporter for Prometheus, you can visualize the performance of your Couchbase clusters in Grafana. This tutorial shows you how install and configure the components and import your first dashboard.


Get your tickets while they last!

Join us in Los Angeles, California February 25-26, 2019 for 2 days of talks and in-depth workshops on Grafana and the open source monitoring ecosystem. Learn about Grafana and new/upcoming features and projects (*cough* Grafana Loki *cough*) and the broader ecosystem like Prometheus, Graphite, InfluxDB, Kubernetes, and more.

Register Now!


We’re Hiring

We’re looking for passionate people from every corner of the world who want to solve interesting and challenging problems in a fun, supportive environment. Join us! Check out all of our open positions.

View All our Open Positions


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard or monitoring related tweet and show it off! #monitoringLove

Those displays look great! Nice shirt too!


How are we doing?

We’re always looking to make TimeShift better. If you have feedback, please let us know! Email or send us a tweet, or post something at our community forum.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

timeShift(GrafanaBuzz, 1w) Issue 78

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2019/02/01/timeshiftgrafanabuzz-1w-issue-78/

Welcome to TimeShift

Grafana v6.0 Beta1 was released this week, and it’s packed with new features! This is one of the biggest updates to Grafana and introduces a new way to explore your data, support for log data, and includes tons of other enhancements. We hope you’ll give v6.0 Beta1 a try and let us know what you think. You can see a few of the highlights below, but check out all of the updates in the What’s new in Grafana v6 documentation.

Also, there’s only 3 weeks until GrafanaCon LA! Be sure to visit the GrafanaCon LA website to see the latest talks that have been added. There’s still time to register so grab a ticket and join us! We have some really interesting talks and hands-on workshops planned, and have a lot of fun in store.

Know of an article we missed? Contact us.


Latest Beta Release: Grafana v6.0 Beta1

New Features
  • Alerting: Adds support for Google Hangouts Chat notifications #11221, thx @PatrickSchuster
  • Elasticsearch: Support bucket script pipeline aggregations #5968
  • Influxdb: Add support for time zone (tz) clause #10322, thx @cykl
  • Snapshots: Enable deletion of public snapshot #14109
  • Provisioning: Provisioning support for alert notifiers #10487, thx @pbakulev
  • Explore: A whole new way to do ad-hoc metric queries and exploration. Split view in half and compare metrics & logs and much much more. Read more here

For a full list of changes, be sure to read through the release notes and get more in-depth info about the features in the documentation.

Download Grafana v6.0 Beta1 Now


From the Blogosphere

Grafana Loki: Like Prometheus, but for logs: Loki author and Grafana Labs VP, Platform Tom Wilkie gave a talk recently at the Cloud Native Computing Paris Meetup. If you want to catch one of his talks on Loki in person, he’ll be giving a similar talk in Belgium at FOSDEM, Feb 2-3, and of course, at GrafanaCon LA Feb 25-26.

Automated Monitoring With Grafana and Prometheus: While Grafana does have dashboard provisioning, it can be difficult to keep dashboards synchronized between environments. Fabio shows how he solves the issue with a Docker image he created.

How to display Octopus Deploy deployment history on Grafana dashboards: Tomasz shows how he uses a SQL data source in Grafana to display data on deployments that he manages with the Octopus Deploy platform.

Monitoring Micro-Service Applications across Hybrid Clouds using Istio service mesh multi-clusters, Kiali observability, Zipkin tracing, Prometheus events and Grafana visualisations: The title kind of explains it all, but this article discusses multi-cloud environments and trying to maintain the benefits of a centralised monitoring platform.

[VIDEO] World’s Fastest Internet – 1.6 TERABITS per Second: An oldie, but a goodie – Take a look at the world’s fastest Internet connection and how they built the network. Also note, Grafana spotted in the wild, around 5:30.


Get your tickets while they last!

Join us in Los Angeles, California February 25-26, 2019 for 2 days of talks and in-depth workshops on Grafana and the open source monitoring ecosystem. Learn about Grafana and new/upcoming features and projects (*cough* Grafana Loki *cough*) and the broader ecosystem like Prometheus, Graphite, InfluxDB, Kubernetes, and more.

Register Now!


Grafana Plugin Update

If you’re using plugins in Grafana, be sure to keep them up to date to take advantage of new features and bug fixes. To update or install any plugin on your on-prem Grafana, use the grafana-cli tool, or for Hosted Grafana, update with one-click.

UPDATED PLUGIN

Kentik Connect Pro App – Version 1.3.3 of the Kentik app was released 2/1/2019 and includes support for selecting US and EU API endpoints, and several bug fixes.

Install


We’re Hiring

We’re looking for passionate people from every corner of the world who want to solve interesting and challenging problems in a fun, supportive environment. Join us! Check out all of our open positions.

View All our Open Positions


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard or monitoring related tweet and show it off! #monitoringLove

Great looking dashboard – be sure to check out the new color picker in v6.0 beta1, will help you maintain consistent colors across dashboards.


How are we doing?

We’re always looking to make TimeShift better. If you have feedback, please let us know! Email or send us a tweet, or post something at our community forum.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

timeShift(GrafanaBuzz, 1w) Issue 77

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2019/01/25/timeshiftgrafanabuzz-1w-issue-77/

Welcome to TimeShift

We’ve added more info on the upcoming talks at GrafanaCon LA and are excited to see the schedule shaping up. Grab your ticket while the last and join us February 25-26 for two days of great talks and hands-on workshops.

This week we’re happy to share articles on how to view Azure Monitor Log Analytics data in Grafana, a proof of concept to visualize Fitbit data, how to quickly get up and running with Prometheus and more.

See an article we missed? Contact us.


Latest Stable Release: Grafana v5.4.3

Tech Highlights
  • Docker: Build and publish Docker images for ARMv7 and ARM64 #14617, thx @johanneswuerbach
  • Backend: Upgrade to golang 1.11.4 #14580
  • MySQL: Only update session in MySQL database when required #14540
Bug Fixes
  • Alerting: Invalid frequency causes division by zero in alert scheduler #14810
  • Dashboard: Dashboard links do not update when time range changes #14493
  • Limits: Support more than 1000 datasources per org #13883
  • Backend: Fix signed in user for orgId=0 result should return active org id #14574
  • Provisioning: Adds orgId to user dto for provisioned dashboards #14678

Download Grafana v5.4.3 Now


From the Blogosphere

Azure Monitor logs in Grafana – now in public preview: The latest update to the Azure Monitor Data Source plugin allows you to view Azure Monitor Log Analytics data in Grafana. See what’s new, learn how to get started, and dive into the documentation for more information.

[VIDEO] Setting up Prometheus and Grafana for monitoring your servers: In this video walkthrough, learn how to install and configure Prometheus and import a ready-made Grafana dashboard to quickly visualize your server metrics.

Monitoring Java applications with Prometheus and Grafana: Part 2: Part two of monitoring Java applications shows you how to connect Prometheus to Grafana and build your first dashboard. Check out part one to learn how to get your metrics into Prometheus.

Analyzing Fitbit Data with Telegraf & Grafana: Michael built a proof of concept in order to collect his Fitbit Blaze data and visualize it in Grafana. This article walks you through the script and configs so you can try it yourself.

Hyperledger Sawtooth Blockchain Performance Metrics with Grafana: Learn how to how to setup Grafana to display Sawtooth and system statistics and see a list of all the available metrics.

Grafana Dashboards for SCCM: A multi-part series that covers the installation and configuration of Grafana dashboards for Microsoft System Center Configuration Manager (SCCM).


Get your tickets while they last!

Join us in Los Angeles, California February 25-26, 2019 for 2 days of talks and in-depth workshops on Grafana and the open source monitoring ecosystem. Learn about Grafana and new/upcoming features and projects (*cough* Grafana Loki *cough*) and the broader ecosystem like Prometheus, Graphite, InfluxDB, Kubernetes, and more.

Register for GrafanaCon

GrafanaCon LA is coming up Feb 25-26, 2019 and day 2 is going to be packed with TSDB-focused tracks and hands-on workshops. Learn how to get the most out of Grafana, how to extend Grafana’s visualization capabilities and get instruction from the experts. We’re also putting together an IoT session where you can get hands-on visualizing sensor data. It’s going to be a blast, so grab your ticket before they’re sold out!

Class will be in session for topics like:





We’re Hiring

We’re kicking off our 2019 hiring with some new opportunities to join the team! If you work in Technical Customer Support or want to check out all of our open positions, check our careers section.

View All our Open Positions


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard or monitoring related tweet and show it off! #monitoringLove

I think you’ve turned Grafana into a Spirograph


How are we doing?

We’re always looking to make TimeShift better. If you have feedback, please let us know! Email or send us a tweet, or post something at our community forum.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

timeShift(GrafanaBuzz, 1w) Issue 76

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2019/01/18/timeshiftgrafanabuzz-1w-issue-76/

Welcome to TimeShift

This week we share news about the Azure Data Explorer plugin for Grafana, updates to the GrafanaCon LA speaker list, new functionality added to the polystat panel plugin and more.

See an article we missed? Contact us.


Latest Stable Release: Grafana v5.4.3

Tech Highlights
  • Docker: Build and publish Docker images for ARMv7 and ARM64 #14617, thx @johanneswuerbach
  • Backend: Upgrade to golang 1.11.4 #14580
  • MySQL: Only update session in MySQL database when required #14540
Bug Fixes
  • Alerting: Invalid frequency causes division by zero in alert scheduler #14810
  • Dashboard: Dashboard links do not update when time range changes #14493
  • Limits: Support more than 1000 datasources per org #13883
  • Backend: Fix signed in user for orgId=0 result should return active org id #14574
  • Provisioning: Adds orgId to user dto for provisioned dashboards #14678

Download Grafana v5.4.3 Now


From the Blogosphere

Azure Data Explorer plugin for Grafana dashboards: The Grafana and Azure Data Explorer teams have created a dedicated plugin which enables you to connect to and visualize data from Azure Data Explorer using its intuitive and powerful Kusto Query Language. This article will walk you through the steps to get you started and have you up and running in a few minutes.

Monitor Apache Kafka Using Grafana and Prometheus: Learn about JMX, how to use Prometheus to store Kafka JMX metrics, and how to visualize those metrics using Grafana to monitor your Kafka broker.

Building My Own Telemetry System for F1 2017 (Game) Using Golang, InfluxDB and Grafana.: Rafael noticed a UDP Telemetry option in the settings for F1 2017 on his Playstation 4, which got him thinking how he could better track this data. The following post chronicles his journey to set up and configure a monitoring stack using Grafana to visualize the speed of his F1 car.

Leveraging OpenShift or Kubernetes for automated performance tests (part 3): In the third article in this series, dive into the details for building an environment for automating performance testing with JMeter and Jenkins. Also, check out part 1 and part 2.


Get your tickets while they last!

Join us in Los Angeles, California February 25-26, 2019 for 2 days of talks and in-depth workshops on Grafana and the open source monitoring ecosystem. Learn about Grafana and new/upcoming features and projects (*cough* Grafana Loki *cough*) and the broader ecosystem like Prometheus, Graphite, InfluxDB, Kubernetes, and more.

Register for GrafanaCon

GrafanaCon LA is coming up Feb 25-26, 2019 and day 2 is going to be packed with TSDB-focused tracks and hands-on workshops. Learn how to get the most out of Grafana, how to extend Grafana’s visualization capabilities and get instruction from the experts. We’re also putting together an IoT session where you can get hands-on visualizing sensor data. It’s going to be a blast, so grab your ticket before they’re sold out!

Class will be in session for topics like:





Grafana Plugin Update

This week we added drilldown link functionality to the Polystat panel plugin. To update or install this update (or any plugin) on your on-prem Grafana, use the grafana-cli tool, or for Hosted Grafana update with one-click.

UPDATED PLUGIN

Polstat Panel – Version 1.0.15 of the Polystat panel plugin includes new clickthrough template variable evaluation and “cell” based name and value referencing.
See https://github.com/grafana/grafana-polystat-panel#templating for new templating options and examples.

Install


We’re Hiring

We’re kicking off our 2019 hiring with some new opportunities to join the team! If you work in Technical Customer Support or want to check out all of our open positions, check our careers section.

View All our Open Positions


How are we doing?

What would you like to see here in 2019? Email or send us a tweet, or post something at our community forum.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

timeShift(GrafanaBuzz, 1w) Issue 75

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2019/01/11/timeshiftgrafanabuzz-1w-issue-75/

Welcome to TimeShift

We’ve been busy updating the GrafanaCon LA website, with additional speakers and are adding more every day, so please stay tuned. Don’t miss your chance to get your ticket. We also have tons of plugin updates to share this week and 2 brand new plugins to check out.

See an article we missed? Contact us.


Latest Stable Release: Grafana v5.4.2

Release Highlights
  • Datasource admin: Fix for issue creating new data source when same name exists #14467
  • OAuth: Fix for oauth auto login setting, can now be set using env variable #14435
  • Dashboard search: Fix for searching tags in tags filter dropdown.

Download Grafana v5.4.2 Now


From the Blogosphere

Western Digital HDD Simulation at Cloud Scale – 2.5 Million HPC Tasks, 40K EC2 Spot Instances: Lots of Grafana graphs and dashboards in this article on how Western Digital built a cloud-scale HPC cluster on AWS and used it to simulate crucial elements of upcoming head designs for their next-generation HDDs.

Connect Grafana to Azure Log Analytics: This is the first post in an upcoming series on connecting Grafana to Azure Log Analytics using the Azure Monitor data source plugin. The next post will cover creating the dashboard in Grafana and querying the data.

Monitoring Java Applications with Prometheus and Grafana – Part 1: A step by step guide to monitoring Java applications with Prometheus and Grafana. Part 1 covers collecting and storing the metrics in Prometheus, and part 2 will dive into configuring Grafana and building your first dashboard.

Outages in the Cloud. Whom to blame and how to prove it?: Claudio describes how he used data visualization to help settle the blame game when he was experiencing periodic outages in his hybrid infrastructure.


GrafanaCon LA is coming up Feb 25-26, 2019 and day 2 is going to be packed with TSDB-focused tracks and hands-on workshops. Learn how to get the most out of Grafana, how to extend Grafana’s visualization capabilities and get instruction from the experts. We’re also putting together an IoT session where you can get hands-on visualizing sensor data. It’s going to be a blast, so grab your ticket before they’re sold out!

Class will be in session for topics like:




Get your tickets while they last!

Join us in Los Angeles, California February 25-26, 2019 for 2 days of talks and in-depth workshops on Grafana and the open source monitoring ecosystem. Learn about Grafana and new/upcoming features and projects (*cough* Grafana Loki *cough*) and the broader ecosystem like Prometheus, Graphite, InfluxDB, Kubernetes, and more.

Register for GrafanaCon


Grafana Plugin Update

This week we have 2 brand new plugins to share and a load of updates! To update or install any plugin on your on-prem Grafana, use the grafana-cli tool, or for Hosted Grafana update with one-click.

NEW PLUGIN

Sensu App – This new app plugin provides a datasource and summary dashboard for SensuCore. An upcoming release will include custom panels and compatibility with SensuGo.

Install

NEW PLUGIN

SCADAvis Synoptic Panel – This new panel plugin allows adding SCADA-like graphics in Grafana that can be combined with metrics. First, create a SVG image using the SCADAvis.io editor. Then, this image can have tags that are then matched with a metric query alias in Grafana.

Install

UPDATED PLUGIN

Statusmap Panel – The Statusmap panel got a small update with a fix for the display of multi-value buckets when there is an empty cell.

Install

UPDATED PLUGIN

JSON datasource – The JSON datasource plugin is a fork of the SimpleJSON datasource plugin which adds some more advanced features. The plugin has just been updated with a small bug fix.

Install

UPDATED PLUGIN

Parity Report Panel – The latest version of this panel adds an option that makes the alias key configurable. This makes it compatible with more types of datasources.

Install

UPDATED PLUGIN

SVG Panel – The SVG Panel is a tool for creating or importing SVG’s into Grafana and connecting them with timeseries data using JavaScript. The latest release includes the following changes:

  • Implemented support for data in docs type. (e.g. Elasticsearch Raw Document)
  • The data passed to the panel is now stored in the ctrl.data property. The alias property ctrl.series is deprecated.

Install

UPDATED PLUGIN

Peak Report Panel – A new release of the Peak Report Panel includes a fix for a scrolling bug that occurs in the latest versions of Grafana.

Install

UPDATED PLUGIN

Thruk Datasource – Thruk is a web interface for Nagios, Icinga, Shinken and Naemon. A new formatting option for datetime columns in the table panel was included in the latest release of the Thruk datasource plugin.

Install

UPDATED PLUGIN

ePict Panel – The ePict panel let you select an image and display live metrics over it. In the latest release, the decimal separator is now correctly localized.

Install

UPDATED PLUGIN

Discrete Panel – The Discrete Panel shows discrete values in a horizontal graph and is especially useful for visualizing state transitions for string or boolean data. The latest release was a large release with a lot of technical updates. It also included some other fixes and features:

  • Configurable duration resolution option
  • Bug fix – don’t hide series names on hover

Install


We’re Hiring

We’re kicking off our 2019 hiring with some new opportunities to join the team! If you work in Technical Customer Support or want to check out all of our open positions, check our careers section.

View All our Open Positions


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard or monitoring related tweet and show it off! #monitoringLove

For some reason I want to go play Excitebike.


How are we doing?

What would you like to see here in 2019? Email or send us a tweet, or post something at our community forum.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

Moving to packages.grafana.com

Post Syndicated from Blogs on Grafana Labs Blog original http://blog.grafana.com.s3-website-us-west-2.amazonaws.com/2019/01/05/moving-to-packages.grafana.com/

Moving to packages.grafana.com

Introduction

To make it even easier for you to get the debian and rpm packages you need we’re moving to our own repository.

The previous repository over at packagecloud will stop working on the 7th of January 2019 and you will have to update your configurations for updates to work.

Packages for arm

A lot of you run Grafana on Raspberry Pi or other arm-based devices, now you can finally get the packages directly from the repository just like everyone else. Both armv7 and arm64 are available. Just add the repository!

Usage

You will find everything you need to know in the documention.

Moving to packages.grafana.com

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2019/01/05/moving-to-packages.grafana.com/

Moving to packages.grafana.com

Introduction

To make it even easier for you to get the debian and rpm packages you need we’re moving to our own repository.

The previous repository over at packagecloud will stop working on the 7th of January 2019 and you will have to update your configurations for updates to work.

Packages for arm

A lot of you run Grafana on Raspberry Pi or other arm-based devices, now you can finally get the packages directly from the repository just like everyone else. Both armv7 and arm64 are available. Just add the repository!

Usage

You will find everything you need to know in the documention.

timeShift(GrafanaBuzz, 1w) Issue 74

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2019/01/04/timeshiftgrafanabuzz-1w-issue-74/

Welcome to TimeShift

Happy New Year! We hope you had a relaxing and safe holiday season, but now it’s time to get back to work! This week we share articles on the UX of Loki, visualizing pull request data from BitBucket, monitoring and observability predictions for 2019 and more! Also, we’ll be making some exciting announcements for GrafanaCon in the coming days, so stay tuned and get your ticket now!

See an article we missed? Contact us.


Latest Stable Release: Grafana v5.4.2

Release Highlights
  • Datasource admin: Fix for issue creating new data source when same name exists #14467
  • OAuth: Fix for oauth auto login setting, can now be set using env variable #14435
  • Dashboard search: Fix for searching tags in tags filter dropdown.

Download Grafana v5.4.2 Now


From the Blogosphere

Closer look at Grafana’s user interface for Loki: Grafana Labs Director, UX, David Kaltschmidt wrote an article on why we built Loki, our new open source Prometheus-inspired logging project, and outlines the UX goals we had to deliver logs simpler & faster.

Meet bitbucket-exporter: Jonathan wanted to get a sense of the state of his open source projects in BitBucket, so he wrote an exporter to get metrics about pull requests from BitBucket Server into Prometheus to visualize the data in Grafana dashboards.

Monitoring & Observability 2019 Predictions: Our friend Mike at Monitoring Weekly shares his thoughts on upcoming trends and mentions Loki as leading the way of trying to solve the problem of high-volume, easy-to-maintain, scalable, on-prem logging.

Grafana Cloud Dropwizard Metrics Reporter: The folks at StubbornJava describe how they’ve used a custom HTTP sender to get metrics into their public Grafana Cloud installation. You can check out their overview dashboard here.

[VIDEO] Monitoring resiliency behavior with MicroProfile: Sebastian shares a video how to get technical metrics of resiliency mechanisms easily with MicroProfile Fault Tolerance 1.1 and visualize the data with Prometheus and Grafana.


GrafanaCon LA is coming up Feb 25-26, 2019 and day 2 is going to be packed with TSDB-focused tracks and hands-on workshops. Learn how to get the most out of Grafana, how to extend Grafana’s visualization capabilities and get instruction from the experts. We’re also putting together an IoT session where you can get hands-on visualizing sensor data. It’s going to be a blast, so grab your ticket before they’re sold out!

Class will be in session for topics like:




Get your tickets while they last!

Join us in Los Angeles, California February 25-26, 2019 for 2 days of talks and in-depth workshops on Grafana and the open source monitoring ecosystem. Learn about Grafana and new/upcoming features and projects (*cough* Grafana Loki *cough*) and the broader ecosystem like Prometheus, Graphite, InfluxDB, Kubernetes, and more.

Register for GrafanaCon


We’re Hiring

Do you love open source software? Do you thrive on tackling complex challenges to build the future? Want to work with awesome people? Be the next to join our team!

View All our Open Positions


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard or monitoring related tweet and show it off! #monitoringLove

This is beautiful, and congrats on the progress!


How are we doing?

What would you like to see here in 2019? Email or send us a tweet, or post something at our community forum.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

timeShift(GrafanaBuzz, 1w) Issue 74

Post Syndicated from Blogs on Grafana Labs Blog original http://blog.grafana.com.s3-website-us-west-2.amazonaws.com/2019/01/04/timeshiftgrafanabuzz-1w-issue-74/

Welcome to TimeShift

Happy New Year! We hope you had a relaxing and safe holiday season, but now it’s time to get back to work! This week we share articles on the UX of Loki, visualizing pull request data from BitBucket, monitoring and observability predictions for 2019 and more! Also, we’ll be making some exciting announcements for GrafanaCon in the coming days, so stay tuned and get your ticket now!

See an article we missed? Contact us.


Latest Stable Release: Grafana v5.4.2

Release Highlights
  • Datasource admin: Fix for issue creating new data source when same name exists #14467
  • OAuth: Fix for oauth auto login setting, can now be set using env variable #14435
  • Dashboard search: Fix for searching tags in tags filter dropdown.

Download Grafana v5.4.2 Now


From the Blogosphere

Closer look at Grafana’s user interface for Loki: Grafana Labs Director, UX, David Kaltschmidt wrote an article on why we built Loki, our new open source Prometheus-inspired logging project, and outlines the UX goals we had to deliver logs simpler & faster.

Meet bitbucket-exporter: Jonathan wanted to get a sense of the state of his open source projects in BitBucket, so he wrote an exporter to get metrics about pull requests from BitBucket Server into Prometheus to visualize the data in Grafana dashboards.

Monitoring & Observability 2019 Predictions: Our friend Mike at Monitoring Weekly shares his thoughts on upcoming trends and mentions Loki as leading the way of trying to solve the problem of high-volume, easy-to-maintain, scalable, on-prem logging.

Grafana Cloud Dropwizard Metrics Reporter: The folks at StubbornJava describe how they’ve used a custom HTTP sender to get metrics into their public Grafana Cloud installation. You can check out their overview dashboard here.

[VIDEO] Monitoring resiliency behavior with MicroProfile: Sebastian shares a video how to get technical metrics of resiliency mechanisms easily with MicroProfile Fault Tolerance 1.1 and visualize the data with Prometheus and Grafana.


GrafanaCon LA is coming up Feb 25-26, 2019 and day 2 is going to be packed with TSDB-focused tracks and hands-on workshops. Learn how to get the most out of Grafana, how to extend Grafana’s visualization capabilities and get instruction from the experts. We’re also putting together an IoT session where you can get hands-on visualizing sensor data. It’s going to be a blast, so grab your ticket before they’re sold out!

Class will be in session for topics like:




Get your tickets while they last!

Join us in Los Angeles, California February 25-26, 2019 for 2 days of talks and in-depth workshops on Grafana and the open source monitoring ecosystem. Learn about Grafana and new/upcoming features and projects (*cough* Grafana Loki *cough*) and the broader ecosystem like Prometheus, Graphite, InfluxDB, Kubernetes, and more.

Register for GrafanaCon


We’re Hiring

Do you love open source software? Do you thrive on tackling complex challenges to build the future? Want to work with awesome people? Be the next to join our team!

View All our Open Positions


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard or monitoring related tweet and show it off! #monitoringLove

This is beautiful, and congrats on the progress!


How are we doing?

What would you like to see here in 2019? Email or send us a tweet, or post something at our community forum.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

Closer look at Grafana’s user interface for Loki

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2019/01/02/closer-look-at-grafanas-user-interface-for-loki/

Introduction

At Grafana Labs we run a plethora of microservices for our hosted offerings.
We’ve been very happy with our monitoring with Graphite and Prometheus, and our distributed tracing using Jaeger.
But we have had a difficult time finding a log aggregation system that fits our needs.
Existing solutions offered lots of features, but those same features somehow got in the way of finding logs quickly.
We decided to build our own simplified log aggregation: Loki.
It is designed to be lean in the backend.
In addition, we tried to pair this with a clean and intuitive way to read your logs inside Grafana.
This post details some of the UX goals we had to deliver logs simpler and faster.

Loki and Grafana.

No pagination

All results are shown on a single page

Pagination splits up content across multiple pages.
The result is that you are allowed to see only so many log lines as to fit on a page, whether that is 20 or 50, or 100.
To see more lines, you need to go to the next page, and wait for those results to load.

There are three problems with this approach.
The first one is that it assumes that log lines are read individually, and that after you read the 20 or so lines, you move to the next set.
This neglects the fact that we as humans are great at pattern recognition.
Log outputs have rhythm and structure.
A healthy log output may look like a healthy electrocardiogram.
Pagination interrupts multi-line patterns and makes it harder to spot the hiccups you are looking for.
Just like you don’t look at every c-h-a-r-a-c-t-e-r of a word to identify it, you don’t need to look at every log line to spot at pattern.

The second problem is delay.
Every time I advance a page, a request is sent to load a new set.
And since that request returns a similarly small set of lines, chances are that I might have to request the next page again.
The worst problem we found, however, was that pagination takes control away from the user.
Your built-in browser search becomes much less powerful, if the rendered set of lines is small.

So what do we do instead?
We return and render a decent number of log lines that is big enough to spot patterns as you scroll down.
Since these logs are already loaded in the browser, they can be searched further using your browser’s search function.
The default limit is currently 1000 lines, but is configurable.

Filtering log lines

Without paging, you need to make sure what you are looking for is in the result set of the 1000 lines.
You do this by filtering.
The combination of time, labels, and a regular expression should be enough to return a decent result set to scroll through.
A graph above the log results shows how many log lines coloured by log level for the given query are seen over time.
If you see a spike, you can use the graph to further zoom in based on patterns you in the graph.

Filter for log lines in the line histogram.

The query field can take a proper regular expressions and its matches are highlighted as you type.
This is useful when you are testing various expressions to further filter the existing results.
When the query is run again, only matches are returned to get even more meaningful results.
Ultimately, we would like the user to be able to select multiple log streams (just like they can graph data from multiple datasources in Grafana), multiplex them into a single stream, and then allow chained filtering, similar to the classic logging grep use case: ... | grep foo | grep -v bar.

Ad-hoc statistics

Doing statistics across log lines is great for spotting unhealthy behavior.
For example, if your database cluster has 3 read-only nodes but only 2 are serving requests, something is up.
In busy log lines, that pattern is hard to spot.

We believe that 1000 lines may just be enough of a data set to run such an analysis on.
We have implemented a handful of parsers to extract fields from log lines, e.g., instance=foo_1 status=400 ....
This allows us to build dynamic field matchers on demand to gather the distribution of field values across the set of lines, e.g., instance is foo_1 on 55 % of the lines that have the field instance.

Ad-hoc statistics on a field.

Since we are calculating the statistics on the result set that is already in the browser, the results show up instantly.
It is worth noting that good filtering is important to maximise the number of lines that have the field present that you want to match against.
We’re extending the parse and match logic to also allow drawing graphs based on number values, e.g., if your log lines contain something line duration=11.3ms, this should be graphable as an ad-hoc time series.

Performance

Rendering 1000 lines is not easy on the browser.
It takes roughly 500 ms during which the main thread is blocked and the browser can feel frozen.
To keep the page interactive, we’re using a staged approach.
First, the graph is rendered to give a quick overview of the log line distribution over time.
Then the first 100 lines are rendered, which should cover the complete screen a few times over.
This is to ensure that we quickly render meaningful logs “above the fold”.
After a further delay, the remaining lines are rendered, to allow scrolling and in-browser search on all the results.
Note that the line limit is configurable in the Loki datasource, in case you want to adjust the number according your browser’s performance.

Explore and Split view

Grafana’s user interface for Loki is part of a new Grafana feature called Explore.
Explore got started with a more query-oriented workflow around troubleshooting with Prometheus metrics.

Metrics and logs side by side

Once you have refined your Prometheus queries to show the unhealthy behavior, you will likely have identified a service and/or an instance that is acting up.
This is where Loki comes in.
Loki uses the same selector logic as Prometheus.
Consider a Prometheus query like http_requests_total{job="app-server",instance="app1",route="/foo"} that you may have used to identify a faulty instance.
When switching to Loki, it reuses the relevant selector labels for which logs exists, e.g., {job="app-server",instance="app1"}.
Notice how it automatically dropped the route label and the metric name.
Thus, using Explore’s Split view, you can have Prometheus and Loki, showing you related metrics and logs side-by-side.

Conclusion

Loki and Grafana are a perfect match.
The backend is kept lean and space-efficient, while the user interface allows ad-hoc field parsing and simple statistics.
For now we will not be able to answer question like “Top n of X” across a long time range, but we still believe that the tradeoff is a very useful one for troubleshooting.
It’s still early days, so please give it a try, provide feedback on GitHub, or contribute so we can make this even better for everyone.

Loki can be run on-prem or as a free demo on Grafana Cloud. Visit the Loki homepage to get started today.

Closer look at Grafana’s user interface for Loki

Post Syndicated from Blogs on Grafana Labs Blog original http://blog.grafana.com.s3-website-us-west-2.amazonaws.com/2019/01/02/closer-look-at-grafanas-user-interface-for-loki/

Introduction

At Grafana Labs we run a plethora of microservices for our hosted offerings.
We’ve been very happy with our monitoring with Graphite and Prometheus, and our distributed tracing using Jaeger.
But we have had a difficult time finding a log aggregation system that fits our needs.
Existing solutions offered lots of features, but those same features somehow got in the way of finding logs quickly.
We decided to build our own simplified log aggregation: Loki.
It is designed to be lean in the backend.
In addition, we tried to pair this with a clean and intuitive way to read your logs inside Grafana.
This post details some of the UX goals we had to deliver logs simpler and faster.

Loki and Grafana.

No pagination

All results are shown on a single page

Pagination splits up content across multiple pages.
The result is that you are allowed to see only so many log lines as to fit on a page, whether that is 20 or 50, or 100.
To see more lines, you need to go to the next page, and wait for those results to load.

There are three problems with this approach.
The first one is that it assumes that log lines are read individually, and that after you read the 20 or so lines, you move to the next set.
This neglects the fact that we as humans are great at pattern recognition.
Log outputs have rhythm and structure.
A healthy log output may look like a healthy electrocardiogram.
Pagination interrupts multi-line patterns and makes it harder to spot the hiccups you are looking for.
Just like you don’t look at every c-h-a-r-a-c-t-e-r of a word to identify it, you don’t need to look at every log line to spot at pattern.

The second problem is delay.
Every time I advance a page, a request is sent to load a new set.
And since that request returns a similarly small set of lines, chances are that I might have to request the next page again.
The worst problem we found, however, was that pagination takes control away from the user.
Your built-in browser search becomes much less powerful, if the rendered set of lines is small.

So what do we do instead?
We return and render a decent number of log lines that is big enough to spot patterns as you scroll down.
Since these logs are already loaded in the browser, they can be searched further using your browser’s search function.
The default limit is currently 1000 lines, but is configurable.

Filtering log lines

Without paging, you need to make sure what you are looking for is in the result set of the 1000 lines.
You do this by filtering.
The combination of time, labels, and a regular expression should be enough to return a decent result set to scroll through.
A graph above the log results shows how many log lines coloured by log level for the given query are seen over time.
If you see a spike, you can use the graph to further zoom in based on patterns you in the graph.

Filter for log lines in the line histogram.

The query field can take a proper regular expressions and its matches are highlighted as you type.
This is useful when you are testing various expressions to further filter the existing results.
When the query is run again, only matches are returned to get even more meaningful results.
Ultimately, we would like the user to be able to select multiple log streams (just like they can graph data from multiple datasources in Grafana), multiplex them into a single stream, and then allow chained filtering, similar to the classic logging grep use case: ... | grep foo | grep -v bar.

Ad-hoc statistics

Doing statistics across log lines is great for spotting unhealthy behavior.
For example, if your database cluster has 3 read-only nodes but only 2 are serving requests, something is up.
In busy log lines, that pattern is hard to spot.

We believe that 1000 lines may just be enough of a data set to run such an analysis on.
We have implemented a handful of parsers to extract fields from log lines, e.g., instance=foo_1 status=400 ....
This allows us to build dynamic field matchers on demand to gather the distribution of field values across the set of lines, e.g., instance is foo_1 on 55 % of the lines that have the field instance.

Ad-hoc statistics on a field.

Since we are calculating the statistics on the result set that is already in the browser, the results show up instantly.
It is worth noting that good filtering is important to maximise the number of lines that have the field present that you want to match against.
We’re extending the parse and match logic to also allow drawing graphs based on number values, e.g., if your log lines contain something line duration=11.3ms, this should be graphable as an ad-hoc time series.

Performance

Rendering 1000 lines is not easy on the browser.
It takes roughly 500 ms during which the main thread is blocked and the browser can feel frozen.
To keep the page interactive, we’re using a staged approach.
First, the graph is rendered to give a quick overview of the log line distribution over time.
Then the first 100 lines are rendered, which should cover the complete screen a few times over.
This is to ensure that we quickly render meaningful logs “above the fold”.
After a further delay, the remaining lines are rendered, to allow scrolling and in-browser search on all the results.
Note that the line limit is configurable in the Loki datasource, in case you want to adjust the number according your browser’s performance.

Explore and Split view

Grafana’s user interface for Loki is part of a new Grafana feature called Explore.
Explore got started with a more query-oriented workflow around troubleshooting with Prometheus metrics.

Metrics and logs side by side

Once you have refined your Prometheus queries to show the unhealthy behavior, you will likely have identified a service and/or an instance that is acting up.
This is where Loki comes in.
Loki uses the same selector logic as Prometheus.
Consider a Prometheus query like http_requests_total{job="app-server",instance="app1",route="/foo"} that you may have used to identify a faulty instance.
When switching to Loki, it reuses the relevant selector labels for which logs exists, e.g., {job="app-server",instance="app1"}.
Notice how it automatically dropped the route label and the metric name.
Thus, using Explore’s Split view, you can have Prometheus and Loki, showing you related metrics and logs side-by-side.

Conclusion

Loki and Grafana are a perfect match.
The backend is kept lean and space-efficient, while the user interface allows ad-hoc field parsing and simple statistics.
For now we will not be able to answer question like “Top n of X” across a long time range, but we still believe that the tradeoff is a very useful one for troubleshooting.
It’s still early days, so please give it a try, provide feedback on GitHub, or contribute so we can make this even better for everyone.

Loki can be run on-prem or as a free demo on Grafana Cloud. Visit the Loki homepage to get started today.

timeShift(GrafanaBuzz, 1w) Issue 73

Post Syndicated from Blogs on Grafana Labs Blog original http://blog.grafana.com.s3-website-us-west-2.amazonaws.com/2018/12/20/timeshiftgrafanabuzz-1w-issue-73/

Welcome to TimeShift

As 2018 draws to a close, we’d like to thank all of our readers for their support and feedback. We look forward to returning in 2019 and sharing more articles about Grafana and the open source monitoring community.

Also, GrafanaCon LA is coming up Feb 25-26, 2019 and we’re excited to announce day 2 is going to be packed with TSDB-focused tracks and hands-on workshops. Learn how to get the most out of Grafana, how to extend Grafana’s visualization capabilities and get instruction from the experts. We’re also putting together an IoT session where you can get hands-on visualizing sensor data. It’s going to be a blast, so grab your ticket before they’re sold out!

Class will be in session for topics like:




Get your tickets while they last!

Join us in Los Angeles, California February 25-26, 2019 for 2 days of talks and in-depth workshops on Grafana and the open source monitoring ecosystem. Learn about Grafana and new/upcoming features and projects (*cough* Grafana Loki *cough*) and the broader ecosystem like Prometheus, Graphite, InfluxDB, Kubernetes, and more.

Register for GrafanaCon

See an article we missed? Contact us.


Latest Stable Release: Grafana v5.4.2

Release Highlights
  • Datasource admin: Fix for issue creating new data source when same name exists #14467
  • OAuth: Fix for oauth auto login setting, can now be set using env variable #14435
  • Dashboard search: Fix for searching tags in tags filter dropdown.

Download Grafana v5.4.2 Now


From the Blogosphere

[VIDEO] On the path to full observability with OSS (and launch of Loki): The video from David Kaltschmidt’s KubeCon 2018 presentation is now available. Learn how to instrument an app with Prometheus and Jaeger, how to debug your app, and get a sneak peek at Grafana Loki, our new Prometheus-inspired open source logging project.

Cortex: a multi-tenant, horizontally scalable Prometheus-as-a-Service: Dive into Cortex, the open source Prometheus-as-a-Service platform that powers Grafana Cloud. Learn about the history of the project, architecture, the gaps it helps fill, and common use-cases.

Marvel at Grafana Loki: The Prometheus of open source log backends: Jane Elizabeth distills much of the information we’ve released on Loki into a recap of what Loki is, what it does and how to try it out today. She also slips in a few references to the mischevious Norse God and the Marvel Cinematic Universe.

How to graph IDRAC temperature, power usage and fan speed measurements in Grafana: Learn how to get your Idrac sensor metrics into Grafana to visualize metrics for temperature, power, and fan speed in 20 minutes.

[Cosmos] How to Set Up Your Own Network Monitoring Dashboard: A quick walkthrough on how to install and configure Grafana and Prometheus to monitor and visualize data on the Cosmos network.


Grafana Plugin Update

The Kentik Connect Pro app got an update this week to better display API errors. Read more and install the update below. To update any of your plugins in your on-prem Grafana, use the grafana-cli tool, or for Grafana Cloud update with one-click.

UPDATED PLUGIN

Kentik Connect Pro – Version 1.3.2 of the Kentik Connect Pro App was released and updates:

  • Easier troubleshooting when configuring the app: API errors will now be displayed in an alert-box.
  • API query errors encountered by a panel will now be displayed in the dialog box available on the top-left corner diagnostic panel.

Install


We’re Hiring

Do you love open source software? Do you thrive on tackling complex challenges to build the future? Want to work with awesome people? Be the next to join our team!

View All our Open Positions


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard or monitoring related tweet and show it off! #monitoringLove

We’re really excited at all the new enhancements and UX tweaks we’re making in Grafana v6.0 – all focused on making Grafana more intuitive and easy to use. Grafana v6.0 stable won’t be ready until February, but you can give the new functionality a try in the nightly builds.


How are we doing?

That wraps up TimeShift for 2018. We hope you’ve enjoyed reading our weekly roundups. What would you like to see here in 2019? Email or send us a tweet, or post something at our community forum.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

timeShift(GrafanaBuzz, 1w) Issue 73

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/12/20/timeshiftgrafanabuzz-1w-issue-73/

Welcome to TimeShift

As 2018 draws to a close, we’d like to thank all of our readers for their support and feedback. We look forward to returning in 2019 and sharing more articles about Grafana and the open source monitoring community.

Also, GrafanaCon LA is coming up Feb 25-26, 2019 and we’re excited to announce day 2 is going to be packed with TSDB-focused tracks and hands-on workshops. Learn how to get the most out of Grafana, how to extend Grafana’s visualization capabilities and get instruction from the experts. We’re also putting together an IoT session where you can get hands-on visualizing sensor data. It’s going to be a blast, so grab your ticket before they’re sold out!

Class will be in session for topics like:




Get your tickets while they last!

Join us in Los Angeles, California February 25-26, 2019 for 2 days of talks and in-depth workshops on Grafana and the open source monitoring ecosystem. Learn about Grafana and new/upcoming features and projects (*cough* Grafana Loki *cough*) and the broader ecosystem like Prometheus, Graphite, InfluxDB, Kubernetes, and more.

Register for GrafanaCon

See an article we missed? Contact us.


Latest Stable Release: Grafana v5.4.2

Release Highlights
  • Datasource admin: Fix for issue creating new data source when same name exists #14467
  • OAuth: Fix for oauth auto login setting, can now be set using env variable #14435
  • Dashboard search: Fix for searching tags in tags filter dropdown.

Download Grafana v5.4.2 Now


From the Blogosphere

[VIDEO] On the path to full observability with OSS (and launch of Loki): The video from David Kaltschmidt’s KubeCon 2018 presentation is now available. Learn how to instrument an app with Prometheus and Jaeger, how to debug your app, and get a sneak peek at Grafana Loki, our new Prometheus-inspired open source logging project.

Cortex: a multi-tenant, horizontally scalable Prometheus-as-a-Service: Dive into Cortex, the open source Prometheus-as-a-Service platform that powers Grafana Cloud. Learn about the history of the project, architecture, the gaps it helps fill, and common use-cases.

Marvel at Grafana Loki: The Prometheus of open source log backends: Jane Elizabeth distills much of the information we’ve released on Loki into a recap of what Loki is, what it does and how to try it out today. She also slips in a few references to the mischevious Norse God and the Marvel Cinematic Universe.

How to graph IDRAC temperature, power usage and fan speed measurements in Grafana: Learn how to get your Idrac sensor metrics into Grafana to visualize metrics for temperature, power, and fan speed in 20 minutes.

[Cosmos] How to Set Up Your Own Network Monitoring Dashboard: A quick walkthrough on how to install and configure Grafana and Prometheus to monitor and visualize data on the Cosmos network.


Grafana Plugin Update

The Kentik Connect Pro app got an update this week to better display API errors. Read more and install the update below. To update any of your plugins in your on-prem Grafana, use the grafana-cli tool, or for Grafana Cloud update with one-click.

UPDATED PLUGIN

Kentik Connect Pro – Version 1.3.2 of the Kentik Connect Pro App was released and updates:

  • Easier troubleshooting when configuring the app: API errors will now be displayed in an alert-box.
  • API query errors encountered by a panel will now be displayed in the dialog box available on the top-left corner diagnostic panel.

Install


We’re Hiring

Do you love open source software? Do you thrive on tackling complex challenges to build the future? Want to work with awesome people? Be the next to join our team!

View All our Open Positions


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard or monitoring related tweet and show it off! #monitoringLove

We’re really excited at all the new enhancements and UX tweaks we’re making in Grafana v6.0 – all focused on making Grafana more intuitive and easy to use. Grafana v6.0 stable won’t be ready until February, but you can give the new functionality a try in the nightly builds.


How are we doing?

That wraps up TimeShift for 2018. We hope you’ve enjoyed reading our weekly roundups. What would you like to see here in 2019? Email or send us a tweet, or post something at our community forum.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

timeShift(GrafanaBuzz, 1w) Issue 72

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/12/14/timeshiftgrafanabuzz-1w-issue-72/

Welcome to TimeShift

The Grafana Labs team converged on Seattle this week for KubeCon + CloudNativeCon NA 2018 where we announced a new Prometheus-inspired, open source logging project we’ve been working on named Loki. We’ve been overwhelmed by the positive response and conversations it’s sparked over the past few days. Please give it a try on-prem or in the cloud and give us your feedback. You can read more about the project, our motivations, and check out the presentation in the blog section below.

See an article we missed? Contact us.


Latest Stable Release: Grafana v5.4.2

Release Highlights
  • Datasource admin: Fix for issue creating new data source when same name exists #14467
  • OAuth: Fix for oauth auto login setting, can now be set using env variable #14435
  • Dashboard search: Fix for searching tags in tags filter dropdown.

Download Grafana v5.4.2 Now


From the Blogosphere

Loki: Prometheus-inspired, open source logging for cloud natives: An in-depth article on the motivations, architecture of Grafana Loki, and the future of logging in Grafana.

[PRESENTATION SLIDES] On the path to full observability with OSS (and launch of Loki): David Kaltschmidt’s KubeCon 2018 presentation on how to instrument an app with Prometheus and Jaeger, how do debug an app, and about Grafana’s new log aggregation solution: Loki. We’ll share the video once it’s available.

TNS Context: Grafana Loki and KubeCon Takeaways: Tom Wilkie, VP, Product at Grafana Labs sat down for an interview with The New Stack at KubeCon + CloudNativeCon NA 2018 in Seattle to provide some additional information about Grafana Loki, answer questions about Prometheus, and share his takeaways from the sold-out event.

Monitoring CrateDB on Kubernetes with Prometheus and Grafana: Learn how to set up Prometheus and Grafana with CrateDB so that you can monitor CPU, memory, and disk usage, as well as CrateDB metrics.

Time-series Analysis with TimescaleDB, Grafana and Plotly: In this post, the folks at CorpGlory describe how they do time-series analysis with TimescaleDB and Grafana, and the challenges they’ve overcome working with real-world time-series data.

Time Series at ShiftLeft: Preetam Jinka from ShiftLeft describes their requirements, their preferred TSDB, and the tooling they’ve developed to manage their infrastructure.


GrafanaCon – We’re shaking things up on day 2!

You spoke and we listened. Based on your feedback, we’re adding TSDB focused tracks and hands-on workshops to our second day program to give you a chance to dive into the nitty gritty of the most popular open source monitoring tools from the experts who build and maintain them.

Class will be in session for topics like:




Get your tickets while they last!

Join us in Los Angeles, California February 25-26, 2019 for 2 days of talks and in-depth workshops on Grafana and the open source monitoring ecosystem. Learn about Grafana and new/upcoming features and projects in the broader ecosystem like Prometheus, Graphite, InfluxDB, Kubernetes, and more.

Register Now


Grafana Plugin Update

This week we share an update to a panel plugin that adds a number of new visualization options. To update any of your plugins in your on-prem Grafana, use the grafana-cli tool, or for Hosted Grafana update with one-click.

UPDATED PLUGIN

Statusmap Panel – The Statusmap panel got a new update which adds lots of new visualization options:

  • Removes all buttons for discrete color mode.
  • Solarized preset for discrete color mode.
  • Fixes display null values as zero.
  • Separate vertical and horizontal spacing for cards.
  • Three sort modes for y axis labels (by metric, ascending and descending sort by name).

Install


We’re Hiring

Do you love open source software? Do you thrive on tackling complex challenges to build the future? Want to work with awesome people? Be the next to join our team!

View All our Open Positions


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard or monitoring related tweet and show it off! #monitoringLove

Beautiful!


How are we doing?

Thats a wrap for another issue of TimeShift. What do you think? Are there other types of content you’d like to see here? Submit a comment on this issue below, or post something at our community forum.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

timeShift(GrafanaBuzz, 1w) Issue 72

Post Syndicated from Blogs on Grafana Labs Blog original http://blog.grafana.com.s3-website-us-west-2.amazonaws.com/2018/12/14/timeshiftgrafanabuzz-1w-issue-72/

Welcome to TimeShift

The Grafana Labs team converged on Seattle this week for KubeCon + CloudNativeCon NA 2018 where we announced a new Prometheus-inspired, open source logging project we’ve been working on named Loki. We’ve been overwhelmed by the positive response and conversations it’s sparked over the past few days. Please give it a try on-prem or in the cloud and give us your feedback. You can read more about the project, our motivations, and check out the presentation in the blog section below.

See an article we missed? Contact us.


Latest Stable Release: Grafana v5.4.2

Release Highlights
  • Datasource admin: Fix for issue creating new data source when same name exists #14467
  • OAuth: Fix for oauth auto login setting, can now be set using env variable #14435
  • Dashboard search: Fix for searching tags in tags filter dropdown.

Download Grafana v5.4.2 Now


From the Blogosphere

Loki: Prometheus-inspired, open source logging for cloud natives: An in-depth article on the motivations, architecture of Grafana Loki, and the future of logging in Grafana.

[PRESENTATION SLIDES] On the path to full observability with OSS (and launch of Loki): David Kaltschmidt’s KubeCon 2018 presentation on how to instrument an app with Prometheus and Jaeger, how do debug an app, and about Grafana’s new log aggregation solution: Loki. We’ll share the video once it’s available.

TNS Context: Grafana Loki and KubeCon Takeaways: Tom Wilkie, VP, Product at Grafana Labs sat down for an interview with The New Stack at KubeCon + CloudNativeCon NA 2018 in Seattle to provide some additional information about Grafana Loki, answer questions about Prometheus, and share his takeaways from the sold-out event.

Monitoring CrateDB on Kubernetes with Prometheus and Grafana: Learn how to set up Prometheus and Grafana with CrateDB so that you can monitor CPU, memory, and disk usage, as well as CrateDB metrics.

Time-series Analysis with TimescaleDB, Grafana and Plotly: In this post, the folks at CorpGlory describe how they do time-series analysis with TimescaleDB and Grafana, and the challenges they’ve overcome working with real-world time-series data.

Time Series at ShiftLeft: Preetam Jinka from ShiftLeft describes their requirements, their preferred TSDB, and the tooling they’ve developed to manage their infrastructure.


GrafanaCon – We’re shaking things up on day 2!

You spoke and we listened. Based on your feedback, we’re adding TSDB focused tracks and hands-on workshops to our second day program to give you a chance to dive into the nitty gritty of the most popular open source monitoring tools from the experts who build and maintain them.

Class will be in session for topics like:




Get your tickets while they last!

Join us in Los Angeles, California February 25-26, 2019 for 2 days of talks and in-depth workshops on Grafana and the open source monitoring ecosystem. Learn about Grafana and new/upcoming features and projects in the broader ecosystem like Prometheus, Graphite, InfluxDB, Kubernetes, and more.

Register Now


Grafana Plugin Update

This week we share an update to a panel plugin that adds a number of new visualization options. To update any of your plugins in your on-prem Grafana, use the grafana-cli tool, or for Hosted Grafana update with one-click.

UPDATED PLUGIN

Statusmap Panel – The Statusmap panel got a new update which adds lots of new visualization options:

  • Removes all buttons for discrete color mode.
  • Solarized preset for discrete color mode.
  • Fixes display null values as zero.
  • Separate vertical and horizontal spacing for cards.
  • Three sort modes for y axis labels (by metric, ascending and descending sort by name).

Install


We’re Hiring

Do you love open source software? Do you thrive on tackling complex challenges to build the future? Want to work with awesome people? Be the next to join our team!

View All our Open Positions


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard or monitoring related tweet and show it off! #monitoringLove

Beautiful!


How are we doing?

Thats a wrap for another issue of TimeShift. What do you think? Are there other types of content you’d like to see here? Submit a comment on this issue below, or post something at our community forum.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

Loki: Prometheus-inspired, open source logging for cloud natives

Post Syndicated from Blogs on Grafana Labs Blog original http://blog.grafana.com.s3-website-us-west-2.amazonaws.com/2018/12/12/loki-prometheus-inspired-open-source-logging-for-cloud-natives/

Loki

Introduction

This blog post is a companion piece for my talk at https://devopsdaysindia.org. I will discuss the motivations, architecture, and the future of logging in Grafana! Let’s get right down to it. You can see the slides for the talk here: https://speakerdeck.com/gouthamve/devopsdaysindia-2018-loki-prometheus-but-for-logs

Motivation

Grafana is the defacto dashboarding solution for time-series data. It supports over 40 datasources (as of this writing), and the dashboarding story has matured considerably with new features, including the addition of teams and folders. We now want to move on from being a dashboarding solution to being an observability platform, to be the go-to place when you need to debug systems on fire.

Full Observability

Observability. There are a lot of definitions out there as to what that means. Observability to me is visibility into your systems and how they are behaving and performing. I quite like the model where observability can be split into 3 parts (or pillars): metrics, logs and traces; each complimenting each other to help you figure out what’s wrong quickly.

The following example illustrates how I tackle incidents at my job:
how I tackle incidents

Prometheus sends me an alert that something is wrong and I open the relevant dashboard for the service. If I find a panel or graph anomalous, I’ll open the query in Grafana’s new Explore UI for a deeper dive. For example, if I find that one of the services is throwing 500 errors, I’ll try to figure out if a particular handler/route is throwing that error or if all instances are throwing the error, etc.

Next up, once I have a vague mental model as to what is going wrong or where it is going wrong, I’ll look at logs. Pre Loki, I used to use kubectl to get the relevant logs to see what the error is and if I could do something about it. This works great for errors, but sometimes I get paged due to high latency. In this situation I get more info from traces regarding what is slow and which method/operation/function is slow. We use Jaeger to get the traces.

While these didn’t always directly tell me what is wrong, they usually got me close enough to look at the code and figure out what is going wrong. Then I can either scale up the service (if the service is overloaded) or deploy the fix.

Logging

Prometheus works great, Jaeger is getting there, and kubectl was decent. The label model was powerful enough for me to get to the bottom of erroring services. If I found that the ingester service was erroring, I’d do: kubectl --namespace prod logs -l name=ingester | grep XXX to get the relevant logs and grep through them.

If I found a particular instance was erroring or if I wanted to tail the logs of a service, I’d have to use the individual pod for tailing as kubectl doesn’t let you tail based on label selectors. This is not ideal, but works for most use-cases.

This worked, as long as the pod wasn’t crashing or wasn’t being replaced. If the pod or node is terminated, the logs are lost forever. Also, kubectl only stores recent logs, so we’re blind when we want logs from the day before or earlier. Further, having to jump from Grafana to CLI and back again wasn’t ideal. We needed a solution that reduced context switching, and many of the solutions we explored were super pricey or didn’t scale very well.

This was expected as they do waaaay more than select + grep, which is essentially what we needed. After looking at existing solutions, we decided to build our own.

Loki

Not happy with any of the open-source solutions, we started speaking to people and noticed that A LOT of people had the same issues. In fact, I’ve come to realise that lots of developers still SSH and grep/tail the logs on machines even today! The solutions they were using were either too pricey or not stable enough. In fact, people were being asked to log less which we think is an anti-pattern for logs. We thought we could build something that we internally, and the wider open-source community could use. We had one main goal:

  • Keep it simple. Just support grep!

Keep it simple. Just support grep!

This tweet from @alicegoldfuss is not an endorsement and only serves to illustrate the problem Loki is attempting to solve

  • We also aimed for other things:
    • Logs should be cheap. Nobody should be asked to log less.
    • Easy to operate and scale
    • Metrics, logs (and traces later) need to work together

The final point was important. We were already collecting metadata from Prometheus for the metrics and we wanted to use that for log correlation. For example, Prometheus tags each metric with the namespace, service name, instance ip, etc. When I get an alert, I use the metadata to figure out where to look for logs. If we manage to tag the logs with the same metadata, we can seamlessly switch between metrics and logs. You can see the internal design doc we wrote here. See a demo video of Loki in action below:

Video: Loki – Prometheus-inspired, open source logging for cloud natives.

Architecture

With our experience building and running Cortex– the horizontally scalable, distributed version of Prometheus we run as a service– we came up with the following architecture:

Logging architecture!

Metadata between metrics and logs matching is critical for us and we initially decided to just target Kubernetes. The idea is to run a log-collection agent on each node, collect logs using that, talk to the kubernetes API to figure out the right metadata for the logs, and send them to a central service which we can use to show the logs collected inside Grafana.

The agent supports the same configuration (relabelling rules) as Prometheus to make sure the metadata matches. We called this agent promtail.

Enter Loki, the scalable log collection engine.
Logging architecture!

The write path and read path (query) are pretty decoupled from each other and it helps to talk about it each separately.
Loki: Architecture!

Distributor

Once promtail collects and sends the logs to Loki, the distributor is the first component to receive them. Now we could be receiving millions of writes per second and we wouldn’t want to write them to a database as they come in. That would kill any database out there. We would need batch and compress the data as it comes in.

We do this via building compressed chunks of the data, by gzipping logs as they come in. The ingester component is a stateful component in charge of building and then later flushing the chunks. We have multiple ingesters, and the logs belonging to each stream should always end up in the same ingester for all the relevant entries to end up in the same chunk. We do this by building a ring of ingesters and using consistent hashing. When an entry comes in, the distributor hashes the labels of the logs and then looks up which ingester to send the entry to based on the hash value.
Loki: Distributor

Further, for redundancy and resilience, we replicate it n (3, by default) times.

Ingester

Now the ingester will receive the entries and start building chunks.
Loki: Ingester

This is basically gzipping the logs and appending them. Once the chunk “fills up”, we flush it to the database. We use separate databases for the chunks (ObjectStorage) and the index, as the type of data they store is different.

Once the chunk “fills up”, we flush it to the database

After flushing a chunk, the ingester then creates a new empty chunk and adds the new entries into that chunk.

Querier

The read path is quite simple and has the querier doing most of the heavy lifting. Given a time-range and label selectors, it looks at the index to figure out which chunks match, and greps through them to give you the results. It also talks to the ingesters to get the recent data that has not been flushed yet.

Note that, right now, for each query, a single querier greps through all the relevant logs for you. We’ve implemented query parallelisation in Cortex using a frontend and the same can be extended to Loki to give distributed grep which will make even large queries snappy enough.

Loki: A look at the Querier

Scalability

Now let’s see if this scales.

  1. We’re putting the chunks into an object store and that scales.
  2. We put the index into Cassandra/Bigtable/DynamoDB which again scales.
  3. The distributors and queriers are stateless components that you can horizontally scale.

Coming to the ingester, it is a stateful component but we’ve built the full sharding and resharding lifecycle into them. When a rollout is done or when ingesters are scaled up or down, the ring topology changes and the ingesters redistribute their chunks to match the new topology. This is mostly code taken from Cortex which has been running in production for more than 2 years.

Caveats

While all of this works conceptually, we expect to hit new issues and limitations as we grow. It should be super cheap to run, given all the data will be sitting in an Object Store like S3. But you would only be able to grep through the data. This might not be suitable for other use-cases like alerting or building dashboards, which you’re better off doing in metrics.

Conclusion

Loki is very much alpha software and should not be used in production environments. We wanted to announce and release Loki as soon as possible to get feedback and contributions from the community and find out what’s working and what needs improvement. We believe this will help us deliver a higher quality and more on-point production release next year.

Loki can be run on-prem or as a free demo on Grafana Cloud. We urge you to give it a try and drop us a line and let us know what you think. Visit the Loki homepage to get started today.

Loki: Prometheus-inspired, open source logging for cloud natives

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/12/12/loki-prometheus-inspired-open-source-logging-for-cloud-natives/

Loki

Introduction

This blog post is a companion piece for my talk at https://devopsdaysindia.org. I will discuss the motivations, architecture, and the future of logging in Grafana! Let’s get right down to it. You can see the slides for the talk here: https://speakerdeck.com/gouthamve/devopsdaysindia-2018-loki-prometheus-but-for-logs

Motivation

Grafana is the defacto dashboarding solution for time-series data. It supports over 40 datasources (as of this writing), and the dashboarding story has matured considerably with new features, including the addition of teams and folders. We now want to move on from being a dashboarding solution to being an observability platform, to be the go-to place when you need to debug systems on fire.

Full Observability

Observability. There are a lot of definitions out there as to what that means. Observability to me is visibility into your systems and how they are behaving and performing. I quite like the model where observability can be split into 3 parts (or pillars): metrics, logs and traces; each complimenting each other to help you figure out what’s wrong quickly.

The following example illustrates how I tackle incidents at my job:
how I tackle incidents

Prometheus sends me an alert that something is wrong and I open the relevant dashboard for the service. If I find a panel or graph anomalous, I’ll open the query in Grafana’s new Explore UI for a deeper dive. For example, if I find that one of the services is throwing 500 errors, I’ll try to figure out if a particular handler/route is throwing that error or if all instances are throwing the error, etc.

Next up, once I have a vague mental model as to what is going wrong or where it is going wrong, I’ll look at logs. Pre Loki, I used to use kubectl to get the relevant logs to see what the error is and if I could do something about it. This works great for errors, but sometimes I get paged due to high latency. In this situation I get more info from traces regarding what is slow and which method/operation/function is slow. We use Jaeger to get the traces.

While these didn’t always directly tell me what is wrong, they usually got me close enough to look at the code and figure out what is going wrong. Then I can either scale up the service (if the service is overloaded) or deploy the fix.

Logging

Prometheus works great, Jaeger is getting there, and kubectl was decent. The label model was powerful enough for me to get to the bottom of erroring services. If I found that the ingester service was erroring, I’d do: kubectl --namespace prod logs -l name=ingester | grep XXX to get the relevant logs and grep through them.

If I found a particular instance was erroring or if I wanted to tail the logs of a service, I’d have to use the individual pod for tailing as kubectl doesn’t let you tail based on label selectors. This is not ideal, but works for most use-cases.

This worked, as long as the pod wasn’t crashing or wasn’t being replaced. If the pod or node is terminated, the logs are lost forever. Also, kubectl only stores recent logs, so we’re blind when we want logs from the day before or earlier. Further, having to jump from Grafana to CLI and back again wasn’t ideal. We needed a solution that reduced context switching, and many of the solutions we explored were super pricey or didn’t scale very well.

This was expected as they do waaaay more than select + grep, which is essentially what we needed. After looking at existing solutions, we decided to build our own.

Loki

Not happy with any of the open-source solutions, we started speaking to people and noticed that A LOT of people had the same issues. In fact, I’ve come to realise that lots of developers still SSH and grep/tail the logs on machines even today! The solutions they were using were either too pricey or not stable enough. In fact, people were being asked to log less which we think is an anti-pattern for logs. We thought we could build something that we internally, and the wider open-source community could use. We had one main goal:

  • Keep it simple. Just support grep!

Keep it simple. Just support grep!

This tweet from @alicegoldfuss is not an endorsement and only serves to illustrate the problem Loki is attempting to solve

  • We also aimed for other things:
    • Logs should be cheap. Nobody should be asked to log less.
    • Easy to operate and scale
    • Metrics, logs (and traces later) need to work together

The final point was important. We were already collecting metadata from Prometheus for the metrics and we wanted to use that for log correlation. For example, Prometheus tags each metric with the namespace, service name, instance ip, etc. When I get an alert, I use the metadata to figure out where to look for logs. If we manage to tag the logs with the same metadata, we can seamlessly switch between metrics and logs. You can see the internal design doc we wrote here. See a demo video of Loki in action below:

Video: Loki – Prometheus-inspired, open source logging for cloud natives.

Architecture

With our experience building and running Cortex– the horizontally scalable, distributed version of Prometheus we run as a service– we came up with the following architecture:

Logging architecture!

Metadata between metrics and logs matching is critical for us and we initially decided to just target Kubernetes. The idea is to run a log-collection agent on each node, collect logs using that, talk to the kubernetes API to figure out the right metadata for the logs, and send them to a central service which we can use to show the logs collected inside Grafana.

The agent supports the same configuration (relabelling rules) as Prometheus to make sure the metadata matches. We called this agent promtail.

Enter Loki, the scalable log collection engine.
Logging architecture!

The write path and read path (query) are pretty decoupled from each other and it helps to talk about it each separately.
Loki: Architecture!

Distributor

Once promtail collects and sends the logs to Loki, the distributor is the first component to receive them. Now we could be receiving millions of writes per second and we wouldn’t want to write them to a database as they come in. That would kill any database out there. We would need batch and compress the data as it comes in.

We do this via building compressed chunks of the data, by gzipping logs as they come in. The ingester component is a stateful component in charge of building and then later flushing the chunks. We have multiple ingesters, and the logs belonging to each stream should always end up in the same ingester for all the relevant entries to end up in the same chunk. We do this by building a ring of ingesters and using consistent hashing. When an entry comes in, the distributor hashes the labels of the logs and then looks up which ingester to send the entry to based on the hash value.
Loki: Distributor

Further, for redundancy and resilience, we replicate it n (3, by default) times.

Ingester

Now the ingester will receive the entries and start building chunks.
Loki: Ingester

This is basically gzipping the logs and appending them. Once the chunk “fills up”, we flush it to the database. We use separate databases for the chunks (ObjectStorage) and the index, as the type of data they store is different.

Once the chunk “fills up”, we flush it to the database

After flushing a chunk, the ingester then creates a new empty chunk and adds the new entries into that chunk.

Querier

The read path is quite simple and has the querier doing most of the heavy lifting. Given a time-range and label selectors, it looks at the index to figure out which chunks match, and greps through them to give you the results. It also talks to the ingesters to get the recent data that has not been flushed yet.

Note that, right now, for each query, a single querier greps through all the relevant logs for you. We’ve implemented query parallelisation in Cortex using a frontend and the same can be extended to Loki to give distributed grep which will make even large queries snappy enough.

Loki: A look at the Querier

Scalability

Now let’s see if this scales.

  1. We’re putting the chunks into an object store and that scales.
  2. We put the index into Cassandra/Bigtable/DynamoDB which again scales.
  3. The distributors and queriers are stateless components that you can horizontally scale.

Coming to the ingester, it is a stateful component but we’ve built the full sharding and resharding lifecycle into them. When a rollout is done or when ingesters are scaled up or down, the ring topology changes and the ingesters redistribute their chunks to match the new topology. This is mostly code taken from Cortex which has been running in production for more than 2 years.

Caveats

While all of this works conceptually, we expect to hit new issues and limitations as we grow. It should be super cheap to run, given all the data will be sitting in an Object Store like S3. But you would only be able to grep through the data. This might not be suitable for other use-cases like alerting or building dashboards, which you’re better off doing in metrics.

Conclusion

Loki is very much alpha software and should not be used in production environments. We wanted to announce and release Loki as soon as possible to get feedback and contributions from the community and find out what’s working and what needs improvement. We believe this will help us deliver a higher quality and more on-point production release next year.

Loki can be run on-prem or as a free demo on Grafana Cloud. We urge you to give it a try and drop us a line and let us know what you think. Visit the Loki homepage to get started today.

timeShift(GrafanaBuzz, 1w) Issue 71

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/12/07/timeshiftgrafanabuzz-1w-issue-71/

Welcome to TimeShift

We’re excited to be speaking at and sponsoring KubeCon + CloudNativeCon NA 2018 in Seattle next week and hope we get a chance to hang out. Swing by our booth to check out a new open source project we’ve been working on, and give us some feedback on Grafana and features you’d like to see.

As always, we hope you enjoy this week’s TimeShift, and don’t be shy about telling us how we can make it better. Hope to see you in Seattle!

See an article we missed? Contact us.


Latest Stable Release: Grafana v5.4

Release Highlights

You can learn more about Grafana v5.4 in the release blog post.

Download Grafana v5.4 Now


From the Blogosphere

Grafana v5.4 Stable released!: This release blog post highlights the major enhancements included in Grafana v5.4 and how to use the new features.

Step By Step Monitoring Cassandra With Prometheus And Grafana: Check out this step by step guide on how to monitor a Cassandra cluster with Prometheus and Grafana on a VM.

Cool Features in Grafana that You Might Missed: In this article, Fairuz discusses how he used Grafana’s rendering API to send png snapshots of graph panels to his various services.

How to export alerts from Prometheus to Grafana: David dives into alerting in this article and explores the method he’s found most successful in creating an alerting dashboard in Grafana with Prometheus.

Introduction to Kubernetes Monitoring: This article explores what you should be monitoring in Kubernetes and how to go about it with Rancher, Prometheus, and Grafana.

Monitoring Server Power Usage and Cost with Grafana | and How to create the graphs: Building an effective dashboard can be as much art as science. This article shows you how to monitor the server power usage and cost of your homelab, and how to build your own Grafana dashboard to track it.


GrafanaCon – We’re shaking things up on day 2!

You spoke and we listened. Based on your feedback, we’re adding TSDB focused tracks and hands-on workshops to our second day program to give you a chance to dive into the nitty gritty of the most popular open source monitoring tools from the experts who build and maintain them.

Class will be in session for topics like:




Get your tickets while they last!

Join us in Los Angeles, California February 25-26, 2019 for 2 days of talks and in-depth workshops on Grafana and the open source monitoring ecosystem. Learn about Grafana and new/upcoming features and projects in the broader ecosystem like Prometheus, Graphite, InfluxDB, Kubernetes, and more.

Register Now


Upcoming Events

In between code pushes we like to speak at, sponsor and attend all kinds of conferences and meetups. We also like to make sure we mention other Grafana-related events happening all over the world. If you’re putting on just such an event, let us know and we’ll list it here.

KubeCon + CloudNativeCon North America 2018 | Seattle, WA – December 10-13, 2018:
David Kaltschmidt: On the OSS Path to Full Observability with Grafana – Grafana is coming "off the wall". To make it more useful for interactive debugging, David and his team have already integrated two pillars of observability – metrics and logs. They are currently adding tracing to complete the incident response experience. All to minimize the cost of context switching during those crucial minutes after getting paged.

This talk will demonstrate the various methods we've used to link the data together. Prometheus is providing the metrics. Via its histograms, request latencies can be extracted to inform each tracing span from Jaeger. Grafana also ensures that lines from your log aggregation system are annotated with span and trace IDs, as well as the other way around: associating logged values with spans.

David will show how these OSS parts should be deployed to achieve full observability in an engaging user experience that saves valuable minutes.

We are also a proud sponsor of the Cloud Native Computing Foundation’s flagship conference. Join Kubernetes, Prometheus, Cortex, OpenTracing, Fluentd, gRPC, containerd, rkt, CNI, Envoy, Jaeger, Notary, TUF, Vitess, CoreDNS, NATS, Linkerd and Helm as the community gathers for four days to further the education and advancement of cloud native computing.

Register Now


Featured Job

As Grafana continues to grow we’re building our European sales team and are hiring Business Development Representatives based in our Stockholm office. This is a rare opportunity to join an early stage startup and take an instrumental role in helping to build the sales function. Apply now.

View All our Open Positions


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard or monitoring related tweet and show it off! #monitoringLove

Thanks Peter! This was a popular feature request, and we’re excited that it made it into this release.


How are we doing?

Thats a wrap for another issue of TimeShift. What do you think? Are there other types of content you’d like to see here? Submit a comment on this issue below, or post something at our community forum.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.