Interview with ForgeRock

Post Syndicated from Simon Pasquier original https://prometheus.io/blog/2019/06/18/interview-with-forgerock/

Continuing our series of interviews with users of Prometheus, Ludovic Poitou
from ForgeRock talks about their monitoring journey.

Can you tell us about yourself and what ForgeRock does?

I’m Ludovic Poitou, Director of Product Management at
ForgeRock, based near Grenoble, France. ForgeRock
is an international identity and access management software company with more
than 500 employees, founded in Norway in 2010, now headquartered in San
Francisco, USA. We provide solutions to secure every online interaction with
customers, employees, devices and things. We have more than 800 customers from
finance companies to government services.

What was your pre-Prometheus monitoring experience?

The ForgeRock Identity Platform has always offered monitoring interfaces. But
the platform is composed of 4 main products, each of them had different
options. For example, the Directory Services product offered monitoring
information through SNMP, JMX or LDAP, or even a RESTful API over HTTP in the
most recent versions. Other products only had REST or JMX. As a result,
monitoring the whole platform was complex and required tools that were able to
integrate those protocols.

Why did you decide to look at Prometheus?

We needed to have a single and common interface for monitoring all our
products, but while keeping the existing ones for backward compatibility.

We started to use DropWizard to collect the metrics in all products. At the
same time, we were starting to move these products to the cloud and run them in
Docker and Kubernetes. So, Prometheus became evident because of its integration
with Kubernetes, its simplicity for deployments, and the integration of
Grafana. We also looked at Graphite and while we also added support for it in
our products, it’s hardly being used by our customers.

How did you transition?

Some of our products were already using the DropWizard library and we had decided
to use a common library in all products, so DropWizard was an obvious choice to
code the instrumentation. But very quickly, we faced an issue with the data
model. Prometheus interface uses dimensions, while we tend to have a
hierarchical model for metrics. We also started to use Micrometer and quickly
hit some constraints. So we ended up building a custom implementation to collect
our metrics using the Micrometer interface. We adapted DropWizard Metrics to
meet our requirements and made the adjustments to the DropWizard Prometheus
exporter. Now with a single instrumentation we can expose the metrics with
dimensions or hierarchically. Then we’ve started building sample Grafana
dashboards that our customer can install and customise to have their own
monitoring views and alerts.

Access Management ForgeRock's Grafana dashboard

We do continue to offer the previous interfaces, but we strongly encourage our
customers to use Prometheus and Grafana.

What improvements have you seen since switching?

The first benefits came from our Quality Engineering team. As they started to
test our Prometheus support and the different metrics, they started to enable
it by default on all stress and performance tests. They started to customise
the Grafana dashboards for the specific tests. Soon after, they started to
highlight and point at various metrics to explain some performance issues.

When reproducing the problems in order to understand and fix them, our
engineering team used Prometheus as well and extended some dashboards. The
whole process gave us a better product and a much better understanding of
which metrics are important to monitor and visualise for customers.

What do you think the future holds for ForgeRock and Prometheus?

ForgeRock has started an effort to offer its products and solutions as a
service. With that move, monitoring and alerting are becoming even more
critical, and of course, our monitoring infrastructure is based on Prometheus.
We currently have two levels of monitoring, one per tenant, where we use
Prometheus to collect data about one customer environment, and we can expose a
set of metrics for that customer. But we have also built a central Prometheus
service where metrics from all deployed tenants is pushed, so that our SRE team
can have a really good understanding of what and how all customers environments
are running. Overall I would say that Prometheus has become our main monitoring
service and it serves both our on-premise customers, and ourselves running our
solutions as a Service.