Open-Sourcing Panoptes, Oath’s distributed network telemetry collector

Post Syndicated from amberwilsonla-blog original https://yahooeng.tumblr.com/post/178738044351

yahoodevelopers:

By Ian Flint, Network Automation Architect and Varun Varma, Senior Principal Engineer

The Oath network automation team is proud to announce that we are open-sourcing Panoptes, a distributed system for collecting, enriching and distributing network telemetry.  

We developed Panoptes to address several issues inherent in legacy polling systems, including overpolling due to multiple point solutions for metrics, a lack of data normalization, consistent data enrichment and integration with infrastructure discovery systems.  

Panoptes is a pluggable, distributed, high-performance data collection system which supports multiple polling formats, including SNMP and vendor-specific APIs. It is also extensible to support emerging streaming telemetry standards including gNMI.

Architecture

The following block diagram shows the major components of Panoptes:

Panoptes is written primarily in Python, and leverages multiple open-source technologies to provide the most value for the least development effort. At the center of Panoptes is a metrics bus implemented on Kafka. All data plane transactions flow across this bus; discovery publishes devices to the bus, polling publishes metrics to the bus, and numerous clients read the data off of the bus for additional processing and forwarding. This architecture enables easy data distribution and integration with other systems. For example, in preparing for open-source, we identified a need for a generally available time series datastore. We developed, tested and released a plugin to push metrics into InfluxDB in under a week. This flexibility allows Panoptes to evolve with industry standards.

Check scheduling is accomplished using Celery, a horizontally scalable, open-source scheduler utilizing a Redis data store. Celery’s scalable nature combined with Panoptes’ distributed nature yields excellent scalability. Across Oath, Panoptes currently runs hundreds of thousands of checks per second, and the infrastructure has been tested to more than one million checks per second.

Panoptes ships with a simple, CSV-based discovery system. Integrating Panoptes with a CMDB is as simple as writing an adapter to emit a CSV, and importing that CSV into Panoptes. From there, Panoptes will manage the task of scheduling polling for the desired devices. Users can also develop custom discovery plugins to integrate with their CMDB and other device inventory data sources.

Finally, any metrics gathering system needs a place to send the metrics. Panoptes’ initial release includes an integration with InfluxDB, an industry-standard time series store. Combined with Grafana and the InfluxData ecosystem, this gives teams the ability to quickly set up a fully-featured monitoring environment.

Deployment at Oath

At Oath, we anticipate significant benefits from building Panoptes. We will consolidate four siloed polling solutions into one, reducing overpolling and the associated risk of service interruption. As vendors move toward streaming telemetry, Panoptes’ flexible architecture will minimize the effort required to adopt these new protocols.

There is another, less obvious benefit to a system like Panoptes. As is the case with most large enterprises, a massive ecosystem of downstream applications has evolved around our existing polling solutions. Panoptes allows us to continue to populate legacy datastores without continuing to run the polling layers of those systems. This is because Panoptes’ data bus enables multiple metrics consumers, so we can send metrics to both current and legacy datastores.

At Oath, we have deployed Panoptes in a tiered, federated model. We install the software in each of our major data centers and proxy checks out to smaller installations such as edge sites.  All metrics are polled from an instance close to the devices, and metrics are forwarded to a centralized time series datastore. We have also developed numerous custom applications on the platform, including a load balancer monitor, a BGP session monitor, and a topology discovery application. The availability of a flexible, extensible platform has greatly reduced the cost of producing robust network data systems.

Easy Setup

Panoptes’ open-source release is packaged for easy deployment into any Linux-based environment. Deployment is straightforward, so you can have a working system up in hours, not days.

We are excited to share our internal polling solution and welcome engineers to contribute to the codebase, including contributing device adapters, metrics forwarders, discovery plugins, and any other relevant data consumers.  

Panoptes is available at https://github.com/yahoo/panoptes, and you can connect with our team at [email protected].