Team of professionals

Back to all news

Leveraging OpenTelemetry for Fault-Tolerant Prometheus Metrics with Envoy Mirroring

There are a lot of use cases when metrics collected from applications or services need to be forwarded from the local environment to remote centralized long-term storage such as Thanos or Mimir.

This article will help build a fault-tolerant and highly available solution to collect and forward metrics from applications and services running in Kubernetes to the remote Prometheus-compatible long-term TSDB storage. It also requires proper knowledge about the components used, such as the OpenTelemetry collector, Prometheus in agent mode, and Envoy proxy request mirroring. Detailed configuration is outside the scope of this article.

The Design

The OTEL collector collects metrics from desired resources, and the pipeline is configured using OpenTelemetry collector receivers, processors, and exporters to process and send collected metrics to the endpoint of the Envoy proxy.

The Envoy proxy is configured with a static route mirror policy with upstream clusters of Prometheus pods. This means that the Envoy proxy directly connects to the k8s pod and not to the k8s service in front of the pods. Each Prometheus pod represents an Envoy upstream cluster. Data are routed primarily to one of the two replicas of the Prometheus pod and mirrored to the second one.

Prometheus is deployed into the k8s cluster with two replicas in Agent mode with the remote-write-receiver feature enabled. Also, an external label prometheus_replica was added to instances, which is used to deduplicate series in Thanos, sent from high-availability Prometheus instances pairs.

Conclusion

This design helped make monitoring more resilient and reduced the time series data gap in Grafana dashboards.

Author

Gabriel Illés
Senior DevOps Engineer

Dedicated professional with experience in managing cloud infrastructure and system administration, integrating cloud-based infrastructure components, and developing automation and data engineering solutions. Good at troubleshooting problems and building successful solutions. Excellent verbal and written communicator with strong background cultivating positive relationships and exceeding goals.

The entire Grow2FIT consulting team: Our team

Related services

DevOps services

Team of professionals

Leveraging OpenTelemetry for Fault-Tolerant Prometheus Metrics with Envoy Mirroring

The Design

Conclusion

Author

Related services

Don't miss the latest news