About the job Analytics Observability Engineer
As a Remote Analytics Observability Engineer, you will be responsible for designing, implementing, and maintaining end-to-end observability solutions that ensure visibility into analytics systems, data pipelines, and application performance. You will work cross-functionally with engineering, data science, DevOps, and SRE teams to enable proactive monitoring, alerting, logging, and tracing across data infrastructure.
This role plays a critical part in ensuring data reliability, system uptime, and actionable insights by building tools and dashboards that improve system transparency and performance.
Key Responsibilities:
Architect and implement observability solutions for analytics platforms and data pipelines (ETL/ELT, streaming, batch)
Integrate monitoring tools (e.g., Prometheus, Grafana, Datadog, New Relic) into analytics environments (Spark, Airflow, dbt, etc.)
Design real-time dashboards and alerts that capture system health, job failures, data anomalies, and latency issues
Analyze telemetry data to identify performance degradation, failures, or capacity bottlenecks
Enable distributed tracing for data flows using OpenTelemetry, Jaeger, or similar technologies
Collaborate with Data Engineering and Site Reliability Engineering (SRE) teams to build scalable and fault-tolerant observability stacks
Drive observability best practices and help teams adopt instrumentation standards
Write infrastructure-as-code to deploy monitoring systems (Terraform, Helm, Kubernetes, etc.)
Required Qualifications:
Bachelors degree in Computer Science, Data Engineering, or a related field
2+ years of experience in observability, SRE, DevOps, or DataOps
Deep knowledge of observability tools such as Grafana, Prometheus, Datadog, Splunk, New Relic, or Honeycomb
Familiarity with monitoring cloud-based data systems (AWS/GCP/Azure), and platforms like Snowflake, BigQuery, Redshift, or Databricks
Proficiency in scripting and automation (e.g., Python, Bash)
Experience with infrastructure management and orchestration tools (Kubernetes, Terraform, Helm)
Strong analytical and debugging skills using telemetry data.