Job Openings Network Automation Engineer Private Cloud Datacenter

About the job Network Automation Engineer Private Cloud Datacenter

Location: Bangalore

Years of Experience: 8 to 15 Years

Prerequisite skills: 

Primary: Ansible / Python

Secondary: GoLang

Mandatory: Private Cloud & Networking skills

Job Description:

Job Title: Network Automation Engineer Private Cloud Datacenter

Job Description:

We are looking for an experienced Network Automation Engineer to design, implement, and optimize automation solutions for our Private Cloud datacenter network, which underpins large-scale AI/ML GPU and TPU workloads. This role focuses on automating configuration, provisioning, and monitoring of high-performance networking devices to ensure low latency, high throughput, and reliability in a mission-critical environment. This role involves automating network device management as well as OS-level network configurations on servers. Expertise in Ansible and Python is essential, and experience with GoLang is a strong plus.

Key Responsibilities:

  • Develop and maintain network automation frameworks for large-scale datacenter environments supporting AI/ML workloads.
  • Build Ansible playbooks, roles, and modules to automate device configurations, software upgrades, and compliance checks across multi-vendor environments.
  • Design and implement Python-based automation scripts and tools to integrate with APIs, orchestration platforms, and monitoring systems.
  • Automate OS core networking configurations on servers (Linux / Windows / Hypervisor) including bonding, VLANs, routing tables, kernel network parameters, MTU tuning, and NIC performance optimization.
  • Collaborate with cloud infrastructure, network engineering, and DevOps teams to deliver seamless provisioning and scaling of GPU/TPU clusters.
  • Ensure network automation solutions meet high-performance computing (HPC) requirements such as low latency, high throughput, and fault tolerance.
  • Participate in network architecture reviews to provide automation insights and recommendations.
  • Document automation processes, workflows, and operational guidelines for the datacenter network.
  • Stay updated on emerging technologies in network automation, SDN, and private cloud networking.

Required Skills & Experience:

  • Expertise in Ansible (playbook development, dynamic inventory, custom modules) for large-scale network automation.
  • Strong proficiency in Python for scripting, API integrations (REST, NETCONF, gNMI), and device interaction (e.g., NAPALM, Netmiko, Paramiko).
  • Hands-on experience with high-performance datacenter networking devices (Cisco Nexus, Arista, Juniper, Mellanox/NVIDIA Networking).
  • Knowledge of Linux / Windows / Hypervisor OS core networking, including:
    • Network stack configuration (sysctl tuning, TCP/UDP parameters).
    • NIC bonding, SR-IOV, DPDK, and kernel bypass techniques.
    • VLANs, routing tables, MTU adjustments, jumbo frames.
    • Performance tuning for HPC/AI workloads.
  • Deep understanding of networking concepts including BGP, EVPN-VXLAN, MPLS, QoS, and leaf-spine architectures.
  • Experience in Private Cloud environments with a focus on supporting HPC/AI workloads.
  • Familiarity with CI/CD pipelines (GitLab, Jenkins) for deploying automation at scale.
  • Knowledge of network observability, telemetry, and streaming protocols (gRPC, sFlow, SNMP, InfluxDB, Prometheus).
  • Strong problem-solving skills and ability to operate in a high-availability, mission-critical datacenter environment.

Good to Have:

  • GoLang experience for building scalable and high-performance automation tools.
  • Familiarity with Infrastructure-as-Code (IaC) tools like Terraform or Pulumi.
  • Exposure to Kubernetes networking (CNI plugins) and containerized workloads.
  • Understanding of AI/ML workload characteristics and their impact on network design and performance.
  • Experience with SDN solutions (e.g., Cisco ACI, VMware NSX, NVIDIA Cumulus).