About the job Network Automation Engineer Private Cloud Datacenter
Location: Bangalore
Years of Experience: 8 to 15 Years
Prerequisite skills:
Primary: Ansible / Python
Secondary: GoLang
Mandatory: Private Cloud & Networking skills
Job Description:
Job Title: Network Automation Engineer Private Cloud Datacenter
Job Description:
We are looking for an experienced Network Automation Engineer to design, implement, and optimize automation solutions for our Private Cloud datacenter network, which underpins large-scale AI/ML GPU and TPU workloads. This role focuses on automating configuration, provisioning, and monitoring of high-performance networking devices to ensure low latency, high throughput, and reliability in a mission-critical environment. This role involves automating network device management as well as OS-level network configurations on servers. Expertise in Ansible and Python is essential, and experience with GoLang is a strong plus.
Key Responsibilities:
- Develop and maintain network automation frameworks for large-scale datacenter environments supporting AI/ML workloads.
- Build Ansible playbooks, roles, and modules to automate device configurations, software upgrades, and compliance checks across multi-vendor environments.
- Design and implement Python-based automation scripts and tools to integrate with APIs, orchestration platforms, and monitoring systems.
- Automate OS core networking configurations on servers (Linux / Windows / Hypervisor) including bonding, VLANs, routing tables, kernel network parameters, MTU tuning, and NIC performance optimization.
- Collaborate with cloud infrastructure, network engineering, and DevOps teams to deliver seamless provisioning and scaling of GPU/TPU clusters.
- Ensure network automation solutions meet high-performance computing (HPC) requirements such as low latency, high throughput, and fault tolerance.
- Participate in network architecture reviews to provide automation insights and recommendations.
- Document automation processes, workflows, and operational guidelines for the datacenter network.
- Stay updated on emerging technologies in network automation, SDN, and private cloud networking.
Required Skills & Experience:
- Expertise in Ansible (playbook development, dynamic inventory, custom modules) for large-scale network automation.
- Strong proficiency in Python for scripting, API integrations (REST, NETCONF, gNMI), and device interaction (e.g., NAPALM, Netmiko, Paramiko).
- Hands-on experience with high-performance datacenter networking devices (Cisco Nexus, Arista, Juniper, Mellanox/NVIDIA Networking).
- Knowledge of Linux / Windows / Hypervisor OS core networking, including:
- Network stack configuration (sysctl tuning, TCP/UDP parameters).
- NIC bonding, SR-IOV, DPDK, and kernel bypass techniques.
- VLANs, routing tables, MTU adjustments, jumbo frames.
- Performance tuning for HPC/AI workloads.
- Deep understanding of networking concepts including BGP, EVPN-VXLAN, MPLS, QoS, and leaf-spine architectures.
- Experience in Private Cloud environments with a focus on supporting HPC/AI workloads.
- Familiarity with CI/CD pipelines (GitLab, Jenkins) for deploying automation at scale.
- Knowledge of network observability, telemetry, and streaming protocols (gRPC, sFlow, SNMP, InfluxDB, Prometheus).
- Strong problem-solving skills and ability to operate in a high-availability, mission-critical datacenter environment.
Good to Have:
- GoLang experience for building scalable and high-performance automation tools.
- Familiarity with Infrastructure-as-Code (IaC) tools like Terraform or Pulumi.
- Exposure to Kubernetes networking (CNI plugins) and containerized workloads.
- Understanding of AI/ML workload characteristics and their impact on network design and performance.
- Experience with SDN solutions (e.g., Cisco ACI, VMware NSX, NVIDIA Cumulus).