Senior Linux & Cloud Infrastructure Engineer - Hybrid Lisbon (1 day office)

Lisbon, Portugal

Job Openings Senior Linux & Cloud Infrastructure Engineer - Hybrid Lisbon (1 day office)

About the job Senior Linux & Cloud Infrastructure Engineer - Hybrid Lisbon (1 day office)

ABOUT THE OPPORTUNITY

Join a technology company as a Senior Linux & Cloud Infrastructure Engineer and take ownership of the operation, optimization, and automation of Linux-based systems and cloud infrastructure.

You'll be responsible for building and maintaining high-performance, secure infrastructure across Linux servers, Docker containers, Kubernetes clusters, and Azure cloud environments. With your solid expertise in Linux system administration, containerization, orchestration, and monitoring tools like Checkmk, you'll ensure infrastructure reliability, optimize system performance, and actively contribute to the modernization of the technology landscape. This role offers the perfect blend of hands-on technical work and strategic infrastructure planning.

Your work will span the full infrastructure stack from Linux server administration and virtualization through container orchestration with Kubernetes and Azure AKS to comprehensive monitoring and automation. You'll have the opportunity to implement best practices, automate repetitive tasks using Bash and Ansible, and work with modern cloud-native technologies while maintaining critical production systems.

Critical Requirements: This is a senior-level position requiring strong knowledge of Linux distributions (Debian/Ubuntu, RHEL), solid experience with Docker and Kubernetes, hands-on experience with Azure Kubernetes Service (AKS), expertise with Checkmk monitoring tool, and proven automation skills using Bash and/or Ansible. B2 English level is essential for technical communication and documentation.

PROJECT & CONTEXT

You'll be responsible for the administration, maintenance, and optimization of Linux servers running Debian/Ubuntu and RHEL-based systems, ensuring high availability, performance, and security across the infrastructure. Your core responsibilities include operating virtualization and server environments, managing system configurations, performing performance analysis and troubleshooting, and handling incident management to maintain infrastructure stability.

Containerization and orchestration are fundamental to your role - you'll create, manage, and optimize Docker containers including writing Dockerfiles, managing container images, working with Docker registries, and deploying applications using docker compose. You'll deploy and operate Kubernetes clusters including creating deployments and services, managing Helm charts for application packaging, configuring ingress controllers for traffic routing, and implementing scaling strategies for production workloads. Hands-on experience with Azure Kubernetes Service (AKS) is essential as you'll manage AKS clusters, integrate with Azure services, and leverage cloud-native features.

Monitoring infrastructure with Checkmk is a critical responsibility - you'll handle setup, configuration, and expansion of Checkmk monitoring across the environment including installing and configuring monitoring agents, creating monitoring rules and automations, building dashboards for visibility, and establishing alerting mechanisms. Your expertise ensures comprehensive monitoring of systems, services, and applications, enabling proactive identification and resolution of issues before they impact users.

Working with Microsoft Azure, you'll manage cloud infrastructure including networking configuration, AKS cluster management, virtual machine administration, and identity and access management. Understanding Azure services and how they integrate with on-premises infrastructure enables you to build hybrid solutions and leverage cloud capabilities effectively.

Automation drives efficiency and reliability - you'll automate repetitive tasks, deployment processes, and configuration management using Bash scripting and/or Ansible playbooks. Creating automation reduces manual effort, ensures consistency across environments, and enables faster recovery from incidents. Your analytical thinking and solution-oriented mindset help you identify automation opportunities and implement effective solutions.

Documentation is essential for knowledge sharing and operational continuity - you'll document system architectures, processes, procedures, and configurations, ensuring team members can understand and maintain infrastructure effectively. Your independent, structured, and reliable working style enables you to manage complex systems while maintaining clear communication with team members and stakeholders.

Core Tech Stack: Linux (Debian/Ubuntu, RHEL), Docker, Kubernetes, Azure Kubernetes Service (AKS), Checkmk monitoring, Microsoft Azure

Infrastructure Focus: Linux server administration, containerization, orchestration, cloud infrastructure, monitoring and alerting, automation

Tools: Checkmk, Docker, Kubernetes, Helm, Bash, Ansible, Azure CLI

Scale: Enterprise production infrastructure supporting critical business systems and applications

WHAT WE'RE LOOKING FOR (Required)

Linux System Administration: Strong knowledge of Linux distributions including Debian/Ubuntu and RHEL-based systems with deep understanding of Linux fundamentals, system administration, package management (DEB/RPM), and server operations

Virtualization & Server Operations: Experience operating virtualization platforms and server environments including hypervisors, virtual machine management, and physical server infrastructure

Checkmk Monitoring Expertise: Expertise with Checkmk monitoring tool including installation, configuration, and operations across both Raw and Enterprise editions - ability to create monitoring rules, configure notifications, build dashboards, and manage monitoring infrastructure

Monitoring Operations: Experience monitoring systems, services, and applications including setting up alerting mechanisms, analyzing metrics, and maintaining comprehensive visibility across infrastructure

Docker Experience: Solid hands-on experience with Docker including creating and managing container images, writing Dockerfiles, working with docker compose for multi-container applications, and managing container registries

Kubernetes Knowledge: Working knowledge of Kubernetes including creating deployments and services, managing Helm charts for application packaging, configuring ingress controllers, implementing scaling strategies, and understanding cluster operations

Azure Kubernetes Service: Hands-on experience with Azure Kubernetes Service (AKS) including deploying and managing AKS clusters, integrating with Azure services, and leveraging AKS-specific features

Microsoft Azure Basics: Basic knowledge of Microsoft Azure including networking concepts and configuration, AKS management, virtual machine administration, and identity and access management

Automation Skills: Proficiency in automation using Bash scripting and/or Ansible for configuration management, deployment automation, and task orchestration

Performance Analysis: Ability to perform performance analysis, identify bottlenecks, optimize system resources, and ensure infrastructure operates efficiently

Troubleshooting: Strong troubleshooting skills for diagnosing and resolving complex infrastructure issues across Linux, containers, Kubernetes, and cloud environments

Incident Management: Experience with incident management including responding to alerts, investigating root causes, implementing fixes, and documenting resolutions

System Documentation: Ability to create and maintain clear documentation of system architectures, processes, procedures, and configurations

Analytical Thinking: Analytical thinking and solution-oriented mindset for approaching infrastructure challenges systematically

Independent Working Style: Independent, structured, and reliable working style with ability to manage tasks and priorities effectively

Communication & Teamwork: Strong communication and teamwork skills for collaborating with technical teams and stakeholders

English Proficiency: B2 level (Upper Intermediate) or higher in English for technical documentation, communication, and collaboration

Work Authorization: Eligibility to work from Portugal with availability for hybrid work model (1 day per week in Lisbon office)

NICE TO HAVE (Preferred)

Extended Monitoring Tools: Experience with additional monitoring tools including Prometheus for metrics collection and Grafana for visualization and dashboarding

IT Security & Compliance: Knowledge in IT security best practices, hardening procedures, security monitoring, and compliance requirements

Python Scripting: Proficiency in Python for automation, scripting, and tool development beyond Bash

CI/CD Pipelines: Experience with CI/CD tools and practices including Jenkins, GitLab CI, Azure DevOps, or similar platforms

Infrastructure as Code: Advanced infrastructure as code experience with Terraform, Pulumi, or ARM templates for Azure resource provisioning

Additional Cloud Platforms: Experience with other cloud platforms like AWS or GCP beyond Microsoft Azure

Kubernetes Advanced: Deep Kubernetes expertise including operators, custom resource definitions (CRDs), StatefulSets, advanced networking, and security policies

Container Security: Knowledge of container security best practices, image scanning, vulnerability management, and secure container configurations

Service Mesh: Experience with service mesh technologies like Istio or Linkerd for microservices communication

Observability: Advanced observability practices including distributed tracing, logging aggregation (ELK Stack, Loki), and metrics correlation

Configuration Management: Experience with additional configuration management tools like Puppet, Chef, or SaltStack

Load Balancing: Knowledge of load balancing solutions including HAProxy, NGINX, or cloud-native load balancers

Database Administration: Experience with database administration for PostgreSQL, MySQL, MongoDB, or other databases in Linux environments

Backup & Recovery: Expertise in backup and disaster recovery solutions, implementing backup strategies, and testing recovery procedures

Networking Advanced: Deep networking knowledge including TCP/IP, routing, firewalls, VPNs, and network troubleshooting

Storage Systems: Experience with storage systems including SAN, NAS, distributed storage (Ceph, GlusterFS), and cloud storage

High Availability: Designing and implementing high availability architectures including clustering, failover mechanisms, and redundancy

Disaster Recovery: Experience planning and implementing disaster recovery strategies and business continuity solutions

Performance Tuning: Advanced performance tuning skills for Linux systems, applications, and databases

Scripting Languages: Additional scripting capabilities in languages like Go, Ruby, or PowerShell

GitOps Practices: Experience with GitOps workflows using tools like ArgoCD or Flux for Kubernetes deployments

Logging Solutions: Hands-on experience with centralized logging solutions like ELK Stack (Elasticsearch, Logstash, Kibana) or Loki

LDAP/Active Directory: Experience integrating Linux systems with LDAP or Active Directory for authentication and authorization

DevOps Culture: Understanding of DevOps principles and practices including collaboration, continuous improvement, and automation-first mindset

Capacity Planning: Skills in capacity planning, resource forecasting, and infrastructure scaling strategies

Cost Optimization: Experience optimizing cloud costs through resource rightsizing, reserved instances, and efficient architecture

Agile Methodologies: Familiarity with Agile practices and working in Agile teams with sprints and iterative delivery

On-Call Experience: Experience with on-call rotations, incident response, and maintaining SLAs for production systems

Vendor Management: Experience working with technology vendors, managing support contracts, and coordinating external resources

Location: Lisbon, Portugal (Hybrid - 1 day per week in office)

Or refer someone