About the job Senior Linux & Cloud Infrastructure Engineer - Hybrid Lisbon (1 day office)
ABOUT THE OPPORTUNITY
Join a technology company as a Senior Linux & Cloud Infrastructure Engineer and take ownership of the operation, optimization, and automation of Linux-based systems and cloud infrastructure.
You'll be responsible for building and maintaining high-performance, secure infrastructure across Linux servers, Docker containers, Kubernetes clusters, and Azure cloud environments. With your solid expertise in Linux system administration, containerization, orchestration, and monitoring tools like Checkmk, you'll ensure infrastructure reliability, optimize system performance, and actively contribute to the modernization of the technology landscape. This role offers the perfect blend of hands-on technical work and strategic infrastructure planning.
Your work will span the full infrastructure stack from Linux server administration and virtualization through container orchestration with Kubernetes and Azure AKS to comprehensive monitoring and automation. You'll have the opportunity to implement best practices, automate repetitive tasks using Bash and Ansible, and work with modern cloud-native technologies while maintaining critical production systems.
Critical Requirements: This is a senior-level position requiring strong knowledge of Linux distributions (Debian/Ubuntu, RHEL), solid experience with Docker and Kubernetes, hands-on experience with Azure Kubernetes Service (AKS), expertise with Checkmk monitoring tool, and proven automation skills using Bash and/or Ansible. B2 English level is essential for technical communication and documentation.
PROJECT & CONTEXT
You'll be responsible for the administration, maintenance, and optimization of Linux servers running Debian/Ubuntu and RHEL-based systems, ensuring high availability, performance, and security across the infrastructure. Your core responsibilities include operating virtualization and server environments, managing system configurations, performing performance analysis and troubleshooting, and handling incident management to maintain infrastructure stability.
Containerization and orchestration are fundamental to your role - you'll create, manage, and optimize Docker containers including writing Dockerfiles, managing container images, working with Docker registries, and deploying applications using docker compose. You'll deploy and operate Kubernetes clusters including creating deployments and services, managing Helm charts for application packaging, configuring ingress controllers for traffic routing, and implementing scaling strategies for production workloads. Hands-on experience with Azure Kubernetes Service (AKS) is essential as you'll manage AKS clusters, integrate with Azure services, and leverage cloud-native features.
Monitoring infrastructure with Checkmk is a critical responsibility - you'll handle setup, configuration, and expansion of Checkmk monitoring across the environment including installing and configuring monitoring agents, creating monitoring rules and automations, building dashboards for visibility, and establishing alerting mechanisms. Your expertise ensures comprehensive monitoring of systems, services, and applications, enabling proactive identification and resolution of issues before they impact users.
Working with Microsoft Azure, you'll manage cloud infrastructure including networking configuration, AKS cluster management, virtual machine administration, and identity and access management. Understanding Azure services and how they integrate with on-premises infrastructure enables you to build hybrid solutions and leverage cloud capabilities effectively.
Automation drives efficiency and reliability - you'll automate repetitive tasks, deployment processes, and configuration management using Bash scripting and/or Ansible playbooks. Creating automation reduces manual effort, ensures consistency across environments, and enables faster recovery from incidents. Your analytical thinking and solution-oriented mindset help you identify automation opportunities and implement effective solutions.
Documentation is essential for knowledge sharing and operational continuity - you'll document system architectures, processes, procedures, and configurations, ensuring team members can understand and maintain infrastructure effectively. Your independent, structured, and reliable working style enables you to manage complex systems while maintaining clear communication with team members and stakeholders.
Core Tech Stack: Linux (Debian/Ubuntu, RHEL), Docker, Kubernetes, Azure Kubernetes Service (AKS), Checkmk monitoring, Microsoft Azure
Infrastructure Focus: Linux server administration, containerization, orchestration, cloud infrastructure, monitoring and alerting, automation
Tools: Checkmk, Docker, Kubernetes, Helm, Bash, Ansible, Azure CLI
Scale: Enterprise production infrastructure supporting critical business systems and applications
WHAT WE'RE LOOKING FOR (Required)
Linux System Administration: Strong knowledge of Linux distributions including Debian/Ubuntu and RHEL-based systems with deep understanding of Linux fundamentals, system administration, package management (DEB/RPM), and server operations
Virtualization & Server Operations: Experience operating virtualization platforms and server environments including hypervisors, virtual machine management, and physical server infrastructure
Checkmk Monitoring Expertise: Expertise with Checkmk monitoring tool including installation, configuration, and operations across both Raw and Enterprise editions - ability to create monitoring rules, configure notifications, build dashboards, and manage monitoring infrastructure
Monitoring Operations: Experience monitoring systems, services, and applications including setting up alerting mechanisms, analyzing metrics, and maintaining comprehensive visibility across infrastructure
Docker Experience: Solid hands-on experience with Docker including creating and managing container images, writing Dockerfiles, working with docker compose for multi-container applications, and managing container registries
Kubernetes Knowledge: Working knowledge of Kubernetes including creating deployments and services, managing Helm charts for application packaging, configuring ingress controllers, implementing scaling strategies, and understanding cluster operations
Azure Kubernetes Service: Hands-on experience with Azure Kubernetes Service (AKS) including deploying and managing AKS clusters, integrating with Azure services, and leveraging AKS-specific features
Microsoft Azure Basics: Basic knowledge of Microsoft Azure including networking concepts and configuration, AKS management, virtual machine administration, and identity and access management
Automation Skills: Proficiency in automation using Bash scripting and/or Ansible for configuration management, deployment automation, and task orchestration
Performance Analysis: Ability to perform performance analysis, identify bottlenecks, optimize system resources, and ensure infrastructure operates efficiently
Troubleshooting: Strong troubleshooting skills for diagnosing and resolving complex infrastructure issues across Linux, containers, Kubernetes, and cloud environments
Incident Management: Experience with incident management including responding to alerts, investigating root causes, implementing fixes, and documenting resolutions
System Documentation: Ability to create and maintain clear documentation of system architectures, processes, procedures, and configurations
Analytical Thinking: Analytical thinking and solution-oriented mindset for approaching infrastructure challenges systematically
Independent Working Style: Independent, structured, and reliable working style with ability to manage tasks and priorities effectively
Communication & Teamwork: Strong communication and teamwork skills for collaborating with technical teams and stakeholders
English Proficiency: B2 level (Upper Intermediate) or higher in English for technical documentation, communication, and collaboration
Work Authorization: Eligibility to work from Portugal with availability for hybrid work model (1 day per week in Lisbon office)
NICE TO HAVE (Preferred)
Extended Monitoring Tools: Experience with additional monitoring tools including Prometheus for metrics collection and Grafana for visualization and dashboarding
IT Security & Compliance: Knowledge in IT security best practices, hardening procedures, security monitoring, and compliance requirements
Python Scripting: Proficiency in Python for automation, scripting, and tool development beyond Bash
CI/CD Pipelines: Experience with CI/CD tools and practices including Jenkins, GitLab CI, Azure DevOps, or similar platforms
Infrastructure as Code: Advanced infrastructure as code experience with Terraform, Pulumi, or ARM templates for Azure resource provisioning
Additional Cloud Platforms: Experience with other cloud platforms like AWS or GCP beyond Microsoft Azure
Kubernetes Advanced: Deep Kubernetes expertise including operators, custom resource definitions (CRDs), StatefulSets, advanced networking, and security policies
Container Security: Knowledge of container security best practices, image scanning, vulnerability management, and secure container configurations
Service Mesh: Experience with service mesh technologies like Istio or Linkerd for microservices communication
Observability: Advanced observability practices including distributed tracing, logging aggregation (ELK Stack, Loki), and metrics correlation
Configuration Management: Experience with additional configuration management tools like Puppet, Chef, or SaltStack
Load Balancing: Knowledge of load balancing solutions including HAProxy, NGINX, or cloud-native load balancers
Database Administration: Experience with database administration for PostgreSQL, MySQL, MongoDB, or other databases in Linux environments
Backup & Recovery: Expertise in backup and disaster recovery solutions, implementing backup strategies, and testing recovery procedures
Networking Advanced: Deep networking knowledge including TCP/IP, routing, firewalls, VPNs, and network troubleshooting
Storage Systems: Experience with storage systems including SAN, NAS, distributed storage (Ceph, GlusterFS), and cloud storage
High Availability: Designing and implementing high availability architectures including clustering, failover mechanisms, and redundancy
Disaster Recovery: Experience planning and implementing disaster recovery strategies and business continuity solutions
Performance Tuning: Advanced performance tuning skills for Linux systems, applications, and databases
Scripting Languages: Additional scripting capabilities in languages like Go, Ruby, or PowerShell
GitOps Practices: Experience with GitOps workflows using tools like ArgoCD or Flux for Kubernetes deployments
Logging Solutions: Hands-on experience with centralized logging solutions like ELK Stack (Elasticsearch, Logstash, Kibana) or Loki
LDAP/Active Directory: Experience integrating Linux systems with LDAP or Active Directory for authentication and authorization
DevOps Culture: Understanding of DevOps principles and practices including collaboration, continuous improvement, and automation-first mindset
Capacity Planning: Skills in capacity planning, resource forecasting, and infrastructure scaling strategies
Cost Optimization: Experience optimizing cloud costs through resource rightsizing, reserved instances, and efficient architecture
Agile Methodologies: Familiarity with Agile practices and working in Agile teams with sprints and iterative delivery
On-Call Experience: Experience with on-call rotations, incident response, and maintaining SLAs for production systems
Vendor Management: Experience working with technology vendors, managing support contracts, and coordinating external resources
Location: Lisbon, Portugal (Hybrid - 1 day per week in office)