Summary
Overview
Work History
Education
Core Competencies
Certification
Attended Trainings
Timeline
Generic
Guganesan Selvaraj

Guganesan Selvaraj

Lead DevOps Engineer
Semenyih, Selangor

Summary

CloudOps, DevOps, and Site Reliability Engineering professional with over 15 years of experience in designing, building, and operating highly available and secure infrastructure across hybrid cloud and on-premises environments. Established a strong foundation in operating systems, networking, automation, troubleshooting, and managing production services as a Linux System Administrator. Progressed into CloudOps roles, implementing infrastructure as code, designing Kubernetes-based platforms, and establishing operational standards that support 24/7 high-transaction systems. Committed to cross-functional collaboration in architecture, security, governance, and delivery while producing clear documentation, including product catalogs, SOPs, roadmaps, and runbooks to drive consistent adoption. Passion for emerging technologies fuels a willingness to experiment and apply innovative ideas that deliver tangible value.

Overview

15
15
years of professional experience
3
3
Certifications
3
3
Languages

Work History

Lead DevOps Engineer

Ørsted Malaysia
09.2024 - Current

Designed and architected a proof-of-concept (PoC) air-gapped Kubernetes container platform for OT edge sites, based on VMware vSphere with Tanzu and aligned to the Purdue network model, enabling secure application deployment under strict IT/OT segregation.

  • Architected end-to-end OT edge container platform using Supervisor + TKG clusters for restricted, air-gapped environments.
  • Built and validated PoC covering cluster lifecycle, workload onboarding, and operational runbooks for OT deployment constraints.
  • Designed external authentication and access control using Pinniped to integrate Kubernetes access with enterprise identity.
  • Implemented Kubernetes governance and baseline security controls using Kyverno (policy-as-code) with auditable exception/break-glass approach.
  • Integrated container/Kubernetes security monitoring and runtime protection using Prisma Cloud agents.
  • Designed backup/restore approach for platform components and persistent workloads using Cohesity (where applicable).
  • Implemented observability with Prometheus/Grafana and centralized logging/audit visibility via Splunk forwarding.
  • Established offline image and artifact promotion workflow using JFrog private registry for controlled releases into OT.
  • Authored platform deliverables: product catalog, governance model (RBAC/onboarding/tenancy), architecture designs, roadmap, SOPs, and runbooks.
  • Delivered using Agile practices (sprint planning, backlog grooming, demos) across IT, OT, Network, and Security stakeholders.
  • Conducted hands-on Kubernetes/Tanzu training workshops for system administrators, network engineers, and product owners.

Senior Site Reliability Engineer

Cheetah Digital
11.2018 - 08.2024

Accountable for the availability, reliability, and performance of the services in both Cloud and OnPrem platforms. This includes the design and architecture of the systems and services provided by Cheetah Digital. Responsible for providing engineering, configuration, maintenance, and support in a highly transactional 24x7 environment of the Messaging platform ensuring all of our customers, globally, have a great experience using the Cheetah Digital Messaging platform.

  • Led migration of on-prem resources to AWS and improved repeatability through automation and IaC.
  • Built CI/CD pipelines using Jenkins to execute Terraform for AWS account and infrastructure provisioning.
  • Created Packer templates to automate custom OS image builds and standardize server baselines.
  • Implemented and operated multi-cluster Kubernetes environments; performed version upgrades and storage integration (EBS/EFS/Ceph Rook).
  • Provisioned and managed AWS services using Terraform (VPC, ELB, EKS, EC2, RDS, EBS, CloudFront, ES, S3, Route53, ACM, MSK, Redis).
  • Coordinated connectivity initiatives (e.g., DirectConnect) with network teams to integrate on-prem and cloud access.
  • Deployed and maintained observability tooling (Prometheus/Grafana, ELK/Elasticsearch, AWS ES) and controlled AWS access using SSO/IAM.
  • Managed on-prem platforms: Linux (VM/physical), Foreman/Puppet builds, Rancher-based Kubernetes, OpenLDAP, Phabricator Git, DNS (TinyDNS/BIND), HAProxy, and common infra services (Kafka/ZooKeeper/Redis/NFS/FTP/Mail).

Application Support Engineer – Tier 3

Cheetah Digital
08.2016 - 11.2018

Support Cheetahmail application which is running on Linux platform. It’s an Experian email marketing software. CheetahMail, is a data-driven email technology that enables clients to build relevant relationships and personalized communication with customers.

Responsibilities include:

  • Acts as first tier in alert response and incident resolution.
  • Acts as first tier in identifying production related incidents and issues. This includes detection of system, application, or performance degradation or unavailability.
  • Tracks the resolution of any system or application problems.
  • Escalates outage and other production interruptions to appropriate individuals/groups.
  • Employs production monitoring tools to ensure that all systems & applications are running and have as close to 100% availability as possible.
  • Provide intermediate troubleshooting to identify the infrastructure element related to an incident or service request.
  • Ensures effective communication to internal customers. Manages incidents and service requests to resolution.
  • Monitor and manage ticket queues across operational teams to ensure that SLA’s are met.
  • Work with second tier to develop software tools in Bash script/Perl/Python to enable automation of support tasks.
  • Work with development team to help scope & design products for supportability, reporting and capacity management. Supplement QA as needed for large scale releases.
  • Special Assignment: Travelled to Experian’s office in Costa Rica to receive knowledge transfer

Senior System Administrator - Linux

CSC Malaysia SDN BHD
10.2015 - 08.2016
  • Led Linux system administration supporting multiple enterprise accounts (Manulife, AIA, Dnata).
  • Built and maintained RHEL/SUSE servers and managed Linux VMs on VMware.
  • Performed health checks, security patching, documentation, and ITIL-aligned change/incident management.
  • Project work: Account transition (CSC India → Kuala Lumpur) and involvement in AWS migration activities.

System Administrator - Linux

IBM GDC Malaysia
02.2013 - 10.2015
  • Supported IBM Denmark Linux infrastructure (RHEL) with 24x7 on-call responsibilities.
  • Managed DNS/mail/web/FTP services, scheduling (CRON/AT), Kerberos admin, and VMware vSphere environments.
  • Tuned monitoring agents (IBM Tivoli Endpoint), performed backup/restore via IBM TSM, and wrote shell automation.
  • Delivered SOPs and supported major upgrade/migration initiatives including RHEL3/4 → RHEL5/6.
  • Knowledge transfer: Travelled to IBM Denmark; supported account transition to IBM GDC Malaysia.

Network Support Engineer

JW NETWORKS SDN BHD
06.2012 - 02.2013
  • NOC operations and backend support including LAMP stack administration and managed services platforms.
  • Performed OS/app migrations (Windows Server 2008 → CentOS Linux) and general system administration (Linux/Windows).
  • Supported solution design discussions, budgeting for equipment, and BAU infrastructure operations.
  • Special projects: Technical supervisor initiatives for Maxis (Traffic Monitoring, Remote Surveillance, Personal Security System).

Technician

Western Digital Malaysia
01.2011 - 06.2012
  • Executed firmware tests, maintained lab equipment, and supported validation activities across multiple OS environments.
  • Owned internal department networking as first-level escalation and delivered daily test status reporting.
  • Built failing scenarios for bench reproduction and provided training/presentations to technicians and other departments.
  • Special projects: PXE server setup (DOS/Windows boot), web design support for internal divisions.

Education

BSc (Hons) Computing -

Segi University College
11.2015

Diploma in Information Technology - Systems administration

New Era College
Malaysia
01.2010

Foundation in Computing - IT

University Tunku Abdul Rahman (UTAR)
Malaysia
04.2007

Core Competencies

•  Linux and Operating Systems: Hands on administration, troubleshooting, patching and hardening across CentOS, Red Hat Enterprise Linux, SUSE, Ubuntu, and Amazon Linux.
•  Cloud Platforms: Delivery and operations across AWS, Google Cloud Platform, and DigitalOcean, supporting hybrid and on premises integration.
•  Virtualization: VMware based infrastructure for enterprise workloads, with Oracle VirtualBox for lab and development environments.
•  Containers and Orchestration: Docker containerization and Kubernetes platforms including upstream Kubernetes, Rancher, Amazon EKS, and VMware Tanzu.
•  Infrastructure as Code and Configuration Management: Terraform and Packer for repeatable builds and provisioning, supported by Ansible, Puppet, Foreman for configuration management and lifecycle automation.
•  CI/CD and Automation: Jenkins pipeline design and implementation to standardize builds, deployments, and environment consistency.
•  Scripting and Operational Tooling: Bash, PHP, and Perl for automation, diagnostics, and repeatable operational workflows.
•  Databases: Operational support and administration exposure across MySQL, MariaDB, MongoDB, and PostgreSQL.
•  Monitoring and Observability: Shinken, Prometheus and Grafana for monitoring and alerting, with Splunk for centralized logging, auditing, and operational visibility.
•  Collaboration and ITSM Delivery Tools: Jira for agile delivery and tracking, Confluence for documentation, Opsgenie for incident workflows, and ServiceNow or Bugzilla for service management and ticketing.
•  Version Control: Git based workflows using GitHub, GitLab, and Bitbucket.

Certification

Certified AWS Solution Architect – Associate (2016)

Attended Trainings

  • VSphere with Tanzu: Deploy, Configure, Manage [V8] (2024)
  • VSphere: Install, Configure, Manage [V6], (Attendance certificate)(2015)
  • CCNA- Attendance certificate (2012)
  • SUSE 11 System Administration (2016)

Timeline

Lead DevOps Engineer

Ørsted Malaysia
09.2024 - Current

Senior Site Reliability Engineer

Cheetah Digital
11.2018 - 08.2024

Application Support Engineer – Tier 3

Cheetah Digital
08.2016 - 11.2018
Certified AWS Solution Architect – Associate (2016)
03-2016

Senior System Administrator - Linux

CSC Malaysia SDN BHD
10.2015 - 08.2016
ITIL V3 Foundation (2014)
06-2014

System Administrator - Linux

IBM GDC Malaysia
02.2013 - 10.2015

Network Support Engineer

JW NETWORKS SDN BHD
06.2012 - 02.2013

Technician

Western Digital Malaysia
01.2011 - 06.2012
Certified Ethical Hacker (CEH) (2010)
12-2010

Diploma in Information Technology - Systems administration

New Era College

Foundation in Computing - IT

University Tunku Abdul Rahman (UTAR)

BSc (Hons) Computing -

Segi University College
Guganesan SelvarajLead DevOps Engineer