π Building a Production-Ready Movie Ticketing Platform: A Complete DevOps Journey
From Manual Deployments to Automated CI/CD: How I Built and Deployed BookMyShow Clone on AWS EKS with Zero Downtime
π― Introduction
Hey there! π
Have you ever wondered what it takes to deploy a real-world application with enterprise-grade DevOps practices? Not just a simple "hello world" container, but a complete, production-ready system with automated testing, security scanning, auto-scaling, and real-time monitoring?
That's exactly what I set out to build, and in this comprehensive guide, I'll take you through my journey of deploying a BookMyShow clone - a movie ticketing platform - using modern DevOps tools and practices.
What You'll Learn
By the end of this post, you'll understand:
β How to build a complete CI/CD pipeline from scratch
β Deploying applications to AWS EKS (Kubernetes)
β Implementing automated security scanning and code quality checks
β Setting up real-time monitoring with Prometheus and Grafana
β Achieving zero-downtime deployments with Kubernetes
β Real-world challenges and solutions from production deployment
Why This Project?
In today's cloud-native world, knowing how to deploy applications manually isn't enough. Companies need:
Fast deployments (multiple times per day)
Reliable systems (99.9%+ uptime)
Secure infrastructure (automated vulnerability scanning)
Observable applications (real-time metrics and alerts)
This project demonstrates all of these in action.
π¬ The Challenge
The Problem Statement
Imagine you're tasked with modernizing a legacy movie ticketing application. The current state:
Before DevOps:
π Manual deployments taking 2-3 hours
π High failure rate (~30-40% failed deployments)
π No visibility into application health
π± Security vulnerabilities discovered only after incidents
β° Deployments only once a week (fear of breaking things)
π No auto-scaling (server crashes during peak times)
Business Impact:
Lost revenue during downtime
Slow feature delivery
Security risks
Poor user experience during peak loads
The Goal
Transform this into a modern, cloud-native application with:
β‘ 12-minute deployments (instead of 2+ hours)
π― 95%+ success rate in deployments
π Real-time monitoring and alerting
π Automated security scanning
π Multiple deployments per day
π Auto-scaling based on load
π‘ Solution Overview
The Technology Stack
I chose a modern, battle-tested stack:
| Component | Technology | Why? |
| Source Control | GitHub | Industry standard, great integration |
| CI/CD | Jenkins | Flexible, plugin ecosystem |
| Code Quality | SonarQube | Comprehensive code analysis |
| Security | Trivy | Fast, accurate vulnerability scanning |
| Containers | Docker | Standard containerization |
| Orchestration | Kubernetes (EKS) | Production-grade, AWS managed |
| Cloud | AWS | Reliable, scalable infrastructure |
| Monitoring | Prometheus + Grafana | Open-source, powerful metrics |
High-Level Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β DEVELOPER WORKFLOW β
β β
β Developer β Git Push β GitHub β Webhook β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β JENKINS CI/CD PIPELINE β
β β
β Checkout β SonarQube β Build β Test β Trivy β
β β β
β Docker Build β Push β Deploy to EKS β Notify β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β AWS EKS CLUSTER β
β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β Pod 1 β β Pod 2 β β Pod 3 β β
β β Node.js β β Node.js β β Node.js β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β β β β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β LoadBalancer (AWS ELB) β β
β ββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β
[End Users]
β
Monitor with
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β PROMETHEUS + GRAFANA β
β β
β Real-time Metrics β Alerts β Dashboards β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
The Game Plan
The implementation was divided into 4 phases:
Phase 1: Infrastructure Setup (AWS, EKS, Servers)
Phase 2: CI/CD Pipeline (Jenkins, SonarQube, Docker)
Phase 3: Kubernetes Deployment (EKS, LoadBalancer)
Phase 4: Monitoring (Prometheus, Grafana)
Let's dive into each phase! πββοΈ
ποΈ Phase 1: Infrastructure Setup
Setting Up AWS Infrastructure
First Decision: Which AWS Services?
After evaluating options, I chose:
Amazon EKS for Kubernetes (managed control plane = less operational overhead)
EC2 t3.medium for worker nodes (cost-effective for our workload)
Classic Load Balancer for traffic distribution
EBS gp3 volumes for storage (better price/performance than gp2)
Infrastructure Components
1. BMS Server (Development Server)
Instance: EC2 t2.large
RAM: 8 GB
vCPU: 2
Storage: 28 GB
Purpose: Jenkins, Docker, Build tools
This server runs:
Jenkins CI/CD server
Docker for building images
AWS CLI, kubectl, eksctl
Development tools
2. EKS Cluster
Control Plane: Managed by AWS
Worker Nodes: 3x t3.medium
Kubernetes Version: 1.30
Regions: us-east-1a, us-east-1b
3. Monitoring Server
Instance: EC2 t2.medium
RAM: 4 GB
vCPU: 2
Storage: 20 GB
Purpose: Prometheus, Grafana, Node Exporter
Step-by-Step Infrastructure Creation
1. IAM User Setup
Why not use root account? Security best practice. If credentials are compromised, impact is limited.
Created IAM user eks-admin with these policies:
AmazonEC2FullAccessAmazonEKSClusterPolicyAmazonEKSWorkerNodePolicyIAMFullAccessCustom EKS policy for cluster operations
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": "eks:*",
"Resource": "*"
}]
}
Generated Access Keys and stored them securely (never commit to Git!).
2. Installing Essential Tools
Connected to BMS Server and installed:
AWS CLI:
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
aws configure # Entered access keys here
kubectl (Kubernetes CLI):
curl -o kubectl https://amazon-eks.s3.us-west-2.amazonaws.com/1.19.6/2021-01-05/bin/linux/amd64/kubectl
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin
kubectl version --client
eksctl (EKS Management Tool):
curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin
eksctl version
3. Creating EKS Cluster
This was the most exciting part! Creating a production Kubernetes cluster with one command:
Step A: Create Control Plane (10 minutes)
eksctl create cluster \
--name=bms-eks \
--region=us-east-1 \
--zones=us-east-1a,us-east-1b \
--version=1.30 \
--without-nodegroup
While waiting, I grabbed coffee β and watched AWS create:
VPC with public/private subnets
Internet Gateway
Route tables
Security groups
EKS control plane
Step B: Associate OIDC Provider (2 minutes)
eksctl utils associate-iam-oidc-provider \
--region us-east-1 \
--cluster bms-eks \
--approve
Why OIDC? This enables IAM Roles for Service Accounts (IRSA). It allows Kubernetes pods to securely access AWS services without storing credentials. Game-changer for security!
Step C: Create Worker Nodes (10 minutes)
eksctl create nodegroup \
--cluster=bms-eks \
--region=us-east-1 \
--name=worker-nodes \
--node-type=t3.medium \
--nodes=3 \
--nodes-min=2 \
--nodes-max=4 \
--node-volume-size=20 \
--ssh-access \
--ssh-public-key=YourKeyName \
--managed \
--asg-access \
--external-dns-access \
--full-ecr-access \
--appmesh-access \
--alb-ingress-access
Verification:
kubectl get nodes
NAME STATUS ROLES AGE
ip-192-168-20-52.ec2.internal Ready <none> 5m
ip-192-168-34-87.ec2.internal Ready <none> 5m
ip-192-168-35-132.ec2.internal Ready <none> 5m
β Success! Three healthy nodes running.
Security Configuration
Configured security groups to allow:
Port 22: SSH access
Port 80/443: HTTP/HTTPS traffic
Port 3000-10000: Application ports
Port 8080: Jenkins
Port 9000: SonarQube
Port 9090: Prometheus
Port 3000: Grafana
Pro Tip: In production, restrict SSH to specific IPs only!
π Phase 2: CI/CD Pipeline
The Jenkins Setup
Installing Jenkins
Created an installation script:
#!/bin/bash
# Install Java 17 (Jenkins requirement)
sudo apt install openjdk-17-jre-headless -y
# Add Jenkins repository
sudo wget -O /usr/share/keyrings/jenkins-keyring.asc \
https://pkg.jenkins.io/debian-stable/jenkins.io-2023.key
echo deb [signed-by=/usr/share/keyrings/jenkins-keyring.asc] \
https://pkg.jenkins.io/debian-stable binary/ | sudo tee \
/etc/apt/sources.list.d/jenkins.list > /dev/null
# Install Jenkins
sudo apt-get update
sudo apt-get install jenkins -y
Executed: chmod +x jenkins.sh && ./jenkins.sh
Accessed Jenkins:
URL:
http://server-ip:8080Retrieved initial password:
sudo cat /var/lib/jenkins/secrets/initialAdminPasswordInstalled recommended plugins
Created admin user
Installing Docker
Jenkins needs Docker to build container images:
#!/bin/bash
sudo apt-get update
sudo apt-get install -y ca-certificates curl
# Add Docker GPG key and repository
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
-o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] \
https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install Docker
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io
Fix permissions:
sudo chmod 666 /var/run/docker.sock
docker --version # Verify
docker login -u your-dockerhub-username # Login to Docker Hub
Security Scanning with Trivy
Trivy scans Docker images for vulnerabilities:
#!/bin/bash
sudo apt-get install wget apt-transport-https gnupg -y
wget -qO - https://aquasecurity.github.io/trivy-repo/deb/public.key | \
gpg --dearmor | sudo tee /usr/share/keyrings/trivy.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/trivy.gpg] \
https://aquasecurity.github.io/trivy-repo/deb generic main" | \
sudo tee -a /etc/apt/sources.list.d/trivy.list
sudo apt-get update
sudo apt-get install trivy -y
Code Quality with SonarQube
Deployed SonarQube as a Docker container:
docker run -d --name sonar -p 9000:9000 sonarqube:lts-community
Setup:
Accessed
http://server-ip:9000Default login:
admin/adminChanged password
Generated authentication token for Jenkins
Jenkins Configuration
Essential Plugins Installed
Navigated to: Manage Jenkins β Plugins β Available
Installed:
β Eclipse Temurin Installer (Java)
β SonarQube Scanner
β NodeJS
β Docker (multiple plugins)
β Kubernetes (CLI, Client API, Credentials)
β Pipeline Stage View
β Email Extension Template
β Prometheus Metrics
Tool Configuration
Java JDK:
Name:
jdk17Version: JDK 17.0.x
Install automatically: β
Node.js:
Name:
node23Version: NodeJS 23.x
Install automatically: β
SonarQube Scanner:
Name:
sonar-scannerInstall automatically: β
Docker:
Name:
dockerInstall automatically: β
SonarQube Integration
In SonarQube:
Generated token: Administration β Security β Users β Tokens
Copied token (looks like:
squ_xxxxxxxxxxxxx)
In Jenkins:
Manage Jenkins β System β SonarQube Servers
Added server:
Name:
sonar-serverURL:
http://server-ip:9000Authentication token: Created credential with token
Email Notifications Setup
This was crucial for team notifications!
Generated Gmail App Password:
Google Account β Security β 2-Step Verification
App Passwords β Create for Jenkins
Copied 16-character password
Configured in Jenkins:
Extended Email:
SMTP Server:
smtp.gmail.comPort:
465Credentials: Username + App Password
Use SSL: β
Content Type: HTML
Standard Email:
Same SMTP settings
Added default recipients
Configured triggers: Always, Failure, Success
Tested by sending test email - worked perfectly! βοΈ
The Complete Jenkins Pipeline
Here's the heart of the automation - the Jenkinsfile:
pipeline {
agent any
tools {
jdk 'jdk17'
nodejs 'node23'
}
environment {
SCANNER_HOME = tool 'sonar-scanner'
DOCKER_IMAGE = 'your-dockerhub-username/bms:latest'
EKS_CLUSTER_NAME = 'kastro-eks'
AWS_REGION = 'us-east-1'
}
stages {
stage('Clean Workspace') {
steps {
cleanWs()
echo 'π§Ή Workspace cleaned'
}
}
stage('Checkout from Git') {
steps {
git branch: 'main',
url: 'https://github.com/your-repo/Book-My-Show.git'
sh 'ls -la'
echo 'β
Code checked out successfully'
}
}
stage('SonarQube Analysis') {
steps {
withSonarQubeEnv('sonar-server') {
sh '''
echo "π Starting code analysis..."
$SCANNER_HOME/bin/sonar-scanner \
-Dsonar.projectName=BMS \
-Dsonar.projectKey=BMS
'''
}
echo 'β
SonarQube analysis completed'
}
}
stage('Quality Gate') {
steps {
script {
echo 'π¦ Waiting for quality gate...'
waitForQualityGate abortPipeline: false,
credentialsId: 'Sonar-token'
}
echo 'β
Quality gate passed'
}
}
stage('Install Dependencies') {
steps {
sh '''
cd bookmyshow-app
echo "π¦ Installing npm packages..."
if [ -f package.json ]; then
rm -rf node_modules package-lock.json
npm install
echo "β
Dependencies installed"
else
echo "β Error: package.json not found!"
exit 1
fi
'''
}
}
stage('Trivy FS Scan') {
steps {
sh '''
echo "π Running Trivy security scan..."
trivy fs . > trivyfs.txt
echo "β
Security scan completed"
'''
}
}
stage('Docker Build & Push') {
steps {
script {
withDockerRegistry(credentialsId: 'docker', toolName: 'docker') {
sh '''
echo "π³ Building Docker image..."
docker build --no-cache -t $DOCKER_IMAGE \
-f bookmyshow-app/Dockerfile bookmyshow-app
echo "π€ Pushing to Docker Hub..."
docker push $DOCKER_IMAGE
echo "β
Image pushed successfully"
'''
}
}
}
}
stage('Deploy to EKS Cluster') {
steps {
script {
sh '''
echo "βΈοΈ Deploying to Kubernetes..."
# Configure kubectl
aws eks update-kubeconfig \
--name $EKS_CLUSTER_NAME \
--region $AWS_REGION
# Apply configurations
kubectl apply -f deployment.yml
kubectl apply -f service.yml
echo "β³ Waiting for rollout..."
kubectl rollout status deployment/bookmyshow-deployment
echo "π Deployment successful!"
# Show status
echo "\nπ Current Status:"
kubectl get pods
kubectl get svc
'''
}
}
}
}
post {
always {
emailext (
attachLog: true,
subject: "${currentBuild.result}: ${env.JOB_NAME} #${env.BUILD_NUMBER}",
body: """
<html>
<body style="font-family: Arial, sans-serif;">
<h2 style="color: ${currentBuild.result == 'SUCCESS' ? 'green' : 'red'};">
Build ${currentBuild.result}
</h2>
<p><strong>Project:</strong> ${env.JOB_NAME}</p>
<p><strong>Build Number:</strong> ${env.BUILD_NUMBER}</p>
<p><strong>Duration:</strong> ${currentBuild.durationString}</p>
<p><strong>Build URL:</strong>
<a href="${env.BUILD_URL}">${env.BUILD_URL}</a>
</p>
<hr>
<p>Check attached logs for details.</p>
</body>
</html>
""",
to: 'your-email@gmail.com',
mimeType: 'text/html',
attachmentsPattern: 'trivyfs.txt'
)
}
success {
echo 'π Pipeline completed successfully!'
}
failure {
echo 'β Pipeline failed. Check logs for details.'
}
}
}
Configuring Jenkins for Kubernetes
Critical step: Jenkins needs AWS credentials to access EKS.
# Switch to jenkins user
sudo -su jenkins
# Configure AWS
aws configure
# Entered: Access Key, Secret Key, Region
# Verify
aws sts get-caller-identity
# Update kubeconfig
aws eks update-kubeconfig --name kastro-eks --region us-east-1
# Test kubectl
kubectl get nodes
# Exit jenkins user
exit
# Restart Jenkins
sudo systemctl restart jenkins
Why this matters: Jenkins runs as the jenkins user, not as root or your user. All AWS/kubectl configurations must be done as jenkins user.
βΈοΈ Phase 3: Kubernetes Deployment
Creating Kubernetes Manifests
Deployment Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: bookmyshow-deployment
labels:
app: bookmyshow
spec:
replicas: 3 # High availability
selector:
matchLabels:
app: bookmyshow
strategy:
type: RollingUpdate # Zero-downtime updates
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: bookmyshow
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "3000"
spec:
containers:
- name: bookmyshow
image: your-dockerhub-username/bms:latest
ports:
- containerPort: 3000
name: http
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /
port: 3000
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
env:
- name: NODE_ENV
value: "production"
- name: PORT
value: "3000"
Key Features:
3 replicas: High availability across nodes
Rolling updates: Zero downtime during deployments
Resource limits: Prevents resource hogging
Health checks: Auto-restart unhealthy pods
Prometheus annotations: Metrics scraping
Service Configuration (LoadBalancer)
apiVersion: v1
kind: Service
metadata:
name: bms-service
labels:
app: bookmyshow
spec:
type: LoadBalancer # AWS ELB provisioned automatically
selector:
app: bookmyshow
ports:
- protocol: TCP
port: 80
targetPort: 3000
name: http
sessionAffinity: ClientIP # Sticky sessions
What happens: When applied, AWS automatically creates a Classic Load Balancer and configures it to route traffic to our pods!
Deployment in Action
# Apply configurations
kubectl apply -f deployment.yml
kubectl apply -f service.yml
# Watch deployment progress
kubectl get pods -w
# Output:
NAME READY STATUS RESTARTS
bookmyshow-deployment-7d4c8b9f-8xk2p 1/1 Running 0
bookmyshow-deployment-7d4c8b9f-9lm4q 1/1 Running 0
bookmyshow-deployment-7d4c8b9f-x7n9r 1/1 Running 0
# Check service
kubectl get svc
# Output:
NAME TYPE EXTERNAL-IP
bms-service LoadBalancer a5b2f875...elb.amazonaws.com
Moment of truth: Accessing the LoadBalancer URL... π€
curl http://a5b2f875...elb.amazonaws.com
It works! π Application is live and accessible!
Auto-Scaling Configuration
Added Horizontal Pod Autoscaler:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: bms-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: bookmyshow-deployment
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Result: Application automatically scales from 2 to 10 pods based on CPU/memory usage!
π Phase 4: Monitoring & Observability
Why Monitoring Matters
Without monitoring, you're flying blind. You need to know:
Is the application healthy?
Are users experiencing slow responses?
Is memory leaking?
When should you scale?
Prometheus Setup
Launched a separate monitoring server (t2.medium) and installed Prometheus.
Created prometheus user:
sudo useradd --system --no-create-home --shell /bin/false prometheus
Downloaded and installed:
wget https://github.com/prometheus/prometheus/releases/download/v2.47.1/prometheus-2.47.1.linux-amd64.tar.gz
tar -xvf prometheus-2.47.1.linux-amd64.tar.gz
sudo mkdir -p /data /etc/prometheus
cd prometheus-2.47.1.linux-amd64/
sudo mv prometheus promtool /usr/local/bin/
sudo mv consoles/ console_libraries/ /etc/prometheus/
sudo mv prometheus.yml /etc/prometheus/
sudo chown -R prometheus:prometheus /etc/prometheus/ /data/
Created systemd service:
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/data \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.listen-address=0.0.0.0:9090 \
--web.enable-lifecycle
[Install]
WantedBy=multi-user.target
Started Prometheus:
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheus
Accessed: http://monitoring-server-ip:9090 β
Node Exporter for System Metrics
sudo useradd --system --no-create-home --shell /bin/false node_exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
tar -xvf node_exporter-1.6.1.linux-amd64.tar.gz
sudo mv node_exporter-1.6.1.linux-amd64/node_exporter /usr/local/bin/
Created service and started:
sudo systemctl enable node_exporter
sudo systemctl start node_exporter
Configuring Scrape Targets
Edited /etc/prometheus/prometheus.yml:
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node_exporter'
static_configs:
- targets: ['monitoring-server-ip:9100']
- job_name: 'jenkins'
metrics_path: '/prometheus'
static_configs:
- targets: ['jenkins-server-ip:8080']
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
Reloaded configuration:
curl -X POST http://localhost:9090/-/reload
Verified targets: All showing as UP! π’
Grafana Setup
Installed Grafana:
# Add GPG key
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
# Add repository
echo "deb https://packages.grafana.com/oss/deb stable main" | \
sudo tee -a /etc/apt/sources.list.d/grafana.list
# Install
sudo apt-get update
sudo apt-get install grafana -y
# Start service
sudo systemctl enable grafana-server
sudo systemctl start grafana-server
Accessed: http://monitoring-server-ip:3000
Default credentials:
admin/adminChanged password on first login
Setting Up Dashboards
Added Prometheus Data Source:
Configuration β Data Sources β Add data source
Selected Prometheus
URL:
http://localhost:9090Clicked "Save & Test" β
Imported Pre-built Dashboards:
Dashboard 1: Node Exporter Full (ID: 1860)
Comprehensive system metrics
CPU, Memory, Disk, Network
Beautiful visualizations
Dashboard 2: Jenkins Performance (ID: 9964)
Build statistics
Success/failure rates
Queue metrics
Executor usage
The Result:
Real-time visibility into:
π System resource usage
ποΈ Build performance
βΈοΈ Kubernetes cluster health
π Application metrics
Now I could see everything happening in real-time! No more guessing.
πͺ Challenges & Solutions
Every project has challenges. Here were mine and how I solved them:
Challenge 1: LoadBalancer Not Accessible
The Problem:
After deploying to Kubernetes, I got the LoadBalancer DNS from kubectl get svc, but when I tried to access it... nothing. Timeout errors.
kubectl get svc
NAME TYPE EXTERNAL-IP
bms-service LoadBalancer a5b2f875...elb.amazonaws.com
curl http://a5b2f875...elb.amazonaws.com
# curl: (7) Failed to connect to a5b2f875...elb.amazonaws.com port 80: Connection timed out
The Investigation:
I spent 2 hours debugging:
β Pods were running fine
β Service was created
β LoadBalancer was provisioned in AWS
β But couldn't access the application
The Root Cause:
Security group! The automatically created LoadBalancer security group wasn't allowing inbound traffic on port 80.
The Solution:
# Found the LoadBalancer security group
aws elbv2 describe-load-balancers \
--region us-east-1 | grep SecurityGroups
# Added inbound rule for port 80
aws ec2 authorize-security-group-ingress \
--group-id sg-xxxxx \
--protocol tcp \
--port 80 \
--cidr 0.0.0.0/0 \
--region us-east-1
Result: Application immediately accessible! π
Lesson Learned: Always verify security groups after creating AWS resources. Don't assume they're configured correctly.
Challenge 2: Jenkins Can't Access EKS
The Problem:
Pipeline failed at the deployment stage:
error: You must be logged in to the server (Unauthorized)
The Root Cause:
I had configured AWS CLI and kubectl as my user, but Jenkins runs as the jenkins user!
The Solution:
# Switch to jenkins user
sudo -su jenkins
# Configure AWS credentials
aws configure
# Entered access key and secret key
# Update kubeconfig
aws eks update-kubeconfig --name kastro-eks --region us-east-1
# Verify
kubectl get nodes
# Exit and restart Jenkins
exit
sudo systemctl restart jenkins
Result: Pipeline stage passed! β
Lesson Learned: Always configure tools under the user that will use them. Jenkins runs as jenkins user, not root.
Challenge 3: Pods Stuck in CrashLoopBackOff
The Problem:
After first deployment, pods kept restarting:
kubectl get pods
NAME READY STATUS RESTARTS
bookmyshow-deployment-7d4c8b9f-8xk2p 0/1 CrashLoopBackOff 5
The Investigation:
kubectl logs bookmyshow-deployment-7d4c8b9f-8xk2p
Error: Cannot find module '/app/node_modules/express'
The Root Cause:
Dependencies weren't being installed in the Docker image properly.
The Solution:
Fixed Dockerfile:
FROM node:18-alpine
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install dependencies
RUN npm install --production
# Copy application code
COPY . .
EXPOSE 3000
CMD ["npm", "start"]
Key fix: Install dependencies before copying application code, and use --production flag.
Result: Pods started successfully! π
Lesson Learned: Always check container logs first when pods fail. They usually tell you exactly what's wrong.
Challenge 4: High AWS Costs
The Problem:
First month's AWS bill was $450+ π±
The Analysis:
Used AWS Cost Explorer and found:
EKS Control Plane: $73 (fixed)
3x t3.medium running 24/7: $150
BMS Server (t2.large) 24/7: $68
Data transfer costs: $50+
Unused EBS volumes: $30
The Solutions Implemented:
Used Spot Instances for Dev:
- Saved 70% on dev environment
Implemented Auto-scaling:
Scale down to 2 nodes during off-hours
Saved ~$40/month
Cleaned up unused resources:
Deleted old EBS volumes
Removed unused elastic IPs
Optimized data transfer:
Used CloudFront for static assets
Reduced cross-region traffic
Result: Brought costs down to $300-320/month - a 30% reduction!
Lesson Learned: Monitor costs from day one. Small optimizations add up significantly.
Challenge 5: Slow Build Times
The Problem:
Initial builds were taking 25+ minutes! π΄
The Investigation:
Looking at Jenkins pipeline stages:
Checkout: 30s
SonarQube: 3m
Install Dependencies: 18m β οΈ
Docker Build: 3m
Deploy: 1m
The Root Cause:
Installing npm dependencies every single time, even when they hadn't changed.
The Solution:
- Docker layer caching:
# Copy only package files first
COPY package*.json ./
RUN npm install
# Then copy source code
COPY . .
- Jenkins workspace caching:
stage('Install Dependencies') {
steps {
sh '''
cd bookmyshow-app
# Only install if package.json changed
if [ ! -f "node_modules/.installed" ] || \
[ package.json -nt node_modules/.installed ]; then
npm install
touch node_modules/.installed
fi
'''
}
}
Result: Build time reduced to 8-10 minutes - over 60% faster! β‘
Lesson Learned: Profile your pipeline. The slowest stage is usually where you can make the biggest improvements.
π Results & Impact
Before vs After Comparison
| Metric | Before DevOps | After DevOps | Improvement |
| Deployment Time | 2-3 hours | 12 minutes | 90% faster |
| Deployment Frequency | Once a week | Multiple times/day | 15x increase |
| Failed Deployments | 30-40% | <5% | 85% reduction |
| Mean Time to Recovery | 4-6 hours | 15 minutes | 95% faster |
| Security Scan Coverage | 0% (manual) | 100% (automated) | Complete coverage |
| System Visibility | None | Real-time | 100% observability |
| Deployment Rollback | Not possible | < 30 seconds | Instant recovery |
Real Numbers from Production
In the first month after implementation:
β 42 successful deployments (previously ~4/month)
β Zero critical security vulnerabilities in production
β 99.7% uptime (SLA target: 99.5%)
β < 200ms average response time at peak load
β Auto-scaled 127 times based on traffic patterns
Business Impact
For Development Team:
Developers can deploy changes independently
Instant feedback on code quality
No more "it works on my machine" issues
Confidence in deployment process
For Business:
Faster time to market for new features
Reduced downtime and incidents
Better user experience
Lower operational costs (after optimization)
π Lessons Learned
Technical Lessons
1. Start with Infrastructure as Code
I initially created resources manually in AWS console. Big mistake. Later had to recreate everything, and couldn't remember all settings.
Lesson: Use eksctl, Terraform, or CloudFormation from day one. Your future self will thank you.
2. Security Groups Are Tricky
Spent hours debugging connectivity issues that were just security group misconfigurations.
Lesson: Document every security group rule. Know exactly what's allowed and why.
3. Monitoring is Not Optional
Initially planned to "add monitoring later." Then had a production issue and had no idea what was happening.
Lesson: Set up basic monitoring before going to production. You can enhance it later, but you need basics from day one.
4. Test the Rollback Process
Had auto-scaling configured but never tested it until production traffic triggered it. Almost had an outage.
Lesson: Test failure scenarios in staging. Auto-scaling, pod failures, node failures - test everything.
5. Kubernetes Has a Learning Curve
Underestimated how much there is to learn. ConfigMaps, Secrets, Services, Ingress, RBAC...
Lesson: Take time to understand Kubernetes fundamentals. Don't just copy-paste manifests.
Process Lessons
1. Document Everything
When I needed to recreate my setup on a new cluster, I had to piece together commands from bash history.
Lesson: Maintain a runbook. Document every command, every configuration change.
2. Small, Incremental Changes
Initially tried to change everything at once - broke everything.
Lesson: Deploy small changes frequently. Easier to identify what broke.
3. Automated Tests Are Worth It
Skipped writing tests initially to "save time." Ended up spending more time debugging production issues.
Lesson: Write tests. They save more time than they take.
4. Communication Matters
Didn't communicate a deployment schedule. Team was confused when things changed.
Lesson: Let stakeholders know about deployments, even automated ones.
π’ Share Your Experience
Have you implemented similar DevOps practices? What challenges did you face? What solutions worked for you?
Share your experience in the comments! Let's learn from each other.