Skip to main content

Why Python When You Already Know Shell Scripting? A DevOps Engineer's Honest Answer

·2333 words·11 mins
Hemanth Kumar Motukuri
Author
Hemanth Kumar Motukuri
DevOps & MLOps | Kubernetes | AWS
Python Zero to MLOps - This article is part of a series.
Part 1: This Article

Who this is for: DevOps engineers who are comfortable with Bash and Shell scripting but are not sure whether Python is worth learning. This post does not assume any Python knowledge.


The Honest Question
#

You can already write a Bash script that:

  • Restarts a failed service
  • Parses a log file for errors
  • Deploys code to a server
  • Sends an alert to Slack

So why would you spend weeks learning Python?

The short answer: Shell scripting tells the computer what to do. Python tells the computer how to think.

Let us break that down with real examples.


What Shell Scripting Is Great At
#

Shell scripting is not going anywhere. It is the right tool for many things:

  • Quick one-linersgrep "ERROR" app.log | wc -l
  • File operations — moving, renaming, compressing files
  • Running commands — starting services, running executables
  • Simple automation — cron jobs, startup scripts
  • Gluing tools together — piping the output of one command into another

If your task is “run these commands in order”, Bash is perfect. It is already on every Linux server. No installation needed. No dependencies.

Shell scripting is the glue of Linux systems.


Where Shell Scripting Starts to Struggle
#

Bash becomes difficult when your task involves:

Complex logic — deeply nested if statements become unreadable fast.

# Bash — checking if a JSON API response has an error
response=$(curl -s "https://api.example.com/health")
status=$(echo "$response" | python3 -c "import sys,json; print(json.load(sys.stdin)['status'])")
# You already needed Python to parse the JSON

Working with data — Bash has no real data structures. Parsing CSV files or JSON with awk, sed, and grep works, but the code becomes long and fragile.

Error handling — in Bash, a failed command often silently moves to the next line unless you add set -e or || exit 1 everywhere.

Reusability — copying functions between Bash scripts is awkward. There is no proper module system or package manager.

Testing — writing automated tests for Bash scripts is possible but painful. Almost nobody does it consistently.

This is not a criticism of Bash. Bash was designed to run commands, not to be a general-purpose programming language. Python was designed to be a general-purpose programming language that can also run commands.


The Core Difference — A Side by Side Comparison
#

Task: Count errors per service from a log file
#

Imagine your log file looks like this:

2026-04-20 08:12:01 api ERROR Connection timeout
2026-04-20 08:12:05 db  INFO  Query completed
2026-04-20 08:12:09 api ERROR Rate limit exceeded
2026-04-20 08:12:14 api INFO  Request completed
2026-04-20 08:12:18 db  ERROR Deadlock detected

In Bash:

#!/bin/bash
echo "Errors by service:"
awk '$4 == "ERROR" {print $3}' app.log | sort | uniq -c | sort -rn

This works. It is fast to write. But now the requirement changes — you also need to:

  • Calculate the average time between errors
  • Send the result as a JSON POST to a monitoring API
  • Only count errors from the last 2 hours

The Bash script grows into something only the original author can understand.

In Python:

import pandas as pd
from datetime import datetime, timedelta
import requests

# Load the log file
df = pd.read_csv("app.log", sep=" ",
                 names=["date", "time", "service", "level", "message"])

# Parse timestamps
df["timestamp"] = pd.to_datetime(df["date"] + " " + df["time"])

# Filter to last 2 hours only
two_hours_ago = datetime.now() - timedelta(hours=2)
recent = df[df["timestamp"] > two_hours_ago]

# Count errors per service
error_counts = recent[recent["level"] == "ERROR"].groupby("service").size()

# Build the JSON report
report = {
    "generated_at": datetime.now().isoformat(),
    "errors": error_counts.to_dict()
}

# Send to monitoring API
requests.post("https://monitoring.internal/api/report", json=report)

print("Report sent successfully")

Each line does exactly one thing. You can read it like a sentence. When the requirement changes again next month, any team member can update it.


What Makes Python Special for DevOps Engineers
#

1. You Already Think Like a Programmer
#

DevOps engineers write logic every day:

  • “If the health check fails three times, restart the service”
  • “For each server in the inventory, run this command”
  • “While the deployment is not complete, check every 10 seconds”

That is exactly how Python reads. The translation from your DevOps thinking to Python code is more natural than you expect.

# This reads almost like the sentence above
for server in inventory:
    if not health_check(server):
        restart_service(server)

2. Libraries — Someone Else Already Did the Hard Part
#

This is the biggest advantage of Python over Bash.

When you write Bash and need to talk to AWS, you call the aws CLI and parse text output. When you write Python, you use the boto3 library — which gives you a full Python interface to every AWS service, with proper error handling and data structures built in.

Here is a real comparison:

Bash — list all stopped EC2 instances:

aws ec2 describe-instances \
  --filters "Name=instance-state-name,Values=stopped" \
  --query 'Reservations[*].Instances[*].[InstanceId,Tags[?Key==`Name`].Value|[0]]' \
  --output text

Python — same task:

import boto3

ec2 = boto3.client("ec2")
response = ec2.describe_instances(
    Filters=[{"Name": "instance-state-name", "Values": ["stopped"]}]
)

for reservation in response["Reservations"]:
    for instance in reservation["Instances"]:
        name = next((t["Value"] for t in instance.get("Tags", [])
                     if t["Key"] == "Name"), "unnamed")
        print(f"{instance['InstanceId']}{name}")

The Python version is longer — but it is structured data you can filter, sort, and send to another function. The Bash version gives you text you have to parse again.

3. Python Can Call Shell Commands Too
#

You do not have to choose between Python and Bash. Python has a built-in module called subprocess that lets you run any shell command from inside Python.

import subprocess

# Run a shell command and capture its output
result = subprocess.run(
    ["kubectl", "get", "pods", "-n", "production"],
    capture_output=True,
    text=True
)

if result.returncode == 0:
    print(result.stdout)
else:
    print(f"Command failed: {result.stderr}")

This means you can:

  • Use Bash for the parts Bash is fast at (running commands)
  • Use Python for the parts Python is good at (logic, data, APIs)
  • Combine them in the same script

4. Error Handling That Actually Works
#

In Bash, set -e makes the script stop on any error. But it is all-or-nothing — you cannot easily say “if this specific command fails, try this alternative, then log it, then continue.”

In Python, you handle each error type separately and respond intelligently:

def deploy_to_server(server, package):
    try:
        upload(package, server)
        restart_service(server)
        verify_health(server)
        print(f"Deployment to {server} successful")

    except ConnectionError:
        # Network issue — add to retry queue, keep going
        retry_queue.append(server)
        print(f"Network error on {server} — will retry")

    except HealthCheckError:
        # Deployed but unhealthy — rollback immediately
        rollback(server)
        alert_team(f"Health check failed on {server} — rolled back")

    except Exception as e:
        # Unknown error — log it and skip this server
        log_error(server, e)

Each failure mode gets its own response. Your pipeline keeps running even when individual servers have problems.

5. Testing Your Automation
#

In Bash, testing a script usually means running it manually and checking the output.

In Python, you write automated tests that run in your CI/CD pipeline on every commit:

import pytest
from your_script import calculate_error_rate

def test_error_rate_is_zero_when_no_errors():
    logs = [{"level": "INFO"}, {"level": "INFO"}]
    assert calculate_error_rate(logs) == 0.0

def test_error_rate_is_correct():
    logs = [{"level": "ERROR"}, {"level": "INFO"}, {"level": "ERROR"}]
    assert calculate_error_rate(logs) == 66.7

Now every time someone changes your automation script, the tests run automatically. If a change breaks something, the pipeline catches it before it reaches production.


Real DevOps Scenarios Where Python Wins
#

Scenario 1 — Automated Infrastructure Audit
#

The task: Every morning, check all your AWS resources, find anything that violates company policy (like S3 buckets with public access, or EC2 instances without tags), and send a Slack report.

Why Python: You need to call AWS APIs, compare results against policy rules, format a structured report, and POST it to Slack. This is three API calls, conditional logic, and data formatting — exactly what Python is built for.

import boto3
import requests

def audit_s3_buckets():
    s3 = boto3.client("s3")
    violations = []

    buckets = s3.list_buckets()["Buckets"]

    for bucket in buckets:
        name = bucket["Name"]

        # Check for public access
        try:
            acl = s3.get_bucket_acl(Bucket=name)
            for grant in acl["Grants"]:
                if "AllUsers" in str(grant):
                    violations.append(f"PUBLIC BUCKET: {name}")
        except Exception:
            pass

        # Check for missing tags
        try:
            tags = s3.get_bucket_tagging(Bucket=name)
        except s3.exceptions.ClientError:
            violations.append(f"NO TAGS: {name}")

    return violations

violations = audit_s3_buckets()

if violations:
    message = "Infrastructure violations found:\n" + "\n".join(violations)
    requests.post(SLACK_WEBHOOK, json={"text": message})

Scenario 2 — Kubernetes Pod Health Monitor
#

The task: Watch your Kubernetes cluster. If any pod has been in CrashLoopBackOff for more than 5 minutes, automatically grab its logs, parse them for the root cause, and create a Jira ticket with the details.

Why Python: The official Kubernetes Python client (kubernetes library) gives you a full Python interface to the cluster. Parsing logs and calling the Jira API are both natural Python tasks.

from kubernetes import client, config
import requests
from datetime import datetime, timezone

config.load_kube_config()
v1 = client.CoreV1Api()

def find_crashing_pods():
    crashing = []
    pods = v1.list_pod_for_all_namespaces()

    for pod in pods.items:
        for status in pod.status.container_statuses or []:
            waiting = status.state.waiting
            if waiting and waiting.reason == "CrashLoopBackOff":
                crashing.append({
                    "name": pod.metadata.name,
                    "namespace": pod.metadata.namespace,
                    "restarts": status.restart_count
                })

    return crashing

crashing_pods = find_crashing_pods()

for pod in crashing_pods:
    # Get the logs
    logs = v1.read_namespaced_pod_log(
        name=pod["name"],
        namespace=pod["namespace"],
        tail_lines=50
    )
    print(f"Pod {pod['name']} is crashing. Last 50 log lines captured.")
    # Then send to Jira, Slack, PagerDuty, etc.

Scenario 3 — CI/CD Pipeline Data Collector
#

The task: Pull build data from your Jenkins or GitHub Actions API, calculate success rates per team, identify which services have the most failing builds, and generate a weekly dashboard.

Why Python: Jenkins has a REST API. GitHub has a REST API. Combining data from two APIs, calculating stats, and generating a report is a 20-line Python script. The equivalent Bash script would be 150 lines of fragile curl and jq parsing.

import requests
import pandas as pd
from datetime import datetime, timedelta

GITHUB_TOKEN = "your_token"
ORG = "your_org"

headers = {"Authorization": f"token {GITHUB_TOKEN}"}

def get_workflow_runs(repo):
    url = f"https://api.github.com/repos/{ORG}/{repo}/actions/runs"
    response = requests.get(url, headers=headers)
    return response.json()["workflow_runs"]

repos = ["api-service", "db-service", "cache-service"]
all_runs = []

for repo in repos:
    runs = get_workflow_runs(repo)
    for run in runs:
        all_runs.append({
            "repo": repo,
            "status": run["conclusion"],
            "created": run["created_at"]
        })

df = pd.DataFrame(all_runs)
success_rate = df.groupby("repo").apply(
    lambda x: (x["status"] == "success").mean() * 100
).round(1)

print("Build success rate by service:")
print(success_rate)

Scenario 4 — Automated Runbook Execution
#

The task: When PagerDuty sends an alert, automatically run a diagnostic runbook — check the database connections, check the cache hit rate, check the API error rate — and reply to the alert with all findings before a human even looks at it.

Why Python: This involves receiving a webhook, calling three different APIs, collecting structured data, and sending a formatted response. Python handles all of this in a clean, readable way. FastAPI makes receiving the webhook a 10-line server.

from fastapi import FastAPI
import httpx

app = FastAPI()

@app.post("/webhook/pagerduty")
async def handle_alert(payload: dict):
    service = payload["messages"][0]["incident"]["service"]["name"]

    # Run diagnostics in parallel
    async with httpx.AsyncClient() as client:
        db_health    = await client.get(f"http://db-service/health")
        cache_stats  = await client.get(f"http://cache-service/stats")
        api_errors   = await client.get(f"http://api-service/metrics/errors")

    # Compile findings
    findings = {
        "service": service,
        "db_connected": db_health.json()["connected"],
        "cache_hit_rate": cache_stats.json()["hit_rate"],
        "error_rate_5m": api_errors.json()["rate_5m"]
    }

    # Post findings back to PagerDuty
    # ...

    return {"status": "diagnostics complete", "findings": findings}

When to Use Shell vs Python — The Simple Rule
#

Use this as your guide:

SituationUse
Run a few commands in sequenceBash
Quick one-liner on the terminalBash
Startup scripts, cron jobsBash
Parse JSON or YAMLPython
Call an APIPython
Work with CSV or log filesPython
Complex conditional logicPython
Send data to multiple systemsPython
Anything that needs testingPython
Anything another person will maintainPython
Kubernetes / AWS / cloud automationPython
Building an internal tool or dashboardPython

The honest answer is: most real DevOps work today falls in the Python column.


What Changes When You Add Python to Your Skills
#

Here is what actually shifts in your daily work once you are comfortable in Python:

You stop avoiding complex tasks. That audit script you kept putting off because the Bash would be too messy? Now you write it in a morning.

Your automation becomes maintainable. Six months later, you can read your own scripts. Your teammates can read them too. They can add features without breaking everything.

You can talk to data scientists. MLOps is a collaboration between DevOps and data science. When a data scientist hands you a Python training script, you can read it, run it, containerize it, and build a pipeline around it — instead of treating it as a black box.

You become harder to replace. Bash is a commodity skill on a DevOps team. Python + infrastructure knowledge + ML pipeline experience is a combination that is in very high demand in 2026.

You can build internal tools. That dashboard someone always wanted? The Slack bot that answers infrastructure questions? The automated runbook system? Python makes these achievable by one engineer in a few days.


The Practical Starting Point
#

You do not need to rewrite your Bash scripts in Python. You do not need to stop using Bash.

Start here: the next time you reach for Bash and the task involves a JSON API, a CSV file, or more than two nested if statements — write it in Python instead.

That is how the switch happens naturally. One script at a time.

The rest of this series will give you everything you need to make that switch confidently.


What Comes Next
#

Now that you understand why Python, the next post covers how to set it up properly — installing Python the right way, setting up your environment, and writing your first script in a terminal.

After setup, we go straight into Phase 0: the core Python concepts every DevOps engineer needs to know — explained without data science jargon, with examples from real infrastructure work.

One thing to do right now: Think of one Bash script you wrote in the last month that was painful to write or hard to read. Keep it in mind as you go through this series. By Phase 2, you will rewrite it in Python — and it will be half the length and twice as clear.

Python Zero to MLOps - This article is part of a series.
Part 1: This Article