Back to blog
Jan 20, 2025
4 min read

Automating Vulnerability Reporting with Python and Pandas

Manual security reporting is slow, error-prone, and nobody's favourite Friday afternoon task. Here's how I replaced it with a Python pipeline at Citi.

Every week, security teams produce reports. Which systems have open vulnerabilities? Which are past SLA? Which business units are the worst offenders? These reports matter — they drive remediation prioritisation and land in front of senior leadership.

They’re also, in most organisations, still produced manually.

Someone exports a CSV from a scanning tool, opens it in Excel, applies some filters, pastes numbers into a slide deck, and sends it around. The process takes hours. It introduces errors. And when the stakeholder asks “can you filter this by region?” the whole thing starts again.

When I joined Citi’s Vulnerability Threat Management team, I built a Python pipeline to replace that process. Here’s the approach.

The core problem: multiple sources, inconsistent formats

Vulnerability data at an enterprise scale comes from multiple scanning tools. Each tool exports data in its own format — different column names, different severity classifications, different date formats. Before you can report on anything, you have to normalise.

Pandas is the right tool for this. The workflow:

import pandas as pd

# Load from multiple sources
tool_a = pd.read_csv("scanner_a_export.csv")
tool_b = pd.read_excel("scanner_b_export.xlsx")

# Normalise column names
tool_a = tool_a.rename(columns={
    "vuln_name": "vulnerability",
    "sev": "severity",
    "disc_date": "discovered_date"
})

# Normalise severity levels (different tools use different scales)
severity_map = {"CRIT": "Critical", "HIGH": "High", "MED": "Medium", "LOW": "Low"}
tool_a["severity"] = tool_a["severity"].map(severity_map)

# Combine
df = pd.concat([tool_a, tool_b], ignore_index=True)

Tedious, but once it’s written it’s consistent. No manual normalisation, no human errors in the mapping.

Calculating SLA compliance

Most vulnerability management programmes define remediation SLAs by severity: Critical within 15 days, High within 30, Medium within 90. Tracking compliance is a core reporting requirement.

from datetime import datetime

sla_days = {"Critical": 15, "High": 30, "Medium": 90, "Low": 180}

df["discovered_date"] = pd.to_datetime(df["discovered_date"])
df["sla_deadline"] = df.apply(
    lambda row: row["discovered_date"] + pd.Timedelta(days=sla_days.get(row["severity"], 180)),
    axis=1
)
df["days_overdue"] = (datetime.today() - df["sla_deadline"]).dt.days.clip(lower=0)
df["breached_sla"] = df["days_overdue"] > 0

Now every vulnerability record knows whether it’s within SLA and by how many days it’s overdue. Aggregating this by business unit, asset type, or geography is a single groupby.

From pipeline to dashboard

The pipeline feeds a Streamlit dashboard. Rather than producing a static report, stakeholders get an interactive view: filter by severity, business unit, date range. Drill down from a summary count to the specific vulnerabilities driving it.

The dashboard refreshes from the latest data on load. There’s no “report production” step. When leadership asks a question, the answer is already there.

What this replaced

The old process: ~3 hours of manual work per weekly report cycle, error-prone copy-paste, no drill-down capability, stale by the time it was read.

The new process: run one script, open a browser tab.

The interesting thing is that the Python and Pandas skills involved aren’t advanced. This is read_csv, rename, merge, groupby, to_datetime. Standard toolkit. The value isn’t in the complexity of the code — it’s in having code at all, instead of a manual process that degrades every time someone is in a hurry.

If you’re on a security team still producing reports manually, this is low-hanging fruit. The tools are free, the skills are learnable, and the time savings compound every single week.