Automating Vulnerability Reporting with Python and Pandas

Every week, security teams produce reports. Which systems have open vulnerabilities? Which are past SLA? Which business units are the worst offenders? These reports matter — they drive remediation prioritisation and land in front of senior leadership.

They’re also, in most organisations, still produced manually.

Someone exports a CSV from a scanning tool, opens it in Excel, applies some filters, pastes numbers into a slide deck, and sends it around. The process takes hours. It introduces errors. And when the stakeholder asks “can you filter this by region?” the whole thing starts again.

When I joined Citi’s Vulnerability Threat Management team, I built a Python pipeline to replace that process. Here’s the approach.

The core problem: multiple sources, inconsistent formats

Vulnerability data at an enterprise scale comes from multiple scanning tools. Each tool exports data in its own format — different column names, different severity classifications, different date formats. Before you can report on anything, you have to normalise.

Pandas is the right tool for this. The workflow:

import pandas as pd

# Load from multiple sources
tool_a = pd.read_csv("scanner_a_export.csv")
tool_b = pd.read_excel("scanner_b_export.xlsx")

# Normalise column names
tool_a = tool_a.rename(columns={
    "vuln_name": "vulnerability",
    "sev": "severity",
    "disc_date": "discovered_date"
})

# Normalise severity levels (different tools use different scales)
severity_map = {"CRIT": "Critical", "HIGH": "High", "MED": "Medium", "LOW": "Low"}
tool_a["severity"] = tool_a["severity"].map(severity_map)

# Combine
df = pd.concat([tool_a, tool_b], ignore_index=True)

Tedious, but once it’s written it’s consistent. No manual normalisation, no human errors in the mapping.

Calculating SLA compliance

Most vulnerability management programmes define remediation SLAs by severity: Critical within 15 days, High within 30, Medium within 90. Tracking compliance is a core reporting requirement.

from datetime import datetime

sla_days = {"Critical": 15, "High": 30, "Medium": 90, "Low": 180}

df["discovered_date"] = pd.to_datetime(df["discovered_date"])
df["sla_deadline"] = df.apply(
    lambda row: row["discovered_date"] + pd.Timedelta(days=sla_days.get(row["severity"], 180)),
    axis=1
)
df["days_overdue"] = (datetime.today() - df["sla_deadline"]).dt.days.clip(lower=0)
df["breached_sla"] = df["days_overdue"] > 0

Now every vulnerability record knows whether it’s within SLA and by how many days it’s overdue. Aggregating this by business unit, asset type, or geography is a single groupby.

From pipeline to dashboard

The pipeline feeds a Streamlit dashboard. Rather than producing a static report, stakeholders get an interactive view: filter by severity, business unit, date range. Drill down from a summary count to the specific vulnerabilities driving it.

The dashboard refreshes from the latest data on load. There’s no “report production” step. When leadership asks a question, the answer is already there.

What this replaced

The old process: ~3 hours of manual work per weekly report cycle, error-prone copy-paste, no drill-down capability, stale by the time it was read.

The new process: run one script, open a browser tab.

The interesting thing is that the Python and Pandas skills involved aren’t advanced. This is read_csv, rename, merge, groupby, to_datetime. Standard toolkit. The value isn’t in the complexity of the code — it’s in having code at all, instead of a manual process that degrades every time someone is in a hurry.

If you’re on a security team still producing reports manually, this is low-hanging fruit. The tools are free, the skills are learnable, and the time savings compound every single week.