Every week, security teams produce reports. Which systems have open vulnerabilities? Which are past SLA? Which business units are the worst offenders? These reports matter — they drive remediation prioritisation and land in front of senior leadership.
They’re also, in most organisations, still produced manually.
Someone exports a CSV from a scanning tool, opens it in Excel, applies some filters, pastes numbers into a slide deck, and sends it around. The process takes hours. It introduces errors. And when the stakeholder asks “can you filter this by region?” the whole thing starts again.
When I joined Citi’s Vulnerability Threat Management team, I built a Python pipeline to replace that process. Here’s the approach.
The core problem: multiple sources, inconsistent formats
Vulnerability data at an enterprise scale comes from multiple scanning tools. Each tool exports data in its own format — different column names, different severity classifications, different date formats. Before you can report on anything, you have to normalise.
Pandas is the right tool for this. The workflow:
import pandas as pd
# Load from multiple sources
tool_a = pd.read_csv("scanner_a_export.csv")
tool_b = pd.read_excel("scanner_b_export.xlsx")
# Normalise column names
tool_a = tool_a.rename(columns={
"vuln_name": "vulnerability",
"sev": "severity",
"disc_date": "discovered_date"
})
# Normalise severity levels (different tools use different scales)
severity_map = {"CRIT": "Critical", "HIGH": "High", "MED": "Medium", "LOW": "Low"}
tool_a["severity"] = tool_a["severity"].map(severity_map)
# Combine
df = pd.concat([tool_a, tool_b], ignore_index=True)
Tedious, but once it’s written it’s consistent. No manual normalisation, no human errors in the mapping.
Calculating SLA compliance
Most vulnerability management programmes define remediation SLAs by severity: Critical within 15 days, High within 30, Medium within 90. Tracking compliance is a core reporting requirement.
from datetime import datetime
sla_days = {"Critical": 15, "High": 30, "Medium": 90, "Low": 180}
df["discovered_date"] = pd.to_datetime(df["discovered_date"])
df["sla_deadline"] = df.apply(
lambda row: row["discovered_date"] + pd.Timedelta(days=sla_days.get(row["severity"], 180)),
axis=1
)
df["days_overdue"] = (datetime.today() - df["sla_deadline"]).dt.days.clip(lower=0)
df["breached_sla"] = df["days_overdue"] > 0
Now every vulnerability record knows whether it’s within SLA and by how many days it’s overdue. Aggregating this by business unit, asset type, or geography is a single groupby.
From pipeline to dashboard
The pipeline feeds a Streamlit dashboard. Rather than producing a static report, stakeholders get an interactive view: filter by severity, business unit, date range. Drill down from a summary count to the specific vulnerabilities driving it.
The dashboard refreshes from the latest data on load. There’s no “report production” step. When leadership asks a question, the answer is already there.
What this replaced
The old process: ~3 hours of manual work per weekly report cycle, error-prone copy-paste, no drill-down capability, stale by the time it was read.
The new process: run one script, open a browser tab.
The interesting thing is that the Python and Pandas skills involved aren’t advanced. This is read_csv, rename, merge, groupby, to_datetime. Standard toolkit. The value isn’t in the complexity of the code — it’s in having code at all, instead of a manual process that degrades every time someone is in a hurry.
If you’re on a security team still producing reports manually, this is low-hanging fruit. The tools are free, the skills are learnable, and the time savings compound every single week.