Hey, ever tried cleaning up messy license strings — like CC BY-NC-ND 4.0 or MIT License — and getting them into something tidy and machine-readable?
That's exactly what licence-normaliser does, and honestly, it's a lifesaver for anyone dealing with open-source compliance or metadata.
Check out the repo here: licence-normaliser.
It's a lightweight Python library that turns chaos into order using a neat three-level system:
family → licence → version. Think cc → cc-by-nc-nd → cc-by-nc-nd-4.0.
Here's a quick demo — imagine you're scraping papers or repos and licenses come in every flavor:
from licence_normaliser import normalise_licence
result = normalise_licence("CC BY-NC-ND 4.0")
print(result.key) # → cc-by-nc-nd-4.0
print(result.licence) # → cc-by-nc-nd
print(result.licence.family) # → cc
Super clean, right? It handles SPDX codes (Apache-2.0), full URLs, even sloppy prose like
"creative commons attribution non-commercial no derivatives". And for Creative Commons fans — yes,
it knows all the variants, including the weird IGO ones.
Look at these badges to see what it normalizes:
Or the compatibility chart if you're remixing stuff:
What makes it robust? Everything's file-driven — aliases, patterns, URLs live in JSON, so you add new
synonyms without touching code. Want strictness? Pass strict=True and it'll raise an error if it
can't match. Debugging? Use --explain or trace=True to see the whole resolution path.
Install's dead simple:
pip install licence-normaliser
(or uv pip install if you're fancy).
CLI's handy too:
licence-normaliser normalise "MIT" # → mit
licence-normaliser batch "Apache-2.0" "CC BY 4.0"
It's got only two stars right now — probably because it's niche — but if you're building anything
with license detection (think ScanCode integration, repo crawlers, academic tools), this quietly
solves a headache. It gets updated via CLI data pulls from SPDX, OSI, Creative Commons...
no manual hassle.
Bottom line: if licenses are your mess, normalise them. This tool just works.
Originally published as GitHub Gist #93cafc05616758479eed6377f6593246