licence-normaliser: Taming licence chaos in Python

Hey, ever tried cleaning up messy license strings — like CC BY-NC-ND 4.0 or MIT License — and getting them into something tidy and machine-readable? That's exactly what licence-normaliser does, and honestly, it's a lifesaver for anyone dealing with open-source compliance or metadata.

licence-normaliser

Check out the repo here: licence-normaliser. It's a lightweight Python library that turns chaos into order using a neat three-level system: familylicenceversion. Think cccc-by-nc-ndcc-by-nc-nd-4.0.

Here's a quick demo — imagine you're scraping papers or repos and licenses come in every flavor:

from licence_normaliser import normalise_licence

result = normalise_licence("CC BY-NC-ND 4.0")
print(result.key)           # → cc-by-nc-nd-4.0
print(result.licence)       # → cc-by-nc-nd
print(result.licence.family)  # → cc

Super clean, right? It handles SPDX codes (Apache-2.0), full URLs, even sloppy prose like "creative commons attribution non-commercial no derivatives". And for Creative Commons fans — yes, it knows all the variants, including the weird IGO ones.

Look at these badges to see what it normalizes:

Creative Commons license badges showing the normalized family and variant icons

Or the compatibility chart if you're remixing stuff:

License compatibility chart with checkmarks and crosses for remixing different licenses

What makes it robust? Everything's file-driven — aliases, patterns, URLs live in JSON, so you add new synonyms without touching code. Want strictness? Pass strict=True and it'll raise an error if it can't match. Debugging? Use --explain or trace=True to see the whole resolution path.

Install's dead simple:

pip install licence-normaliser

(or uv pip install if you're fancy).

CLI's handy too:

licence-normaliser normalise "MIT"          # → mit
licence-normaliser batch "Apache-2.0" "CC BY 4.0"

It's got only two stars right now — probably because it's niche — but if you're building anything with license detection (think ScanCode integration, repo crawlers, academic tools), this quietly solves a headache. It gets updated via CLI data pulls from SPDX, OSI, Creative Commons... no manual hassle.

Bottom line: if licenses are your mess, normalise them. This tool just works.

Originally published as GitHub Gist #93cafc05616758479eed6377f6593246

social