▲Analysis of supply-chain attack on Ultralyticsblog.pypi.org

98 points by SethMLarson 206 days ago | 31 comments

dlor 203 days ago [-]

Really cool to see all the hard work on Trusted Publishing and Sigstore pay off here. As a reminder, these tools were never meant to prevent attacks like this, only to make them easier to detect, harder to hide, and easier to recover from.

theteapot 192 days ago [-]

Just getting around to looking at this. There is a certificate in sigstore for the 8.3.41 that claims the package is a build of cb260c243ffa3e0cc84820095cd88be2f5db86ca -- https://search.sigstore.dev/?logIndex=153415340. But it isn't. The package content differ from the content of that commit. This doesn't seem like something that's working that well.

ronjouch 203 days ago [-]

Good recommendations, including a neat tool to audit your GHAs: https://github.com/woodruffw/zizmor , “A static analysis tool for GitHub Actions”.

clbrmbr 203 days ago [-]

As a user of PyPI, what’s a best practice to protect against compromised libraries?

I fear that freezing the version number is inadequate because attackers (who don’t forget, control the dependency) could change the git tag and redeploy a commonly used version with different code.

Is it really viable to use hashes to lock the requirements.txt?

woodruffw 203 days ago [-]

Release files on PyPI are immutable: an attacker can’t overwrite a pre-existing file for a version. So if you pin to an exact version, you are (in principle) protected from downloading a new malicious one.

The main caveat to the above is that files are immutable on PyPI, but releases are not. So an attacker can’t overwrite an existing file (or delete and replace one), but they can always add a more specific distribution to a release if one doesn’t already exist. In practice, this means that a release that doesn’t have an arm64 wheel (for example) could have one uploaded to it.

TL;DR: pinning to a version is suitable for most settings; pinning to the exact set of hashes for that version’s file will prevent new files from being added to that version without you knowing.

TZubiri 203 days ago [-]

The best practice is to reduce your dependencies.

Trim your requirements.txt

HeatrayEnjoyer 203 days ago [-]

Your software should execute as little code written outside your offices as possible.

onei 203 days ago [-]

That seems like short-sighted advice. My company probably isn't paying me to write crypto, web frameworks, database drivers, etc. If it's not where I'm adding business value, I would generally try to use a third-party solution, assuming there's no stdlib equivalent. That likely means my code is an overwhelming minority of what gets executed.

If C dominates your codebase or you're squeezing out every inch of performance, then sure, you may well have written everything libc is missing. In Python, or another language that had a thriving ecosystem of third-party packages, it seems wasteful to write it all in-house.

hansvm 203 days ago [-]

They aren't paying you to integrate a bunch of third-party dependencies either, especially not when you could be using the time to generate actual business value.

The specific examples you listed are usually fine for generic SAAS companies (I'd usually object to a "full" web framework), but advice of the flavor "most code should be your own" is advocating for a transitive dependency list you can actually understand.

Anecdotally, by far the worst bugs I've ever had to triage were all in 3rd-party frameworks or in the mess created by adapting the code the business cares about into the shape a library demands (impedence mismatches). They're also the nastiest to fix since you don't own the code and are faced with a slow update schedule, forking, writing it yourself _anyway_ (and now probably in the impedence-mismatched API you used to talk to the last version instead of what your application actually wants), or adding an extra layer of hacks to insulate yourself from the problem.

That, combined with just how easy it is to write most software a business needs, pushes me to avoid most dependencies. It's really freeing to own enough of the code that when somebody asks for a new feature you can immediately put the right code in the right spot and generate business value instead of fighting with this or that framework.

TZubiri 203 days ago [-]

"They aren't paying you to integrate a bunch of third-party dependencies either, especially not when you could be using the time to generate actual business value."

They might, but in my experience, it's bottom of the barrel clients playing out of their league. Example, a single store that is using shopify and wants to migrate to their own website because the fees are too high, might pay 500-1000$ for you to build something with wordpress and woocommerce, or worse, a mysql react website.

TZubiri 203 days ago [-]

It's a fine balance.

You win most of the time, until you get log4jed or left-padded. Then my company survives you.

Also I might win even without vulns. I don't write frameworks, I just write the service or website directly. And less abstractions and 3rd party code can mean more quality.

TZubiri 203 days ago [-]

Especially those without a commercial contract. I'm fine paying for an api

but what is unprofessional is installing random stuff from github.com/GuyProgrammer/Project78 with an anime girl as a profile pic.

luismedel 203 days ago [-]

It surprises me how much companies rely on that kind of projects without 1) making a proper assessment and 2) cloning the project to ensure it isn't tampered in the future.

TZubiri 203 days ago [-]

Not only do they not clone projects or freeze their dependencies, but they are pressured to constantly update to the latest version to avoid vulnerabilities ( while introducing risk of new ones)

pabs3 200 days ago [-]

Download the libraries' real source repos, apply static analysis tools, audit the source code manually, then build wheels from source instead of using prebuilt stuff from PyPI. Repeat for every update of every library. Publish your audits using crev, so others can benefit from them. Push the Python community to think about Reproducible Builds and Bootstrappable Builds.

https://github.com/crev-dev/ https://reproducible-builds.org/ https://bootstrappable.org/

thangngoc89 203 days ago [-]

This is where tools like poetry, uv with lock files shine. The lock files contains all transient dependencies (like pip freeze) but they do it automatically.

d0mine 203 days ago [-]

Are you sure pypi allows to modify old published package?

Lock files may contain hashes.

koromak 203 days ago [-]

Anyone know of a tool like zizmor for GitLab CI/CD? Pretty confident my setup is unsafe after reading through this.

Honestly safety in CI/CD seems near impossible anyways.

pabs3 200 days ago [-]

There is some linting available:

https://docs.gitlab.com/ee/ci/yaml/lint.html

Personally I'd move as much logic out of the YAML as possible into either pure shell scripts or scripts in other languages. Then use shellcheck other appropriate linters for those scripts.

Maybe one day someone will write a proper linter for the shell-wrapped-in-yaml insanity that are these CI systems, but it seems unlikely.

romanows 203 days ago [-]

So the Python package `ultralytics` had their GitHub CI/CD pipeline compromised which allowed an attack to be inserted and then published on PyPI?

thangngoc89 203 days ago [-]

Attacker sent a PR to the ultralytics repository that triggered Github CI. This results in 1) attacker trigger new version publication on the CI itself 2) attacker was able to obtain secrets token for publish to PyPi

Hilift 203 days ago [-]

Sadly, popular open source projects are vulnerable to this vector. A popular package that is adopted by a large vendor (Redhat/Microsoft) may see a PR from months or a year ago materialize in their product update pipeline. That is too easy to weaponize so that it doesn't manifest until needed or in a different environment.

amelius 203 days ago [-]

Question. Are there white-hat hackers out there who pen-test the Python ecosystem on a regular basis?

ashishbijlani 203 days ago [-]

We scan PyPI packages regularly for malware to provide a private registry of vetted packages.

The tech is open-sourced: Packj [1]. It uses static+dynamic code/behavioral analysis to scan for indicators of compromise (e.g., spawning of shell, use of SSH keys, network communication, use of decode+eval, etc). It also checks for several metadata attributes to detect impersonating packages (typo squatting).

1. https://github.com/ossillate-inc/packj

amelius 203 days ago [-]

If the tech is open-sourced, then an attacker can keep trying in private until they find an exploit, and then use it.

Also, you only know if your security measures work if you test them. I'd feel much safer if there was regular pen-testing by security researchers. We're talking about potential threats from nation state actors here.

ashishbijlani 203 days ago [-]

> If the tech is open-sourced, then an attacker can keep trying in private until they find an exploit, and then use it.

So you'd rather assume that if something is obscure, it is secure?

amelius 202 days ago [-]

I'm just pointing out a huge downside of the approach and that more measures such as pen testing are really needed. I don't want to be right, I want a secure PyPI <3

orf 202 days ago [-]

I maintain a project that mirrors all the code published to PyPi into a series of GitHub repositories, allowing automated scanning and analysis.

https://github.com/pypi-data

zvr 200 days ago [-]

Thank you for that! It's been very useful!

cleanerbob 203 days ago [-]

[dead]

203 days ago [-]

amelius 203 days ago [-]

> What can you do as a publisher to the Python Package Index?

Does PyPI rate publishers based on how well they comply to these rules? Can users see which publishers are more reliable than others?

JimmyWilliams1 203 days ago [-]

I appreciate PyPI's transparency and the proactive measures to mitigate future risks. Are there plans to further educate developers on secure workflow practices to prevent similar incidents? This seems like a vital area for community collaboration and awareness.

Loading comments...

dlor 203 days ago [-]

theteapot 192 days ago [-]

ronjouch 203 days ago [-]

Good recommendations, including a neat tool to audit your GHAs: https://github.com/woodruffw/zizmor , “A static analysis tool for GitHub Actions”.

clbrmbr 203 days ago [-]

As a user of PyPI, what’s a best practice to protect against compromised libraries?

I fear that freezing the version number is inadequate because attackers (who don’t forget, control the dependency) could change the git tag and redeploy a commonly used version with different code.

Is it really viable to use hashes to lock the requirements.txt?

woodruffw 203 days ago [-]

TL;DR: pinning to a version is suitable for most settings; pinning to the exact set of hashes for that version’s file will prevent new files from being added to that version without you knowing.

TZubiri 203 days ago [-]

The best practice is to reduce your dependencies.

Trim your requirements.txt

HeatrayEnjoyer 203 days ago [-]

Your software should execute as little code written outside your offices as possible.

onei 203 days ago [-]

hansvm 203 days ago [-]

They aren't paying you to integrate a bunch of third-party dependencies either, especially not when you could be using the time to generate actual business value.

TZubiri 203 days ago [-]

"They aren't paying you to integrate a bunch of third-party dependencies either, especially not when you could be using the time to generate actual business value."

TZubiri 203 days ago [-]

It's a fine balance.

You win most of the time, until you get log4jed or left-padded. Then my company survives you.

Also I might win even without vulns. I don't write frameworks, I just write the service or website directly. And less abstractions and 3rd party code can mean more quality.

TZubiri 203 days ago [-]

Especially those without a commercial contract. I'm fine paying for an api

but what is unprofessional is installing random stuff from github.com/GuyProgrammer/Project78 with an anime girl as a profile pic.

luismedel 203 days ago [-]

It surprises me how much companies rely on that kind of projects without 1) making a proper assessment and 2) cloning the project to ensure it isn't tampered in the future.

TZubiri 203 days ago [-]

Not only do they not clone projects or freeze their dependencies, but they are pressured to constantly update to the latest version to avoid vulnerabilities ( while introducing risk of new ones)

pabs3 200 days ago [-]

https://github.com/crev-dev/ https://reproducible-builds.org/ https://bootstrappable.org/

thangngoc89 203 days ago [-]

This is where tools like poetry, uv with lock files shine. The lock files contains all transient dependencies (like pip freeze) but they do it automatically.

d0mine 203 days ago [-]

Are you sure pypi allows to modify old published package?

Lock files may contain hashes.

koromak 203 days ago [-]

Anyone know of a tool like zizmor for GitLab CI/CD? Pretty confident my setup is unsafe after reading through this.

Honestly safety in CI/CD seems near impossible anyways.

pabs3 200 days ago [-]

There is some linting available:

https://docs.gitlab.com/ee/ci/yaml/lint.html

Personally I'd move as much logic out of the YAML as possible into either pure shell scripts or scripts in other languages. Then use shellcheck other appropriate linters for those scripts.

Maybe one day someone will write a proper linter for the shell-wrapped-in-yaml insanity that are these CI systems, but it seems unlikely.

romanows 203 days ago [-]

So the Python package `ultralytics` had their GitHub CI/CD pipeline compromised which allowed an attack to be inserted and then published on PyPI?

thangngoc89 203 days ago [-]

Hilift 203 days ago [-]

amelius 203 days ago [-]

Question. Are there white-hat hackers out there who pen-test the Python ecosystem on a regular basis?

ashishbijlani 203 days ago [-]

We scan PyPI packages regularly for malware to provide a private registry of vetted packages.

1. https://github.com/ossillate-inc/packj

amelius 203 days ago [-]

If the tech is open-sourced, then an attacker can keep trying in private until they find an exploit, and then use it.

ashishbijlani 203 days ago [-]

> If the tech is open-sourced, then an attacker can keep trying in private until they find an exploit, and then use it.

So you'd rather assume that if something is obscure, it is secure?

amelius 202 days ago [-]

I'm just pointing out a huge downside of the approach and that more measures such as pen testing are really needed. I don't want to be right, I want a secure PyPI <3

orf 202 days ago [-]

I maintain a project that mirrors all the code published to PyPi into a series of GitHub repositories, allowing automated scanning and analysis.

https://github.com/pypi-data

zvr 200 days ago [-]

Thank you for that! It's been very useful!

cleanerbob 203 days ago [-]

[dead]

203 days ago [-]

amelius 203 days ago [-]

> What can you do as a publisher to the Python Package Index?

Does PyPI rate publishers based on how well they comply to these rules? Can users see which publishers are more reliable than others?

JimmyWilliams1 203 days ago [-]