Really cool to see all the hard work on Trusted Publishing and Sigstore pay off here. As a reminder, these tools were never meant to prevent attacks like this, only to make them easier to detect, harder to hide, and easier to recover from.
theteapot 128 days ago [-]
Just getting around to looking at this. There is a certificate in sigstore for the 8.3.41 that claims the package is a build of cb260c243ffa3e0cc84820095cd88be2f5db86ca -- https://search.sigstore.dev/?logIndex=153415340. But it isn't. The package content differ from the content of that commit. This doesn't seem like something that's working that well.
ronjouch 140 days ago [-]
Good recommendations, including a neat tool to audit your GHAs: https://github.com/woodruffw/zizmor , “A static analysis tool for GitHub Actions”.
clbrmbr 139 days ago [-]
As a user of PyPI, what’s a best practice to protect against compromised libraries?
I fear that freezing the version number is inadequate because attackers (who don’t forget, control the dependency) could change the git tag and redeploy a commonly used version with different code.
Is it really viable to use hashes to lock the requirements.txt?
woodruffw 139 days ago [-]
Release files on PyPI are immutable: an attacker can’t overwrite a pre-existing file for a version. So if you pin to an exact version, you are (in principle) protected from downloading a new malicious one.
The main caveat to the above is that files are immutable on PyPI, but releases are not. So an attacker can’t overwrite an existing file (or delete and replace one), but they can always add a more specific distribution to a release if one doesn’t already exist. In practice, this means that a release that doesn’t have an arm64 wheel (for example) could have one uploaded to it.
TL;DR: pinning to a version is suitable for most settings; pinning to the exact set of hashes for that version’s file will prevent new files from being added to that version without you knowing.
TZubiri 139 days ago [-]
The best practice is to reduce your dependencies.
Trim your requirements.txt
HeatrayEnjoyer 139 days ago [-]
Your software should execute as little code written outside your offices as possible.
onei 139 days ago [-]
That seems like short-sighted advice. My company probably isn't paying me to write crypto, web frameworks, database drivers, etc. If it's not where I'm adding business value, I would generally try to use a third-party solution, assuming there's no stdlib equivalent. That likely means my code is an overwhelming minority of what gets executed.
If C dominates your codebase or you're squeezing out every inch of performance, then sure, you may well have written everything libc is missing. In Python, or another language that had a thriving ecosystem of third-party packages, it seems wasteful to write it all in-house.
hansvm 139 days ago [-]
They aren't paying you to integrate a bunch of third-party dependencies either, especially not when you could be using the time to generate actual business value.
The specific examples you listed are usually fine for generic SAAS companies (I'd usually object to a "full" web framework), but advice of the flavor "most code should be your own" is advocating for a transitive dependency list you can actually understand.
Anecdotally, by far the worst bugs I've ever had to triage were all in 3rd-party frameworks or in the mess created by adapting the code the business cares about into the shape a library demands (impedence mismatches). They're also the nastiest to fix since you don't own the code and are faced with a slow update schedule, forking, writing it yourself _anyway_ (and now probably in the impedence-mismatched API you used to talk to the last version instead of what your application actually wants), or adding an extra layer of hacks to insulate yourself from the problem.
That, combined with just how easy it is to write most software a business needs, pushes me to avoid most dependencies. It's really freeing to own enough of the code that when somebody asks for a new feature you can immediately put the right code in the right spot and generate business value instead of fighting with this or that framework.
TZubiri 139 days ago [-]
"They aren't paying you to integrate a bunch of third-party dependencies either, especially not when you could be using the time to generate actual business value."
They might, but in my experience, it's bottom of the barrel clients playing out of their league. Example, a single store that is using shopify and wants to migrate to their own website because the fees are too high, might pay 500-1000$ for you to build something with wordpress and woocommerce, or worse, a mysql react website.
TZubiri 139 days ago [-]
It's a fine balance.
You win most of the time, until you get log4jed or left-padded. Then my company survives you.
Also I might win even without vulns. I don't write frameworks, I just write the service or website directly. And less abstractions and 3rd party code can mean more quality.
TZubiri 139 days ago [-]
Especially those without a commercial contract. I'm fine paying for an api
but what is unprofessional is installing random stuff from github.com/GuyProgrammer/Project78 with an anime girl as a profile pic.
luismedel 139 days ago [-]
It surprises me how much companies rely on that kind of projects without 1) making a proper assessment and 2) cloning the project to ensure it isn't tampered in the future.
TZubiri 139 days ago [-]
Not only do they not clone projects or freeze their dependencies, but they are pressured to constantly update to the latest version to avoid vulnerabilities ( while introducing risk of new ones)
pabs3 137 days ago [-]
Download the libraries' real source repos, apply static analysis tools, audit the source code manually, then build wheels from source instead of using prebuilt stuff from PyPI. Repeat for every update of every library. Publish your audits using crev, so others can benefit from them. Push the Python community to think about Reproducible Builds and Bootstrappable Builds.
This is where tools like poetry, uv with lock files shine. The lock files contains all transient dependencies (like pip freeze) but they do it automatically.
d0mine 139 days ago [-]
Are you sure pypi allows to modify old published package?
Lock files may contain hashes.
koromak 140 days ago [-]
Anyone know of a tool like zizmor for GitLab CI/CD? Pretty confident my setup is unsafe after reading through this.
Honestly safety in CI/CD seems near impossible anyways.
Personally I'd move as much logic out of the YAML as possible into either pure shell scripts or scripts in other languages. Then use shellcheck other appropriate linters for those scripts.
Maybe one day someone will write a proper linter for the shell-wrapped-in-yaml insanity that are these CI systems, but it seems unlikely.
romanows 139 days ago [-]
So the Python package `ultralytics` had their GitHub CI/CD pipeline compromised which allowed an attack to be inserted and then published on PyPI?
thangngoc89 139 days ago [-]
Attacker sent a PR to the ultralytics repository that triggered Github CI. This results in
1) attacker trigger new version publication on the CI itself
2) attacker was able to obtain secrets token for publish to PyPi
Hilift 140 days ago [-]
Sadly, popular open source projects are vulnerable to this vector. A popular package that is adopted by a large vendor (Redhat/Microsoft) may see a PR from months or a year ago materialize in their product update pipeline. That is too easy to weaponize so that it doesn't manifest until needed or in a different environment.
amelius 139 days ago [-]
Question. Are there white-hat hackers out there who pen-test the Python ecosystem on a regular basis?
ashishbijlani 139 days ago [-]
We scan PyPI packages regularly for malware to provide a private registry of vetted packages.
The tech is open-sourced: Packj [1]. It uses static+dynamic code/behavioral analysis to scan for indicators of compromise (e.g., spawning of shell, use of SSH keys, network communication, use of decode+eval, etc). It also checks for several metadata attributes to detect impersonating packages (typo squatting).
If the tech is open-sourced, then an attacker can keep trying in private until they find an exploit, and then use it.
Also, you only know if your security measures work if you test them. I'd feel much safer if there was regular pen-testing by security researchers. We're talking about potential threats from nation state actors here.
ashishbijlani 139 days ago [-]
> If the tech is open-sourced, then an attacker can keep trying in private until they find an exploit, and then use it.
So you'd rather assume that if something is obscure, it is secure?
amelius 139 days ago [-]
I'm just pointing out a huge downside of the approach and that more measures such as pen testing are really needed. I don't want to be right, I want a secure PyPI <3
orf 139 days ago [-]
I maintain a project that mirrors all the code published to PyPi into a series of GitHub repositories, allowing automated scanning and analysis.
> What can you do as a publisher to the Python Package Index?
Does PyPI rate publishers based on how well they comply to these rules? Can users see which publishers are more reliable than others?
JimmyWilliams1 140 days ago [-]
I appreciate PyPI's transparency and the proactive measures to mitigate future risks. Are there plans to further educate developers on secure workflow practices to prevent similar incidents? This seems like a vital area for community collaboration and awareness.
I fear that freezing the version number is inadequate because attackers (who don’t forget, control the dependency) could change the git tag and redeploy a commonly used version with different code.
Is it really viable to use hashes to lock the requirements.txt?
The main caveat to the above is that files are immutable on PyPI, but releases are not. So an attacker can’t overwrite an existing file (or delete and replace one), but they can always add a more specific distribution to a release if one doesn’t already exist. In practice, this means that a release that doesn’t have an arm64 wheel (for example) could have one uploaded to it.
TL;DR: pinning to a version is suitable for most settings; pinning to the exact set of hashes for that version’s file will prevent new files from being added to that version without you knowing.
Trim your requirements.txt
If C dominates your codebase or you're squeezing out every inch of performance, then sure, you may well have written everything libc is missing. In Python, or another language that had a thriving ecosystem of third-party packages, it seems wasteful to write it all in-house.
The specific examples you listed are usually fine for generic SAAS companies (I'd usually object to a "full" web framework), but advice of the flavor "most code should be your own" is advocating for a transitive dependency list you can actually understand.
Anecdotally, by far the worst bugs I've ever had to triage were all in 3rd-party frameworks or in the mess created by adapting the code the business cares about into the shape a library demands (impedence mismatches). They're also the nastiest to fix since you don't own the code and are faced with a slow update schedule, forking, writing it yourself _anyway_ (and now probably in the impedence-mismatched API you used to talk to the last version instead of what your application actually wants), or adding an extra layer of hacks to insulate yourself from the problem.
That, combined with just how easy it is to write most software a business needs, pushes me to avoid most dependencies. It's really freeing to own enough of the code that when somebody asks for a new feature you can immediately put the right code in the right spot and generate business value instead of fighting with this or that framework.
They might, but in my experience, it's bottom of the barrel clients playing out of their league. Example, a single store that is using shopify and wants to migrate to their own website because the fees are too high, might pay 500-1000$ for you to build something with wordpress and woocommerce, or worse, a mysql react website.
You win most of the time, until you get log4jed or left-padded. Then my company survives you.
Also I might win even without vulns. I don't write frameworks, I just write the service or website directly. And less abstractions and 3rd party code can mean more quality.
but what is unprofessional is installing random stuff from github.com/GuyProgrammer/Project78 with an anime girl as a profile pic.
https://github.com/crev-dev/ https://reproducible-builds.org/ https://bootstrappable.org/
Lock files may contain hashes.
Honestly safety in CI/CD seems near impossible anyways.
https://docs.gitlab.com/ee/ci/yaml/lint.html
Personally I'd move as much logic out of the YAML as possible into either pure shell scripts or scripts in other languages. Then use shellcheck other appropriate linters for those scripts.
Maybe one day someone will write a proper linter for the shell-wrapped-in-yaml insanity that are these CI systems, but it seems unlikely.
The tech is open-sourced: Packj [1]. It uses static+dynamic code/behavioral analysis to scan for indicators of compromise (e.g., spawning of shell, use of SSH keys, network communication, use of decode+eval, etc). It also checks for several metadata attributes to detect impersonating packages (typo squatting).
1. https://github.com/ossillate-inc/packj
Also, you only know if your security measures work if you test them. I'd feel much safer if there was regular pen-testing by security researchers. We're talking about potential threats from nation state actors here.
So you'd rather assume that if something is obscure, it is secure?
https://github.com/pypi-data
Does PyPI rate publishers based on how well they comply to these rules? Can users see which publishers are more reliable than others?