So, we have a package, VDTK which we’re planning to release in a new major version -
there’s only one problem… We depend on several packages which do not publish builds to the PyPi repository. In our
pyproject.toml
file, they are specified as:
dependencies = [
"...",
"clip @ git+https://github.com/openai/CLIP.git",
"mauve-text @ git+https://github.com/krishnap25/mauve.git",
"en_core_web_lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.4.1/en_core_web_lg-3.4.1-py3-none-any.whl",
]
Unfortunately, when you upload a package built with dependencies like this, you get the error:
ERROR HTTPError: 400 Bad Request from https://upload.pypi.org/legacy/
Invalid value for requires_dist. Error: Can't have direct dependency: 'clip @ git+https://github.com/openai/CLIP.git'
This is rather unfortunate, but there are several possible options to get around this.
Option 1: Vendor the code
License permitting, we could copy the code into our own repository, update the pointers in our repo, and import the code that way. Unfortunately, this is time consuming, and difficult to maintain and update, since we become responsible for merging any upstream changes directly into our codebase and pushing a new release for any patches released in the upstream. Additionally, it means that we take on the security burden of any upstream code, which is usually untenable for small-scale projects.
Option 2: Break the UX
Another option is to remove the ability for users to install our package from pypi, and force users to install the package
with pip install vdtk @ git+https://github.com/cannylab/vdtk
. This isn’t a bad option, but it makes the package less
discoverable for other users, and forces any packages which are built on our code to have the same breaking change
long-term.
Option 3: Publish new pypi packages with the code
The final option is, license permitting, to publish our own pypi versions of the packages. This is almost as bad as option 1, but it makes life a little bit easier, since we’re mostly just running the build scripts on behalf of the package maintainers. It’s a bit of a PITA, but is probably one of the cleanst options. That being said, it’s not exactly polite to publish the code of something that you don’t maintain - but hey, if the package maintainer won’t do it themselves, then it would be better if somebody does it.