The Python Package Problem Python packages can give your software superpowers, but managing them is a barrier to AI development.

Published

Feb 28, 2024

Reading time

2 min read

Dear friends,

I think the complexity of Python package management holds down AI application development more than is widely appreciated. AI faces multiple bottlenecks — we need more GPUs, better algorithms, cleaner data in large quantities. But when I look at the day-to-day work of application builders, there’s one additional bottleneck that I think is underappreciated: The time spent wrestling with version management is an inefficiency I hope we can reduce.

A lot of AI software is written in the Python language, and so our field has adopted Python’s philosophy of letting anyone publish any package online. The resulting rich collection of freely available packages means that, with just one “pip install” command, you now can install a package and give your software new superpowers! The community’s parallel exploration of lots of ideas and open-sourcing of innovations has been fantastic for developing and spreading not just technical ideas but also usable tools.

But we pay a price for this highly distributed development of AI components: Building on top of open source can mean hours wrestling with package dependencies, or sometimes even juggling multiple virtual environments or using multiple versions of Python in one application. This is annoying but manageable for experienced developers, but creates a lot of friction for new AI developers entering our field without a background in computer science or software engineering.

I don’t know of any easy solution. Hopefully, as the ecosystem of tools matures, package management will become simpler and easier. Better tools for testing compatibility might be useful, though I’m not sure we need yet another Python package manager (we already have pip, conda, poetry, and more) or virtual environment framework.

As a step toward making package management easier, maybe if all of us who develop tools pay a little more attention to compatibility — for example, testing in multiple environments, specifying dependencies carefully, carrying out more careful regression testing, and engaging with the community to quickly spot and fix issues — we can make all of this wonderful open source work easier for new developers to adopt.

Keep coding!

Andrew

P.S. Built in collaboration with Meta: “Prompt Engineering with Llama 2,” taught by Amit Sangani, is now available! Meta’s Llama 2 has been a game changer: Building with open source lets you control your own data, scrutinize errors, update models (or not) as you please, and work alongside the global community to advance open models. In this course, you’ll learn how to prompt Llama chat models using advanced techniques like few-shot for classification and chain-of-thought for logic problems. You’ll also learn how to use specialized models like Code Llama for software development and Llama Guard to check for harmful content. The course also touches on how to run Llama 2 on your own machine. I hope you’ll take this course and try out these powerful, open models! Sign up here

Subscribe to The Batch