How to manage a Python Project


Python 101

(for Distributed Systems)

Module 2

A.Y. 2024/2025

Matteo Magnini (based on the material made by Giovanni Ciatto)


Compiled on: 2024-10-19 — printable version

back

table of content

Software Organization

Modules

  • A module is a file containing Python definitions and statements.
  • You can import a module with the import keyword.
  • You can import a specific function or class from a module with the from keyword.
  • You can also import a module with an alias with the as keyword.
# You can import a module
import math as m
from math import sqrt

# You can use the imported module
x = m.sqrt(25)  # 5
y = sqrt(36)  # 6

Packages

  • A package is a collection of modules.
  • A package is a directory containing a file named __init__.py.
  • You can import a package with the import keyword.
  • You can import a specific module from a package with the from keyword.
# You can import a package
import numpy

# You can import a module from a package
from numpy import random

# You can use the imported package
y = random.randint(1, 10)  # random integer between 1 and 10
  • numpy is a third-party package in Python!

Dependency Management

In the package example we used numpy, but because it is a third-party package, we need to install it first. In general, there could be many dependencies in a project, and it is hard to manage them manually. The need for build systems in Python emerged with more complex use cases.

  • Since there were none, now there are several tools that do build-related jobs:
Anaconda Conda Miniconda pip
Poetry PyBuilder PyEnv virtualenv
  • The feature set varies wildly
  • They are meant to solve different problems!

Python’s conflicting standards

xkcd

Since there were no standard management systems originally, multiple tools proliferated

  • The Python Packaging Authority (PyPA) is inconsistent in its suggestions:

  • Many Python developers also exploit PyEnv

  • Many data scientists use Anaconda

What are all these tools? (pt. 1)

  1. By default, Python is installed system-wide

    • i.e. there should be one (and only one) Python interpreter on the system
  2. All Python installations come with pip, the package installer for Python

  3. So, one may install Python packages system-wide with pip install PACKAGE_NAME

One problem, many implications: the same package can be installed only once on the same Python installation

  • (a) what if two projects on the same system require different versions of the same package as dependencies?
    • say, project A requires Kivy==2.3 and project B requires Kivy==1.4
  • (b) what if two projects on the same system require different versions of Python?
    • say, project A requires Python 3.8 and project B requires Python 3.10

What are all these tools? (pt. 2)

(consider reading this page for further details https://stackoverflow.com/a/41573588)

  1. virtualenv and venv are tools to create virtual Python installations on the same system

    • virtualenv is a third-party tool, venv is built-in in Python 3.3 and later
    • let’s say you have Python v. XXX installed on your system…
      • … these tools let you create other lightweight copies of Python v. XXX in other folders
        • the copies are fresh, i.e. they no package installed
        • but one may install different packages in each copy, via pip
    • now you can solve problem (a)
  2. PyEnv is a tool to manage multiple Python installations on the same system

    • each installation may use a different version of Python
    • now you can solve problem (b)

New problem: many Python installations on the same system,
each one with a different version of Python, and different packages installed

recall all the issues we had in previous lectures?

What are all these tools? (pt. 3)

  1. Smart and adequate convention to work with Python projects: 1-project-1-Python-env

    • each Python project has its own Python environment …
      • be it virtual or not, as far as it uses the same Python version required by that project
    • … the environment only contains the packages required by that project
  2. Achieving this requires developers to be disciplined and meticulous

    • other than being proficient with the tools above
  3. Poetry is a tool that aims to automate this process

From now on, let’s use Poetry

Poetry is a declarative tool for dependency management, packaging, and release in Python

  • It handles both dependencies and dev-dependencies

    • replacing requirements.txt and requirements-dev.txt
  • It automates the 1-project-1-Python-env convention

  • It simplifies the packaging process for the project

  • It simplifies the publication process on PyPI (or other software repositories)

Getting started with Poetry

  • Poetry is a modern dependency management tool for Python.
  • You can install Poetry with the following command:
curl -sSL https://install.python-poetry.org | python3 -
# Create a new project
poetry new my_project

# Add a dependency
poetry add numpy

# Install dependencies
poetry install

# Run a script
poetry run python my_script.py

(Poetry’s canonical) Project Structure

  • A Python project typically has the following structure:
<root directory>  # main directory of the project
├── <package>/  # main package of the project (there can be multiple packages)
│   ├── __init__.py  # python package marker
│   └── <module>.py  # module in the package (there can be multiple modules)
├── test/
│   └── test_<module>.py  # test module for the module
├── .github/
│   └── workflows/
│       └── check.yml  # some github action
├──.gitignore  # git ignore file
├──.gitattributes  # git attributes file (e.g., for line endings in different OS)
├──__main__.py  # application entry point
├── LICENSE  # license file (e.g., Apache 2.0)
├── pyproject.toml  # build dependencies
├── poetry.toml  # Poetry settings (we will see Poetry for dependency management)
└── README.md  # don't forget to write a README file

How to manage a Python Project

Python 101

(for Distributed Systems)

Module 2

A.Y. 2024/2025

Giovanni Ciatto
Matteo Magnini

Table of Contents

  1. Course Overview
  2. Preliminary notions for Software Development
  3. Introduction to the Python Language
  4. How to manage a Python Project
  5. A Python Project Example
  6. QA, testing, TDD, reproducibility and replicability


top

back