Preliminary notions for Software Development

Python 101

(for Distributed Systems)

Module 2

A.Y. 2024/2025

Matteo Magnini (based on the material made by Giovanni Ciatto)

Compiled on: 2024-10-19 — printable version

back

table of content

Motivation and goals

In order to put everybody on the same page…
… we are going to recall some basic notions and technical aspects related to python and how to handle a python project, namely:
- the command line
- the anatomy of a software project in Python
- the role of modelling in SE
- ubiquitous (w.r.t. SE) notions such as interfaces and runtimes

The command-line

(a.k.a. the shell, the terminal, the console)

First contact with the terminal

Open your terminal application
- Either “PowerShell”, “Command Prompt”, “Terminal” on Windows
- “iTerm” or “Terminal” on macOS
- “Konsole” or “Terminal” on Linux

The terminal is a text-based interface to the operative system (OS). Each terminal application is executing a shell program

The shell is a program that has a simple job: REPL (Read, Evaluate, Print, Loop)

it reads a command from the user

it evaluates the command

it prints the result

it loops back to step 1, unless the user explicitly asks to exit

Write a few simple commands and observe the effect
- ls (or dir on Windows): should list the files in the current directory
- echo "Hello World": should print “Hello World” on the screen
- exit: should close the shell (and the terminal application, if it’s the only shell)

Why the terminal?

The terminal is a powerful tool for software development
It allows developers to interact with the OS in a precise, minimal, and efficient way
Most operations in software development can, and often should, be performed from the terminal

The terminal may accept commands:

interactively, from the user

from a script, which is a file containing commands to be executed by a shell

Developers are inherently lazy

Developers’ mindset:

If a person can do it manually via the shell, then a script can do it automatically

If a script can do it automatically, then that’s the way of doing it

these are the basic principles behind automation
a part from time-saving and precision in execution, automation enables reproducibility of the process
1. experts can distill their operational knowledge into scripts
2. scripts can be called (i.e. invoked, executed) by non-experts

To automate or not to automate?

Beware, ‘cause scripts are software too, and they require engineering:

A comic making fun of the excess of automation

https://xkcd.com/1319/

There is an implict trade-off between
1. the time spent to automate a task (i.e. coding)
2. and the time saved w.r.t. doing the task manually

This is not really the case of everyday programming tasks, but let’s keep this in mind

Why the terminal in this course?

We are going to use the terminal for:
- running Python scripts
- managing Python projects (version tracking, tests, releases, etc.)
Understaing how to do stuff via the terminal is a reusable skill
- while IDE and GUIs evolve quickly, and every team has its own preferences…
- command line tools are very stable and widely adopted
We encourage you to read the following lectures from The MIT’s “Missing Semester of Your CS Education”:

Stuff you need to know about the shell (pt. 1)

There exist essentially two sorts of shells:
1. the Unix-like shells (e.g., bash, zsh, fish, ksh, csh, tcsh, etc.) used by Linux and macOS
  - and, to some extent, by Windows
2. the Windows shells (e.g., cmd, PowerShell) which are different from the Unix-like ones, and from each other

Whenever working with the terminal, first thing to do is to understand which shell you are using

if you’re on Linux, you’re probably using bash or zsh

if you’re on macOS, you’re probably using zsh or bash

if you’re on Windows, you’re probably using

cmd if you opened the Command Prompt application

PowerShell is you opened the PowerShell application

bash if are using the Windows Subsystem for Linux (WSL) or Git Bash

Stuff you need to know about the shell (pt. 2)

Whenver you open a shell, the shell is “in” a directory, which is called the current working directory (CWD)
- by default, commands operate on the CWD (i.e. they read and write files in the CWD)
- that directory should be shown somewhere in the shell’s prompt
- if not shown, you can always ask the shell to show it, via some command
If one wants to operate on a file in a different directory…
- … they have to change the CWD
  - this is done via the cd command (change directory)
- … without changing the CWD, they have to specify the path to the file
  - this is done via the absolute or relative path to the file

Stuff you need to know about the shell (pt. 3)

A path is a string that represents the location of a file or a directory in the file system

Beware: path-separator is different among Windows (\) and other OS (/),

and we only use / in the slides

A relative path is a path that is relative to the CWD
- e.g., ./file.txt refers to a file named file.txt in the CWD
- e.g., ../file.txt refers to a file named file.txt in the parent directory of the CWD
- e.g., ./dir/file.txt refers to a file named file.txt in a sub-directory of CWD, named dir
An absolute path is a path that starts from the root of the file system
- on Unix-like systems, the root is /
  - e.g. /home/giovanni/file.txt refers to a file named file.txt in giovanni’s home directory on Linux
  - e.g. /Users/giovanni/file.txt refers to a file named file.txt in giovanni’s home directory on macOS
- on Windows, there are several roots, one per drive (e.g., C:, D:, etc.)
  - e.g. C:\Users\giovanni\file.txt refers file file.txt in giovanni’s home directory, on drive C:
  - e.g. D:\Data\Photos\profile.jpg refers file profile.jpg in the Data\Photos directory, on drive D:

Terminal cheat sheet

Operation	*nix	win
Print the current directory location	`pwd`	`echo %cd%`
Remove the file `foo` (does not work with directories)	`rm foo`	`del foo`
Remove directory `bar`	`rm -r bar`	`del bar`
Change disk (e.g., switch to `D:`)	n.a., single root (`/`)	`D:`
Move to the subdirectory `baz`	`cd baz`	`cd baz`
Move to the parent directory	`cd ..`	`cd..`
Move (rename) file `foo` to `baz`	`mv foo baz`	`move foo baz`
Copy file `foo` to `baz`	`cp foo baz`	`copy foo baz`
Create a directory named `bar`	`mkdir bar`	`md bar`

Stuff you need to know about the shell (pt. 4)

Most commands have arguments
- roughly, whatever you can write after the command name is an argument
If you think of commands as functions, then arguments are the parameters of the function
- and the command is the function name
- and the printed output is the return value
- simply, no parentheses are required by default for “invoking” the function
Consider the ls command
- ls lists the files in the CWD, as an inline list
- ls -l lists the files in the CWD, as a detailed list
- ls -l /path/to/dir lists the files in the /path/to/dir directory, as a detailed list

How the hell can I memorise all these commands?

You should not.

Just try to grasp the basic idea of how shells work
Just memorise that there exist a way to do X via the shell
- and all relevant Xs
You will eventually memorise the syntax of most frequent commands
For the rest, you can always look up the documentation
- or the Web, or ask someone, there including ChatGPT or StackOverflow
  - but please, do not copy&paste code that you do not understand
Most commands support asking for help when one does not remeber the syntax
- e.g. COMMAND_NAME --help or COMMAND_NAME -h mostly used on Unix-like systems
- e.g. man COMMAND_NAME mostly used on Unix-like systems (man is for “manual”)
- e.g. COMMAND_NAME /? mostly used on Windows
- e.g. Get-Help COMMAND_NAME mostly used on Windows

Do not waste your memory, learn how to look up the documentation instead

About interactive commands (pt. 1)

Some commands are interactive
- when you start them, they to not terminate immediately
- instead, they wait for user input
In this case we say that the command is just starting a process
- “process” is a technical term for a program that is running
- an app is commonly a process attached to a graphical user interace (GUI) shown on the screen
There is no difference among interactive and non-interactive processes, for the shell
1. a command is used to start the process
2. the command will stay alive,
  - and possibly consume the user’s input
  - and possibly produce some output
3. for some reason, the process may eventually terminate
  - e.g., because of some input from the user, some error, or some condition met
4. when that the case, control is returned to the shell
  - which will ask for more commands, as usual

About interactive commands (pt. 2)

Upon termination, each command returns an exit code (i.e. a non-negative integer)
- 0 means “everything went fine”
- any other number means “something went wrong” (each number is a different error code)
- so the shell always knows if the last command was successful or not
When using the shell interactively:
- pay attention to whether the last command you wrote was successful or not
When programming the shell in a script:
- you can check the exit code of the last command via the special variable $?
- you can use the if statement to check the exit code and act accordingly

About interactive processes (pt. 1)

In the eyes of the OS, any process can be modelled as follows:

drawing

i.e. a black box
- consuming a stream of input data from the standard input channel (stdin)
- producing a stream of output data to the standard output channel (stdout)
- and, possibly, producing a stream of error data to the standard error stream (stderr)
more channels may be opened by the process, e.g. when reading / writing files

a stream is an unlimited sequence of bytes (or characters)

this may represent anything having a digital representation (e.g. a file, a network connection, a device, etc.)

About interactive processes (pt. 2)

Most commonly, for interactive processes, the situation is as follows:

drawing

all three streams are connected to the terminal by default
- so the process reads input from the keyboard
- and writes output to the terminal
- and writes errors to the terminal (sometimes, errors are colored differently)

About interactive processes (pt. 3)

Example: the `nano` command

nano is a simple, interactive, text editor for the terminal

Open a shell, and run the following command
```
nano myfile.txt
```
This should transform the terminal into a text editor, editing the file myfile.txt
- you can write some text in there, e.g. Hello World
Then, press Ctrl+O to save the file
- you will be asked to confirm the file name, press Enter
Then, press Ctrl+X to exit the editor
You should be back to the shell, and the file myfile.txt should have been created
- you can verify that, via the ls command

The Python command

Python is a programming language, namely, the reference programming language we use in the course
We will operate Python stuff via the terminal, using the python command
The command’s behaviour is very different depending on which and how many arguments are passed:
- python with no arguments starts an interactive Python shell
  - i.e. yet another shell, but using the Python syntax
- python FILENAME starts a Python process that executes the Python code in the file FILENAME
- python -m MODULE starts a Python process that executes the module named MODULE
  - we’ll see what a module is, later in the course
- python -c "CODE" starts a Python process that executes the Python code in the string "CODE"
  - e.g. python -c "print('Hello World')"

Use python --help to inspect the help of the python command, and see all the options
When using Python, always remember to check the version of the Python interpreter you are using
- python --version or python -V

Code and code organization

Running example (pt. 1)

Let’s say we are going to build a simple calculator app, in Python

Using Kivy for the GUI, we may easily build the following app:

Running example (pt. 2)

The source code for such application is available here:

https://github.com/unibo-dtm-se/compact-calculator

TO-DO list

Download the code by clicking on the green “Code” button, and then “Download ZIP”
Unpack the archive in a directory of your choice
Open the directory in some lightweight IDE like VS Code
- possibly exploit the integrated terminal
Inspect the content of the directory (e.g. via the terminal, using the ls -la command)
- you should notice 4 files:
  - .python-version (hidden on Unix) textual declaration of the Python version required by the application
  - calculator.py: the source code of the application
  - requirements.txt: a file containing the dependencies of the application
  - README.md: a file containing some notes about of the application

Running example (pt. 3)

Notice that the calculator.py file is a Python script that contains only 46 lines of code
Have a look to the source code of the calculator.py file
- do you recognize any structure in the code?
- do you have any clue about what’s going on in the code?

Let’s try to run the application

open a terminal in the directory of the application (the VS Code is fine)
run the following command
```
python calculator.py
```

you may observe the following output in the terminal:

Traceback (most recent call last):
File "/path/to/your/directory/calculator.py", line 1, in <module>
    from kivy.app import App
ModuleNotFoundError: No module named 'kivy'

The issue here is that our application depends on some third-party library, namely Kivy
- third-party $\approx$ not written by us + not included Python by default

Running example (pt. 4)

The solution is pretty simple: let’s install the missing dependency
- we can do that via the pip command, which is the Python package manager
```
pip install kivy
```
After the installation, we can try to run the application again
```
python calculator.py
```
this time, the application should start, and you should see the calculator Window
play a bit with the application
- ensure it works as expected
- finally, close the Window, and notice that control is returned to the terminal
either in the terminal (ls -la), or in the GUI, you may notice a new sub-directory named __pycache__
- with a file named calculator.cpython-3XXX.pyc (or similar) in it
  - that is a compilation cache file, generated by the Python interpreter, for our application

Many hidden concepts in this example

the notion of library
- i.e. a collection of pre-cooked software be reused in different applications
the notion of dependency
- i.e. a piece of software that is required by another piece of software to run
the notion of runtime
- i.e. the environment in which a piece of software is executed
the notion of package manager
- i.e. a tool that automates the installation of dependencies into a runtime
the notion of compilation
- i.e. the process of translating a source code into a machine code that can be executed

Libraries (pt. 1)

Basically no programmer ever writes an entire application from scratch
- virtually all programmers re-use someone else’s code to do their job
One key principle in SE has always been:

Don’t reinvent the wheel

SE is essentially about how to write good code, which works well, and can be reused in the future
1. let’s design software to be general
2. let’s write the code to work well
3. let’s give it a name and clearly document how it works (input, output, etc.)
4. let’s make it reusable, i.e. referenceable (callable) by other software
Collections of reusable code are called libraries

Libraries (pt. 2)

All programming languages have a standard library…
- i.e. a collection of reusable code that comes with the language, by default
  - e.g. the math module in Python, the java.util package in Java, etc.
  - BTW, Python has one of the richest standard libraries among programming languages
- most commonly, any two versions of a programming language would have a different standard library
  - the syntax of the language may not even change
… plus some mechanism to install and import third-party libraries
- e.g., in Python, the pip command is used to install third-party libraries
- e.g., in Python, the import statement is used to import libraries in the script
  - upon import, the difference among standard and third-party libraries is irrelevant
The consequences of this “library” idea are manifold
- what libraries are available for Python program to use?
- what third-party libraries can one install?
- what’s the impact of using a third-party library for my application?
- how can one write a library that can be reused by others?
  - why should one do that?

Runtime (pt. 1)

The runtime is the environment in which a piece of software is executed
- not to be confused with run-time, which is the time when the software is executed

Runtime of a program $\approx$ jargon for “the set of libraries actually available for that program at run-time”

this commonly includes:
1. the standard library of the interpreter executing the
  - e.g. Python 3.11’s standard library for our calculator application
2. any third-party library installed onto that interpreter
  - e.g. Kivy for our calculator application

Dependencies (pt. 1)

Developers exploit libraries produced by others to avoid wasting time reinventing the wheel
The reasoning is more or less as follows:
1. one needs to realise some software for functionality $F$
2. writing the code for $F$ requires some effort $E_{scratch} > 0$
3. there exist some library $L$ which reduces the effort to $0 < E_{use} < E_{scratch}$
4. installing the library requires some effort $E_{install} > 0$
5. learning how to use the library requires some effort $E_{learn} > 0$
6. in the likely case that $E_{install} + E_{learn} + E_{use} < E_{scratch}$, one should use the library

A dependency among some software $S$ and some other software $L$

occurs when $S$ requires $L$ to work

requires $\approx$ $S$ needs $L$ to be part of its runtime to be executed
this is commonly the result of the reasoning above
for instance, the calculator application depends on the Kivy library, and on the Python 3.11 standard library

Dependencies (pt. 2)

Some definitions related to the notion of dependency:

Transitive dependency: if $S$ depends on $L$, and $L$ depends on $M$, then $S$ transitively depends on $M$

non-transitive dependencies are called direct dependencies

Dependencies (pt. 3)

Dependency graph of a software $S$: the graph spawned by all the dependencies (direct or transitive) depedencies of $S$

e.g. the dependency graph of our calculator application includes Kivy and Python 3.11 standard library, and all transitive depedencies of Kivy:

calculator.py
├── Python 3.11.7
└── kivy 2.3.0
    ├── docutils *
    ├── kivy-deps-angle >=0.4.0,<0.5.0
    ├── kivy-deps-glew >=0.3.1,<0.4.0
    ├── kivy-deps-sdl2 >=0.7.0,<0.8.0
    ├── kivy-garden >=0.1.4
    │   └── requests *
    │       ├── certifi >=2017.4.17
    │       ├── charset-normalizer >=2,<4
    │       ├── idna >=2.5,<4
    │       └── urllib3 >=1.21.1,<3
    ├── pygments *
    └── pypiwin32 *
        └── pywin32 >=223

Package managers and package reporitories

To support the extension of runtimes, and therefore the addition of dependencies…
… most programming languages come with 2 related tools:
1. a package manager, which is a (command-line)-tool to install and manage dependencies, semi-automatically
2. package repositories, which are collections of software and metadata about that software, commonly accessible via the Web
Packgage $\approx$ a piece of software with a name and a version, and a fixed structure eases installation and reuse
- each package manager/repository subtends a package format, and some installation or publication procedures
Package managers commonly support specifying from which repository a dependency should be installed
- yet, each programming language may have its own default package repository
In the Python world:
- the Python Package Index (PyPI) is the default software repository, full of open-source Python software
- the pip command is the default package manager, and it is tightly integrated with PyPI
  - by default pip install NAME installs the last version of the package NAME from PyPI

Dependency declaration

It is a good practice to document which dependencies a software relies upon
- names and versions, possibly
It is even a better practice to automate the installation of dependencies
- so that they can be restored in automatically in any new development / usage environment
Other than package managers and repositories, automation requires dependency declaration
- each package manager supports some file format for this purpose
In the Python world, there are several conventions for dependency declaration
- the most common is the requirements.txt file
  - which contains a list of dependencies in the form NAME==VERSION
  - the pip install -r requirements.txt command installs all depndencis in the file
- another common convention is to declare Python version in a .python-version file
  - the pyenv install command can install corresponding version of Python

Package managers into the wild

The Python world is not the only one where package managers and package repositories are used
Most programming languages have their own package manager and package repository
- e.g. Java has Maven and Gradle, and Maven Central
- JavaScript has npm and npmjs.com
- Rust has Cargo and crates.io
- Go has go and pkg.go.dev
- Ruby has gem and rubygems.org
- C++ has vcpkg and conan.io
- etc.
In the Linux world, package managers/repositories are used at the OS level too
- e.g. Debian and Ubuntu distributions use apt and Debian repositories
- Red Hat and Centos distributions use yum and Red Hat repositories
- Arch Linux and its derived distriutions use pacman and Arch Linux repositories
- etc.
On MacOS, the Homebrew package manager is widely used (not shipped with the OS)
On Windows, one can use chocolatey or scoop as package managers (not shipped with the OS)

What about the actual code?

Let’s delve into the actual code of the calculator application

(focus on the comments)

# Import a bunch of stuff from the Kivy library, used below
from kivy.app import App
from kivy.uix.boxlayout import BoxLayout
from kivy.uix.button import Button
from kivy.uix.label import Label

# Matrix of button names and their layout in the GUI
BUTTONS_NAMES = [
    ['7', '8', '9', '/'],
    ['4', '5', '6', '*'],
    ['1', '2', '3', '-'],
    ['.', '0', '=', '+'],
]

# Calculator *class*: template for all sorts of calculators. this is a particular case of App (i.e. a window, in Kivy)
class CalculatorApp(App):
    # Method to build the GUI of the calculator, accordinging to Kivy's conventions
    def build(self):
        # Definition & initialisation of the "expression" field of the calculator.
        # This fields stores a string, representing the expression to be evaluated when "=" is pressed
        self.expression = ""

        # Let's create a layout, i.e. a virtual container of the visual components the GUI.
        # The grid shall dispose components vertically (top to bottom), i.e. it contains *rows* of components
        grid = BoxLayout(orientation='vertical')

        # Let's create a label, which will serve as the display of the calculator
        self._display = Label(text='0', font_size=24, size_hint=(1, 0.75))
        # Let's add the label to the grid, as the first row
        grid.add_widget(self._display)

        # For each *list of* button names in the matrix of button names...
        for button_names_row in BUTTONS_NAMES:
            # ... let's create another virtual container for a *row* for components
            grid_row = BoxLayout()
            # ... then, for each button name in the list of button names...
            for button_name in button_names_row:
                # ... let's create a button, having the button name as text
                # (the button is configured to call method on_button_press when pressed)
                button = Button(text=button_name, font_size=24, on_press=self.on_button_press)
                # ... and let's add the button to the row
                grid_row.add_widget(button)
            # ... and let's add the row to the grid
            grid.add_widget(grid_row)

        # Finally, let's return the grid, what will be showed in the window
        return grid

    # Method to be called when a button is pressed
    def on_button_press(self, button):
        # If the button is the "=" button
        if button.text == '=':
            # Try to...
            try:
                # ... evaluate the expresion *as a Python expression*, convert the result to a string, 
                # and show that string on the calculator display
                self._display.text = str(eval(self.expression))
            # If an error occurs in doing the above (e.g. wrong expression)
            except SyntaxError:
                # ... set the display to "Error"
                self._display.text = 'Error'
            # Reset the calculator's expression
            self.expression = ""
        # If the button is any other button
        else:
            # Append the button's text to the calculator's expression
            self.expression += instance.text
            # Show the calculator's expression on the display
            self._display.text = self.expression


# If the script is executed as a standalone program (i.e. not imported as a module)
if __name__ == '__main__':
    # Let's create a new calculator application, and run it
    CalculatorApp().run()

the whole application is contained in a single file, calculator.py, with a single class CalculatorApp
the classe mixes UI and businness logic in a single place

The issue with the current version of the code (pt. 1)

The system is what the user sees (i.e. the view)
The code is not modular: view, controller, and model are mixed together
- the view $\approx$ what is shown to the user
- the model $\approx$ what the application does or can do
- the controller $\approx$ the glue between the view and the model
  - dictating how changes in the view are reflected in the model

The issue with the current version of the code (pt. 2)

Requirements may change, e.g.: customers may ask for:
- novel operations (e.g. square root, power, etc.) to be supported
  - implies changes in the model, the view, and the controller
- a completely different view (e.g. a web interface)
  - implies rewriting the controller and the model, to just revrite the view
- a completely different model (e.g. a programmer calculator, supporting bin, oct, hex, dec)
  - implies rewriting the view and the controller, to just rewrite the model
It may be hard to change the model, without breaking the view or the controller
- the same apply for any other permutation of the three
It may be hard to test the model, without testing the view or the controller
- the same apply for any other permutation of the three
The application is, and will always only be, a desktop application
- unless complete rewrites are performed

Preliminary notions for Software Development

Python 101

(for Distributed Systems)

Module 2

A.Y. 2024/2025

Matteo Magnini (based on the material made by Giovanni Ciatto)

Motivation and goals

The command-line

First contact with the terminal

Why the terminal?

Developers are inherently lazy

To automate or not to automate?

Why the terminal in this course?

Stuff you need to know about the shell (pt. 1)

Stuff you need to know about the shell (pt. 2)

Stuff you need to know about the shell (pt. 3)

Terminal cheat sheet

Stuff you need to know about the shell (pt. 4)

How the hell can I memorise all these commands?

About interactive commands (pt. 1)

About interactive commands (pt. 2)

About interactive processes (pt. 1)

About interactive processes (pt. 2)

About interactive processes (pt. 3)

Example: the nano command

The Python command

Code and code organization

Running example (pt. 1)

Running example (pt. 2)

TO-DO list

Running example (pt. 3)

Running example (pt. 4)

Many hidden concepts in this example

Libraries (pt. 1)

Libraries (pt. 2)

Runtime (pt. 1)

Dependencies (pt. 1)

Dependencies (pt. 2)

Dependencies (pt. 3)

Package managers and package reporitories

Dependency declaration

Package managers into the wild

What about the actual code?

The issue with the current version of the code (pt. 1)

The issue with the current version of the code (pt. 2)

Preliminary notions for Software Development

Python 101

(for Distributed Systems)

Module 2

A.Y. 2024/2025

Giovanni Ciatto

Matteo Magnini

Table of Contents

Example: the `nano` command