A conversation was started after in the PyCon Africa slack group after the conference discussing solutions to problems common in our communities and projects we can collaborate on and the biggest problem that came to my mind was the state of Internet connectivity in Africa.
Internet access is poor in many parts of the continent and the costs are prohibitive in the countries that have good connections. This makes it difficult for developers on the continent to get access to all the resources they need when they need them.
This isn’t a problem I can solve and I’m sure it will take some time before Internet access is evenly distributed and made affordable for everyone.
One approach to overcoming this problem is to make Web content available for offline use. To this end, I thought of the web content that I and many other software developers consume on a regular basis. I wanted to find a way to make the content of websites such as Stack Overflow, Wikipedia and PyPI(Python Package Index) available offline.
I did research on this and found that it is possible. At the time of writing, I have successfully cloned Stack OverFlow and PyPI. In this post I will discuss how to clone the PyPI repository to a Raspberry Pi and serve up the content in order to allow connected devices to pip install
packages without an active Internet connection.
Goals
There are a number of ways to go about creating your own PyPI mirror and the way I did it might now work for everyone. My goals for this project project were:
- The hardware to do this must be affordable (<=US$100)
- There must be little or no setup required on the client computers.
I decided to use a Raspberry Pi 4 running on Raspian with a 200Gb SD Card for storage. I used minirepo
to clone PyPI, pypiserver
to serve up the packages and nginx
to create a reverse proxy.
There are four steps involved in creating a local PyPI server:
- Download and install Operating System and system utilities
- Configure Raspberry Pi to act as a WiFi hotspot, DHCP and DNS server
- Clone/Download PyPI packages
- Configure a webserver to deliver the downloaded packages to connected clients
I’ll explain the steps above in more detail below.
1. Download and install Operating System and system utilities
The Raspberry Pi is a small credit card sized computer that sells for at least US $35. For this project I used the Raspberry Pi 4 but other models should also work. Raspberry Pi models 3 and 4 have built in WiFi adapters which make the job of setting up the Pi as a WiFi hotspot or access point simpler than when using an external wireless adapter.
To get started, download and install Raspbian. Raspbian is a light-weight Debian based OS that is optimised for the Raspberry Pi. In order to work as an access point, the Raspberry Pi will need to have access point software installed, along with DHCP server software to provide connecting devices with a network address.
Next, download all the utilities and packages you will need before configuring the Raspberry Pi. I learned this the hard way after I messed up the network configurations on the Pi and ended up not being able to download anything afterwards.
You need the following software packages:
- dnsmasq — DNS and DHCP Server software
- hostapd — Access Point software
- minirepo — Used to clone PyPI for offline use
- pypiserver — Creates an index from cloned PyPI packages
- nginx — A web server
To install these packages, run these two commands:
$ sudo apt install dnsmasq hostapd nginx
$ pip install minirepo pypiserver
2. Configure Raspberry Pi for WiFi hotspot, DHCP and DNS
The goal in this step is to configure a stand alone network to act as a server so the Raspberry Pi needs to have a static IP Address assigned to the Wireless port. To configure the static IP, edit the dhcpcd
configuration file:
$ sudo nano /etc/dhcpcd.conf
add the following:
interface wlan0 static ip_address=192.168.4.1/24 nohook wpa_supplicant
Configure DHCP
A lot of the default settings in the dnsmasq
settings are not necessary. Create a new configuration file:
$ sudo mv /etc/dnsmasq.conf /etc/dnsmasq.conf.orig
$ sudo nano /etc/dnsmasq.conf
Add the following configuration:
interface=wlan0 listen-address=192.168.4.1 dhcp-range=192.168.4.2,192.168.4.30,255.255.255.0,24h address=/raspberrypi.local/192.168.4.1
This sets up DHCP for clients connecting through the wireless interface wlan0
.
The second line tells the DHCP server(dnsmasq) to listen to connections coming in from the static IP you setup in the previous step. The next line tells DHCP to provide IP addresses 192.168.4.2
to 192.168.4.30
with a lease time of 24 hours.
Create an Access Point
Next, configure the access point software(hostapd
):
$ sudo nano /etc/hostapd/hostapd.conf
Add the following:
# /etc/hostapd/hostapd.conf interface=wlan0 driver=nl80211 ssid=NameOfNetwork hw_mode=g channel=7 wmm_enabled=0 macaddr_acl=0 auth_algs=1 ignore_broadcast_ssid=0 wpa=2 wpa_key_mgmt=WPA-PSK wpa_pairwise=TKIP rsn_pairwise=CCMP wpa_passphrase=YourNetworkPassword
Add your own network name and network password where it saysssid
and wpa_passphrase
,respectively.
Tell the system where to find this file, open the hostapd
config file:
sudo nano /etc/default/hostapd
Find the line with #DAEMON_CONF
, and replace it with this:
DAEMON_CONF="/etc/hostapd/hostapd.conf"
Add routing and masquerade
Edit /etc/sysctl.conf
and uncomment the line that says:
net.ipv4.ip_forward=1
Add a masquerade for outbound traffic on eth0
:
sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
Save the new rule:
sudo sh -c "iptables-save > /etc/iptables.ipv4.nat"
Edit /etc/rc.local
and add the following above “exit 0
” to install the rules at boot:
iptables-restore < /etc/iptables.ipv4.nat
This is important if you decide to share an internet connection or setup a bridge on the Raspberry Pi later.
The Raspberry Pi should be ready to work as an access point. If you're connected to it directly, now would be a good time to enable SSH. Reboot the Raspberry Pi and test if everything works.
Using a different WiFi enabled device like a phone or laptop, scan for new wireless networks. If everything went smoothly, you should see the WiFi network you created above. Try connecting to it.
3.Clone PyPI
In this section, you will see how to clone PyPI and configure the following packages:
- minirepo
- pypiserver
- nginx
Minirepo
Minirepo is a commandline program that downloads packages from PyPI.org so you can use pip
without Internet. The easiest way to install it is to use pip:
$ pip install minirepo
The first time it’s executed, minirepo
will ask you for the local repository path(where it should save downloaded packages to), which defaults to ~/minirepo
in Linux. A JSON configuration file is created and saved as ~/.minirepo
, that you can edit to your preferences.
There are a number of alternatives out there for cloning PyPI, but I used minirepo
because it allows you to download a selective mirror, only downloading all sources for Python 3, for example. At the time of writing this post, the entire PyPI repository is somewhere in the neighbourhood of 1TB but, by using a selective download, I was able to get it down to 120GB or so. Here's the configuration I used for this project:
{ "processes": 10, "package_types": [ "bdist_egg", "bdist_wheel", "sdist" ], "extensions": [ "bz2", "egg", "gz", "tgz", "whl", "zip" ], "python_versions": [ "3.0", "3.1", "3.2", "3.3", "3.4.10", "3.5.7", "3.6.9", "3.7.2", "3.7.3", "3.7.4", "any", "cp27", "py2", "py2.py3", "py27", "source" ], "repository": "/home/pi/minirepo" }
The configuration above downloads sources for Python 3 and limits the package types to sdist
, bdist_wheel
and bdist_egg
packages. The downside of using this approach is that some packages that don't meet the filter criteria will not get downloaded.
Cloning PyPI takes a long time, so you'll want to leave it running in the background, while you watch a movie or ten depending on your Internet connection speed.
Pypiserver
At this point, you should have PyPI mirrored to your computer. My local PyPI mirror has 200000+ packages.
Before we get to the next step, it is important to take a step back to understand what pip
is and how it works.
Pip
is the most popular tool for installing Python packages, and the one included with modern versions of Python. It provides the essential core features for finding, downloading, and installing packages from PyPI and other Python package indexes, and can be incorporated into a wide range of development workflows via its command-line interface (CLI).
Pip
supports installing packages from:
- PyPI (and other indexes) using requirement specifiers.
- VCS project urls.
- Local project directories
- Local or remote source archives
Since you have cloned the PyPi packages to a local repository, pip
can install those packages directly from the local PyPI mirror you just downloaded. That is not the purpose of this article however. The goal here is to allow remote clients to connect to the Raspberry Pi and download packages over the network. This is where pypiserver
comes in.
pypiserver
, will serve up the local package index that will allow pip
to find packages in your repository over the network.
First, test to see if it works:
$ pypi-server -p 8080 ~/minirepo & # Will listen to all IPs.
Notice that when running it, the command to run it is pypi-server
and not pypyserver
.
Here, you're starting pypiserver
and running it on port 8080. It will find packages in the minirepo
folder. This process will keep running in the background until you either kill it or shutdown the Raspberry Pi. I will show you how to start it a boot later.
If you visit the static IP you set for the Raspberry Pi at port 8080
in your browser you should see a message similar to the one below:
You can install from the local packages repository now:
pip install --index-url http://localhost:8080/simple/
OR, from a client computer:
pip install --index-url http://192.168.4.1:8080/
If you have installed pypiserver
on a remote URL without HTTPS you will receive an “untrusted” warning from pip, urging you to append the --trusted-host
option:
pip --trusted-host 192.168.4.1 install --index-url http://192.168.4.1:8080/
An even shorter way:
pip --trusted-host 192.168.4.1 install -i http://192.168.4.1:8080/
Always specifying the local pypi URL and the trusted host flags on the commandline can be cumbersome.
If you want to always install packages from your own mirror, create this pip
config file in your home directory or in a virtual environment:
[global] trusted-host = 192.168.4.1 [install] index-url = http://192.168.4.1:8080
Home directory
- On Unix and macOS the home directory file is:
$HOME/.pip/pip.conf
- On Windows the file is:
%HOME%\pip\pip.ini
In a virtual environment:
- On Unix and macOS the file is
$VIRTUAL_ENV/pip.conf
- On Windows the file is:
%VIRTUAL_ENV%\pip.ini
I recommend placing this config file in a virtual environment.
4. Setup a Web server to deliver the packages.
By default, pypiserver
scans the entire packages directory each time an incoming HTTP request occurs. This can cause significant slow downs when serving a large number of packages like we are in this instance.
One way to serve the files up faster is to put pypiserver
behind a reverse proxy and enabling your web server's built in caching functionality. I'll use nginx
in this article but you're free to use any webserver you prefer.
Setup a new virtual host in nginx
.
Create a file /etc/nginx/sites-available/cheeseshop.com
. For the purposes of this article I'll refer to the new virtual host as cheeseshop.com.
Run $ sudo nano /etc/nginx/sites-available/cheeseshop.com
and add the following content:
proxy_cache_path /data/nginx/cache levels=1:2 keys_zone=pypiserver_cache:10m max_size=10g inactive=120m use_temp_path=off; upstream pypi { server 127.0.0.1:8080; } server { listen 80; server_name cheeseshop.com; autoindex on; location / { proxy_set_header Host $host:$server_port; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header X-Real-IP $remote_addr; proxy_cache pypiserver_cache; proxy_pass http://pypi; } }
The first part of the config instructs nginx
to create a 10GB cache that will remain active for 2 hours.
The upstream pypi
section is responsible for serving up content from the pypiserver running on port 8080. CheeseShop is the secret code name for the Python Package Index so that's why I named the server that. You can use any name or IP address you like.
The server
section specifies that port 80 will be used for incoming HTTP connections and that those requests get forwarded to the pypi server.
I don't own the cheeseshop.com domain but I can use it since we are creating a stand alone network without access to the internet. In order for client computers to be able to connect to cheeseshop.com, you'll want to tell the DNS server how to resolve it. More on that in a bit.
To enable this new virtual host, you want to create a symbolic link to the config file you just created in the /etc/nginx/sites-enabled/
folder:
$ sudo ln -s /etc/nginx/sites-available/cheeseshop.com /etc/nginx-sites-enabled/
Doing this will enable the new virtual host. Check that everything works by running
sudo nginx -t
. If everything checks out, great! Next you want to make a small DNS change to map the cheeseshop.com domain to an IP address.
Open /etc/hosts
and add an entry for the newly created cheeseshop.com domain:
192.168.4.1 cheeseshop.com
The hosts file contains domain to IP address mappings that help the computer serve you the right content. Dnsmasq will check this file whenever it starts up so it is a good idea to restart it:
sudo service dnsmasq restart
Restart nginx too for good measure:
sudo service nginx restart
Assuming everything went smoothly, you should be able to install Python packages from client computers using a hostname as opposed to using an IP now.
Using the server
To test this out, connect to the Raspberry Pi's WiFi network and create a new virtual environment on a client computer and run the following command inside the virtual environment:
pip --trusted-host cheeseshop.com install -i http://cheeseshop.com django
Running that command produces the following output:
pip --trusted-host cheeseshop.com install -i http://cheeseshop.com django Looking in indexes: http://cheeseshop.com Collecting django Downloading http://cheeseshop.com:80/packages/Django-3.0.1.tar.gz (9.0 MB) |████████████████████████████████| 9.0 MB 1.1 MB/s Collecting pytz Downloading http://cheeseshop.com:80/packages/pytz-2019.3-py2.py3-none-any.whl (509 kB) |████████████████████████████████| 509 kB 1.3 MB/s Collecting sqlparse>=0.2.2 Downloading http://cheeseshop.com:80/packages/sqlparse-0.3.0-py2.py3-none-any.whl (39 kB) Collecting asgiref~=3.2 Downloading http://cheeseshop.com:80/packages/asgiref-3.2.3-py2.py3-none-any.whl (18 kB) Building wheels for collected packages: django Building wheel for django (setup.py) ... done Created wheel for django: filename=Django-3.0.1-py3-none-any.whl size=7428296 sha256=b31336b1249afbdbb2374912f6983179f4715127d7e6b842a8455a94a1518ce5 Stored in directory: /home/terra/.cache/pip/wheels/6f/55/5c/aca7917f1899fbb7430677d9d6ef7c6be748c412dec3e63c04 Successfully built django Installing collected packages: pytz, sqlparse, asgiref, django Successfully installed asgiref-3.2.3 django-3.0.1 pytz-2019.3 sqlparse-0.3.0
Starting pypiserver at boot(Optional)
To ensure that pypiserver software starts up automatically at boot, create a new Linux service and use systemd
to manage it.
1. Create a start up script that the service will manage, call it start-pypi-server.sh
. Add the following content to it:
#! /bin/bash /home/pi/.local/bin/pypi-server -p 8080 /home/pi/minirepo/ &
2. Copy the script to /usr/bin
and make it executable:
sudo cp start-pypi-server.sh /usr/bin/start-pypi-server.sh
sudo chmod +x /usr/bin/start-pypi-server.sh
3. Create a unit file to define a systemd service. Name it pypiserver.service:
GNU nano 3.2 /lib/systemd/system/pypiserver.service [Unit] Description=A minimal PyPI server for use with pip/easy_install. [Service] Type=forking ExecStart=/bin/bash /usr/bin/start-pypi-server.sh User=pi [Install] WantedBy=multi-user.target
This defines a basic service. The ExecStart
directive specifies the command that will be run to start the service.
4. Copy the unit file to /etc/systemd/system and give it permissions:
sudo cp pypiserver.service /etc/systemd/system/pypiserver.service
sudo chmod 644 /etc/systemd/system/pypiserver.service
Start and Enable the Service
Once you have created a unit file, you can test the service:
sudo systemctl start pypiserver
2. Check the status of the pypiserver service:
sudo systemctl status pypiserver
This will produce output similar to this:
$ sudo systemctl status pypiserver ● pypiserver.service - A minimal PyPI server for use with pip/easy_install. Loaded: loaded (/etc/systemd/system/pypiserver.service; enabled; vendor prese Active: active (running) since Fri 2020-02-07 19:17:05 CAT; 2h 19min ago Process: 420 ExecStart=/bin/bash /usr/bin/start-pypi-server.sh (code=exited, s Main PID: 441 (pypi-server) Tasks: 4 (limit: 4915) Memory: 408.0M CGroup: /system.slice/pypiserver.service └─441 /usr/bin/python /home/pi/.local/bin/pypi-server -p 8080 /home/p Feb 07 19:17:05 raspberrypi systemd[1]: Starting A minimal PyPI server for use w Feb 07 19:17:05 raspberrypi systemd[1]: Started A minimal PyPI server for use wi
3. To stop or restart the service:
sudo systemctl stop pypiserver
sudo systemctl restart pypiserver
4. Finally, use the enable
command to ensure that the service starts whenever the system boots:
sudo systemctl enable pypiserver
Conclusion
You have seen how to create your own local PyPI clone on a Raspberry Pi. You learned how to
- Setup a Raspberry Pi as an access point
- Setup the Raspberry Pi as a DHCP and DNS server
- Clone PyPi
- Use a web server to serve up the cloned packages.
I did this as a proof of concept to show that it is possible to run something like PyPI offline. I am sure there are a better or more efficient ways I could have done this. Please leave a comment below with any suggestions or criticism. Thanks for reading.
References:
- https://www.linode.com/docs/applications/project-management/how-to-create-a-private-python-package-repository/
- https://www.linode.com/docs/quick-answers/linux/start-service-at-boot/
- https://pypi.org/project/pypiserver/
- https://www.raspberrypi.org/documentation/configuration/wireless/access-point.md
- https://www.digitalocean.com/community/tutorials/how-to-set-up-nginx-server-blocks-virtual-hosts-on-ubuntu-16-04
- https://serverfault.com/questions/136332/setting-up-dnsmasq-for-a-local-network