30 May How to Install Airflow with Docker on Ubuntu
In this tutorial, we will demonstrate how we can install Airflow using Docker. Docker is an open platform for developing and running different applications. Docker enables us to separate our applications from infrastructure so we can deliver our software faster. We can also manage our infrastructures the same way we manage our applications.
Now, let’s look at Airflow. Apache airflow is one of the brilliant tools utilized by many companies in defining & scheduling their complex data pipelines. We can programmatically schedule and monitor the workflow for our different jobs. This tool is widely used by data scientists, data engineers, software engineers, and many more.
We’ll use a step by step process to show how to carry out this installation.
Step1: Install Docker Engine
The first stage of our installation is installing docker itself on our machine. We will check our computer using this command docker --version
if we have docker installed. So if not, we will walk you through how to install it.
Based on what we have on Docker’s official website, these are the following steps for installing docker;
Install using the repository
1 sudo apt-get update 2 3 sudo apt-get install \ 4 ca-certificates \ 5 curl \ 6 gnupg \ 7 lsb-release
Add Docker’s official GPG key:
1 curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/ share/keyrings/docker-archive-keyring.gpg
Set up a stable repository. You can add the nightly or test repository by using the word
nightly
ortest
in the command below;
1echo \ 2 "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/ keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \ 3 $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
Now, we can install the latest version of Docker Engine, containerd, and Docker Compose by using the command below.
1sudo apt-get update 2 sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin
We can verify if Docker has been installed on our machine by running thehello-world
image below.
1$ sudo docker run hello-world
So, we’ve installed docker on our machine. Let’s proceed to the next step. We must also note that we need docker-compose before proceeding to Airflow. Nice, it has been installed along with the Engine. Let’s move to the next step.
Step2: Working with Vs Code
Let’s open our Visual studio code or any other IDE we have and create a new folder for our project. We can name the folder airflow-docker
. Now we need to download a docker-compose file that describes all the services required by Airflow. The Docker Compose file has already been made for us by the Airflow community.
We can run this code below to download the file into our working directory.
1 curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.3.0/docker-compose.yaml'
Now we should have the docker-compose YAML in our working directory. Let’s create more new folders for dags, plugins and logs. The Dag folder is where our python file will reside. You should have this as seen below.
Whenever you open the docker-compose file, you should have something similar to this;
Under the services, you can see the Postgres user and password named as airflow. We’ll use this as our username and password when logging into the webserver on the web browser.
Step3: Export Environment Variables
Here, we also need to export our environment variables to ensure the users and group permissions are the same as folders from host and folders in our container. Run the command below on our terminal in VScode;
1echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env
After you run the command, you should have a .env file in your directory.
Step4: Initialize Airflow Instance
Now that we are done with all our settings, we can initialize our airflow instance using the command below. This will create both user and password airflow based on the settings in the docker compose file.
1 docker-compose up airflow-init
We should have something like this as shown below after we performed our initialization;
The next thing is to run all the services we specified in the docker compose file(the redis, scheduler, worker, webserver, e.t.c) so that our containers will come up and start running.
We will use the command below;
1 docker-compose up
If you have any error relating to permission issue, ensure before each command you add Sudo at the back. This is because you haven’t added docker among groups.
Now, we can check the browser to see our Airflow Instance currently running by using the localhost:8080 command to view it in the web browser.
We should have a page like this whenever we type the command in our browser.
No Comments