Apache Airflow - install change and and basic setting
- bdata3
- Jan 4, 2022
- 1 min read
To install use docker-compose (how to it install on ec2 see the previous post):
where you will find:
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.2.3/docker-compose.yaml'
To add packages such as awswrangler or to add capabilities of airflow such as amazon support :
create Dockerfile (in the same place as the docker-compose.yaml)
FROM apache/airflow:2.2.3-python3.8
RUN pip install awswrangler && pip install apache-airflow-providers-amazon && pip install apache-airflow-providers-papermill && pip install ipykernel
Build your airflow docker :
docker build . -f Dockerfile --tag my-airflow:0.0.1
replace the reference in the docker-compose.yaml to your image:
image: ${AIRFLOW_IMAGE_NAME:-my-airflow:0.0.1}
build the airflow :
docker-compose up airflow-init
docker-compose up
To run your own python code out of the Dag you may need to add (besides to __init__.py file also the following lines:
import sys,os
sys.path.insert(0,os.path.abspath(os.path.dirname(__file__)))
#example of function to be used
from code.test_1 import hello
If you want to add list of tasks in loop (attention to the pool - to limit concurrency) :
for si in range(10):
l_sleep = [BashOperator(task_id=f"sleep_{si}",bash_command=f"/opt/airflow/dags/tst_sleep.py -t {si}",
pool='my_sleep_pool', task_concurrency=3)]
store>>process
for myt in l_sleep:
process>> myt
To limit the task_concurrency execution create a pool in the admin with limit of Slots (as :

used above).
you can see some examples of DAGs in:
Comments