top of page
Search

Apache Airflow - install change and and basic setting

To install use docker-compose (how to it install on ec2 see the previous post):


where you will find:

curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.2.3/docker-compose.yaml'

To add packages such as awswrangler or to add capabilities of airflow such as amazon support :

  • create Dockerfile (in the same place as the docker-compose.yaml)

FROM apache/airflow:2.2.3-python3.8
RUN pip install  awswrangler && pip install apache-airflow-providers-amazon && pip install apache-airflow-providers-papermill && pip install ipykernel
  • Build your airflow docker :

docker build . -f Dockerfile --tag my-airflow:0.0.1
  • replace the reference in the docker-compose.yaml to your image:

 image: ${AIRFLOW_IMAGE_NAME:-my-airflow:0.0.1}
  • build the airflow :

docker-compose up airflow-init
docker-compose up


To run your own python code out of the Dag you may need to add (besides to __init__.py file also the following lines:

import sys,os
sys.path.insert(0,os.path.abspath(os.path.dirname(__file__)))
#example of function to be used
from  code.test_1 import hello

If you want to add list of tasks in loop (attention to the pool - to limit concurrency) :


for si in range(10):
        l_sleep = [BashOperator(task_id=f"sleep_{si}",bash_command=f"/opt/airflow/dags/tst_sleep.py -t {si}",
        pool='my_sleep_pool', task_concurrency=3)]
    
    store>>process
    for myt in l_sleep:
        process>> myt

To limit the task_concurrency execution create a pool in the admin with limit of Slots (as :


ree

used above).












you can see some examples of DAGs in:


 
 
 

Comments


Subscribe Form

©2019 by Big Data. Proudly created with Wix.com

bottom of page