From bbcc9e330cf00ccd0c4d0e96af1b6a61c2603209 Mon Sep 17 00:00:00 2001 From: Pankaj Koti Date: Wed, 19 Feb 2025 16:15:07 +0530 Subject: [PATCH] Document async_py_requirements added in ExecutionConfig for ExecutionMode.AIRFLOW_ASYNC (#1545) related: #1533 related: #1544 --- docs/configuration/execution-config.rst | 1 + docs/getting_started/execution-modes.rst | 9 +++++++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/docs/configuration/execution-config.rst b/docs/configuration/execution-config.rst index 249a5bbc3..dce093367 100644 --- a/docs/configuration/execution-config.rst +++ b/docs/configuration/execution-config.rst @@ -12,3 +12,4 @@ The ``ExecutionConfig`` class takes the following arguments: - ``dbt_executable_path``: The path to the dbt executable for dag generation. Defaults to dbt if available on the path. - ``dbt_project_path``: Configures the dbt project location accessible at runtime for dag execution. This is the project path in a docker container for ``ExecutionMode.DOCKER`` or ``ExecutionMode.KUBERNETES``. Mutually exclusive with ``ProjectConfig.dbt_project_path``. - ``virtualenv_dir`` (new in v1.6): Directory path to locate the (cached) virtual env that should be used for execution when execution mode is set to ``ExecutionMode.VIRTUALENV``. +- ``async_py_requirements`` (new in v1.9): A list of Python packages to install when ``ExecutionMode.AIRFLOW_ASYNC`` (Experimental) is used. This parameter is required only when ``enable_setup_async_task`` and ``enable_teardown_async_task`` are set to ``True``. Example value: ``["dbt-postgres==1.9.0"]``. diff --git a/docs/getting_started/execution-modes.rst b/docs/getting_started/execution-modes.rst index 835f7b7a7..0910dcb34 100644 --- a/docs/getting_started/execution-modes.rst +++ b/docs/getting_started/execution-modes.rst @@ -304,7 +304,11 @@ You may observe that the compile task takes a bit longer to run due to the laten remotely (e.g. for the classic ``jaffle_shop`` dbt project, upon compiling it produces about 31 files measuring about 124KB in total, but on a local machine it took approximately 25 seconds for the task to compile & upload the compiled SQLs to the remote path)., however, it is still a win as it is one-time overhead and the subsequent tasks run asynchronously utilising the Airflow's -deferrable operators and supplying to them those compiled SQLs. +deferrable operators and supplying to them those compiled SQLs. With this setup task, model tasks no longer require dbt +to be available or installed, eliminating the need to install dbt adapters in the same environment as the Airflow +installation. However, the virtual environment created during execution of the ``SetupAsyncOperator`` must install +the necessary dbt adapter for the setup task to function correctly. This can be achieved by specifying the required +dbt adapter in the ``async_py_requirements`` parameter within the ``ExecutionConfig`` of your ``DbtDag`` or ``DbtTaskGroup``. Note that currently, the ``airflow_async`` execution mode has the following limitations and is released as **Experimental**: @@ -312,7 +316,8 @@ Note that currently, the ``airflow_async`` execution mode has the following limi 2. **Limited to dbt models**: Only dbt resource type models are run asynchronously using Airflow deferrable operators. Other resource types are executed synchronously, similar to the local execution mode. 3. **BigQuery support only**: This mode only supports BigQuery as the target database. If a different target is specified, Cosmos will throw an error indicating the target database is unsupported in this mode. 4. **ProfileMapping parameter required**: You need to specify the ``ProfileMapping`` parameter in the ``ProfileConfig`` for your DAG. Refer to the example DAG below for details on setting this parameter. -6. **location parameter required**: You must specify the location of the BigQuery dataset in the ``operator_args`` of the ``DbtDag`` or ``DbtTaskGroup``. The example DAG below provides guidance on this. +5. **location parameter required**: You must specify the location of the BigQuery dataset in the ``operator_args`` of the ``DbtDag`` or ``DbtTaskGroup``. The example DAG below provides guidance on this. +6. **async_py_requirements parameter required**: If you're using the default approach of having a setup task, you must specify the necessary dbt adapter Python requirements based on your profile type for the async execution mode in the ``ExecutionConfig`` of your ``DbtDag`` or ``DbtTaskGroup``. The example DAG below provides guidance on this. To start leveraging async execution mode that is currently supported for the BigQuery profile type targets you need to install Cosmos with the below additional dependencies: