Regression tests are either run in Docker, using docker-compose to orchestrate the tests, or locally.
It is recommended to clean the regtests/output
directory before running tests. This can be done by
running:
rm -rf ./regtests/output && mkdir -p ./regtests/output && chmod -R 777 ./regtests/output
Tests can be run with docker-compose using the provided ./regtests/docker-compose.yml
file, as
follows:
./gradlew :polaris-quarkus-server:assemble -Dquarkus.container-image.build=true
docker compose -f ./regtests/docker-compose.yml up --build --exit-code-from regtest
In this setup, a Polaris container will be started in a docker-compose group, using the image previously built by the Gradle build. Then another container, including a Spark SQL shell, will run the tests. The exit code will be the same as the exit code of the Spark container.
This is the flow used in CI and should be done locally before pushing to GitHub to ensure that no environmental factors contribute to the outcome of the tests.
Important: if you are also using minikube, for example to test the Helm chart, you may need to unset the Docker environment that was pointing to the Minikube Docker daemon, otherwise the image will be built by the Minikube Docker daemon and will not be available to the local Docker daemon. This can be done by running, before building the image and running the tests:
eval $(minikube -p minikube docker-env --unset)
Regression tests can be run locally as well, using the test harness.
In this setup, a Polaris server must be running on localhost:8181 before running tests. The simplest way to do this is to run the Polaris server in a separate terminal window:
./gradlew run
Note: the regression tests expect Polaris to run with certain options, e.g. with support for FILE
storage, default realm POLARIS
and root credentials root:secret
; if you run the above command,
this will be the case. If you run Polaris in a different way, make sure that Polaris is configured
appropriately.
Running the test harness will automatically run the idempotent setup script. From the root of the project, just run:
env POLARIS_HOST=localhost ./regtests/run.sh
To run the tests in verbose mode, with test stdout printing to console, set the VERBOSE
environment variable to 1
; you can also choose to run only a subset of tests by specifying the
test directories as arguments to run.sh
. For example, to run only the t_spark_sql
tests in
verbose mode:
env VERBOSE=1 POLARIS_HOST=localhost ./regtests/run.sh t_spark_sql/src/spark_sql_basic.sh
Several tests require access to cloud resources, such as S3 or GCS. To run these tests, you must export the appropriate environment variables prior to running the tests. Each cloud can be enabled independently. Create a .env file that contains the following variables:
# AWS variables
AWS_TEST_ENABLED=true
AWS_ACCESS_KEY_ID=<your_access_key>
AWS_SECRET_ACCESS_KEY=<your_secret_key>
AWS_STORAGE_BUCKET=<your_s3_bucket>
AWS_ROLE_ARN=<iam_role_with_access_to_bucket>
AWS_TEST_BASE=s3://<your_s3_bucket>/<any_path>
# GCP variables
GCS_TEST_ENABLED=true
GCS_TEST_BASE=gs://<your_gcs_bucket>
GOOGLE_APPLICATION_CREDENTIALS=/tmp/credentials/<your_credentials.json>
# Azure variables
AZURE_TEST_ENABLED=true
AZURE_TENANT_ID=<your_tenant_id>
AZURE_DFS_TEST_BASE=abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<any_path>
AZURE_BLOB_TEST_BASE=abfss://<container-name>@<storage-account-name>.blob.core.windows.net/<any_path>
GOOGLE_APPLICATION_CREDENTIALS
must be mounted to the container volumes. Copy your credentials file
into the credentials
folder. Then specify the name of the file in your .env file - do not change the
path, as /tmp/credentials
is the folder on the container where the credentials file will be mounted.
If a test fails due to incorrect expected output, the test harness will generate a script to help
you compare the actual output with the expected output. The script will be located in the output
directory, and will have the same name as the test, with the extension .fixdiffs.sh
.
For example, if the test t_hello_world
fails, the script to compare the actual and expected output
will be located at output/t_hello_world/hello_world.sh.fixdiffs.sh
:
Tue Apr 23 06:32:23 UTC 2024: Running all tests
Tue Apr 23 06:32:23 UTC 2024: Starting test t_hello_world:hello_world.sh
Tue Apr 23 06:32:23 UTC 2024: Test run concluded for t_hello_world:hello_world.sh
Tue Apr 23 06:32:23 UTC 2024: Test FAILED: t_hello_world:hello_world.sh
Tue Apr 23 06:32:23 UTC 2024: To compare and fix diffs: /tmp/polaris-regtests/t_hello_world/hello_world.sh.fixdiffs.sh
Tue Apr 23 06:32:23 UTC 2024: Starting test t_spark_sql:spark_sql_basic.sh
Tue Apr 23 06:32:32 UTC 2024: Test run concluded for t_spark_sql:spark_sql_basic.sh
Tue Apr 23 06:32:32 UTC 2024: Test SUCCEEDED: t_spark_sql:spark_sql_basic.sh
Simply execute the specified fixdiffs.sh
file, which will in turn run meld
and fix the ref file:
/tmp/polaris-regtests/t_hello_world/hello_world.sh.fixdiffs.sh
Then commit the changes to the ref file.
With a Polaris server running, you can run a spark-sql interactive shell to test. From the root of the project:
env POLARIS_HOST=localhost ./regtests/run_spark_sql.sh
Some SQL commands that you can try:
create database db1;
show databases;
create table db1.table1 (id int, name string);
insert into db1.table1 values (1, 'a');
select * from db1.table1;
Other commands are available in the regtests/t_spark_sql/src
directory.
Python tests are based on pytest
. They rely on a python Polaris client, which is generated from the openapi spec.
The client can be generated using two commands:
# generate the management api client
docker run --rm \
-v ${PWD}:/local openapitools/openapi-generator-cli generate \
-i /local/spec/polaris-management-service.yml \
-g python \
-o /local/regtests/client/python --additional-properties=packageName=polaris.management --additional-properties=apiNamePrefix=polaris
# generate the iceberg rest client
docker run --rm \
-v ${PWD}:/local openapitools/openapi-generator-cli generate \
-i /local/spec/rest-catalog-open-api.yaml \
-g python \
-o /local/regtests/client/python --additional-properties=packageName=polaris.catalog --additional-properties=apiNameSuffix="" --additional-properties=apiNamePrefix=Iceberg
Tests rely on Python 3.8 or higher. pyenv
can be used to install a current version and mapped to the local directory
by using
pyenv install 3.8
pyenv local 3.8
Once you've done that, you can run setup.sh
to generate a python virtual environment (installed at ~/polaris-venv
)
and download all of the test dependencies into it. From here, run.sh
will be able to execute any pytest present.
To debug, setup IntelliJ to point at your virtual environment to find your test dependencies (see https://www.jetbrains.com/help/idea/configuring-python-sdk.html). Then run the test in your IDE.
The above is handled automatically when running reg tests from the docker image.