Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distance metric small fixes: uninitialized value in IVF_PQ and pass setting during consolidate_updates() #508

Merged
merged 6 commits into from
Sep 3, 2024

Conversation

jparismorgan
Copy link
Contributor

@jparismorgan jparismorgan commented Sep 2, 2024

What

Makes a few distance metric fixes:

  1. Initializes distance_metric_ in IVF_PQ so that we don't get undefined behavior if you don't set it.
  2. Passes the distance_metric to the new index when we consolidate_updates(). When we do this, we start crashing with:
[create_array] size 0 dimensions 4 vector_type float32 array_name normalized_vectors
FE

========================================= ERRORS ==========================================
________________________ ERROR at teardown of test_ivf_flat_index _________________________

capfd = <_pytest.capture.CaptureFixture object at 0x14fef0820>

    @pytest.fixture(scope="function", autouse=True)
    def no_output(capfd):
        # Wait for the test to finish.
        yield
    
        # Flush stdout.
        libc = ctypes.CDLL(None)
        libc.fflush(None)
    
        # Fail if there is any output.
        out, err = capfd.readouterr()
        if out or err:
>           pytest.fail(
                f"Test failed because output was captured. out:\n{out}\nerr:\n{err}"
            )
E           Failed: Test failed because output was captured. out:
E           [create_array] size 0 dimensions 4 vector_type float32 array_name normalized_vectors
E           
E           err:

apis/python/test/conftest.py:24: Failed
======================================== FAILURES =========================================
___________________________________ test_ivf_flat_index ___________________________________

capfd = <_pytest.capture.CaptureFixture object at 0x14fef0820>
tmp_path = PosixPath('/private/var/folders/jb/5gq49wh97wn0j7hj6zfn9pzh0000gn/T/pytest-of-parismorgan/pytest-50/test_ivf_flat_index0')

    def test_ivf_flat_index(capfd, tmp_path):
        partitions = 10
        uri = os.path.join(tmp_path, "array")
        vector_type = np.dtype(np.float32)
        index = ivf_flat_index.create(
            uri=uri,
            dimensions=4,
            vector_type=vector_type,
            partitions=partitions,
            distance_metric=vspy.DistanceMetric.COSINE,
        )
    
        ingestion_timestamps, base_sizes = load_metadata(uri)
        assert base_sizes == [0]
        assert ingestion_timestamps == [0]
    
        query_and_check(
            index,
            np.array([[2, 2, 2, 2]], dtype=np.float32),
            3,
            {MAX_UINT64},
            nprobe=partitions,
        )
    
        update_vectors = np.empty([5], dtype=object)
        update_vectors[0] = np.array([1, 0, 0, 0], dtype=vector_type)
        update_vectors[1] = np.array([1, 1, 0, 0], dtype=vector_type)
        update_vectors[2] = np.array([32, 41, 30, 0], dtype=vector_type)
        update_vectors[3] = np.array([1, 5, 3, 0], dtype=vector_type)
        update_vectors[4] = np.array([4, 4, 4, 0], dtype=vector_type)
        index.update_batch(vectors=update_vectors, external_ids=np.array([0, 1, 2, 3, 4]))
    
        expected_distances = np.array([0.500000, 0.292893, 0.142262, 0.239361, 0.133975])
    
        query_and_check(
            index,
            np.array([[2, 2, 2, 2]], dtype=np.float32),
            3,
            {2, 3, 4},
            expected_distances=expected_distances,
            nprobe=partitions,
        )
    
>       index = index.consolidate_updates()

apis/python/test/test_distance_metrics.py:172: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/opt/homebrew/anaconda3/envs/TileDB-Vector-Search-2/lib/python3.9/site-packages/tiledb/vector_search/index.py:535: in consolidate_updates
    new_index = ingest(
/opt/homebrew/anaconda3/envs/TileDB-Vector-Search-2/lib/python3.9/site-packages/tiledb/vector_search/ingestion.py:3086: in ingest
    d = create_ingestion_dag(
/opt/homebrew/anaconda3/envs/TileDB-Vector-Search-2/lib/python3.9/site-packages/tiledb/vector_search/ingestion.py:2406: in create_ingestion_dag
    normalized_uri = create_array(
/opt/homebrew/anaconda3/envs/TileDB-Vector-Search-2/lib/python3.9/site-packages/tiledb/vector_search/ingestion.py:503: in create_array
    input_vectors_array_cols_dim = tiledb.Dim(
/opt/homebrew/anaconda3/envs/TileDB-Vector-Search-2/lib/python3.9/site-packages/tiledb/dimension.py:79: in __init__
    super().__init__(ctx, name, tiledb_type, domain, tile)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <[AttributeError("'Dim' object has no attribute '_ctx'") raised in repr()] Dim object at 0x14ff3c3b0>
ctx = tiledb.Ctx() [see Ctx.config() for configuration], _pass_ctx_to_super = True
args = ('cols', <DataType.INT32: 0>, array([ 0, -1], dtype=int32), array(0, dtype=int32))

    def __init__(self, ctx, *args, _pass_ctx_to_super=True):
        if not ctx:
            ctx = default_ctx()
    
        if _pass_ctx_to_super:
>           super().__init__(ctx, *args)
E           tiledb.cc.TileDBError: [TileDB::Dimension] Error: Domain check failed; Upper domain bound should not be smaller than the lower one

To fix this, we don't create the normalized nodes taskgraph if the size is 0. In this case we just use the original input URI.

Testing

  • Tests pass.

@jparismorgan jparismorgan marked this pull request as ready for review September 2, 2024 21:00
@jparismorgan jparismorgan merged commit ff7d04e into main Sep 3, 2024
6 checks passed
@jparismorgan jparismorgan deleted the jparismorgan/distance-metric-fixes branch September 3, 2024 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants