Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery spuriously returns no rows during list_bundles #6923

Closed
dsotirho-ucsc opened this issue Feb 21, 2025 · 2 comments
Closed

BigQuery spuriously returns no rows during list_bundles #6923

dsotirho-ucsc opened this issue Feb 21, 2025 · 2 comments
Assignees
Labels
orange [process] Done by the Azul team

Comments

@dsotirho-ucsc
Copy link
Contributor

dsotirho-ucsc commented Feb 21, 2025

While gathering snapshot details for #6876, I found that the following project was indexed in dcp45 but not in dcp46.

https://explore.data.humancellatlas.org/projects/1ffa2223-28a6-4133-a5a4-badd00faf4bc?catalog=dcp45

Found in dcp45:

$ curl 'https://service.azul.data.humancellatlas.org/index/projects/1ffa2223-28a6-4133-a5a4-badd00faf4bc?catalog=dcp45'
{"protocols":[{"libraryConstructionApproach":["10x 3' v2","10x 3' v3","Visium 10x GE"],...

Not found in dcp46:

$ curl 'https://service.azul.data.humancellatlas.org/index/projects/1ffa2223-28a6-4133-a5a4-badd00faf4bc?catalog=dcp46'
{"Code":"NotFoundError","Message":"Can't find an entity in projects with an uuid, 1ffa2223-28a6-4133-a5a4-badd00faf4bc."}

The snapshot for this project was originally added in dcp33, and has not been removed / replaced since.

mksrc('bigquery', 'datarepo-b64e953d', 'hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33'),


The CloudWatch logs from the reindexing of dcp46 show that the snapshot returned zero bundles.

CloudWatch Logs Insights
region: us-east-1
log-group-names: /aws/lambda/azul-indexer-prod-contribute, /aws/lambda/azul-indexer-prod-contribute_retry, /aws/lambda/azul-indexer-prod-aggregate_retry, /aws/lambda/azul-indexer-prod-aggregate
start-time: 2025-02-13T16:37:34.313Z
end-time: 2025-02-14T16:27:27.028Z
query-string:

fields @timestamp, @message
| filter @requestId = 'b3b329c4-33d0-5d94-b72b-49ecf8332577'
| sort @timestamp asc
| limit 10000
@timestamp @message
2025-02-14 04:42:11.993 START RequestId: b3b329c4-33d0-5d94-b72b-49ecf8332577 Version: $LATEST
2025-02-14 04:42:11.994 [INFO] 2025-02-14T04:42:11.994Z b3b329c4-33d0-5d94-b72b-49ecf8332577 azul.indexer.index_controller Worker handling message {'action': 'reindex', 'catalog': 'dcp46', 'source': {'id': 'f264f514-ff73-4985-9c50-d7d23190e123', 'spec': 'tdr:bigquery:gcp:datarepo-b64e953d:hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33:/0'}, 'prefix': ''}, attempt #'1' (approx).
2025-02-14 04:42:11.994 [INFO] 2025-02-14T04:42:11.994Z b3b329c4-33d0-5d94-b72b-49ecf8332577 azul.azulclient Listing bundles with prefix '' in source TDRSourceRef(id='f264f514-ff73-4985-9c50-d7d23190e123', spec=TDRSourceSpec(prefix=Prefix(common='', partition=0), type=<Type.bigquery: 'bigquery'>, domain=<Domain.gcp: 'gcp'>, subdomain='datarepo-b64e953d', name='hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33')).
2025-02-14 04:42:11.995 [DEBUG] 2025-02-14T04:42:11.995Z b3b329c4-33d0-5d94-b72b-49ecf8332577 azul.terra Query (205 characters total): "\n SELECT links_id, version\n FROM datarepo-b64e953d.hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33.links\n WHERE STARTS_WITH(links_id, '')\n "
2025-02-14 04:43:03.613 [DEBUG] 2025-02-14T04:43:03.613Z b3b329c4-33d0-5d94-b72b-49ecf8332577 azul.terra Job info: {"job_id": "9bcd61bc-c9b4-4371-bf45-483dd875de56", "total_rows": 0, "stats": {"searchStatistics": {"indexUnusedReasons": [{"code": "NOT_SUPPORTED_IN_STANDARD_EDITION", "message": "Index can not be used for query with Standard edition reservation. See https://cloud.google.com/bigquery/docs/editions-intro for more information."}]}}, "query": "\n SELECT links_id, version\n FROM datarepo-b64e953d.hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33.links\n WHERE STARTS_WITH(links_id, '')\n "}
2025-02-14 04:43:03.613 [INFO] 2025-02-14T04:43:03.613Z b3b329c4-33d0-5d94-b72b-49ecf8332577 azul.azulclient There are 0 bundle(s) with prefix '' in source TDRSourceRef(id='f264f514-ff73-4985-9c50-d7d23190e123', spec=TDRSourceSpec(prefix=Prefix(common='', partition=0), type=<Type.bigquery: 'bigquery'>, domain=<Domain.gcp: 'gcp'>, subdomain='datarepo-b64e953d', name='hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33')).
2025-02-14 04:43:03.613 [INFO] 2025-02-14T04:43:03.613Z b3b329c4-33d0-5d94-b72b-49ecf8332577 azul.azulclient After filtering obsolete versions, 0 bundles remain in prefix '' of source 'tdr:bigquery:gcp:datarepo-b64e953d:hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33:/0' in catalog 'dcp46'
2025-02-14 04:43:03.613 [INFO] 2025-02-14T04:43:03.613Z b3b329c4-33d0-5d94-b72b-49ecf8332577 azul.azulclient Successfully queued 0 notification(s) for prefix of source TDRSourceRef(id='f264f514-ff73-4985-9c50-d7d23190e123', spec=TDRSourceSpec(prefix=Prefix(common='', partition=0), type=<Type.bigquery: 'bigquery'>, domain=<Domain.gcp: 'gcp'>, subdomain='datarepo-b64e953d', name='hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33'))
2025-02-14 04:43:03.613 [INFO] 2025-02-14T04:43:03.613Z b3b329c4-33d0-5d94-b72b-49ecf8332577 azul.indexer.index_controller Worker successfully handled message {'action': 'reindex', 'catalog': 'dcp46', 'source': {'id': 'f264f514-ff73-4985-9c50-d7d23190e123', 'spec': 'tdr:bigquery:gcp:datarepo-b64e953d:hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33:/0'}, 'prefix': ''} in 51.619s.
2025-02-14 04:43:03.614 END RequestId: b3b329c4-33d0-5d94-b72b-49ecf8332577
2025-02-14 04:43:03.614 REPORT RequestId: b3b329c4-33d0-5d94-b72b-49ecf8332577 Duration: 51621.34 ms Billed Duration: 51622 ms Memory Size: 256 MB Max Memory Used: 175 MB

Retrying the BigQuery query manually however returns 590 results...

Image


... and these bundles were found when a reindex of the source into dcp46 was performed.

CloudWatch Logs Insights
region: us-east-1
log-group-names: /aws/lambda/azul-indexer-prod-contribute, /aws/lambda/azul-indexer-prod-contribute_retry, /aws/lambda/azul-indexer-prod-aggregate_retry, /aws/lambda/azul-indexer-prod-aggregate
start-time: -10800s
end-time: 0s
query-string:

fields @timestamp, @message
| filter @requestId = '9091d880-c0d1-5184-89b9-b8b224c80944'
| sort @timestamp asc
| limit 10000
@timestamp @message
2025-02-21 20:28:21.441 START RequestId: 9091d880-c0d1-5184-89b9-b8b224c80944 Version: $LATEST
2025-02-21 20:28:21.441 [INFO] 2025-02-21T20:28:21.441Z 9091d880-c0d1-5184-89b9-b8b224c80944 azul.indexer.index_controller Worker handling message {'action': 'reindex', 'catalog': 'dcp46', 'source': {'id': 'f264f514-ff73-4985-9c50-d7d23190e123', 'spec': 'tdr:bigquery:gcp:datarepo-b64e953d:hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33:/0'}, 'prefix': ''}, attempt #'1' (approx).
2025-02-21 20:28:21.462 [INFO] 2025-02-21T20:28:21.461Z 9091d880-c0d1-5184-89b9-b8b224c80944 azul.azulclient Listing bundles with prefix '' in source TDRSourceRef(id='f264f514-ff73-4985-9c50-d7d23190e123', spec=TDRSourceSpec(prefix=Prefix(common='', partition=0), type=<Type.bigquery: 'bigquery'>, domain=<Domain.gcp: 'gcp'>, subdomain='datarepo-b64e953d', name='hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33')).
2025-02-21 20:28:21.583 [INFO] 2025-02-21T20:28:21.583Z 9091d880-c0d1-5184-89b9-b8b224c80944 azul.deployment Allocated new Boto3 client for 'secretsmanager' with ID 139813878296800
2025-02-21 20:28:22.342 [DEBUG] 2025-02-21T20:28:22.341Z 9091d880-c0d1-5184-89b9-b8b224c80944 azul.terra Query (205 characters total): "\n SELECT links_id, version\n FROM datarepo-b64e953d.hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33.links\n WHERE STARTS_WITH(links_id, '')\n "
2025-02-21 20:28:23.394 [DEBUG] 2025-02-21T20:28:23.394Z 9091d880-c0d1-5184-89b9-b8b224c80944 azul.terra Job info: {"job_id": "74cffb4e-d4c5-4432-a8c4-42476b0e62ce", "total_rows": 590, "stats": {"estimatedBytesProcessed": "343224", "timeline": [{"elapsedMs": "247", "totalSlotMs": "150", "pendingUnits": "0", "completedUnits": "1", "activeUnits": "0", "estimatedRunnableUnits": "0"}, {"elapsedMs": "259", "totalSlotMs": "150", "pendingUnits": "0", "completedUnits": "2", "estimatedRunnableUnits": "0"}], "totalPartitionsProcessed": "1", "totalBytesProcessed": "343224", "totalBytesBilled": "20971520", "billingTier": 1, "totalSlotMs": "150", "cacheHit": false, "searchStatistics": {"indexUsageMode": "UNUSED", "indexUnusedReasons": [{"code": "INDEX_CONFIG_NOT_AVAILABLE", "message": "There is no index configuration for the base table datarepo-3185ebf2:datarepo_hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2.datarepo_raw_links_fbb50a72_1c1a_4f3c_b4ee_6db573843203.", "baseTable": {"projectId": "datarepo-3185ebf2", "datasetId": "datarepo_hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2", "tableId": "datarepo_raw_links_fbb50a72_1c1a_4f3c_b4ee_6db573843203"}}, {"code": "INDEX_CONFIG_NOT_AVAILABLE", "message": "There is no index configuration for the base table datarepo-b64e953d:hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33.datarepo_row_ids.", "baseTable": {"projectId": "datarepo-b64e953d", "datasetId": "hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33", "tableId": "datarepo_row_ids"}}]}, "transferredBytes": "0", "metadataCacheStatistics": {"tableMetadataCacheUsage": [{"tableReference": {"projectId": "datarepo-3185ebf2", "datasetId": "datarepo_hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2", "tableId": "datarepo_raw_links_fbb50a72_1c1a_4f3c_b4ee_6db573843203"}, "unusedReason": "OTHER_REASON", "explanation": "Table does not have CMETA."}, {"tableReference": {"projectId": "datarepo-b64e953d", "datasetId": "hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33", "tableId": "datarepo_row_ids"}, "unusedReason": "OTHER_REASON", "explanation": "Table does not have CMETA."}]}}, "query": "\n SELECT links_id, version\n FROM datarepo-b64e953d.hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33.links\n WHERE STARTS_WITH(links_id, '')\n "}
2025-02-21 20:28:23.620 [INFO] 2025-02-21T20:28:23.604Z 9091d880-c0d1-5184-89b9-b8b224c80944 azul.azulclient There are 590 bundle(s) with prefix '' in source TDRSourceRef(id='f264f514-ff73-4985-9c50-d7d23190e123', spec=TDRSourceSpec(prefix=Prefix(common='', partition=0), type=<Type.bigquery: 'bigquery'>, domain=<Domain.gcp: 'gcp'>, subdomain='datarepo-b64e953d', name='hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33')).
2025-02-21 20:28:23.621 [INFO] 2025-02-21T20:28:23.621Z 9091d880-c0d1-5184-89b9-b8b224c80944 azul.azulclient After filtering obsolete versions, 590 bundles remain in prefix '' of source 'tdr:bigquery:gcp:datarepo-b64e953d:hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33:/0' in catalog 'dcp46'
2025-02-21 20:28:25.427 [INFO] 2025-02-21T20:28:25.427Z 9091d880-c0d1-5184-89b9-b8b224c80944 azul.azulclient Successfully queued 590 notification(s) for prefix of source TDRSourceRef(id='f264f514-ff73-4985-9c50-d7d23190e123', spec=TDRSourceSpec(prefix=Prefix(common='', partition=0), type=<Type.bigquery: 'bigquery'>, domain=<Domain.gcp: 'gcp'>, subdomain='datarepo-b64e953d', name='hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33'))
2025-02-21 20:28:25.428 [INFO] 2025-02-21T20:28:25.428Z 9091d880-c0d1-5184-89b9-b8b224c80944 azul.indexer.index_controller Worker successfully handled message {'action': 'reindex', 'catalog': 'dcp46', 'source': {'id': 'f264f514-ff73-4985-9c50-d7d23190e123', 'spec': 'tdr:bigquery:gcp:datarepo-b64e953d:hca_prod_1ffa222328a64133a5a4badd00faf4bc__20231101_dcp2_20231102_dcp33:/0'}, 'prefix': ''} in 3.986s.
2025-02-21 20:28:25.430 END RequestId: 9091d880-c0d1-5184-89b9-b8b224c80944
2025-02-21 20:28:25.430 REPORT RequestId: 9091d880-c0d1-5184-89b9-b8b224c80944 Duration: 3989.91 ms Billed Duration: 3990 ms Memory Size: 256 MB Max Memory Used: 151 MB Init Duration: 2763.78 ms
@dsotirho-ucsc dsotirho-ucsc added the orange [process] Done by the Azul team label Feb 21, 2025
@nadove-ucsc
Copy link
Contributor

Possibly related, possibly duplicate: #6667

@dsotirho-ucsc dsotirho-ucsc self-assigned this Feb 24, 2025
@dsotirho-ucsc
Copy link
Contributor Author

Closing as dup of #6667

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
orange [process] Done by the Azul team
Projects
None yet
Development

No branches or pull requests

2 participants