Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write metadata cache data to mappings _meta with refresh time update #805

Merged
merged 25 commits into from
Oct 29, 2024

Conversation

seankao-az
Copy link
Collaborator

@seankao-az seankao-az commented Oct 24, 2024

Description

Metadata Cache Writer

For the most part, same as

In addition to the regular metadata storage using FlintIndexMetadataService, we're dual-writing additional fields, defined by FlintMetadataCache, to the index mappings _meta field. It's intended for frontend users to access some crucial metadata for an index quickly without invoking another backend API call.

This PR adds such fields for all indexes, if the spark config spark.flint.metadataCacheWrite.enabled is set to true.

  • _meta.properties.metadataCacheVersion: "1.0"
  • _meta.properties.refreshInterval: Integer. Refresh interval of an index measured in seconds. This field is added only if index refresh type is auto refresh and refresh_interval is set
  • _meta.properties.sourceTables: Array of Strings. For now, it's mocked data. Update coming in later PR.
  • _meta.properties.lastRefreshTime: Long. Timestamp in milliseconds when last refresh happened. This field is added only if index already gets refreshed at least once

Last Refresh Time

Added two new fields in FlintMetadataLogEntry and bumped version of its json doc from 1.0 to 1.1 (because adding new field but not changing existing fields)

  • lastRefreshStartTime: Long. Timestamp when last refresh started
  • lastRefreshCompleteTime: Long. Timestamp when last refresh completed

These are accurate only for manual refresh (full, incremental) and external scheduler for auto refresh.
For internal scheduler, the jobStartTime (or createTime in FlintMetadataLogEntry) is used to track streaming job start time.

I'm not reusing createTime because they should be updated at different times.
For createTime (for internal scheduler) it's during refreshIndex, recoverIndex, updateIndexManualToAuto
But for lastRefreshStartTime and lastRefreshCompleteTime (for manual refresh and external scheduler) it's only updated in refreshIndex

End-to-End Test

Tests performed in my test cluster

index (full refresh mode) created without `spark.flint.metadataCacheWrite.enabled`

Checking the _meta.properties field: No fields for metadata cache is added

GET flint_myglue_test_default_http_logs_cv_full_no_cache_index/_mappings

{
  "flint_myglue_test_default_http_logs_cv_full_no_cache_index": {
    "mappings": {
      "_meta": {
        "latestId": "ZmxpbnRfbXlnbHVlX3Rlc3RfZGVmYXVsdF9odHRwX2xvZ3NfY3ZfZnVsbF9ub19jYWNoZV9pbmRleA==",
        "kind": "covering",
        "indexedColumns": [
          {
            "columnType": "timestamp",
            "columnName": "@timestamp"
          },
          {
            "columnType": "string",
            "columnName": "request"
          },
          {
            "columnType": "int",
            "columnName": "size"
          },
          {
            "columnType": "string",
            "columnName": "clientip"
          },
          {
            "columnType": "int",
            "columnName": "status"
          }
        ],
        "name": "cv_full_no_cache",
        "options": {
          "auto_refresh": "false",
          "incremental_refresh": "false"
        },
        "source": "myglue_test.default.http_logs",
        "version": "0.6.0",
        "properties": {
          "filterCondition": "status = 403",
          "env": {
            "SERVERLESS_EMR_VIRTUAL_CLUSTER_ID": "****",
            "SERVERLESS_EMR_JOB_ID": "****"
          }
        }
      },
      "properties": {
        "@timestamp": {
          "type": "date",
          "format": "strict_date_optional_time_nanos"
        },
        "clientip": {
          "type": "keyword"
        },
        "request": {
          "type": "keyword"
        },
        "size": {
          "type": "integer"
        },
        "status": {
          "type": "integer"
        }
      }
    }
  }
}
index (full refresh mode) created with `spark.flint.metadataCacheWrite.enabled` set to true

Checking the _meta.properties field:

  • metadataCacheVersion is set to 1.0
  • sourceTables is set to mocked array data
GET flint_myglue_test_default_http_logs_cv_full_with_cache_index/_mappings

{
  "flint_myglue_test_default_http_logs_cv_full_with_cache_index": {
    "mappings": {
      "_meta": {
        "latestId": "ZmxpbnRfbXlnbHVlX3Rlc3RfZGVmYXVsdF9odHRwX2xvZ3NfY3ZfZnVsbF93aXRoX2NhY2hlX2luZGV4",
        "kind": "covering",
        "indexedColumns": [
          {
            "columnType": "timestamp",
            "columnName": "@timestamp"
          },
          {
            "columnType": "string",
            "columnName": "request"
          },
          {
            "columnType": "int",
            "columnName": "size"
          },
          {
            "columnType": "string",
            "columnName": "clientip"
          },
          {
            "columnType": "int",
            "columnName": "status"
          }
        ],
        "name": "cv_full_with_cache",
        "options": {
          "auto_refresh": "false",
          "incremental_refresh": "false"
        },
        "source": "myglue_test.default.http_logs",
        "version": "0.6.0",
        "properties": {
          "sourceTables": [
            "dataSourceName.default.logGroups(logGroupIdentifier:['arn:aws:logs:us-east-1:123456:test-llt-xa', 'arn:aws:logs:us-east-1:123456:sample-lg-1'])"
          ],
          "filterCondition": "status = 403",
          "metadataCacheVersion": "1.0",
          "env": {
            "SERVERLESS_EMR_VIRTUAL_CLUSTER_ID": "****",
            "SERVERLESS_EMR_JOB_ID": "****"
          }
        }
      },
      "properties": {
        "@timestamp": {
          "type": "date",
          "format": "strict_date_optional_time_nanos"
        },
        "clientip": {
          "type": "keyword"
        },
        "request": {
          "type": "keyword"
        },
        "size": {
          "type": "integer"
        },
        "status": {
          "type": "integer"
        }
      }
    }
  }
}
Triggering a full refresh

Check that lastRefreshStartTime is updated when index (full refresh mode) enters refreshing

GET .query_execution_request_myglue_test/_search

      {
        "_index": ".query_execution_request_myglue_test",
        "_id": "ZmxpbnRfbXlnbHVlX3Rlc3RfZGVmYXVsdF9odHRwX2xvZ3NfY3ZfZnVsbF93aXRoX2NhY2hlX2luZGV4",
        "_score": 1.0,
        "_source": {
          "version": "1.1",
          "latestId": "ZmxpbnRfbXlnbHVlX3Rlc3RfZGVmYXVsdF9odHRwX2xvZ3NfY3ZfZnVsbF93aXRoX2NhY2hlX2luZGV4",
          "type": "flintindexstate",
          "state": "refreshing",
          "applicationId": "****",
          "jobId": "****",
          "dataSourceName": "myglue_test",
          "jobStartTime": 1730241864568,
          "lastRefreshStartTime": 1730241864568,
          "lastRefreshCompleteTime": 0,
          "lastUpdateTime": 1730241864691,
          "error": ""
        }
      }

And lastRefreshCompleteTime updated when refresh is done

GET .query_execution_request_myglue_test/_search

      {
        "_index": ".query_execution_request_myglue_test",
        "_id": "ZmxpbnRfbXlnbHVlX3Rlc3RfZGVmYXVsdF9odHRwX2xvZ3NfY3ZfZnVsbF93aXRoX2NhY2hlX2luZGV4",
        "_score": 1.0,
        "_source": {
          "version": "1.1",
          "latestId": "ZmxpbnRfbXlnbHVlX3Rlc3RfZGVmYXVsdF9odHRwX2xvZ3NfY3ZfZnVsbF93aXRoX2NhY2hlX2luZGV4",
          "type": "flintindexstate",
          "state": "active",
          "applicationId": "****",
          "jobId": "****",
          "dataSourceName": "myglue_test",
          "jobStartTime": 1730241864568,
          "lastRefreshStartTime": 1730241864568,
          "lastRefreshCompleteTime": 1730241882050,
          "lastUpdateTime": 1730241882107,
          "error": ""
        }
      }

The _meta.properties.lastRefreshTime is added as well when spark.flint.metadataCacheWrite.enabled set to true


{
  "flint_myglue_test_default_http_logs_cv_full_with_cache_index": {
    "mappings": {
      "_meta": {
        "latestId": "ZmxpbnRfbXlnbHVlX3Rlc3RfZGVmYXVsdF9odHRwX2xvZ3NfY3ZfZnVsbF93aXRoX2NhY2hlX2luZGV4",
        "kind": "covering",
        "indexedColumns": [
          {
            "columnType": "timestamp",
            "columnName": "@timestamp"
          },
          {
            "columnType": "string",
            "columnName": "request"
          },
          {
            "columnType": "int",
            "columnName": "size"
          },
          {
            "columnType": "string",
            "columnName": "clientip"
          },
          {
            "columnType": "int",
            "columnName": "status"
          }
        ],
        "name": "cv_full_with_cache",
        "options": {
          "auto_refresh": "false",
          "incremental_refresh": "false"
        },
        "source": "myglue_test.default.http_logs",
        "version": "0.6.0",
        "properties": {
          "sourceTables": [
            "dataSourceName.default.logGroups(logGroupIdentifier:['arn:aws:logs:us-east-1:123456:test-llt-xa', 'arn:aws:logs:us-east-1:123456:sample-lg-1'])"
          ],
          "filterCondition": "status = 403",
          "metadataCacheVersion": "1.0",
          "lastRefreshTime": 1730241882050,
          "env": {
            "SERVERLESS_EMR_VIRTUAL_CLUSTER_ID": "****",
            "SERVERLESS_EMR_JOB_ID": "****"
          }
        }
      },
      "properties": {
        "@timestamp": {
          "type": "date",
          "format": "strict_date_optional_time_nanos"
        },
        "clientip": {
          "type": "keyword"
        },
        "request": {
          "type": "keyword"
        },
        "size": {
          "type": "integer"
        },
        "status": {
          "type": "integer"
        }
      }
    }
  }
}
Test with auto refresh with external scheduler with cache write enabled

_meta.properties.refreshInterval is filled with number (unit: seconds) as expected

{
  "flint_myglue_test_default_http_logs_cv_auto_with_cache_index": {
    "mappings": {
      "_meta": {
        "latestId": "ZmxpbnRfbXlnbHVlX3Rlc3RfZGVmYXVsdF9odHRwX2xvZ3NfY3ZfYXV0b193aXRoX2NhY2hlX2luZGV4",
        "kind": "covering",
        "indexedColumns": [
          {
            "columnType": "timestamp",
            "columnName": "@timestamp"
          },
          {
            "columnType": "string",
            "columnName": "request"
          },
          {
            "columnType": "int",
            "columnName": "size"
          },
          {
            "columnType": "string",
            "columnName": "clientip"
          },
          {
            "columnType": "int",
            "columnName": "status"
          }
        ],
        "name": "cv_auto_with_cache",
        "options": {
          "auto_refresh": "true",
          "refresh_interval": "5 Minutes",
          "scheduler_mode": "external",
          "incremental_refresh": "false",
          "checkpoint_location": "s3://flint-dev-seankao/checkpoints/metadata_cache_test_auto_2/"
        },
        "source": "myglue_test.default.http_logs",
        "version": "0.6.0",
        "properties": {
          "refreshInterval": 300,
          "sourceTables": [
            "dataSourceName.default.logGroups(logGroupIdentifier:['arn:aws:logs:us-east-1:123456:test-llt-xa', 'arn:aws:logs:us-east-1:123456:sample-lg-1'])"
          ],
          "filterCondition": "status = 403",
          "metadataCacheVersion": "1.0",
          "env": {
            "SERVERLESS_EMR_VIRTUAL_CLUSTER_ID": "****",
            "SERVERLESS_EMR_JOB_ID": "****"
          }
        }
      },
      "properties": {
        "@timestamp": {
          "type": "date",
          "format": "strict_date_optional_time_nanos"
        },
        "clientip": {
          "type": "keyword"
        },
        "request": {
          "type": "keyword"
        },
        "size": {
          "type": "integer"
        },
        "status": {
          "type": "integer"
        }
      }
    }
  }
}

lastRefreshStartTime and lastRefreshCompleteTime are updated accordingly

      {
        "_index": ".query_execution_request_myglue_test",
        "_id": "ZmxpbnRfbXlnbHVlX3Rlc3RfZGVmYXVsdF9odHRwX2xvZ3NfY3ZfYXV0b193aXRoX2NhY2hlX2luZGV4",
        "_score": 0.7801585,
        "_source": {
          "version": "1.1",
          "latestId": "ZmxpbnRfbXlnbHVlX3Rlc3RfZGVmYXVsdF9odHRwX2xvZ3NfY3ZfYXV0b193aXRoX2NhY2hlX2luZGV4",
          "type": "flintindexstate",
          "state": "active",
          "applicationId": "****",
          "jobId": "****",
          "dataSourceName": "myglue_test",
          "jobStartTime": 1730244879973,
          "lastRefreshStartTime": 1730244879973,
          "lastRefreshCompleteTime": 1730244901683,
          "lastUpdateTime": 1730244901696,
          "error": ""
        }
      }

lastRefreshTime also added to _meta.properties

{
  "flint_myglue_test_default_http_logs_cv_auto_with_cache_index": {
    "mappings": {
      "_meta": {
        "latestId": "ZmxpbnRfbXlnbHVlX3Rlc3RfZGVmYXVsdF9odHRwX2xvZ3NfY3ZfYXV0b193aXRoX2NhY2hlX2luZGV4",
        "kind": "covering",
        "indexedColumns": [
          {
            "columnType": "timestamp",
            "columnName": "@timestamp"
          },
          {
            "columnType": "string",
            "columnName": "request"
          },
          {
            "columnType": "int",
            "columnName": "size"
          },
          {
            "columnType": "string",
            "columnName": "clientip"
          },
          {
            "columnType": "int",
            "columnName": "status"
          }
        ],
        "name": "cv_auto_with_cache",
        "options": {
          "auto_refresh": "true",
          "refresh_interval": "5 Minutes",
          "scheduler_mode": "external",
          "incremental_refresh": "false",
          "checkpoint_location": "s3://flint-dev-seankao/checkpoints/metadata_cache_test_auto_2/"
        },
        "source": "myglue_test.default.http_logs",
        "version": "0.6.0",
        "properties": {
          "refreshInterval": 300,
          "sourceTables": [
            "dataSourceName.default.logGroups(logGroupIdentifier:['arn:aws:logs:us-east-1:123456:test-llt-xa', 'arn:aws:logs:us-east-1:123456:sample-lg-1'])"
          ],
          "filterCondition": "status = 403",
          "metadataCacheVersion": "1.0",
          "lastRefreshTime": 1730245374777,
          "env": {
            "SERVERLESS_EMR_VIRTUAL_CLUSTER_ID": "****",
            "SERVERLESS_EMR_JOB_ID": "****"
          }
        }
      },
      "properties": {
        "@timestamp": {
          "type": "date",
          "format": "strict_date_optional_time_nanos"
        },
        "clientip": {
          "type": "keyword"
        },
        "request": {
          "type": "keyword"
        },
        "size": {
          "type": "integer"
        },
        "status": {
          "type": "integer"
        }
      }
    }
  }
}
Test for auto refresh with internal scheduler

jobStartTime is updated for streaming job, but lastRefreshStart/CompleteTime isn't updated, as expected

GET /.query_execution_request_myglue_test/_search

      {
        "_index": ".query_execution_request_myglue_test",
        "_id": "ZmxpbnRfbXlnbHVlX3Rlc3RfZGVmYXVsdF9odHRwX2xvZ3NfY3ZfYXV0b19pbnRlcm5hbF93aXRoX2NhY2hlX2luZGV4",
        "_score": 0.90223897,
        "_source": {
          "version": "1.1",
          "latestId": "ZmxpbnRfbXlnbHVlX3Rlc3RfZGVmYXVsdF9odHRwX2xvZ3NfY3ZfYXV0b19pbnRlcm5hbF93aXRoX2NhY2hlX2luZGV4",
          "type": "flintindexstate",
          "state": "refreshing",
          "applicationId": "****",
          "jobId": "****",
          "dataSourceName": "myglue_test",
          "jobStartTime": 1730244344917,
          "lastRefreshStartTime": 0,
          "lastRefreshCompleteTime": 0,
          "lastUpdateTime": 1730244726126,
          "error": ""
        }
      }

Related Issues

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…rch-project#744)

* write mock metadata cache data to mappings _meta

Signed-off-by: Sean Kao <seankao@amazon.com>

* Enable write to cache by default

Signed-off-by: Sean Kao <seankao@amazon.com>

* bugfix: _meta.latestId missing when create index

Signed-off-by: Sean Kao <seankao@amazon.com>

* set and unset config in test suite

Signed-off-by: Sean Kao <seankao@amazon.com>

* fix: use member flintSparkConf

Signed-off-by: Sean Kao <seankao@amazon.com>

---------

Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
@seankao-az seankao-az marked this pull request as ready for review October 24, 2024 18:16
@seankao-az seankao-az self-assigned this Oct 24, 2024
@seankao-az seankao-az added the enhancement New feature or request label Oct 24, 2024
@seankao-az
Copy link
Collaborator Author

add label to backport to the nexus branch.
To be clear, it shouldn't be backported to 0.5.
The 0.5- part in the name 0.5-nexus is obsolete

Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
@seankao-az seankao-az force-pushed the write-metadata-cache branch from 5f3af3b to 7a8e1f3 Compare October 25, 2024 18:29
* Handles refresh for refresh mode AUTO, which is used exclusively by auto refresh index with
* internal scheduler.
*/
private def refreshIndexAuto(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we update for auto refresh?

Copy link
Collaborator Author

@seankao-az seankao-az Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now only track lastRefreshStartTime and lastRefreshCompleteTime for manual refresh and auto refresh with external scheduler.

for streaming job, we use createTime to track the streaming job start time.
there's no mechanism for tracking start/end time for each micro batch update yet, so updating the 2 timestamp in the refresh could be misleading.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some comment

Signed-off-by: Sean Kao <seankao@amazon.com>
@seankao-az
Copy link
Collaborator Author

Note to any reviewer if curious, the force push only amended commit 2f58f56 and nothing else

Copy link
Collaborator

@ykmr1224 ykmr1224 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we call it metadata Cache? I was not quite sure the indication of cache.

@seankao-az
Copy link
Collaborator Author

seankao-az commented Oct 25, 2024

i do welcome a better name... was kind of struggling to come up with a name. I'm not too convinced that MetadataCache is the best one.
So the main use case for this is when using custom index metadata and metadata log storage, these metadata aren't available in OpenSearch index. And some frontend use case need access to these data without making query to backend executed in spark. I interpret it as a read cache for such users, that'll be updated by us dual-writing

@seankao-az
Copy link
Collaborator Author

Updated description with test results

@seankao-az seankao-az merged commit a07f88f into opensearch-project:main Oct 29, 2024
4 checks passed
@opensearch-trigger-bot
Copy link

The backport to 0.5-nexus failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/opensearch-spark/backport-0.5-nexus 0.5-nexus
# Navigate to the new working tree
pushd ../.worktrees/opensearch-spark/backport-0.5-nexus
# Create a new branch
git switch --create backport/backport-805-to-0.5-nexus
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a07f88f86fa384d94e535f99397e8d0d0402bba0
# Push it to GitHub
git push --set-upstream origin backport/backport-805-to-0.5-nexus
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/opensearch-spark/backport-0.5-nexus

Then, create a pull request where the base branch is 0.5-nexus and the compare/head branch is backport/backport-805-to-0.5-nexus.

@seankao-az
Copy link
Collaborator Author

Backport blocked by: #836
retrigger backport once revert pr is merged

@opensearch-trigger-bot
Copy link

The backport to 0.5-nexus failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/opensearch-spark/backport-0.5-nexus 0.5-nexus
# Navigate to the new working tree
pushd ../.worktrees/opensearch-spark/backport-0.5-nexus
# Create a new branch
git switch --create backport/backport-805-to-0.5-nexus
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a07f88f86fa384d94e535f99397e8d0d0402bba0
# Push it to GitHub
git push --set-upstream origin backport/backport-805-to-0.5-nexus
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/opensearch-spark/backport-0.5-nexus

Then, create a pull request where the base branch is 0.5-nexus and the compare/head branch is backport/backport-805-to-0.5-nexus.

seankao-az added a commit to seankao-az/opensearch-spark that referenced this pull request Oct 30, 2024
…pensearch-project#805)

* [0.5-nexus] Write mock metadata cache data to mappings _meta (opensearch-project#744)

* write mock metadata cache data to mappings _meta

Signed-off-by: Sean Kao <seankao@amazon.com>

* Enable write to cache by default

Signed-off-by: Sean Kao <seankao@amazon.com>

* bugfix: _meta.latestId missing when create index

Signed-off-by: Sean Kao <seankao@amazon.com>

* set and unset config in test suite

Signed-off-by: Sean Kao <seankao@amazon.com>

* fix: use member flintSparkConf

Signed-off-by: Sean Kao <seankao@amazon.com>

---------

Signed-off-by: Sean Kao <seankao@amazon.com>

* default metadata cache write disabled

Signed-off-by: Sean Kao <seankao@amazon.com>

* remove string literal "external" in index builder

Signed-off-by: Sean Kao <seankao@amazon.com>

* track refreshInterval and lastRefreshTime

Signed-off-by: Sean Kao <seankao@amazon.com>

* add last refresh timestamps to metadata log entry

Signed-off-by: Sean Kao <seankao@amazon.com>

* update metadata cache test case: should pass

Signed-off-by: Sean Kao <seankao@amazon.com>

* move to spark package; get refresh interval

Signed-off-by: Sean Kao <seankao@amazon.com>

* parse refresh interval

Signed-off-by: Sean Kao <seankao@amazon.com>

* minor syntax fix on FlintSpark.createIndex

Signed-off-by: Sean Kao <seankao@amazon.com>

* strategize cache writer interface

Signed-off-by: Sean Kao <seankao@amazon.com>

* update refresh timestamps in FlintSpark

Signed-off-by: Sean Kao <seankao@amazon.com>

* add test cases

Signed-off-by: Sean Kao <seankao@amazon.com>

* IT test for refresh timestamp update

Signed-off-by: Sean Kao <seankao@amazon.com>

* add doc for spark conf

Signed-off-by: Sean Kao <seankao@amazon.com>

* change mock table name

Signed-off-by: Sean Kao <seankao@amazon.com>

* add IT test at FlintSpark level

Signed-off-by: Sean Kao <seankao@amazon.com>

* test with external scheduler

Signed-off-by: Sean Kao <seankao@amazon.com>

* refactor refreshIndex method; add test for modes

Signed-off-by: Sean Kao <seankao@amazon.com>

* fix typo

Signed-off-by: Sean Kao <seankao@amazon.com>

* fix failed test caused by refactoring

Signed-off-by: Sean Kao <seankao@amazon.com>

* rename method; add comment

Signed-off-by: Sean Kao <seankao@amazon.com>

---------

Signed-off-by: Sean Kao <seankao@amazon.com>
(cherry picked from commit a07f88f)
seankao-az added a commit that referenced this pull request Oct 30, 2024
…805) (#840)

* [0.5-nexus] Write mock metadata cache data to mappings _meta (#744)

* write mock metadata cache data to mappings _meta

Signed-off-by: Sean Kao <seankao@amazon.com>

* Enable write to cache by default

Signed-off-by: Sean Kao <seankao@amazon.com>

* bugfix: _meta.latestId missing when create index

Signed-off-by: Sean Kao <seankao@amazon.com>

* set and unset config in test suite

Signed-off-by: Sean Kao <seankao@amazon.com>

* fix: use member flintSparkConf

Signed-off-by: Sean Kao <seankao@amazon.com>

---------

Signed-off-by: Sean Kao <seankao@amazon.com>

* default metadata cache write disabled

Signed-off-by: Sean Kao <seankao@amazon.com>

* remove string literal "external" in index builder

Signed-off-by: Sean Kao <seankao@amazon.com>

* track refreshInterval and lastRefreshTime

Signed-off-by: Sean Kao <seankao@amazon.com>

* add last refresh timestamps to metadata log entry

Signed-off-by: Sean Kao <seankao@amazon.com>

* update metadata cache test case: should pass

Signed-off-by: Sean Kao <seankao@amazon.com>

* move to spark package; get refresh interval

Signed-off-by: Sean Kao <seankao@amazon.com>

* parse refresh interval

Signed-off-by: Sean Kao <seankao@amazon.com>

* minor syntax fix on FlintSpark.createIndex

Signed-off-by: Sean Kao <seankao@amazon.com>

* strategize cache writer interface

Signed-off-by: Sean Kao <seankao@amazon.com>

* update refresh timestamps in FlintSpark

Signed-off-by: Sean Kao <seankao@amazon.com>

* add test cases

Signed-off-by: Sean Kao <seankao@amazon.com>

* IT test for refresh timestamp update

Signed-off-by: Sean Kao <seankao@amazon.com>

* add doc for spark conf

Signed-off-by: Sean Kao <seankao@amazon.com>

* change mock table name

Signed-off-by: Sean Kao <seankao@amazon.com>

* add IT test at FlintSpark level

Signed-off-by: Sean Kao <seankao@amazon.com>

* test with external scheduler

Signed-off-by: Sean Kao <seankao@amazon.com>

* refactor refreshIndex method; add test for modes

Signed-off-by: Sean Kao <seankao@amazon.com>

* fix typo

Signed-off-by: Sean Kao <seankao@amazon.com>

* fix failed test caused by refactoring

Signed-off-by: Sean Kao <seankao@amazon.com>

* rename method; add comment

Signed-off-by: Sean Kao <seankao@amazon.com>

---------

Signed-off-by: Sean Kao <seankao@amazon.com>
(cherry picked from commit a07f88f)
kenrickyap pushed a commit to Bit-Quill/opensearch-spark that referenced this pull request Dec 11, 2024
…pensearch-project#805)

* [0.5-nexus] Write mock metadata cache data to mappings _meta (opensearch-project#744)

* write mock metadata cache data to mappings _meta

Signed-off-by: Sean Kao <seankao@amazon.com>

* Enable write to cache by default

Signed-off-by: Sean Kao <seankao@amazon.com>

* bugfix: _meta.latestId missing when create index

Signed-off-by: Sean Kao <seankao@amazon.com>

* set and unset config in test suite

Signed-off-by: Sean Kao <seankao@amazon.com>

* fix: use member flintSparkConf

Signed-off-by: Sean Kao <seankao@amazon.com>

---------

Signed-off-by: Sean Kao <seankao@amazon.com>

* default metadata cache write disabled

Signed-off-by: Sean Kao <seankao@amazon.com>

* remove string literal "external" in index builder

Signed-off-by: Sean Kao <seankao@amazon.com>

* track refreshInterval and lastRefreshTime

Signed-off-by: Sean Kao <seankao@amazon.com>

* add last refresh timestamps to metadata log entry

Signed-off-by: Sean Kao <seankao@amazon.com>

* update metadata cache test case: should pass

Signed-off-by: Sean Kao <seankao@amazon.com>

* move to spark package; get refresh interval

Signed-off-by: Sean Kao <seankao@amazon.com>

* parse refresh interval

Signed-off-by: Sean Kao <seankao@amazon.com>

* minor syntax fix on FlintSpark.createIndex

Signed-off-by: Sean Kao <seankao@amazon.com>

* strategize cache writer interface

Signed-off-by: Sean Kao <seankao@amazon.com>

* update refresh timestamps in FlintSpark

Signed-off-by: Sean Kao <seankao@amazon.com>

* add test cases

Signed-off-by: Sean Kao <seankao@amazon.com>

* IT test for refresh timestamp update

Signed-off-by: Sean Kao <seankao@amazon.com>

* add doc for spark conf

Signed-off-by: Sean Kao <seankao@amazon.com>

* change mock table name

Signed-off-by: Sean Kao <seankao@amazon.com>

* add IT test at FlintSpark level

Signed-off-by: Sean Kao <seankao@amazon.com>

* test with external scheduler

Signed-off-by: Sean Kao <seankao@amazon.com>

* refactor refreshIndex method; add test for modes

Signed-off-by: Sean Kao <seankao@amazon.com>

* fix typo

Signed-off-by: Sean Kao <seankao@amazon.com>

* fix failed test caused by refactoring

Signed-off-by: Sean Kao <seankao@amazon.com>

* rename method; add comment

Signed-off-by: Sean Kao <seankao@amazon.com>

---------

Signed-off-by: Sean Kao <seankao@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants