-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple Spark versions #3637
Comments
+1 to support multiple versions
What's the motivation for this idea? to speed the build? |
Yes speeding up build is one benefit. And in Iceberg when we did that, the main benefit was to let all developers know what is the targeted version to add new features, because in IDE only one version's module would resolve. That avoids a few issues we encountered when just starting to do multi-version, where people frequently added new features to older versions by mistake, and it was hard to track which version has what features patched. Selective version build does not 100% socle these issues, but definitely improved the situation a lot. |
To speed the build, Suppose we have this project structure layout:
We can use Maven's
If you want to build the dependent modules at the same time, run this:
Also, if you want to build the parent module at the same time, run this:
Does this feature match your requirements?
What does this mean? Suppose we use Maven to manage the above project layout.
Can it solve your problem? |
I see. In your example, which version does the I guess if we need to check that, we will do something like the following to rebuild
This probably should be how the CI should be configured to test each version. |
Good question. I think the Spark version in The There is a similar reference in flink-connector-elasticsearch. In the base module, it chooses es 7.10.x. |
Okay sounds good, I think I get the general idea here, will reorganize the codebase and publish it to lance-spark and ping you for review! |
Port Spark connector from https://github.com/lancedb/lance/tree/main/java/spark to the new repository. Refactor the codebase structure and fix a few methods in LanceArrowUtils to support both Spark 3.4 and 3.5. Closes: lancedb/lance#3636 Closes: lancedb/lance#3637
Spark 4.0 is in preview will be out soon, and so far many projects have taken the multi-version stance to support Spark 3.X versions due to its backwards incompatibility issues, examples:
Currently Lance Spark connector only supports a single version, and it feels like we should move to a similar multi-version model to prepare for 4.0.
However, that might also mean we need to switch a build system, because at this moment I don't see a good way for doing selective versioned build through Maven, only Gradle can support this like in the Iceberg & Gravitino example. For Maven-based systems like Hudi, it seems like the user has to either manually build the specific version, or all the supported Spark versions are all built, which is not really ideal. But I am less familiar with Maven than Gradle, so let me know if there is a good way to support that natively in Maven.
This comes up while I am doing #3636, if we agree with the versioning strategy, I can make the corresponding changes during the port process.
Thoughts? @yanghua @SaintBacchus
The text was updated successfully, but these errors were encountered: