-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 08bc3c9
Showing
61 changed files
with
4,385 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# This workflow uses actions that are not certified by GitHub. | ||
# They are provided by a third-party and are governed by | ||
# separate terms of service, privacy policy, and support | ||
# documentation. | ||
# This workflow will build a Java project with Gradle and cache/restore any dependencies to improve the workflow execution time | ||
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-java-with-gradle | ||
|
||
name: Java CI with Gradle | ||
|
||
on: | ||
push: | ||
branches: [ "main" ] | ||
pull_request: | ||
branches: [ "main" ] | ||
|
||
permissions: | ||
contents: read | ||
|
||
jobs: | ||
build: | ||
|
||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- uses: actions/checkout@v3 | ||
- name: Set up JDK 11 | ||
uses: actions/setup-java@v3 | ||
with: | ||
java-version: '11' | ||
distribution: 'temurin' | ||
- name: Build and test | ||
uses: gradle/gradle-build-action@v2.5.0 | ||
with: | ||
arguments: build | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
.gradle | ||
build | ||
.idea | ||
**/.DS_Store |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2023 vinted | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
# Flink BigQuery Connector ![Build](https://github.com/vinted/flink-big-query-connector/actions/workflows/gradle.yml/badge.svg) [![](https://jitpack.io/v/com.vinted/flink-big-query-connector.svg)](https://jitpack.io/#com.vinted/flink-big-query-connector) | ||
|
||
|
||
This project provides a BigQuery sink that allows writing data with exactly-once or at-least guarantees. | ||
|
||
## Usage | ||
|
||
There are builder classes to simplify constructing a BigQuery sink. The code snippet below shows an example of building a BigQuery sink in Java: | ||
|
||
```java | ||
var credentials = new JsonCredentialsProvider("key"); | ||
|
||
var clientProvider = new BigQueryProtoClientProvider(credentials, | ||
WriterSettings | ||
.newBuilder() | ||
.build() | ||
); | ||
|
||
var bigQuerySink = BigQueryStreamSink | ||
.<String>newProto() | ||
.withClientProvider(clientProvider) | ||
.withDeliveryGuarantee(DeliveryGuarantee.EXACTLY_ONCE) | ||
.withRowValueSerializer(new NoOpRowSerializer<>()) | ||
.build(); | ||
``` | ||
|
||
The sink takes in a batch of records. Batching happens outside the sink by opening a window. Batched records need to implement the BigQueryRecord interface. | ||
|
||
```java | ||
var trigger = BatchTrigger.<Record, GlobalWindow>builder() | ||
.withCount(100) | ||
.withTimeout(Duration.ofSeconds(1)) | ||
.withSizeInMb(1) | ||
.withResetTimerOnNewRecord(true) | ||
.build(); | ||
|
||
var processor = new BigQueryStreamProcessor().withDeliveryGuarantee(DeliveryGuarantee.AT_LEAST_ONCE).build(); | ||
|
||
source | ||
.key(s -> s) | ||
.window(GlobalWindows.create()) | ||
.trigger(trigger) | ||
.process(processor) | ||
``` | ||
|
||
|
||
To write to BigQuery, you need to: | ||
|
||
* Define credentials | ||
* Create a client provider | ||
* Batch records | ||
* Create a value serializer | ||
* Sink to BigQuery | ||
|
||
# Credentials | ||
|
||
There are two types of credentials: | ||
|
||
* Loading from a file | ||
```java | ||
new FileCredentialsProvider("/path/to/file") | ||
``` | ||
* Passing as a JSON string | ||
```java | ||
new JsonCredentialsProvider("key") | ||
``` | ||
|
||
# Types of Streams | ||
|
||
BigQuery supports two types of data formats: json and proto. When creating a stream, you can choose these types by creating the appropriate client and using the builder methods. | ||
|
||
* JSON | ||
```java | ||
var clientProvider = new BigQueryJsonClientProvider(credentials, | ||
WriterSettings | ||
.newBuilder() | ||
.build() | ||
); | ||
|
||
var bigQuerySink = BigQueryStreamSink | ||
.<String>newJson() | ||
``` | ||
* Proto | ||
```java | ||
var clientProvider = new BigQueryProtoClientProvider(credentials, | ||
WriterSettings | ||
.newBuilder() | ||
.build() | ||
); | ||
|
||
var bigQuerySink = BigQueryStreamSink | ||
.<String>newProto() | ||
``` | ||
|
||
# Exactly once | ||
It utilizes a [buffered stream](https://cloud.google.com/bigquery/docs/write-api#buffered_type), managed by the BigQueryStreamProcessor, to assign and process data batches. If a stream is inactive or closed, a new stream is created automatically. The BigQuery sink writer appends and flushes data to the latest offset upon checkpoint commit. | ||
# At least once | ||
Data is written to the [default stream](https://cloud.google.com/bigquery/docs/write-api#default_stream) and handled by the BigQueryStreamProcessor, which batches and sends rows to the sink for processing. | ||
# Serializers | ||
|
||
For the proto stream, you need to implement `ProtoValueSerializer`, and for the JSON stream, you need to implement `JsonRowValueSerializer`. | ||
|
||
# Metrics | ||
<table class="table table-bordered"> | ||
<thead> | ||
<tr> | ||
<th class="text-left" style="width: 15%">Scope</th> | ||
<th class="text-left" style="width: 18%">Metrics</th> | ||
<th class="text-left" style="width: 39%">Description</th> | ||
<th class="text-left" style="width: 10%">Type</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr> | ||
<th rowspan="8">Stream</th> | ||
<td>stream_offset</td> | ||
<td>Current offset for the stream. When using at least once, the offset is always 0</td> | ||
<td>Gauge</td> | ||
</tr> | ||
<tr> | ||
<td>batch_count</td> | ||
<td>Number of records in the appended batch</td> | ||
<td>Gauge</td> | ||
</tr> | ||
<tr> | ||
<td>batch_size_mb</td> | ||
<td>Appended batch size in mb</td> | ||
<td>Gauge</td> | ||
</tr> | ||
<tr> | ||
<td>split_batch_count</td> | ||
<td>Number of times the batch hit the BigQuery limit and was split into two parts</td> | ||
<td>Gauge</td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
plugins { | ||
id 'com.github.johnrengelman.shadow' version '7.1.2' | ||
id 'maven-publish' | ||
id 'java-library' | ||
id 'me.qoomon.git-versioning' version '4.3.0' | ||
} | ||
|
||
sourceCompatibility = JavaVersion.VERSION_11 | ||
targetCompatibility = JavaVersion.VERSION_11 | ||
|
||
repositories { | ||
mavenCentral() | ||
mavenLocal() | ||
maven { url 'https://packages.confluent.io/maven/' } | ||
maven { url 'https://nexus.vinted.net/repository/maven-proxy-oom/' } | ||
} | ||
|
||
group = "com.github.vinted" | ||
|
||
gitVersioning.apply { | ||
tag { | ||
pattern = '(?<tagVersion>\\d+\\.\\d+\\.\\d+$)' | ||
versionFormat = '${tagVersion}' | ||
} | ||
branch { | ||
pattern = '.+' | ||
versionFormat = '${branch}-SNAPSHOT' | ||
} | ||
preferTags = false | ||
} | ||
|
||
ext { | ||
flinkVersion = '1.17.0' | ||
bigqueryVersion = '2.27.0' | ||
bigqueryStorageVersion = '2.37.2' | ||
json4sVersion = '4.0.3' | ||
} | ||
|
||
dependencies { | ||
// Flink provided dependencies | ||
compileOnly "org.apache.flink:flink-connector-base:$flinkVersion" | ||
compileOnly "org.apache.flink:flink-streaming-java:$flinkVersion" | ||
compileOnly "org.apache.flink:flink-core:$flinkVersion" | ||
|
||
implementation "com.google.cloud:google-cloud-bigquery:$bigqueryVersion" | ||
implementation "com.google.cloud:google-cloud-bigquerystorage:$bigqueryStorageVersion" | ||
|
||
testImplementation platform('org.junit:junit-bom:5.9.1') | ||
testRuntimeOnly "org.junit.jupiter:junit-jupiter-engine" | ||
testImplementation 'org.mockito:mockito-junit-jupiter:4.11.0' | ||
testImplementation "org.assertj:assertj-core:3.24.0" | ||
testImplementation "org.apache.flink:flink-connector-base:$flinkVersion" | ||
testImplementation "org.mockito:mockito-inline:4.7.0" | ||
testImplementation "org.apache.flink:flink-test-utils:$flinkVersion" | ||
testImplementation 'commons-io:commons-io:2.11.0' | ||
} | ||
|
||
test { | ||
useJUnitPlatform() | ||
testLogging { | ||
events "passed", "skipped", "failed" | ||
} | ||
} | ||
|
||
shadowJar { | ||
classifier = null | ||
relocate 'io.grpc', 'com.vinted.flink.bigquery.shaded.io.grpc' | ||
relocate 'io.netty', 'com.vinted.flink.bigquery.shaded.io.netty' | ||
mergeServiceFiles() | ||
} | ||
|
||
publishing { | ||
publications { | ||
java(MavenPublication) { | ||
artifactId = project.name | ||
artifact shadowJar | ||
} | ||
} | ||
} | ||
|
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
distributionBase=GRADLE_USER_HOME | ||
distributionPath=wrapper/dists | ||
distributionUrl=https\://services.gradle.org/distributions/gradle-7.3.3-bin.zip | ||
zipStoreBase=GRADLE_USER_HOME | ||
zipStorePath=wrapper/dists |
Oops, something went wrong.