You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @zeroshade I've come across this closed issue #38616 and I could still reproduce it while writing arrow data to a parquet file using pqarrow.
Here is the code that's writing to parquet file, I'm using one of your examples:
arrChan:=make(chan arrow.Record, 10)
gofunc(ch<-chan arrow.Record) {
first_rec:=<-chf, err:=os.OpenFile("./test.parquet", os.O_CREATE|os.O_WRONLY, 0644)
iferr!=nil {
panic(err)
}
deferf.Close()
// ...// we'll use the default writer properties, but you could easily pass// properties to customize the writerprops:=parquet.NewWriterProperties()
writer, err:=pqarrow.NewFileWriter(first_rec.Schema(), f, props,
pqarrow.DefaultWriterProps())
iferr!=nil {
panic(err)
}
deferwriter.Close()
fmt.Println("here")
iferr:=writer.Write(first_rec); err!=nil {
fmt.Println(err)
panic(err)
}
// first_rec.Release()forrec:=rangech {
iferr:=writer.Write(rec); err!=nil {
panic(err)
}
// rec.Release()
}
}(arrChan)
The arrow records are Released outside this function.
This code writes out a test.parquet file and when I read it using DuckDB, I get this error:
Error: Invalid Input Error: Failed to cast value: Type UINT32 with value 4294967295 can't be cast because the value is out of range for the destination type UINT16
Here is the output from the parquet-cli tool similar to what's in #38616
$ parquet pages test.parquet
Column: id
--------------------------------------------------------------------------------
page type enc count avg size size rows nulls min / max
0-D dict _ _ 1 4.00 B 4 B
0-1 data _ R 1 3.00 B 3 B 0 "0" / "0"
Column: resource.id
--------------------------------------------------------------------------------
page type enc count avg size size rows nulls min / max
0-D dict _ _ 1 4.00 B 4 B
0-1 data _ R 1 9.00 B 9 B 0 "4294967295" / "0"
Column: resource.schema_url
--------------------------------------------------------------------------------
page type enc count avg size size rows nulls min / max
0-D dict _ _ 1 43.00 B 43 B
0-1 data _ R 1 9.00 B 9 B 0 "https://opentelemetry.io/..." / "https://opentelemetry.io/..."
Column: scope.id
--------------------------------------------------------------------------------
page type enc count avg size size rows nulls min / max
0-D dict _ _ 1 4.00 B 4 B
0-1 data _ R 1 9.00 B 9 B 0 "4294967295" / "0"
Column: metric_type
--------------------------------------------------------------------------------
page type enc count avg size size rows nulls min / max
0-D dict _ _ 1 4.00 B 4 B
0-1 data _ R 1 3.00 B 3 B 0 "1" / "1"
Column: name
--------------------------------------------------------------------------------
page type enc count avg size size rows nulls min / max
0-D dict _ _ 1 7.00 B 7 B
0-1 data _ R 1 3.00 B 3 B "gen" / "gen"
Columns: resource.id and scope.id have incorrect min values.
$ parquet meta test.parquet
File path: test.parquet
Created by: parquet-go version 18.0.0-SNAPSHOT
Properties: (none)
Schema:
message schema {
required int32 id (INTEGER(16,false));
required group resource {
optional int32 id (INTEGER(16,false));
optional binary schema_url (STRING);
}
required group scope {
optional int32 id (INTEGER(16,false));
}
required int32 metric_type (INTEGER(8,false));
required binary name (STRING);
}
Row group 0: count: 1 464.00 B records start: 4 total(compressed): 464 B total(uncompressed):464 B
--------------------------------------------------------------------------------
type encodings count avg size nulls min / max
id INT32 _ _ R 1 56.00 B 0 "0" / "0"
resource.id INT32 _ _ R 1 62.00 B 0 "4294967295" / "0"
resource.schema_url BINARY _ _ R 1 171.00 B 0 "https://opentelemetry.io/..." / "https://opentelemetry.io/..."
scope.id INT32 _ _ R 1 62.00 B 0 "4294967295" / "0"
metric_type INT32 _ _ R 1 56.00 B 0 "1" / "1"
name BINARY _ _ R 1 57.00 B "gen" / "gen"
I'm hoping these reproduction details are sufficient., if there are any missing details that I can provide, please let me know and I can produce them as soon as possible. Thank you :thank
GOARCH='amd64'
GOOS='linux'
GOVERSION='go1.23.4'
Component(s)
Parquet
The text was updated successfully, but these errors were encountered:
Hi @zeroshade I've come across this closed issue #38616 and I could still reproduce it while writing arrow data to a parquet file using pqarrow.
Here is the code that's writing to parquet file, I'm using one of your examples:
The arrow records are Released outside this function.
This code writes out a test.parquet file and when I read it using DuckDB, I get this error:
Here is the output from the parquet-cli tool similar to what's in #38616
Columns: resource.id and scope.id have incorrect min values.
I'm hoping these reproduction details are sufficient., if there are any missing details that I can provide, please let me know and I can produce them as soon as possible. Thank you :thank
GOARCH='amd64'
GOOS='linux'
GOVERSION='go1.23.4'
Component(s)
Parquet
The text was updated successfully, but these errors were encountered: