Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

correctly support writing pd.Series and pd.DataFrame #31

Conversation

briancappello
Copy link
Contributor

@umitanuki This PR replaces #27, fixes #30

I still don't understand why, but on 2.7, it appears arrays need to coerced to be contiguous. What's bizarre is that if you print(entire_dataset.flags) it says it is contiguous on both 2.x and 3.x. If instead you do print(entire_dataset[colname].flags) it says it isn't contiguous on both 2.x and 3.x. Meanwhile memoryview on 3.x doesn't complain, but buffer on 2.x blows up. Anyway, tests pass now.

@briancappello
Copy link
Contributor Author

I'm happy to add tests for this code too, but it will have to be tomorrow - feeling like it's bed time for me tonight

@umitanuki
Copy link
Contributor

Thanks for the work. Yes I want to see tests and README update as well, and don't rush

@@ -5,7 +5,7 @@ Build Status: ![build status](https://circleci.com/gh/alpacahq/pymarketstore/tre

Pymarketstore can query and write financial timeseries data from [MarketStore](https://github.com/alpacahq/marketstore)

Tested with 2.7, 3.3+
Tested with Python 2.7, 3.5+
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pandas is 3.5+

@@ -68,43 +68,48 @@ Construct a client object with endpoint.

## Query

`pymkts.Client#query(symbols, timeframe, attrgroup, start=None, end=None, limit=None, limit_from_start=False)`
`pymkts.Client.query(symbols, timeframe, attrgroup, start=None, end=None, limit=None, limit_from_start=False)`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed all of these to dots, feels more "Pythonic" to me, but I'm not especially attached to it either.

@briancappello
Copy link
Contributor Author

@umitanuki Added tests and updated the README, so this branch should be ready for review

@bynr
Copy link
Contributor

bynr commented Aug 1, 2019

Any update on this?

@briancappello
Copy link
Contributor Author

Finally got some time to play around with this some more. I'm not sure yet exactly where things are going wrong but it seems like marketstore@master might have bugs with writing or reading float64 values.

As it is right now, the client needs to know which types the backend is using to persist data on disk. (eg if the first write for a symbol used float32 values, then the client needs to know to submit their write requests using float32 as opposed to float64).

Updating writer.WriteCSM to coerce types appears to "work" (or at least, it doesn't crash), but upon querying for the data, some of the numbers comes back garbled (overflows i think? ... somewhere in marketstore?)

		// Check if the previously-written data schema matches the input
		columnMismatchError := "unable to match data columns (%v) to bucket columns (%v)"
		dbDSV := tbi.GetDataShapesWithEpoch()
 		csDSV := cs.GetDataShapes()
 		if len(dbDSV) != len(csDSV) {
 			return fmt.Errorf(columnMismatchError, csDSV, dbDSV)
 		}
 		missing, coercion := GetMissingAndTypeCoercionColumns(dbDSV, csDSV)
-		if missing != nil || coercion != nil {
+		if missing != nil {
 			return fmt.Errorf(columnMismatchError, csDSV, dbDSV)
 		}
+		if coercion != nil {
+			for _, shape := range coercion {
+				cs.CoerceColumnType(shape)
+			}
+		}

If you write new float64 data to marketstore (eg a symbol or attribute group it doesn't know about), then it will persist as float64 instead of the default float32 as used by the background workers. But again, upon querying for it, some of it comes back wrong.

What I've done locally is just to use pandas to convert types to float32 (or int32 for volume), before submitting the write request. But it doesn't feel like the "correct" way to fix this. I did spend some time digging through the code paths in marketstore, but wasn't able to identify exactly where things are breaking down (if I'm even right that the problem is indeed somewhere in that codebase)

@briancappello
Copy link
Contributor Author

Replaced by #55

@briancappello briancappello deleted the really-support-pandas-datatypes-in-client-write branch August 15, 2020 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants