Skip to content

Using an SQL data source (Design Philosophy?) #679

Answered by MaxHalford
kristopher-wood asked this question in Q&A
Discussion options

You must be logged in to vote

I think there is no established way to do things. Whatever works for you is fine.

In my opinion, what is really important is to have a proper dataset to train and evaluate on. I say train and evaluate because, as you may know, a single dataset can be used for both tasks when you do online learning. It's what we call progressive validation. Therefore, the ideal setup is to have a dataset where you know the arrival times of the labels. This way, you can simulate a production scenario by showing the x and ys to the model in the exact same order as what happened in production. I wrote a blog post on this here.

Now, the way you obtain this dataset is entirely up to you. Using a database is fin…

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@kristopher-wood
Comment options

@kristopher-wood
Comment options

@MaxHalford
Comment options

@kristopher-wood
Comment options

Answer selected by MaxHalford
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants