Skip to content
This repository has been archived by the owner on Jan 10, 2025. It is now read-only.

Investigate different models for describing an analysis flow using a DAG or similar structure. #83

Open
stuartlynn opened this issue Oct 26, 2020 · 0 comments

Comments

@stuartlynn
Copy link
Contributor

We currently only have 2 types of operation on smooshr

  1. Combine columns together
  2. Create a taxonomy for a given column

In the future we would like to have more steps for example

  • Extract part of a column as a new column. For example an address like "23 Some Street, Some City, US, 11221" -> "Some City" to
  • Standardize a time column
  • Merge the contents of two columns together to form a new column
  • Do entity matching on a given column
  • etc

Some of these steps will have dependencies on previous steps that are hard to predict at run time. It would be great to have each indiividual transform be defined as a node in a graph with dependecies linked by edges. Essentially a DAG.

This would inform the UI and the python code that is ultimetly spit out by the tool.

Some links to projects that might be worth looking at

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant