-
Notifications
You must be signed in to change notification settings - Fork 0
Overview of Functionality
The data from the "Fitbit" trial is not immediately accessible in Microsoft Excel. The data is formatted in JavaScript Object Notation (JSON) a generalized example of which is presented below.
{
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": {
"GlossEntry": {
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": {
"para": "A meta-markup language, used to create markup languages such as DocBook.",
"GlossSeeAlso": ["GML", "XML"]},
"GlossSee": "markup"}
}
}
}
}
As we can see in the above example, data are presented in "key, value" pairs. It is important to note that one key may also hold additional keys for multiple values. This is present as "glossary" does not at the bottom of the hierarchy, giving rise to nested values. However, "GlossSeeAlso" is a key that has exactly one value, "["GML", and "XML"]."
This is important to note that while the number of "key, value" pairs is not set, the schema, or structure, of keys is. That is to say that in this particular case of the Fitbit study, the number of individuals completing a survey may not be consistent between individuals, but the way the data are reported will be. We expect the structure of the data to remain constant, but the number of values to change.
As discussed above, this is the importance of including a parser. Additionally, as the number of "key, value" pairs may change between individuals, the way the data are held constant must be standardized another way to keep it human readable to extract these data to a reasonable number of rows. Thus, the data are extracted to the columns:
- value : survey answer value
- corp: an alpha numeric ID
- study: alpha numeric ID
- ID: an anonymous patient identifier used for longitudinal analysis
- survey : survey and survey question
This allows any JSON file to be queried either by another python file or, if desired, another querying language.
For further analyses, data are structured in trees of which all nodes are at most 2 nodes from the root. Several functions within these trees are available to aggregate data from scoring surveys to finding omissions in completion dates.
There are two main formats for output:
- CSV files
- Terminal output
CSV files are used for the output of data parsing and terminal output is used for the output of data aggregation and returning data queries.