forked from ESIPFed/soil_data_model_survey
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path31-SurveyResults_record.Rmd
116 lines (78 loc) · 6.4 KB
/
31-SurveyResults_record.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
# Community Survey
```{r eval=FALSE, echo=FALSE}
rawRead.df <- readxl::read_excel('../data-raw/survey_results/Soil data compilation survey (Responses).xlsx', sheet=1)
#results.df <- tibble::as_tibble(t(rawRead.df))
#Drop columns with identifying or demigraphic information.
writeClean <- rawRead.df[,-(10:15)]
write_csv(writeClean, '../data/cleanSurveyResponses.csv')
```
```{r}
surveyResponse.df <- read_csv('data/cleanSurveyResponses.csv')
```
Many of our respondence were looking at questions around global biogeochemical processes, specifically carbon cycling.
Most of the data sources were drawn from the primary scientific literature with several respondences also drawing from direct response from study PI or colleagues.
One respondent mentioned online databases (e.g., USGS NWIS, USGS Geochemical Landscapes, NEON).
None of the study respondences mentioned DataOne or PI-driven data repositories for data discovery.
Identified challenges included QAQC protocols, difficulty getting data upon direct request from collegues, formatting heterogenity, missing or badly identified metadata, template development and stability, project management for aggregation effort in large groups.
Things that worked well in the process were orchestrated hackatons.
In general many respondants want data paired with a publication and already entired into a universal template.
There was frustration with brittle inhouse harmonization pipelines that were not always backwards compatable.
Automatic discovery from primary literature made another wishlist.
General FAIR data principles were recognized as important, though not named as such.
In general there were no differences in the harmonizatin pipeline process between the providers and aggregators.
Unsurprisingly open public data was well supported by the respondences with interest in broaderening scienific impact and increased visibility.
Attribution and engagement was a named concern of respondences, in particular collonial science was called out as being partiucalrlly problematic.
Time investments with unclear rewards.
Unclear what "data base" was appropreate for a given data set.
Data permissions issues.
Data is messy and data providers wish that was better understood by data aggregators.
Both error/uncertainty and the exact methods use things that data providers feel is often overlooked by data aggregators.
## Summary of responses
The survey followed the interviews and had a total of 23 responses.
Even though the levels of experience and purposes behind compiling data products varied, there were many commonalities across all respondents.
### Finding sources
When asked how they find their data sources, the majority of participants indicated that they used literature, while others used databases, reached out to colleagues that had worked on the topic, or even used a combination of all three.
These responses indicate that data accessibility is much less of a problem now than previously with conducting a meta-analysis.
### Painpoints and suggestions
When asked what went well with compiling the data product or what could have gone better, most people listed difficulties rather than successes.
Of the few successes, people reported comprehensive search results and the standardization of metadata and control vocabulary.
The difficulties of compiling the data product included trying to determine what authors meant when there was a lack of documentation, and the large amounts of time spent to sort data and create a template.
One respondent concluded that it was easy to automate ingestion of data from the same source, but very time consuming to mix data formats due to the need for harmonization.
### Ideal process
The participants shared that the ideal process for data harmonization involves gap-filling, standardizing units and table structure, as well as being able to preserve the original data yet build off of it for their own purposes.
### Main hurdles
Lastly, the main hurdles to contributing to data products and to other people using data include the time sunk into curating and formatting data products, the complex organization of data, and finding which data products to contribute to.
Overall, the difficulty of meta-analysis results from trying to harmonize such complex and diverse data sets.
## Responses
### Why do you compile data product(s)? What is the question you are looking to answer?
```{r}
cat(paste0(surveyResponse.df$`Why do you compile data product(s)? What is the question you are looking to answer?`, collapse='\n'))
```
### How do you find your data sources?
```{r}
cat(paste0(surveyResponse.df$`How do you find your data sources?`, collapse='\n'))
```
### What went well with how you compiled the data product or what do you wish gone better?
```{r}
cat(paste0(surveyResponse.df$`What went well with how you compiled the data product or what do you wish gone better?`, collapse='\n'))
```
### What is the ideal process for data harmonization? For the purposes of this question, this includes gap-filling, unit conversion, as well as standardizing headers and table structure.
```{r}
cat(paste0(surveyResponse.df$`What is the ideal process for data harmonization? For the purposes of this question, this includes gap-filling, unit conversion, as well as standardizing headers and table structure.`, collapse='\n'))
```
### Do you want your data to be compiled in a data product? Why or why not?
```{r}
cat(paste0(surveyResponse.df$`Do you want your data to be compiled in a data product? Why or why not?`, collapse='\n'))
```
### What is your main hurdle to contributing to data products and to other people using your data?
```{r}
cat(paste0(surveyResponse.df$`What is your main hurdle to contributing to data products and to other people using your data?`, collapse='\n'))
```
### What do you wish was better understood about how your data is collected or should be used?
```{r}
cat(paste0(surveyResponse.df$`What do you wish was better understood about how your data is collected or should be used?`, collapse='\n'))
```
### What is the ideal process for data harmonization from a data provider prospective? For the purposes of this question, this includes gap-filling, unit conversion, standardizing headers, and table structure.
```{r}
cat(paste0(surveyResponse.df$`What is the ideal process for data harmonization from a data provider prospective? For the purposes of this question, this includes gap-filling, unit conversion, standardizing headers, and table structure.`, collapse='\n'))
```