diff --git a/demystifying_data_mining/demystifying_data_mining.md b/demystifying_data_mining/demystifying_data_mining.md new file mode 100644 index 000000000..5c832191d --- /dev/null +++ b/demystifying_data_mining/demystifying_data_mining.md @@ -0,0 +1,115 @@ + + +# Demystifying Data Mining + +
+ +## Overview +@comment + +**Is this module right for me?** @long_description + +**Estimated time to completion:** @estimated_time + +**Pre-requisites** +No perequisites + +**Learning Objectives** + +@learning_objectives + +
+ +## What is Data Mining? + +Data mining is basically extracting useful information from huge sets of data. During research, huge amount of data is gathered. Data mining is the process of mining that data to extract the relevant information that can be used to make informed decisions. + +Data mining is the inbetween phase of gathering the data and creating a model. + +Models are a system of representation of events/instances + +Gathering Data -> Mining data (DATA MINING!) -> creating models. + +The primary purpose of mining data is to identify trends, patterns, and relationships in order to make informed decisions and plans. + +**Note:** Data mining becomes very useful when dealing with very large data i.e the more data available, the more accurate and indepth the trends, patterns and relationships identified. + +## Why is Data Mining important? + +Data mining is used by large companies for different reasons, one of which is to get as much information from the data they have of their consumers as possible. + +As data mining is also used to predict trends, large companies use it in order to prepare for their next line of production. This prediction capability is also used by stores to determine things such as what consumers are buying most and what placements of good encourages the consumer to spend more at their respective stores. That is, if we place the cereal section close to the milk section, would the consumer feel more inclined to buy cereal? + +Back to the ability to predict trends, data mining can be used to predict the future. That is why it is so favoured by large business and cooperation (who doesnt want the ability to predict the next big thing!). Data mining does this by consolidating the data and using previous events(gotten from the data) to run a sort of probability test of what could happen in the future. + +
+Example
+ +M hospital decided to get the data they have from the last five years, they clean the data and then mine it. After mining the data, it is found that during the last 3 springs, there has been a large influx of patient visitiing the hospital with cases of Rhinovirus. According to the data mined, it is predicted that the same would happen this spring. + +Data mining allowed the hospital to search through the gathered data for relevant information in order to make prediction about the future. + +
+ + +## Applications of Data Mining + +**Data mining in Healtcare:** With the data accumulated in healthcare, data mining can help find the most appropriate and cost effective practices that benefit both the hospital and the pateints + +**Data mining in Research analysis:** Data mining is one of the best tools for cleaning data, pre-processing data and integrating data into a database, which makes it ideal for researchers. Data mining can help identify the correlation between activities or co-occurring sequences that can bring about change in the research direction. Data mining, when used with data visualization and visual data mining, can help clarify data in research. + +## Challenges of Data Mining + +**Big data:** Many existing systems struggle with handling, storing, and making use of the flood of unorganised input that come with handling big data set. Most system crash. + +**User competency:** To fully gain the benefits of data mining, the user must understand the data available and the context of the information they are seeking. They must also know, at least generally, how the tools (for data mining) work and what they can do. + +**Data quality and availability:** If the input is low quality, the output will also be low quality. Data mining needs data and like with anything that needs data to be collected, the quality of that data needs to high. Mining low quality data will give low quality information. + +## Additional Resources + +The [Top Ten Data Mining Applications in the Real World](https://intellipaat.com/blog/top-data-mining-applications/), a free online eduational site, provides more indepth examples of how data mining can be applied to real world situations and jobs. + + +## Feedback + +In the beginning, we stated some goals. + +**Learning Objectives** + +@learning_objectives + +We ask you to fill out a brief (5 minutes or less) survey to let us know: + +* If we achieved the learning objectives +* If the module difficulty was appropriate +* If we gave you the experience you expected + +We gather this information in order to iteratively improve our work. Thank you in advance for filling out [our brief survey](https://redcap.chop.edu/surveys/?s=KHTXCXJJ93&module_name=%22Demystifying+Data+Mining%22)!