Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dami data mining technique #299

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
115 changes: 115 additions & 0 deletions demystifying_data_mining/demystifying_data_mining.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
<!--

author: Agoro Oluwadamilare
email: agoroo@chop.edu
version: 2.0.0
module_template_version: 3.0.0
language: en
narrator: UK English Female
title: Demystifying Data Mining
comment: understand what data mining is and why it is important.
long_description: Everyday, huge amount of data is generated, collected and stored. Learn what data mining is and why it is important.
estimated_time: 15 minutes
@learning_objectives

After completion of this module, learners will be able to:

- Define data mining
- Explain why data mining is important
- Descibe cases in which data minig could be used
- List the limitations in data mining

@end

link: https://chop-dbhi-arcus-education-website-assets.s3.amazonaws.com/css/styles.css

script: https://kit.fontawesome.com/83b2343bd4.js

-->

# Demystifying Data Mining

<div class = "overview">

## Overview
@comment

**Is this module right for me?** @long_description

**Estimated time to completion:** @estimated_time

**Pre-requisites**
No perequisites

**Learning Objectives**

@learning_objectives

</div>

## What is Data Mining?

Data mining is basically extracting useful information from huge sets of data. During research, huge amount of data is gathered. Data mining is the process of mining that data to extract the relevant information that can be used to make informed decisions.

Data mining is the inbetween phase of gathering the data and creating a model.

Models are a system of representation of events/instances

Gathering Data -> Mining data (DATA MINING!) -> creating models.

The primary purpose of mining data is to identify trends, patterns, and relationships in order to make informed decisions and plans.

**Note:** Data mining becomes very useful when dealing with very large data i.e the more data available, the more accurate and indepth the trends, patterns and relationships identified.

## Why is Data Mining important?

Data mining is used by large companies for different reasons, one of which is to get as much information from the data they have of their consumers as possible.

As data mining is also used to predict trends, large companies use it in order to prepare for their next line of production. This prediction capability is also used by stores to determine things such as what consumers are buying most and what placements of good encourages the consumer to spend more at their respective stores. That is, if we place the cereal section close to the milk section, would the consumer feel more inclined to buy cereal?

Back to the ability to predict trends, data mining can be used to predict the future. That is why it is so favoured by large business and cooperation (who doesnt want the ability to predict the next big thing!). Data mining does this by consolidating the data and using previous events(gotten from the data) to run a sort of probability test of what could happen in the future.

<div class = "care">
<b style="color: rgb(var(--color-highlight));">Example</b><br>

M hospital decided to get the data they have from the last five years, they clean the data and then mine it. After mining the data, it is found that during the last 3 springs, there has been a large influx of patient visitiing the hospital with cases of Rhinovirus. According to the data mined, it is predicted that the same would happen this spring.

Data mining allowed the hospital to search through the gathered data for relevant information in order to make prediction about the future.

</div>


## Applications of Data Mining

**Data mining in Healtcare:** With the data accumulated in healthcare, data mining can help find the most appropriate and cost effective practices that benefit both the hospital and the pateints

**Data mining in Research analysis:** Data mining is one of the best tools for cleaning data, pre-processing data and integrating data into a database, which makes it ideal for researchers. Data mining can help identify the correlation between activities or co-occurring sequences that can bring about change in the research direction. Data mining, when used with data visualization and visual data mining, can help clarify data in research.

## Challenges of Data Mining

**Big data:** Many existing systems struggle with handling, storing, and making use of the flood of unorganised input that come with handling big data set. Most system crash.

**User competency:** To fully gain the benefits of data mining, the user must understand the data available and the context of the information they are seeking. They must also know, at least generally, how the tools (for data mining) work and what they can do.

**Data quality and availability:** If the input is low quality, the output will also be low quality. Data mining needs data and like with anything that needs data to be collected, the quality of that data needs to high. Mining low quality data will give low quality information.

## Additional Resources

The [Top Ten Data Mining Applications in the Real World](https://intellipaat.com/blog/top-data-mining-applications/), a free online eduational site, provides more indepth examples of how data mining can be applied to real world situations and jobs.


## Feedback

In the beginning, we stated some goals.

**Learning Objectives**

@learning_objectives

We ask you to fill out a brief (5 minutes or less) survey to let us know:

* If we achieved the learning objectives
* If the module difficulty was appropriate
* If we gave you the experience you expected

We gather this information in order to iteratively improve our work. Thank you in advance for filling out [our brief survey](https://redcap.chop.edu/surveys/?s=KHTXCXJJ93&module_name=%22Demystifying+Data+Mining%22)!