This repository is designed to analyze public sentiment toward political candidates, utilizing a range of datasets including social media interactions (Reddit), polling data, and Google Trends. The aim is to uncover sentiment patterns, shifts in support, and possible correlations between online discourse and polling or search interest trends in the lead-up to an election. Key events, such as the retirement of Joe Biden from politics and Kamala Harris’s entry into the race, are analyzed for their impact on public opinion.
-
data/
Contains the data in three forms:raw
,filtered
, andprocessed
, guiding the user through the data preparation stages identified as necessary for meeting the project’s objectives. Each folder includes datasets at different stages of preparation, allowing users to track the data transformation process from initial collection to final analysis-ready form.In the final
processed
form, additional files beyond the main dataset have been retained. These files allow readers the flexibility to explore and derive insights independently and also enable future exploratory opportunities for our team. -
scripts/
Contains all scripts used throughout the project. Each script is organized by function, making it easier to follow the steps and replicate the processes carried out during project development. -
visualizations/
Contains all images used for visualization purposes of experiments run throughout the project. -
LICENSE
The usage license for this project, detailing the terms and conditions for using and sharing the repository. Please see this file for specific usage guidelines. -
motivation.md
Outlines the goals, objectives, and research questions guiding this project, along with hypotheses and anticipated insights. It also discusses the selection of datasets and potential challenges faced during data preparation and analysis. -
data_cleaning.md
Provides an in-depth guide to the data cleaning process, detailing each step taken to prepare the raw data for analysis. This file explains the rationale behind each transformation, including data deduplication, missing data handling, standardization, and other preprocessing tasks that ensure the datasets align with the project’s analytical goals. -
mining_methods.md
Provides a detailed description of the data mining techniques used, the rationale behind their selection based on project objectives and data characteristics, and the configuration of the algorithms, including parameters, tools, and libraries employed.
-
results.md
Summarizes the results obtained from various data mining techniques, including visualizations such as graphs, tables, and other representations to facilitate interpretation.
-
evaluation.md
Describes the evaluation methods used.
-
interpretation.md
Details the findings of this project and its connection to the research questions.