Skip to content

mayankesh239/PII_Tracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PII_Tracker App

This application is designed to periodically fetch data from a public GitHub repository file like this containing patterns for Personally Identifiable Information (PII) data which the help of github acess token. It then synchronizes the data with a MongoDB collection.

  • It fetches data periodically from the GitHub file specified by the link.
  • It stores the fetched data in a MongoDB collection.
  • It handles additions, updates, and deletes of entries in the GitHub file, reflecting the changes in the MongoDB collection on the next run.
  • The code is written in Python.
  • It handles edge cases such as handling GitHub API errors, checking for the last synchronized commit, comparing commits to determine if there are new changes, and logging errors for debugging purposes.

Note: it only collects the useful informations ( entries for which sensitive is marked as true in the file).

Milestones achieved

  1. Given a GitHub link to a file like this which contains patterns for PII data, a cron mentioned here will periodically run and fetch data from this file and store this in the mongo collection mentioned in the mongodb_uri of main.py.

  2. Note that if a new entry is added to the file, then the same would be reflected in mongo on the next run. Same goes for updates and deletes as well.

Requirements

  • Python 3.x
  • pip (Python package installer)
  • GitHub access token
  • MongoDB URI

Installation

  1. Clone the repository:
$ git clone https://github.com/mayankesh239/PII_Tracker.git 
  1. Navigate to the project directory:
$ cd PII_Tracker
  1. Install the required Python packages:
$ pip install -r requirements.txt

Configuration

  1. Generate a GitHub access token:
  • Go to https://github.com/settings/tokens.
  • Click on "Generate new token".
  • Give the token a suitable description and select the necessary scopes (e.g., repo access).
  • Click on "Generate token" and copy the generated access token.
  1. Set the GitHub access token as an environment variable:
  • Open the terminal and execute the following command:
    $ export GITHUB_ACCESS_TOKEN="your-access-token"
    
    Replace "your-access-token" with the GitHub access token you generated.
  1. Set the MongoDB URI:
  • Open main.py file in a text editor.
  • Replace the value of mongodb_uri variable (at line no 14 ) with your MongoDB connection URI. You can refer this Create Cluster Using MongoDB Atlas) to create cluster in MongoDB Atlas.
  1. Configure the application:
  • Open the main.py file.
  • Update the following variables in the code:
    • repository_url: Set it to the GitHub repository URL containing the PII data file.
    • file_path: Set it to the file path of the PII data file within the repository.
    • mongodb_uri: Set it to the connection URI for your MongoDB database.
    • database_name: Set it to the name of the MongoDB database.
    • collection_name: Set it to the name of the MongoDB collection.

Usage

demo_1.webm

To run the application and perform data synchronization, execute the following command in the project directory:

$ python3 main.py

The application will fetch data from the this GitHub repository file, filter the sensitive information based on the "sensitive" attribute, and update the MongoDB collection with the filtered data. It will log the execution status and any errors encountered in the pii_sync.log file.

Cron Job Configuration

demo_2.webm

To set up a cron job for periodic execution, you can use the crontab command on Linux systems:

  1. Open the terminal and execute the following command:
$ crontab -e

If prompted to select an editor, choose your preferred editor (e.g., nano, vim).

  1. Add the following line to the crontab file to schedule the job at 10:32 PM every day:
32 22 * * * /usr/bin/python3 /path/to/your/pii_tracker/main.py 

Replace /path/to/your/pii-tracker with the actual path to the project directory. Save the crontab file and exit the editor.

  1. Execute the following command:
sudo apt install postfix

During the installation, you will be prompted to choose the general type of configuration. Select "Internet Site" and press Enter. Then, enter your fully qualified domain name (FQDN) when prompted. If you don't have a registered domain name, you can use the hostname of your server as the FQDN. To find out the hostname, you can run the following command in your terminal:

hostname
  1. Save the file and exit the text editor.

The cron job will now run at the specified time and execute the PII synchronization process. You can check the execution and any potential error messages in the log file specified in your script's logging configuration (pii_sync.log in this case).

You can check the scheduled cron jobs by running the following command in the terminal:

crontab -l

If the cron job is not working you can refer this. This doc has a list some of the ways to fix the issues.

Note: this will work in linux. You can use task scheduler to perform this on windows ( refer this )

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages