Skip to content

parksangji/malicious_URL_detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Malicious URL detection

It extracts features from malicious URLs and normal URLs, expresses the features as vectors, and uses multiple machine learning to differentiate malicious and normal URLs for features.

Extract feature based on URL

This project uses multiple machine learning algorithms to detect malicious URLs.Detects/detects malicious URLs when various URLs are input usingThe goal is to develop predictable algorithms. various machines Lexical results for normal and malicious URLs by targeting the learning algorithm. By analyzing the features, the final accuracy of about 96% was derived.

development environment

  • Development Environment: Linux, Jupyter
  • Development language: Python3

Introduction to the applied technology and how to apply it

image

  • About 430,000 normal URLs and 150,000 malicious URLs, using a total of 580,000 URL data.
  • 80% of the given URL data is used in the train set to train the model, and 20% is used in the test set to evaluate the model's performance

image

  • Based on 22 lexical features extracted from URLs, 8 machine learning algorithms are Create a malicious URL prediction model using

conclusion

  • By analyzing vocabulary characteristics for normal and malicious URLs targeting various machine learning algorithms, about 96% of accuracy was finally derived.
  • Compared to the results derived from individual models, the probability of incorrectly predicting malignancy as normal and normal as malicious in multiple models decreases.
  • By combining a large number of models, it could be confirmed that higher accuracy was maintained similarly when a specific model was included rather than higher results.

About

2021 Malicious URL Analysis Graduation Project🎓

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages