Skip to content

Python library for analyzing Persian texts. With the ability to analyze customer opinions and their offer status, analyzing the seven emotions in Persian sentences at the moment.

License

Notifications You must be signed in to change notification settings

RezaGooner/PerSent

Repository files navigation

PerSent (Persian Sentiment Analyzer)

Persian

PerSent Logo

Introduction

PerSent is a practical Python library designed for Persian sentiment analysis. The name stands for "Persian Sentiment Analyzer". Currently in its early testing phase, the library provides basic functionality and is available on PyPI. Install it using:

pip install PerSent

Current capabilities include:

  • Sentiment analysis of opinions/comments

  • Emotion analysis of texts (happiness, sadness, anger, surprise, fear, disgust, calmness)

  • Analysis of product/service reviews (recommended/not recommended/no idea)

  • Both single-text and batch CSV processing

  • Output displayed in terminal or saved to CSV with summary statistics

Initial repository that evolved into this library: Click here

We welcome user testing and feedback to improve the library. If you encounter bugs or have suggestions, please:

For installation issues due to dependency conflicts (especially with mingw-w64), consider using online platforms like DeepNote.com.

Structure

Comment Analysis Functions

train(train_csv, test_size=0.2, vector_size=100, window=5)

Parameter Data Type Default Value Description Optional/Required
train_csv str - Path to CSV file containing training data with body and recommendation_status columns Required
test_size float 0.2 Proportion of test data (between 0.0 and 1.0) Optional
vector_size int 100 Output vector dimension for Word2Vec model (embedding size) Optional
window int 5 Context window size for Word2Vec model Optional

recommendation_status must be one of:

  • no_idea

  • recommended

  • not_recommended

Null/NaN values are converted to no_idea, affecting model accuracy.

  • Returns test accuracy score.

analyzeText(text)

Parameter Data Type Description Optional/Required
text str The Persian text to be analyzed Required

The core function that analyzes a text and returns one of: "not_recommended", "recommended", or "no_idea".


saveModel()

loadModel()

Model persistence functions. Models are saved in the model directory.


analyzeCSV(input_csv, output_path, summary_path=None, text_column=0)

Parameter Data Type Default Value Description Optional/Required
input_csv str - Path to input CSV file containing comments to analyze Required
output_path str - Path where analyzed results CSV will be saved Required
summary_path str or None None Optional path to save summary statistics CSV Optional
text_column int or str 0 Column index (int) or name (str) containing the text to analyze Optional

Batch processes comments from a CSV file. For single-column files, text_column isn't needed. Otherwise specify column name/index (0-based, negative indices supported). Output contains:

1- Original text

2- Recommendation status Optional summary_path generates statistics:

  • Total count

  • Recommended count

  • Not recommended count

  • No idea count

  • Model accuracy (not implemented in current version)

Returns a DataFrame and saves results.


Emotion Analysis Functions

loadLex(csv_file, word_col=0, emotion_col=1, weight_col=2)

Parameter Data Type Default Value Description Optional/Required
csv_file str - Path to CSV lexicon file Required
word_col int or str 0 Column index (int) or name (str) containing words Optional
emotion_col int or str 1 Column index (int) or name (str) containing emotion labels Optional
weight_col int or str 2 Column index (int) or name (str) containing weight values Optional

Loads a CSV with three columns:

1- Keywords

2- Emotion (happiness, sadness, anger, fear, disgust, calmness)

3- Emotion weight (defaults to 1 if unspecified, affecting accuracy)

Column indices are optional.


train(train_csv, text_col='text', emotion_col='sentiment', weight_col='weight')

Parameter Data Type Default Value Description Optional/Required
train_csv str - Path to training CSV file Required
text_col str or int 'text' Column name/index containing text data Optional
emotion_col str or int 'emotion' Column name/index containing emotion labels Optional
weight_col str or int 'weight' Column name/index containing weight values Optional

Trains the emotion model using a CSV with specified column names (optional).


saveModel(model_name='weighted_sentiment_model')

Parameter Type Default Value Description Optional/Required
model_name str 'weighted_sentiment_model' Base filename for saving model (without extension) Optional

loadModel(model_name='weighted_sentiment_model')

Parameter Type Default Value Description Optional/Required
model_name str 'weighted_sentiment_model' Base filename of model to load (without extension) Optional

Model persistence functions (saved in model directory).


analyzeText(text)

Parameter Type Description Optional/Required
text str Persian text to analyze Required

Analyzes a single text, returning percentage scores for each emotion.


analyzeCSV(input_csv, output_csv, text_col='text', output_col='sentiment_analysis')

Parameter Type Default Value Description Optional/Required
input_csv str - Path to input CSV file containing text to analyze Required
output_csv str - Path to save analyzed results Required
text_col str/int 'text' Column name/index containing text to analyze Optional
output_col str 'sentiment_analysis' Column name for output results Optional

Batch processes texts from CSV. Returns True on success. Requires:

  • input_csv path

  • output_csv path Optional column names.


Installation

Install via pip:

pip install PerSent

For specific versions:

pip install PerSent==<VERSION_NUMBER>

Usage

  • Comment Analysis

Basic single-text analysis:

from PerSent import CommentAnalyzer

analyzer = CommentAnalyzer()

'''
Training (if you have data):
Requires CSV with comments and recommendation status columns
Status must be: recommended/not_recommended/no_idea
'''
analyzer.train("train.csv")

# Load pre-trained model
analyzer.loadModel()

# Predict
text = "کیفیت عالی داشت" # "Excellent quality"
result = analyzer.analyzeText(text)
print(f"Sentiment: {result}")  # Output: Sentiment: recommended

The included pre-trained model has ~70% accuracy. For better results, you can train with larger datasets. I've prepared a split dataset (due to size):

Download Here


Batch CSV processing:

from PerSent import CommentAnalyzer
analyzer = CommentAnalyzer()
analyzer.loadModel()

# Basic usage (single-column CSV)
analyzer.analyzeCSV(
    input_csv="comments.csv",
    output_path="results.csv"
)

# Alternative usage patterns:
# 1. Using column index (0-based)
analyzer.analyzeCSV("comments.csv", "results.csv", None, 0)

# 2. Negative indices (count from end)
analyzer.analyzeCSV("comments.csv", "results.csv", None, -1)

# 3. Column name
analyzer.analyzeCSV("comments.csv", "results.csv", None, "نظرات") # "Comments" column

# 4. With summary (single-column)
analyzer.analyzeCSV("comments.csv", "results.csv", "summary.csv")

# 5. With summary and column specification
analyzer.analyzeCSV("comments.csv", "results.csv", "summary.csv", 2)
  • Emotion Analysis

Single text analysis with pre-trained model:

from PerSent import SentimentAnalyzer

analyzer = SentimentAnalyzer()
analyzer.loadModel()

sample_text = "امتحانم رو خراب کردم. احساس می‌کنم یک شکست خورده‌ی تمام عیارم."
# "I failed my exam. I feel like a complete failure."

result = analyzer.analyzeText(sample_text)
for emotion, score in sorted(result.items(), key=lambda x: x[1], reverse=True):
    print(f"{emotion}: {score:.2f}%")

output :

غم: 36.00%                     #Sadness
عصبانیت: 36.00%                 #anger
ترس: 28.00%                    #fear
شادی: 0.00%                     #happiness
تنفر: 0.00%                      #disgust
شگفتی: 0.00%                    #surprise
آرامش: 0.00%                    #calmness

To train your own model:

analyzer.train('emotion_dataset.csv')

Required CSV columns:

1- Keywords

2- Emotion (happiness, sadness, anger, disgust, fear, calmness)

3- Emotion weight

Model persistence:

analyzer.saveModel("custom_model_name")
analyzer.loadModel("custom_model_name")

Batch CSV processing:

analyzer.analyzeCSV("input.csv", "output.csv")

Contribution

As mentioned, this library needs community collaboration. Please share suggestions, bugs, or feedback via: