Statement Sentiment Logistic Regression

This code performs several tasks related to natural language processing (NLP). It first imports some libraries that are needed for the rest of the code to run. The csv library is not used in this code, so it is not clear why it is being imported.

The nltk (Natural Language Toolkit) library is used to perform tasks related to processing and analyzing human language. The code calls two functions from this library: word_tokenize and pos_tag. word_tokenize is used to split a sentence into individual words, or “tokens,” and pos_tag is used to label each token with its part of speech (e.g., noun, verb, adjective).

The textblob library is used to determine the sentiment (positive or negative emotion) of a sentence. The code calls the TextBlob function from this library and passes it a sentence as an argument. The function returns an object that has a sentiment attribute, which contains information about the polarity (positive or negative) and subjectivity (objective or subjective) of the sentence.

The code then defines a function called analyze_sentence that takes a sentence as an argument and performs three tasks: tokenizing the sentence, labeling the parts of speech of each token, and determining the sentiment of the sentence. It then prints the tokens, part of speech tags, and sentiment of the sentence.

Next, the code defines a sample statement and uses the sent_tokenize function from nltk to split it into individual sentences. It then creates a list of labels for each sentence, with a label of 1 indicating a positive sentiment and a label of 0 indicating a negative sentiment.

The code then uses the CountVectorizer function from the sklearn.feature_extraction.text library to create a matrix of word counts for each sentence. This is a common technique in NLP for representing text data as numerical data that can be used in machine learning models.

The code then creates a logistic regression model from the sklearn.linear_model library and trains it on the data using the fit function. It then uses the trained model to make predictions on the sentences using the predict function.

Finally, the code loops through each sentence and calls the analyze_sentence function on it. It also prints the predicted label for each sentence.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Thu Dec 22 05:38:20 2022

@author: ramnot
"""
import csv
import nltk
import textblob
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression

# Function to analyze the content and tone of a sentence
def analyze_sentence(sentence):
  # Tokenize the sentence
  tokens = nltk.word_tokenize(sentence)
  # Tag the parts of speech of each token
  pos_tags = nltk.pos_tag(tokens)
  # Use TextBlob to determine the sentiment of the sentence
  sentiment = textblob.TextBlob(sentence).sentiment
  # Print the tokens, part of speech tags, and sentiment of the sentence
  print("Tokens:", tokens)
  print("Part of Speech Tags:", pos_tags)
  print("Sentiment:", sentiment)

# Example Federal Reserve statement
statement = 'Today I will offer a progress report on the Federal Open Market Committees (FOMC) efforts to restore price stability to the U.S. economy for the benefit of the American people. The report must begin by acknowledging the reality that inflation remains far too high.'

# Split the statement into sentences
sentences = nltk.sent_tokenize(statement)

# Create a list of labels for each sentence (1 for positive, 0 for negative)
labels = []
for sentence in sentences:
  if textblob.TextBlob(sentence).sentiment.polarity > 0:
    labels.append(1)
  else:
    labels.append(0)

# Use CountVectorizer to create a matrix of word counts for each sentence
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(sentences)

# Fit a logistic regression model on the data
model = LogisticRegression()
model.fit(X, labels)

# Make predictions on the sentences using the trained model
predictions = model.predict(X)

# Analyze each sentence and print the predicted label
for i in range(len(sentences)):
  print("Sentence:", sentences[i])
  analyze_sentence(sentences[i])
  print("Predicted Label:", predictions[i])
  print()