Doing some basic sentiment analysis on your own data isn’t a difficult process anymore, we have some fantastic libraries to make the process immensely easy. I recently upgraded my phone and copied over all the SMS’s I had, with this data I wanted to dig into it and see what I can find out - let’s start by applying basic sentiment analysis to this data!
Get started by downloading the Stanford CoreNLP full zip, extract it and further extract the models jar within that.
We’re going to include a few dependencies in our F# project to make reading files and interacting with CoreNLP much easier. In your F# project include the following packages:
- FsLab - a fantastic set of libraries that includes everything needed to get started with data science in F#
- Stanford.NLP.CoreNLP
- Stanford.NLP.Parser
- Stanford.NLP.Parser.Fsharp
We split the process into two parts, the sentiment processing and the file reading.
Sentiment.fsx
// Path to the extracted models
let jarDirectory = @"D:\data\stanford-corenlp-full-2015-12-09\models"
let props = Properties()
props.setProperty("annotators", "tokenize, ssplit, pos, parse, sentiment") |> ignore
props.setProperty("suntime.binders", "0") |> ignore
Directory.SetCurrentDirectory(jarDirectory)
let pipeline = StanfordCoreNLP(props)
let evaluateSentiment (text:string) =
let annotation = Annotation(text)
pipeline.annotate(annotation)
let sentences = annotation.get(CoreAnnotations.SentencesAnnotation().getClass()) :?> java.util.ArrayList
// For each sentence in sentences, annotate and return the sentiment value
let sentiments =
[ for s in sentences ->
let sentence = s :?> Annotation
let sentenceTree = sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree().getClass()) :?> Tree
let sentiment = RNNCoreAnnotations.getPredictedClass(sentenceTree)
let preds = RNNCoreAnnotations.getPredictions(sentenceTree)
let probs = [ for i in 0..4 -> preds.get(i)]
sentiment, probs ]
if sentiments.Length = 0 then
-1
else
fst sentiments.Head
// Helper function matching sentiment value to english value
let getSentimentMeaning value =
match value with
| 0 -> "Negative"
| 1 -> "Somewhat negative"
| 2 -> "Neutral"
| 3 -> "Somewhat positive"
| 4 -> "Positive"
| _ -> "Unknown"
RunSentiment.fsx
#load "Sentiment.fsx"
#load "..\packages\FsLab.0.3.17\FsLab.fsx"
open Deedle
open Sentiment
let textData = @"D:\data\sms-20151102214743.csv"
let textFrame = Frame.ReadCsv(textData).GroupRowsBy<string>("date").DropSparseRows()
// "body" is the column that hold the SMS text - evaluate sentiment for each SMS
let sentiments = textFrame.GetColumn<string>("body")
|> Series.mapValues (fun v ->
try
printfn "%s" v
Sentiment.evaluateSentiment(v)
with
| Failure msg -> -1
)
// Write data to the new column
textFrame?sentiment <- sentiments
textFrame.SaveCsv("D:\data\deedle-sms.csv", separator=',')
Executing RunSentiment.fsx will give you a new csv file that has a new column of the predicted sentiment.
Have fun looking through the data, finding positive and negative friends and trends! Though I wouldn’t recommend letting people know that they’re negative…