Doing some basic sentiment analysis on your own data isn’t a difficult process anymore, we have some fantastic libraries to make the process immensely easy. I recently upgraded my phone and copied over all the SMS’s I had, with this data I wanted to dig into it and see what I can find out - let’s start by applying basic sentiment analysis to this data!
Get started by downloading the Stanford CoreNLP full zip, extract it and further extract the models jar within that.
We’re going to include a few dependencies in our F# project to make reading files and interacting with CoreNLP much easier. In your F# project include the following packages:
- FsLab - a fantastic set of libraries that includes everything needed to get started with data science in F#
We split the process into two parts, the sentiment processing and the file reading.
// Path to the extracted models let jarDirectory = @"D:\data\stanford-corenlp-full-2015-12-09\models" let props = Properties() props.setProperty("annotators", "tokenize, ssplit, pos, parse, sentiment") |> ignore props.setProperty("suntime.binders", "0") |> ignore Directory.SetCurrentDirectory(jarDirectory) let pipeline = StanfordCoreNLP(props) let evaluateSentiment (text:string) = let annotation = Annotation(text) pipeline.annotate(annotation) let sentences = annotation.get(CoreAnnotations.SentencesAnnotation().getClass()) :?> java.util.ArrayList // For each sentence in sentences, annotate and return the sentiment value let sentiments = [ for s in sentences -> let sentence = s :?> Annotation let sentenceTree = sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree().getClass()) :?> Tree let sentiment = RNNCoreAnnotations.getPredictedClass(sentenceTree) let preds = RNNCoreAnnotations.getPredictions(sentenceTree) let probs = [ for i in 0..4 -> preds.get(i)] sentiment, probs ] if sentiments.Length = 0 then -1 else fst sentiments.Head // Helper function matching sentiment value to english value let getSentimentMeaning value = match value with | 0 -> "Negative" | 1 -> "Somewhat negative" | 2 -> "Neutral" | 3 -> "Somewhat positive" | 4 -> "Positive" | _ -> "Unknown"
#load "Sentiment.fsx" #load "..\packages\FsLab.0.3.17\FsLab.fsx" open Deedle open Sentiment let textData = @"D:\data\sms-20151102214743.csv" let textFrame = Frame.ReadCsv(textData).GroupRowsBy<string>("date").DropSparseRows() // "body" is the column that hold the SMS text - evaluate sentiment for each SMS let sentiments = textFrame.GetColumn<string>("body") |> Series.mapValues (fun v -> try printfn "%s" v Sentiment.evaluateSentiment(v) with | Failure msg -> -1 ) // Write data to the new column textFrame?sentiment <- sentiments textFrame.SaveCsv("D:\data\deedle-sms.csv", separator=',')
Executing RunSentiment.fsx will give you a new csv file that has a new column of the predicted sentiment.
Have fun looking through the data, finding positive and negative friends and trends! Though I wouldn’t recommend letting people know that they’re negative…