One of the things that we'd really like to be able to do with reputation monitor, is some kind of sentiment analysis - i.e. working out whether a story about you is positive or negative, good or bad.
This is hard, for a number of reasons, but we think we might be able to have a crack at it - there is a wealth of published material on the subject and hopefully we'll be able to come up with something useful.
Some of the challenges we have identified before writing a line of code (as opposed to those myriad problems you identify once you're writing code!) follow. We'd love any input you might have to guide us along the way:
- synonyms, antonyms, homonyms: words that mean the same thing, appear the same but mean different things, and all the other joys of the English language (does GLA refer to the Greater London Authority or the Greatest Living American?)
- sarcasm, irony and other humour: I think it's safe to say, we won't be working this out algorithmically any time soon
- computing resources: do we stick with term weights, look at vector weighting or move to full Markov chains? There's a lot of information out there, and we need to be able actually to sift through it all...
So, we're looking forward to seeing what we can create, but we'd also love your feedback and input.
Update: I saw SEOmoz's term targeting tool this morning which partly prompted me to write this post (term targeting being a related semantic analysis to sentiment analysis) so I thought I would run it over this post. The results look great guys - the top two targeted 2-word phrases were: 'sentiment analysis' and 'reputation monitor' which I think is fair (the 3rd was 'London authority' which is slightly less intentional - although maybe we should aim to be the London authorities on sentiment analysis!)