Monday, 24 August 2015

Sentiment Analysis

The following subject has relevance to the development of the Universal Debating Project

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials.
Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. The attitude may be his or her judgment or evaluation (see appraisal theory), affective state (that is to say, the emotional state of the author when writing), or the intended emotional communication (that is to say, the emotional effect the author wishes to have on the reader).


A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level — whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as "angry," "sad," and "happy."
Early work in that area includes Turney[1] and Pang[2] who applied different methods for detecting the polarity of product reviews and movie reviews respectively. This work is at the document level. One can also classify a document's polarity on a multi-way scale, which was attempted by Pang[3] and Snyder[4] (among others):[3] expanded the basic task of classifying a movie review as either positive or negative to predicting star ratings on either a 3 or a 4 star scale, while Snyder[4] performed an in-depth analysis of restaurant reviews, predicting ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-star scale). Even though in most statistical classification methods, the neutral class is ignored under the assumption that neutral texts lie near the boundary of the binary classifier, several researchers suggest that, as in every polarity problem, three categories must be identified. Moreover it can be proven that specific classifiers such as the Max Entropy[5] and the SVMs[6] can benefit from the introduction of neutral class and improve the overall accuracy of the classification.
A different method for determining sentiment is the use of a scaling system whereby words commonly associated with having a negative, neutral or positive sentiment with them are given an associated number on a -10 to +10 scale (most negative up to most positive) and when a piece of unstructured text is analyzed using natural language processing, the subsequent concepts are analyzed for an understanding of these words and how they relate to the concept .[citation needed] Each concept is then given a score based on the way sentiment words relate to the concept, and their associated score. This allows movement to a more sophisticated understanding of sentiment based on an 11 point scale. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text.[7]
Another research direction is subjectivity/objectivity identification. This task is commonly[8] defined as classifying a given text (usually a sentence) into one of two classes: objective or subjective. This problem can sometimes be more difficult than polarity classification:[9] the subjectivity of words and phrases may depend on their context and an objective document may contain subjective sentences (e.g., a news article quoting people's opinions). Moreover, as mentioned by Su,[10] results are largely dependent on the definition of subjectivity used when annotating texts. However, Pang[11] showed that removing objective sentences from a document before classifying its polarity helped improve performance.
A more fine-grained analysis model is called the feature/aspect-based sentiment analysis.[12] It refers to determining the opinions or sentiments expressed on different features or aspects of entities, e.g., of a cell phone, a digital camera, or a bank. A feature or aspect is an attribute or component of an entity, e.g., the screen of a cell phone, or the picture quality of a camera. This problem involves several sub-problems, e.g., identifying relevant entities, extracting their features/aspects, and determining whether an opinion expressed on each feature/aspect is positive, negative or neutral.[13] More detailed discussions about this level of sentiment analysis can be found in Liu's NLP Handbook chapter, "Sentiment Analysis and Subjectivity".[14]

Methods and features[edit]

Existing approaches to sentiment analysis can be grouped into four main categories: keyword spotting, lexical affinity, statistical methods, and concept-level techniques.[15] Keyword spotting classifies text by affect categories based on the presence of unambiguous affect words such as happy, sad, afraid, and bored.[16] Lexical affinity not only detects obvious affect words, it also assigns arbitrary words a probable “affinity” to particular emotions.[17] Statistical methods leverage on elements from machine learning such as latent semantic analysis, support vector machines, "bag of words" and Semantic Orientation — Pointwise Mutual Information (See Peter Turney's [1] work in this area). More sophisticated methods try to detect the holder of a sentiment (i.e. the person who maintains that affective state) and the target (i.e. the entity about which the affect is felt).[18] To mine the opinion in context and get the feature which has been opinionated, the grammatical relationships of words are used. Grammatical dependency relations are obtained by deep parsing of the text.[19] Unlike purely syntactical techniques, concept-level approaches leverage on elements from knowledge representation such as ontologies and semantic networks and, hence, are also able to detect semantics that are expressed in a subtle manner, e.g., through the analysis of concepts that do not explicitly convey relevant information, but which are implicitly linked to other concepts that do so.[20]
Open source software tools deploy machine learning, statistics, and natural language processing techniques to automate sentiment analysis on large collections of texts, including web pages, online news, internet discussion groups, online reviews, web blogs, and social media.[19] Knowledge-based systems, instead, make use of publicly available resources, e.g., WordNet-Affect,[21] SentiWordNet,[22] and SenticNet,[23] to extract the semantic and affective information associated with natural language concepts. Sentiment Analysis can also be performed on visual content i.e. images and videos. One of the first approach in this direction is SentiBank[24] utilizing an adjective noun pair representation of visual content.
A human analysis component is required in sentiment analysis, as automated systems are not able to analyze historical tendencies of the individual commenter, or the platform and are often classified incorrectly in their expressed sentiment. Automation impacts approximately 23% of comments that are correctly classified by humans.[25]
Sometimes, the structure of sentiments and topics is fairly complex. Also, the problem of sentiment analysis is non-monotonic in respect to sentence extension and stop-word substitution (compare THEY would not let my dog stay in this hotel vs I would not let my dog stay in this hotel). To address this issue a number of rule-based and reasoning-based approaches have been applied to sentiment analysis, including Defeasible Logic Programming.[26] Also, there is a number of tree traversal rules applied to syntactic parse tree to extract the topicality of sentiment in open domain setting[27][28]


The accuracy of a sentiment analysis system is, in principle, how well it agrees with human judgments. This is usually measured by precision and recall. However, according to research human raters typically agree 79%[29] of the time (see Inter-rater reliability).
Thus, a 70% accurate program is doing nearly as well as humans, even though such accuracy may not sound impressive. If a program were "right" 100% of the time, humans would still disagree with it about 20% of the time, since they disagree that much about any answer .[30] More sophisticated measures can be applied, but evaluation of sentiment analysis systems remains a complex matter. For sentiment analysis tasks returning a scale rather than a binary judgement, correlation is a better measure than precision because it takes into account how close the predicted value is to the target value.

Sentiment analysis and Web 2.0[edit]

The rise of social media such as blogs and social networks has fueled interest in sentiment analysis. With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations. As businesses look to automate the process of filtering out the noise, understanding the conversations, identifying the relevant content and actioning it appropriately, many are now looking to the field of sentiment analysis.[31] Further complicating the matter, is the rise of anonymous social media platforms such as 4chan and Reddit.[32] If web 2.0 was all about democratizing publishing, then the next stage of the web may well be based on democratizing data mining of all the content that is getting published.[33]
One step towards this aim is accomplished in research. Several research teams in universities around the world currently focus on understanding the dynamics of sentiment in e-communities through sentiment analysis.[34] The CyberEmotions project, for instance, recently identified the role of negative emotions in driving social networks discussions.[35]
The problem is that most sentiment analysis algorithms use simple terms to express sentiment about a product or service. However, cultural factors, linguistic nuances and differing contexts make it extremely difficult to turn a string of written text into a simple pro or con sentiment.[31] The fact that humans often disagree on the sentiment of text illustrates how big a task it is for computers to get this right. The shorter the string of text, the harder it becomes.
Even though short text strings might be a problem, sentiment analysis within microblogging has shown that Twitter can be seen as a valid offline indicator of political sentiment. Tweets’ political sentiment demonstrates close correspondence to parties’ and politicians’ political positions, indicating that the content of Twitter messages plausibly reflects the offline political landscape.[36]


  1. ^ Jump up to: a b Turney, Peter (2002). "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews". Proceedings of the Association for Computational Linguistics. pp. 417–424. arXiv:cs.LG/0212032. 
  2. Jump up ^ Pang, Bo; Lee, Lillian; Vaithyanathan, Shivakumar (2002). "Thumbs up? Sentiment Classification using Machine Learning Techniques". Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 79–86. 
  3. ^ Jump up to: a b Pang, Bo; Lee, Lillian (2005). "Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales". Proceedings of the Association for Computational Linguistics (ACL). pp. 115–124. 
  4. ^ Jump up to: a b Snyder, Benjamin; Barzilay, Regina (2007). "Multiple Aspect Ranking using the Good Grief Algorithm". Proceedings of the Joint Human Language Technology/North American Chapter of the ACL Conference (HLT-NAACL). pp. 300–307. 
  5. Jump up ^ Vryniotis, Vasilis (2013). The importance of Neutral Class in Sentiment Analysis. 
  6. Jump up ^ Koppel, Moshe; Schler, Jonathan (2006). "The Importance of Neutral Examples for Learning Sentiment". Computational Intelligence 22. pp. 100–109. CiteSeerX: 
  7. Jump up ^ Thelwall, Mike; Buckley, Kevan; Paltoglou, Georgios; Cai, Di; Kappas, Arvid (2010). "Sentiment strength detection in short informal text". Journal of the American Society for Information Science and Technology 61 (12): 2544–2558. doi:10.1002/asi.21416. 
  8. Jump up ^ Pang, Bo; Lee, Lillian (2008). "4.1.2 Subjectivity Detection and Opinion Identification". Opinion Mining and Sentiment Analysis. Now Publishers Inc. 
  9. Jump up ^ Mihalcea, Rada; Banea, Carmen; Wiebe, Janyce (2007). "Learning Multilingual Subjective Language via Cross-Lingual Projections" (PDF). Proceedings of the Association for Computational Linguistics (ACL). pp. 976–983. 
  10. Jump up ^ Su, Fangzhong; Markert, Katja (2008). "From Words to Senses: a Case Study in Subjectivity Recognition" (PDF). Proceedings of Coling 2008, Manchester, UK. 
  11. Jump up ^ Pang, Bo; Lee, Lillian (2004). "A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts". Proceedings of the Association for Computational Linguistics (ACL). pp. 271–278. 
  12. Jump up ^ Hu, Minqing; Liu, Bing (2004). "Mining and Summarizing Customer Reviews". Proceedings of KDD 2004. 
  13. Jump up ^ Liu, Bing; Hu, Minqing; Cheng, Junsheng (2005). "Opinion Observer: Analyzing and Comparing Opinions on the Web". Proceedings of WWW 2005. 
  14. Jump up ^ Liu, Bing (2010). "Sentiment Analysis and Subjectivity" (PDF). In Indurkhya, N.; Damerau, F. J. Handbook of Natural Language Processing (Second ed.). 
  15. Jump up ^ Cambria, Erik; Schuller, Björn; Xia, Yunqing; Havasi, Catherine (2013). "New Avenues in Opinion Mining and Sentiment Analysis". IEEE Intelligent Systems 28 (2): 15–21. doi:10.1109/MIS.2013.30. 
  16. Jump up ^ Ortony, Andrew; Clore, G; Collins, A (1988). The Cognitive Structure of Emotions (PDF). Cambridge Univ. Press. 
  17. Jump up ^ Stevenson, Ryan; Mikels, Joseph; James, Thomas (2007). "Characterization of the Affective Norms for English Words by Discrete Emotional Categories" (PDF). Behavior Research Methods 39 (4): 1020–1024. 
  18. Jump up ^ Kim, S. M.; Hovy, E. H. (2006). "Identifying and Analyzing Judgment Opinions." (PDF). Proceedings of the Human Language Technology / North American Association of Computational Linguistics conference (HLT-NAACL 2006). New York, NY. 
  19. ^ Jump up to: a b Dey, Lipika; Haque, S. K. Mirajul (2008). "Opinion Mining from Noisy Text Data". Proceedings of the second workshop on Analytics for noisy unstructured text data, p.83-90. 
  20. Jump up ^ Cambria, Erik; Hussain, Amir (2012). Sentic Computing: Techniques, Tools, and Applications (PDF). Springer. 
  21. Jump up ^ Strapparava, Carlo; Valitutti, Alessandro (2004). "WordNet-Affect: An affective extension of WordNet" (PDF). Proceedings of LREC. pp. 1083–1086. 
  22. Jump up ^ Baccianella, Stefano; Esuli, Andrea; Sebastiani, Fabrizio (2010). "Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining" (PDF). Proceedings of LREC. pp. 2200–2204. Retrieved 2014-04-05. 
  23. Jump up ^ Cambria, Erik; Olsher, Daniel; Rajagopal, Dheeraj (2014). "SenticNet 3: A common and common-sense knowledge base for cognition-driven sentiment analysis" (PDF). Proceedings of AAAI. pp. 1515–1521. 
  24. Jump up ^ Borth, Damian; Ji, Rongrong; Chen, Tao; Breuel, Thomas; Chang, Shih-Fu (2013). "Large-scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs". Proceedings of ACM Int. Conference on Multimedia. pp. 223–232. 
  25. Jump up ^ "Case Study: Advanced Sentiment Analysis". Retrieved 18 October 2013. 
  26. Jump up ^ Galitsky, Boris; McKenna, Eugene William. "Sentiment Extraction from Consumer Reviews for Providing Product Recommendations". Retrieved 18 November 2013. 
  27. Jump up ^ Galitsky, Boris; Dobrocsi, Gabor; de la Rosa, Josep Lluís (2010). "Inverting Semantic Structure Under Open Domain Opinion Mining". FLAIRS Conference. 
  28. Jump up ^ Galitsky, Boris; Chen, Huanjin; Du, Shaobin (2009). "Inversion of Forum Content Based on Authors' Sentiments on Product Usability". AAAI Spring Symposium: Social Semantic Web: Where Web 2.0 Meets Web 3.0: 33–38. 
  29. Jump up ^ Ogneva, M. "How Companies Can Use Sentiment Analysis to Improve Their Business". Mashable. Retrieved 2012-12-13. 
  30. Jump up ^ Roebuck, K. Sentiment Analysis: High-impact Strategies - What You Need to Know: Definitions, Adoptions, Impact, Benefits, Maturity, Vendors. 
  31. ^ Jump up to: a b Wright, Alex. "Mining the Web for Feelings, Not Facts", New York Times, 2009-08-23. Retrieved on 2009-10-01.
  32. Jump up ^ "Sentiment Analysis on Reddit". Retrieved 10 October 2014. 
  33. Jump up ^ Kirkpatrick, Marshall. ", ReadWriteWeb, 2009-04-15. Retrieved on 2009-10-01.
  34. Jump up ^ CORDIS. "Collective emotions in cyberspace (CYBEREMOTIONS)", European Commission, 2009-02-03. Retrieved on 2010-12-13.
  35. Jump up ^ Condliffe, Jamie. "Flaming drives online social networks ", NewScientist, 2010-12-07. Retrieved on 2010-12-13.
  36. Jump up ^ Tumasjan, Andranik; O.Sprenger, Timm; G.Sandner, Philipp; M.Welpe, Isabell (2010). "Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment". "Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media"

Further reading[edit]

Thursday, 13 August 2015

The Universal Debating Project

Basic Proposal by Robert Searle

Towards the Global Development of an Open Democratic "Super Brain" using Structured Data

The Universal Debating Project (or UDP) is an extremely ambitious proposal for an ongoing programme in which "all", or most arguments for, or against any topic of human knowledge could be presented in the clearest, and shortest possible form. It would also include an online "encyclopedia"(like Wikipedia) for the pros, and cons of any debate which could be continually updated in Real-Time on the internet. It would naturally adopt the Networking P2P approach, and hence, be an Open Source of structured data emanating from laymen, experts, NGOs, scholarly papers, popular articles, documentaries, web feeds, aggregators (ie.feed readers, or news readers), et al.
Ofcourse, articles such as those on Wikipedia do try to present arguments for, and against particular subjects. But how "complete," and how unbiased are they? Moreover, they deal probably with mainly major arguments, and to a lesser extent "minor" arguments. In effect, what is needed is the most objective presentation of "all" possible pros, and cons on most, if not "all" kinds of human knowledge.
Ideally, the Universal Debating Project should be publicly seen as being the most reliable, and the most credible central global source of such structured data in the world. Its aim is to achieve improved constructive reasoning, and greater holistic objectivity. It should become of great practical value for educators, citizens, governments, NGOs, businesses, et al.


The Problem of Complexity

As the world becomes increasingly complex it becomes more, and more vital to....
a) ...reduce most, if not "all" introductory information on a topic into clear, and manageable levels of data (ie. Text Simplification)...ideally using the least number of words..(similiar to notes, or "good" powerpoint presentations, and as short comprehensive summaries consisting of one, or more paragraph)
b) ...reduce "all" major, and "minor" resulting arguments for, and against of topic in the most lucid manner possible....again ideally using the least number of words...
c) ....a whole series of links to various sources could ofcourse be included at the click of a button. Ofcourse, the relevant sentences in many cases could be cached, or highlighted.
Special editors could do the above work notably in connection with a), and b). This would mean that any pro, and con arguments which are repeated could be "quickly" reduced into the least numbers of words, and be free of emotional language. These could be emailed to those in a debate to see if the participants opinions are presented accurately.
Apart from Wikipedia mentioned earlier there are ofcourse on the internet any number of forums, and discussion groups. These are fine as far as they go. But as said before how complete are their arguments for, and against a certain topic? Naturally enough, such arguments are continually repeated again, and again. This is where the UDP becomes all-important.
A vital aspect of all this is that it should be possible for people to become "instant experts." In other words, they should be able to become reasonably "expert" in the shortest space of time in say some aspect of economics, biology, or physics, or whatever. Thus, there is an element here of Anti-Credentialism in which essentially good arguments rely on good "objective" thinking rather than relying on the credibility of experts all the time.
Ofcourse, those who have formal qualifications, and training still play a vital role, but they must be prepared to submit their ideas, and discoveries onto the UDP. Thus, they could be "fully" scrutinised by other experts, and by the public without relevant credentials. This could all lead in certain instances to "quality" online global "brainstorming", or more precisely "brainwriting" sessions leading to "new" ideas that may have value in society, and the world. Hence, Collective Intelligence at work.
It should be added here too that technical subjects such as physics, and the processes involved in mathematics could be presented verbally, and clearly. In the latter instance, a individual may have little, or no training. But with "simple" verbal step by step presentations he, or she could reach levels of "mathematical understanding."

Basic Systemization of Presentation on the Universal Debating Project

The structured data of the UDP could be like the Pros, and Cons, a Debaters Handbook edited by Trevor Sather which went through a number of editions since 1896. Here, two columns are presented, one of which is for pro arguments, and the other for con arguments. This along with a brief lucid presentation of an issue, or topic should ofcourse become ultimately universally standardized for the entire world, and act as a truly comprehensive compliment to any number of "decentralized" sources of information.
An intriguing aspect of the Universal Debating Project is that we could have what is termed a "Rationality Count"(RC). This would be the electronic tracking of peoples decision-making processes for, and against a specific topic. This could give us valuable insight as to the degrees of rationality people may have. For instance, 2,000 people may select pro argument a for topic C via the internet. Then, a con argument b could be presented online for the same topic C, and 1,500 decide to agree with it, and ofcourse, press the right button on their computers to transmit their decision...and so on. We may well find interesting patterns if RCs are used. In other words, a "mapping out" of the "thinking processes" of participants in the UDP.
If the "Global Brain", or Universal Debating Project were ever set up, its initial concern would probably be with major issues notably social matters, economics, politics, and climate change/global warming. A site could be set up, and it could even have a motto such as "Fair Thinking, Fair World".

Also, it should be added that it is as yet unclear how such a proposal could be funded. It could use the Wikipedia model, or maybe not.

More Information

The following list of links may have direct, and indirect relevance to the above p2pfoundation entry. Yet, they are worthy of inclusion here. They also give us an idea of the immensity of the subject of debating, rationality, and thinking...... A book by the parapsychologist Robert Thouless, and one in which he gave a "simplistic" presentation of the different forms of argument.
There are naturally enough a number of debating "organizations" on the internet. However, the scale, and indeed, scope of the UDP is generally far greater, and "infinitely" comprehensive. The UDP could play a "central" role in the Global Brain proposal
A key problem with policy making is that unforeseen consequences can often happen. Hence, the need for good thought through planning to reduce future problems. Such policy making could be aided with the structured data approach of the UDP This has a list of links of great interest (The UDP could play a critical role in this)
The link below deals with games that have serious educational value, and could in certain situations even affect socio-economic change. Ofcourse, a pro, and con project such as the above could be presented in an attractive, and stimulating manner.
Mind Maps are another way of presenting issues other than the pro, and con approach.
Semantics can have relevance.
There are variety of ways for developing greater creativity. One such approach is lateral Thinking.
The following deals with Upstream Engagement in which people can have informed dialogue about subjects (notably in connection with "controversial" scientific innovation)
Another area of likely relevance is media bias. If undertaken correctly, the Universal Debating Project should be able to present the most "objective" presentation in the world of various topics notably on emotive issues such as genetically modified food, and global warming.
A link of links
Big Data could play a big role in the all this.
The following link is concerned with the idea(!) of Ideonomy which would probably be of great relevance to UDP. (ie.Smart Drugs) (This could be seen as a complete contradiction to the UDP) (an interesting list of book references on debating, etc) (a classic example of the "misuse" of data)
An important area of enquiry is how accurate, and authentic statistics are. With the aid of the UDP a set of them could be scrutinized rigorously.
There is an important radio programme which questions statistics...
Some interesting info can be found on the discussion section of this page/subject entry.
Finally, a blog has been set up. Other "relevant" subject matters not included as links in the above may also exist on the blog itself, and maybe included at the P2P Foundation site.