Post Syndicated from Steven Cherry original https://spectrum.ieee.org/podcast/artificial-intelligence/machine-learning/polling-is-too-hardfor-humans
Steven Cherry Hi, this is Steven Cherry, for Radio Spectrum.
The Literary Digest, a now-defunct magazine, was founded in 1890. It offered—despite what you’d expect from its name—condensed versions of news-analysis and opinion pieces. By the mid-1920s, it had over a million subscribers. Some measure of its fame and popularity stemmed from accurately predicting every presidential election from 1916 to 1932, based on polls it conducted of its ever-growing readership.
Then came 1936. The Digest predicted that Kansas Governor Alf Landon would win in a landslide over the incumbent, Franklin Delano Roosevelt. Landon in fact captured only 38 percent of the vote. Roosevelt won 46 of the U.S.’s 48 states, the biggest landslide in presidential history. The magazine never recovered from its gaffe and folded two years later.
The Chicago Tribune did recover from its 1948 gaffe, one of the most famous newspaper headlines of all time, “Dewey Defeats Truman”—a headline that by the way was corrected in the second edition that election night to read “Democrats Make Sweep of State Offices,” and by the final edition, “Early Dewey Lead Narrow; Douglas, Stevenson Win,” referring to candidates that year for Senator and Governor. The Senator, Paul Douglas, by the way, was no relation to an earlier Senator from Illinois a century ago, Stephen Douglas.
The Literary Digest’s error was due, famously, to the way it conducted its polls— its readership, even though a million strong, was woefully unrepresentative of the nation’s voters as a whole.
The Tribune’s gaffe was in part due to a printer’s strike that forced the paper to settle on a first-edition banner headline hours earlier than it otherwise would have, but it made the guess with great confidence in part because the unanimous consensus of the polling that year that had Dewey ahead, despite his running one of the most lackluster, risk-averse campaigns of all time.
Polls have been making mistakes ever since, and it’s always, fundamentally, the same mistake. They’re based on representative samples of the electorate that aren’t sufficiently representative.
After the election of 2016, in which the polling was not only wrong but itself might have inspired decisions that affected the outcome—where the Clinton campaign shepherded its resources; whether James Comey would hold a press conference—pollsters looked inward, re-weighted various variables, assured us that the errors of 2016 had been identified and addressed, and then proceeded to systematically mis-predict the 2020 presidential election much as they had four years earlier.
After a century of often-wrong results, it would be reasonable to conclude that polling is just too difficult for humans to get right.
But what about software? Amazon, Netflix, and Google do a remarkable job of predicting consumer sentiment, preferences, and behavior. Could artificial intelligence predict voter sentiment, preferences, and behavior?
Well, it’s not as if they haven’t tried. And results in 2020 were mixed. One system predicted Biden’s lead in the popular vote to be large, but his electoral college margin small—not quite the actual outcome. Another system was even further from the mark, giving Biden wins in Florida, Texas, and Ohio—adding up to a wildly off-base electoral margin.
One system, though, did remarkably well. As a headline in Fortune magazine put it the morning of election day, “The polls are wrong. The U.S. presidential race is a near dead heat, this AI ‘sentiment analysis’ tool says.” The AI tool predicted a popular vote of 50.2 percent for Biden, only about one-sixth of one percent from the actual total, and 47.3 percent for Trump, off by a mere one-tenth of one percent.
The AI compay that Fortune magazine referred to is called Expert.ai, and its Chief Technology Officer, Marco Varone, is my guest today.
Marco, welcome to the podcast.
Marco Varone Hi everybody.
Steven Cherry Marco, AI-based speech recognition has been pretty good for 20 years, AI has been getting better and better at fraud detection for 25 years. AI beat the reigning chess champion back in 1997. Why has it taken so long to apply AI to polling, which is, after all … well, even in 2017, was a $20.1 billion dollar industry, which is about $20 billion more than chess.
Marco Varone Well, there are two reasons for this. The first one, that if you wanted to apply artificial intelligence to this kind of problem, you need to have the capability of understanding language in a pretty specific, deep, and nuanced way. And it is something that, frankly, for many, many years was very difficult and required a lot of investment and a lot of work in trying to go deeper than the traditional shallow understanding of text. So this was one element.
The second element is that, as you have seen in this particular case, polls, on average, are working still pretty well. But there are particular events, in particular a situation where there is a clear gap between what has been predicted and the final result. And there is a tendency to say, okay, on average, the results are not so bad. So don’t change too much because we can make it with good results, without the big changes that are always requiring investment, modification, and a complex process.
I would say that it’s a combination of the technology that needed to become better and better in understanding the capability of really extracting insights and small nuances from any kind of communication and the fact that for other types of polls, the current situation is not so bad.
The fact [is] that now there is a growing amount of information that you can easily analyze because it is everywhere in every social network, every communication in every blog and comments, made it a bit easier to say, okay, now we have a better technology, even in specific situations we can have access to a huge amount of data. So let’s try it. And this is what we did. And I believe this will become a major trend in the future.
Steven Cherry Every AI system needs data; Expert.ai uses social posts. How does it work?
Marco Varone Well, the social posts are, I would say, the most valuable kind of content that you can analyze in a situation like this, because on one side, it is a type of content that we know. When I say we know, it means that we have a used this type of content for many other projects. It is normally the kind of content that we analyze for our traditional customers, looking for the actual comments and opinions about products, services, and particular events. Social content is easy to get—up to a point; with the recent scandals, it’s becoming a bit more difficult to have access to. A huge amount of social data in the past was a bit simpler—and also it is something where you can find really every kind of person, every kind of expression, and every kind of discussion.
So it’s easier to analyze this content, to extract a big amount of insight, a big amount of information, and trying to tune … to create reasonably solid models that can be tested in a sort of realtime—there is a continuous stream of social content. There is an infinite number of topics that are discussed. And so you have the opportunity to have something that is plentiful, that is cheap, but has a big [?] expression and where you can really tune your models and tune your algorithms in it much faster and more cost-effective way than with the other type of content.
Steven Cherry So that sort of thing requires something to count as a ground truth. What is what is your ground truth here?
Very, very, very good point … a very good question. The key point is that from the start, we have decided to invest a lot of money and a lot of efforts in creating a sort of representation of knowledge that we have stored in a big knowledge graph that has been crafted manually, initially.
So we created this knowledge representation that is a sort of representation of the world knowledge, in a reduced form, and the language and the way that you express this knowledge. And we created this solid foundation, manually, so we have been able to build on a very solid and very structured foundation. On top of this foundation, it was possible, as I mentioned, to add the new knowledge, working, analyzing a big amount of data, social data is an example, but there are many other types of data that we use to enrich our knowledge. And so we are not influenced, like many other approaches, from a bias that you can take from extracting knowledge only from data.
So it’s the start of a two-tier system where we have this solid ground-truth foundation—the knowledge and information that that expert linguists and the people that have a huge understanding of things that’s created. On top of that, we can add all the information that we can extract more or less automatically from a different type of data. We believe that this was a huge investment that we did during the years, but is paying big dividends and also giving us the possibility of understanding the language and the communication at a deeper level than with other approaches.
Steven Cherry And are you using only data from Twitter or from other social media as well?
Marco Varone No, no, we try to use as much social media as possible, the limitation sometimes is that Twitter is much easier and faster to have access to a bigger amount of information. For other social sources sometimes is not that easy because you can have issues in accessing the content or you have a very limited amount of information that you can download, or that is expensive—or some sources, you cannot really control them automatically. So Twitter becomes his first choice for the reason that it is easier to get a big volume. And if you are ready to pay, you can have access to the full Twitter firehose.
Steven Cherry The universe of people who post on social networks would seem to be skewed in any number of ways. Some people post more than the average. Some people don’t post much at all. People are sometimes more extreme in their views. How do you go from social media sentiment to voter sentiment? How do you avoid the Literary Digest problem?
Marco Varone Probably the most relevant element is our huge experience. Somehow we we have started to analyze the big amount of data, textual data, many, many years ago, and we were forced to really find a way of managing and balancing and avoiding this kind of noise or duplicated information or extra spurious information—[it] can really impact on the capability of our solution to extract the real insights.
So I think that experience—a lot of experience in doing this for many, many years—is the second the secret element of our recipe in being able to do this kind of analysis. And that I would add that also you should consider that if you do it several times, we started to analyze political content, things that link it to political elections a few years ago. So we also had this generic experience and a specific experience in finding how to tune the different parameters, how to set the different algorithms to try to minimize these kinds of noisy elements. You can’t remove them completely. It is impossible.
But for example, when we analyzed the social content for the Brexit referendum, in the UK, and we were able to guess—one of the few able to do this—the real result of it, we learned a lot of lessons and we were able to improve our capability. Clearly, this means that there is not that a formula that is good for every kind of analysis.
Steven Cherry It’s sort of a commonplace that people express more extreme views on the Internet than they do in face-to-face encounters. The results from 2016 and 2020—and the Brexit result as well—suggests that the opposite may be the case. People’s voting reflects truly-held extreme views, while the polling reflects a sort of face-to-face façade.
Marco Varone Yes, I must admit that we had a small advantage in this—compared with many other companies and probably many other players that tried to guess the result of this election or the Brexit—being based where our technology is. Here in Italy, we saw this kind of situation happening much sooner than we have seen happening in other countries. So in Italy, we had, even many years ago, the strange situation where people, when they were polled, for an interview, were saying, “Oh, no, I think that is too extreme. I will never vote for this. I will vote for this other candidate or the other party.” But in the end, when that the elections were over, you saw that, oh, this is not what really happened in the secret of the vote.
So I would say that this is a small secret, a small advantage that we have against many other people that try to guess this result, creating this kind of technology and implementation in Italy, where these small splits or exaggerated positioning decided the vote for the election was happening before then we have seen. Now it’s very common. This is happening not only in the U.S., but also in other countries. It was happening before … So we have been able to understand it sooner and try to adjust and balance our parameters accordingly.
Steven Cherry That’s so interesting. People have, of course, compared the Trump administration to the Berlusconi administration, but I didn’t realize that the comparison went back all the way to their initial candidacies. So in effect, the shy voter theory—especially the shy-Trump voter theory—is basically correct and people express themselves more authentically online.
Steven Cherry Correct. This is what we are seeing, again and again. And it is something that I believe is not only happening in the political environment, but there it’s somehow stronger than in other places. As I told you, we are applying our artificial intelligence solution in many different fields, analyzing the feedback from customers of telco companies, banks, insurance companies. And you see that when you look at, for example, the content of the e-mails, or let me say official communication that they are exchanging between the customer and the company, everything is a bit smoother, more natural. The tone is under control. And then that when you see the same kind of problem that is discussed in a social content, everything is stronger. People are really trying to give a much stronger opinion, saying, I’ll never buy this kind of service or I had big problems with this company.
And so, again, this is something that we have seen also in other spaces. In the political situation, I believe it is even stronger because they are really not buying something like when you are interacting with a company, but you are trying to give your small contribution to the future of your country or your state or your local government. So probably there are even stronger sentiments and feelings for people. And in the social situation, they are really free because you are not really identified—normally you can be recognized, but in many cases you are not linked to the specific person doing that. So I believe that that is the strongest place where there is this, “Okay, I really wanted to say what I think, and this is the only place where I will tell this, because the risk of having a sort of a negative result is smaller.”
Steven Cherry Yeah. So not to belabor the point, but it does seem important. It’s commonly thought that the Internet goads people into holding more extreme positions than they really do, but the reality is that it instead frees them to express themselves more honestly.
A 2015 article in Nature argued that public opinion has become more extreme over time, and then the article looks at some of the possible causes. I’m wondering if you have seen that in your work and is it possible that standard polling techniques simply have not caught up with that?
Marco Varone Yes, I think that we can confirm we have seen that this kind of change we have … We are applying our solution to social content for a good number of years. I would say not exactly from the start because you need to have the sort of a minimum amount of data but it’s been a big number of years. And I can confirm, yes, it’s something that we have seen that it is happening. I don’t know exactly if it is also something that is linked to the fact that people that are more vocal on such a content are also part of the new generation of people that are younger, that have been able to use these kinds of channels of communication actively more or less from the start. I think that there are different element on this, but for sure I can confirm this.
And in different countries, we have seen some significant variation. For example, you should expect that here in Italy it’s super-strong because the Italian people, for example, are considered very … They don’t fear to express their opinion, but I will say that in the U.S. and also in the U.K., we are seeing [it] even stronger. Okay, so it’s happening in all the countries where we are operating and there are some countries where it’s even stronger than another one. You will not be surprised that, for example, when you analyze there, the content in Germany, such a content, everything is somehow more under control, exactly as you expect. So sometimes there are surprises. In other situations that are things that are more or less as you expect.
Steven Cherry I mentioned earlier Amazon, Netflix and Google. Are there similarities between what you’re doing here and what recommendation engines do?
Marco Varone There are elements in common and there are significant differences. The elements in common that they are also using the capability that they have in analyzing the textual content to extract elements for the recommendation, but they are also using a lot of other information. For us that when you analyze something more or less, the only information that we can get access to is really that the tweets, the posts, the articles, and other similar things. But for Amazon, they have access—or for Netflix—to get a lot of other information. So on Amazon, you have the clicks, you have the story of the customer, you have the different path that has been followed in navigating the site. They have historical information. So they have a much richer set of data and the text part is only somehow a complement of it. So there are elements in common and differences. And the other difference is that all these companies have a very shallow capability of understanding what is really written—in a comment, in a post, in a tweet—they tend to work more or less on a keyword level. Okay, this is a negative keyword; this is a positive keyword. With our AI intelligence, we can go deeper than that. So we can get the emotion, the feeling—we can disambiguate much better small differences in the expression of the person because we can go to a deeper level of understanding. It is not like a person. A person is still better than understanding all the nuances, but it’s something that can add more value and allows us to compensate—up to a point—to the fact that we don’t have access to this huge set of other data that these big companies easily have because they track and they log everything.
Steven Cherry I’m not sure humans always do better. You know, one of my complaints about the movie rating site Rotten Tomatoes is they take reviews by film reviewers and assess whether the review was a generally positive or generally negative review. It’s incredibly simplistic. And yet, in my opinion, they often get it wrong. I’d love to see a sentiment analysis software attack the movie-rating problem. Speaking of which, polling is more of a way to show off your company’s capabilities, yes? Your main business involves applications in industries like insurance and banking and publishing?
Marco Varone Correct. Absolutely. We decided that we would do it from time to time, as is said, to apply our technology and our solutions to this specific problem, not because we want to become a competitor of the companies doing these polls, but because we think it is a very good way to show the capability and the power of our technology and our solution, applied to a problem that is easily understood by everybody.
Normally what we do is to apply this kind of approach, for example, in analyzing the customer interaction between the customers and our clients or analyzing big amounts of social content to identify trends, patterns, emerging elements that can be emerging technologies or managing challenges.
Part of our customers are also in the intelligence space. So public police forces, national security, intelligence agencies … and use our AI platform to try to recognize possible threats, to help investigators and analysts to find the information that they want to find in a much faster and more structured way. Finally, I will say that our historical market is in publishing. Publishers are always searching for a way to enrich the content that they publish with the additional metadata so that the people reading and navigating inside the knowledge can really slice and dice the information across many dimensions or can then focus on specific topics, a specific place, or specific type of event.
Steven Cherry Returning to polling, the Pew Research Center is just one of many polling organizations that looked inward after 2020 and as far as I can tell, concluded that it needed to do still better sampling and weighting of voters. In other words, they just need to do a better job of what they had been doing. Do you think they could ever succeed at that or are they just on a failed path and they really need to start doing something more like what you’re doing?
Marco Varone I think that they are on a failed path and they need to really merge the two approaches. I believe that for the future, they really need to keep the good part of what they did for many, many years, because there is still a lot of value in that. But they are obliged to add this additional dimension because only working together with these two approaches, you can really find something that can give a good result. And I would say good prediction in the majority of the situations, even in these extreme events that are becoming more and more common. And this is sort of a part of how the world is changing.
So we think that they need to look at the kind of artificial technology, artificial intelligence technologies that we and other companies are making available because you cannot continue. This is not a problem of tuning the existing formulas. They should not discard it. It would be a big mistake, but for sure, in my opinion, they need a tool to blend the two things and spend the time to balance this combined model, because, again, if you just then merge the two approaches without spending time on balancing, the result would be even worse than what they have now.
Steven Cherry Well, Marco, I think that’s a very natural human need to predict the future, to help us plan accordingly, and a very natural cultural need to understand where our fellow citizens stand and feel and think about the important issues that face us. Polling tries to meet those needs. And if it’s been on the wrong path these many years, I hope there’s a right path and hopefully you’re pointing the way to it. Thanks for your work and for joining us today.
Marco Varone Thank you. It was a pleasure.
Steven Cherry We’ve been speaking with Marco Varone, CTO of Expert.ai, about polling, prediction, social media, and natural language processing.
Radio Spectrum is brought to you by IEEE Spectrum, the member magazine of the Institute of Electrical and Electronic Engineers, a professional organization dedicated to advancing technology for the benefit of humanity.
This interview was recorded November 24, 2020. Our theme music is by Chad Crouch.
You can subscribe to Radio Spectrum on the Spectrum website, Spotify, Apple Podcast, or wherever you get your podcasts. You sign up for alerts or for our upcoming newsletter. And we welcome your feedback on the web or in social media.
For Radio Spectrum, I’m Steven Cherry.
Note: Transcripts are created for the convenience of our readers and listeners. The authoritative record of IEEE Spectrum’s audio programming is the audio version.
We welcome your comments on Twitter (@RadioSpectrum1 and @IEEESpectrum) and Facebook.