Spotify, Machine Learning, and the Business of Recommendation Engines

Post Syndicated from Steven Cherry original https://spectrum.ieee.org/podcast/consumer-electronics/audiovideo/spotify-machine-learning-and-the-business-of-recommendation-engines

Steven Cherry Hi, this is Steven Cherry for Radio Spectrum.

You’re surely familiar—though you may not know it by name—with the Paradox of Choice; we’re surrounded by it: 175 salad dressing choices, 80,000 possible Starbucks beverages, 50 different mutual funds for your retirement account.

“All of this choice,” psychologists say, “starts to be not only unproductive, but counterproductive—a source of pain, regret, worry about missed opportunities, and unrealistically high expectations.”

And yet, we have more choices than ever— 32,000 hours to watch on Netflix, 10 million e-books on our Kindles, 5000 different car makes and models, not counting color and dozens of options.

It’s too much. We need help. And that help is available in the form of recommendation engines. In fact, they may be helping us a bit too much, according to my guest today.

Michael Schrage is a research fellow at the MIT Sloan School’s Initiative on the Digital Economy. He advises corporations— including Procter & Gamble, Google, Intel, and Siemens—on innovation and investment, and he’s the author of several books including 2014’s The Innovator’s Hypothesis, and the 2020 book Recommendation Engines, newly published by MIT Press. He joins us today via Skype.

Steven Cherry Michael, welcome to the podcast.

Michael Schrage Thank you so much for having me.

Steven Cherry Michael, many of us think of recommendation engines as those helpful messages at Netflix or Amazon, such as people like you also watched or also bought, but also Yelp and TripAdvisor, predictive text choices and spelling corrections. And, of course, Twitter posts, Google results and they order things are your Facebook feed. How ubiquitous are recommendation engines?

Michael Schrage They’re ubiquitous. They’re pervasive. They’re becoming more so, in no small part because of the way the reason for which you set up this talk. There’s more choices and more choices aren’t inherently better choices. So what are the ways that better data and better algorithms can personalize or customize or in some other way make more relevant a choice or an opportunity for you? That is the reason why I wrote the Recommendation Engines book, because this issue of, on one hand, the explosion of choice in the absence of time, the constraints of time, but the chance, the opportunity to get something that really resonates with you, that really pleasantly and instructively and empoweringly helps you—that’s a big deal. That’s a big deal. And I think it’s a big deal that’s going to become a bigger deal as machine learning algorithms kick in and our recommender systems, our recommendation engines become even smarter.

Steven Cherry I want to get to some examples, but before we do, it seems like prediction and recommendation are all tied up with one another. Do we need to distinguish them?

That is an excellent, excellent question. And let me tell you why I think it’s such an excellent question. When I really began looking into this area, I thought of recommendation as just that, you know, analytics and analytics proffers out relevant choices. In fact, depending upon the kind of datasets you have access to, one can and should think of recommendation engines as generators of predictions of things you will like. Predictions of things you will engage with. Predictions of things you will buy. Now, what you like and what you engage with and what you buy may be different things to optimize around, but they are all different predictions. And so, yes—recommendation engines are indeed predictors.

Steven Cherry Back in the day Spectrum had some of the early articles on the Netflix Prize. Netflix now seems to collect data on just about every moment we spend on the service … what we queue up, how much bingeing we do, where exactly we stop a show for the night or forever …. It switched from a five-star rating system to a thumbs up or down, and it seems it doesn’t even need that information when there’s so much actual behavior to go by.

Michael Schrage You are exactly right on the Netflix instrumentation. It is remarkable what they have learned and what they do—and decline to disclose about what they’ve learned—about the difference between what’s called implicit versus explicit ratings. Explicit ratings are exactly what you’ve described, five stars. But in fact, thumbs up, thumbs down turns out to be statistically quite relevant and quite useful. The most important thing Netflix discovered and of course, let’s not forget that Netflix didn’t begin as a streaming service. It began with its delivery system being the good old-fashioned United States Postal Service. What Netflix discovered was people’s behavior was far more revealing and far more useful in terms of, yes, predicting preference.

Steven Cherry So Netflix encourages us to binge-watch. But it seems YouTube is really the master at queuing up a next video and playing it … you start a two-minute video and find that an hour has gone by … In the book, you say Uber uses much the same technique with its drivers. How does that work?

Michael Schrage Yes! YouTube has really literally, not just figuratively re-engineered itself around recommendation. And TikTok was a born recommender. The circumstances for Uber were somewhat different because what Uber did and its analytics was it discovered that if there was too much downtime between rides, some of its drivers would abandon the platform and become Lyft drivers. So the question then became, how do we keep our power drivers engaged constructively, productively, cost-effectively, time-effectively engaged? And they began to do stacking and they began to—the team using a platform I think called Leonardo—began analyzing what kind of requests were coming in. And they began sorting out and stacking requests so that drivers could literally, as they were dropping somebody off, have the option, a recommendation of one or two or three, depending upon what the flow was, what the queue was, what ride match they could do next.

And that was a very obviously a very, very big deal because it was a win-win for the platform. It gave more choices for people who wanted to use the ride-hailing service. But it also kept the flow of drivers very, very smooth and consistent. So it was a demand-smoothing and a supply-smoothing approach. And recommendation is really key for that. In fact, forgive me for going on a bit longer on this, but this was one of the reasons why Netflix went into recommender systems and recommendation engines, because, of course, everybody wanted the blockbuster films back in the DVD days. So what could we give people if we were out of the top five blockbusters? So this was the emergence of recommendation engines to help manage the long- or longer-tail phenomenon. How can we customize the experience? How can we do a better job of managing inventory? So there are certain transcendent mathematical algorithmic techniques that were as relevant to the Uber you hail as to the movie you watch.

Steven Cherry Recommendation engine designers draw from psychology, but also you say in the book from behavioral economics and then even one more area persuasion technology. How do these three things differ and how did they fit into the recommendation engines?

Michael Schrage They are very different flavors—and I’m very much appreciative of that question and that perception because there’s the classic notion that, you know, the recommendation it’s about persuasion and there is persuasion technology. It’s been called captiveology. And it’s the folks at Stanford were pioneers in that regard. And one of the people in the class, one of the captiveology classes, persuasion technology classes, ended up being a founder, a co-founder of Instagram.

And the whole notion there is how do we persuade or how do we design a technology was rooted in technology. How do we design and technology, engagement or interaction to create persistent engagement, persistent awareness? So in Stanford, for understandable reasons, it was rooted in technology. Psychology, absolutely. The history of choice presentation you mentioned about the tyranny, the paradox of choice. Barry Schwartz’s work in Swarthmore. How do people cognitively process choice? When does choice become overwhelming? There’s the famous article by George Miller, the magic number seven plus or minus two, which talks about the cognitive constraints that people have when they are trying to make a decision. This is where psychology and economics began to intersect with Herb Simon, who was one of the pioneers of artificial intelligence from Carnegie Tech and then Carnegie Mellon—bounded rationality. So there are limits on what we can’t remember everything. So what are the limits and how do those limits constrain the quantity and quality of choices that we make?

This evolved into behavioral economics. This was Daniel Kahneman and Amos Tversky, and Kahneman also won the Nobel Prize in economics. Cognitive heuristics, shortcuts. And basically what you had was the incred—because of the Internet, you had all of these incredible software and technical tools, and the Internet became the greatest laboratory for behavioral, economic, psychological, and captiveology experiments in the history of the world. And the medium, the format, the template, which made the most sense for doing this kind of experimentation. These kinds of experimentation, mashing up persuasion technology with behavioral economics was recommender systems, recommendation engines. They really became the domain where this work was tested, explored, exploited. And in 2007, the RecSys, the Recommendation Systems Conference, academic conference, was launched internationally and people from all over the world, and most importantly, from all of these disciplines, computer science, psychology, economics, etc., they came and began sharing stuff in this regard. So it’s a remarkable, remarkable epistemological story, not just a technological or innovation story.

Steven Cherry You point out that startups nowadays are not only born digital but born recommending. You have three great case studies in the book, Spotify, ByteDance, which is the company behind TikTok, and Stitch Fix, which is a billion-dollar startup that applies data science to fashion. I want to talk about Spotify because the density of data is just a bit mind-bending here. Two hundred million users, 30 million songs, spectrogram data of each song’s tempo key and loudness, activity logs and user playlists. micro-genre analysis…. You were particularly impressed by Spotify Discover Weekly Service, which uses all of that data. Could you tell us about it?

Michael Schrage Yes. And it’s the fifth anniversary and I just got a release saying that they’ve been over 2.3 billion hours streamed under Discovery. And the whole idea was that it’s an obvious idea, but it’s an obvious idea that’s difficult to do. The idea was, what can you listen to that you’ve never heard before? Discover. This is key. One of the key elements of effective recommender-system/recommendation- engine design is discovery and serendipity. And what they did was basically launch a program where you get a playlist, where you get to hear stuff that you’ve never heard before but that you are likely to like. And how can they be confident that you are likely to like it? Because it deals directly with everything that you mentioned in setting up the question. The micro-genres, the tempo, the cadence, the different genres, what you have on your playlist, what your friends have on their playlists, etc. And of course, as with Netflix, they track your behavior. How long do you listen to the track? Do you skip it, etc? And it’s proven to be remarkably successful.

And it illustrates to my mind one of the most interesting ontological, epistemological, esthetic, and philosophical issues that that recommendation engine design raises: What is the nature of similarity? How similar is similar? What is the more important similarity in music? The lyrics? The tempo, the mood, the spectrograph, the acoustics, the time of day? What are the dimensions of similarity that matter most? And the algorithms that either individually or ensembled tease out and identify and rank those similarities. And based on those similarities, proffer this list, t his playlist of songs, of music you are most likely to like. It’s remarkable. It’s a prediction of your future taste based on your past behavior. But! But! Not in a way that is simply, no pun intended, an echo of what you’ve liked in the past.

But it represents a trajectory of what you are likely to like in the future. And I find that fascinating because it’s using the past to predict serendipitous, surprising, and unexpected future preferences, seemingly unexpected future preferences. I think that’s a huge deal.

Steven Cherry Yeah, the music case is so interesting to me, I think, because, you know, we want to hear new things that we’re going to like, but we also want to hear the old stuff that we know that we like. It’s that mix that’s really interesting. And it seems to me that you go to a concert and the artist, without all of the machinery of a recommendation engine, is doing that, him- or herself. They’re presenting the stuff off of the new album, but they’re making sure to give you a healthy selection of the old favorites.

I’m going to make a little bit of a leap here, but something like that, I think goes on with—you mentioned ranking and we have this big election coming up in the U.S. and a handful of jurisdictions have moved to ranked-choice voting. In its simplest form, this is where people select not just their preferred candidate, but they rank them. And then after an initial counting, the candidate with the fewest votes as the first choice has dropped out from the count and their ballots get redistributed based on people’s number-two choices and so on until there’s a clear winner.

The idea is to get to a candidate who is acceptable to the largest number of voters instead of maybe one that’s more strongly preferred by a smaller number. And here’s the similarity where I think a concert is sort of in a crude form doing what the recommendation engine does. Runoff elections do this in a much cruder way. And so my question for you is, is there some way in a manageable form for recommendation systems to help us in voting for candidates and and and help us get to the candidate who is most acceptable to the largest number of voters?

Michael Schrage What a fascinating question. And just to be clear, my background is in computer science and economics. I’m not a political scientist. Some of my best friends are political scientists. And let me point out that there is a very, very rich literature on this. And I would go so far to say that people who are really interested in pursuing this question should look at public choice literature. They should go back to the French. You know, Condorcet … the French came up with a variety of voting systems in this regard. But let me tell you one of my immediate, visceral reactions to it. Are we voting for people or are we voting for policies? What would be the better option or opportunity for people: to vote for a referendum on immigration, public health, or for the people to enact a variety of policies where whereas, you know, we do not have direct democracy. There are certain areas there, certain states where you can, of course, you know, vote on directly on a particular question. The way I would repurpose your question would be, do we want recommendation engines that help us vote for people? Help us vote for policies? Or help us vote for some hybrid of the two?

Why am I complicating your seemingly simple question? Precisely because it is the kind of question that forces us to ask, “Well, what is the real purpose of the recommendation?” To get us to make a better vote for a person or to get a better outcome from the political system and the governance system?

Let me give you a real-world example that I’ve worked with companies on. We can come up with a recommendation system recommendation engine that is optimized around the transaction, getting you to buy now. Now! We’re optimizing recommendations so that you will buy now! Or we say hmm, that customer, we can have a relationship with. Maybe what we should do is have recommendations that optimize customer lifetime value. And this is one of the most important questions going on for every single Internet platform company that I know. Google had exactly this issue with YouTube; it still has this issue with its advertising partners and platform. This is the exact issue that Amazon has. Clearly, it regards its Prime customers from a customer lifetime value perspective. So your political question raises the single most important issue. What are the recommendations optimized for: the vote in this election or the general public welfare over the next three to four to six years?

Steven Cherry That turned out to be more interesting than I thought it was going to be. I’d be remiss if I didn’t take a minute to ask you about your previous book, The Innovator’s Hypothesis. Its core idea is that cheap experiments are a better path to innovation than brainstorming sessions, innovation vacations, and a bunch of things that companies are often advised to do to promote innovation and creativity. Maybe you could just take a minute and tell us about the 5×5 framework.

Michael Schrage Oh, my goodness. So I’d be delighted. And just to be clear, the book on recommendation engines could not and would not have been written without the work that I did on The Innovator’s Hypothesis.

You have five people from different parts of the organization come up with a portfolio of five different experiments based on well-framed, rigorous, testable hypotheses. Imagine doing this with 15 or 20 teams throughout an organization. What you’re creating is an innovation pipeline, an experimentation pipeline. You see what percent and proportion of your hypotheses address the need of users, customers, partners, suppliers. Which ones are efficiency-oriented? Which ones are new value-oriented? Wow! What a fascinating way to gain insight into the creative and innovative and indeed the business culture of the enterprise. So I wanted to move the Schwerpunkt, the center of gravity, away from, “Where are good ideas coming from?” to, “Can we set up an infrastructure to frame, test, experiment, and scale business hypotheses that matter? What do you think a recommendation engine is? A recommendation engine is a way to test hypotheses about what people want to watch, what they want to listen to, who they want to reach out to, who they want to share with. Recommendation engines are experimentation engines.

Steven Cherry I’m reminded of IDEO, the design company. It has a sort of motto, no meeting without a prototype.

Michael Schrage Yes. A prototype is an experiment. A prototype shouldn’t be…. Here’s where you know people are cheating: when they say “proof of concept.” Screw the proof of concept. You want skepticism when you want to validate a hypothesis. Screw validation. What do you want to learn? What do you want to learn? Prototypes are about learning. Experimentation is about learning. Recommendation engines are about learning—learning people’s preferences, what they’re willing to explore, what they’re not willing to explore. Learning is the key. Learning is the central organizing principle for why these technologies and these opportunities are so bloody exciting.

Steven Cherry You mentioned Stanford earlier and it seems there’s a direct connection between the two books here, and that is the famous 2007 class at Stanford where Facebook’s future programmers were taught to “move fast and break things.” There was a key class taught by the experimental psychologist B.J. Fogg.

Michael Schrage Right. B.J. is a remarkable guy. He is one of the pioneers of captiveology and persuasion technology. And one of the really impressive things about B.J. is he really took a prototyping/experimentation sensibility to all of this.

It used to be that the deliverable if you took a class on entrepreneurship at Stanford or M.I.T.—and this is as recent as a decade ago—is what did you have to come up with? What was your deliverable? A business plan. Screw the business plan! With things like GitHub and open-source software, you now have to come up with a prototype. What’s the cliché phrase? The MVP: the minimum viable prototype. Okay? But that’s really the key point here. We’re trying to turn prototypes into platforms from learning what matters most, learning about what matters most in terms of our customer or user orientation and value, and learning what matters most about what we need to build and how it needs to be built. How modular should it be? How scalable should it be? How bespoke should be?

What’s the big difference between this in 2020 and 2010? What we’re building now will have the capability to learn—machine learning capabilities. One of the bad things that happened to me is I wrote my book was, far faster than I expected machine learning algorithms colonized the recommendation engine/recommender system world. And so I had to get up to speed and get up to speed fast on machine learning and machine learning platforms because recommendation engines now are the paragon and paradigm of machine learning worldwide.

Steven Cherry Well, it seems that once our basic needs are satisfied, the most precious commodities, we have our time and attention. One of the central dilemmas of our age is that we may be giving over too much of our everyday life to recommendation engines, but we certainly can’t live our overly complex everyday lives without them. Michael, thank you for studying them, for writing this book about them, and for joining us today.

Michael Schrage My thanks for the opportunity. It was a pleasure.

Steven Cherry We’ve been speaking with Michael Schrage of the MIT Sloan School and author of the new book, Recommendation Engines, about how they are influencing more and more of our experiences.

This interview was recorded 26 August 2020. Our audio engineering was by Gotham Podcast Studio in New York. Our music is by Chad Crouch.

Radio Spectrum is brought to you by IEEE Spectrum, the member magazine of the Institute of Electrical and Electronics Engineers.

For Radio Spectrum, I’m Steven Cherry.

Note: Transcripts are created for the convenience of our readers and listeners. The authoritative record of IEEE Spectrum’s audio programming is the audio version.

We welcome your comments on Twitter (@RadioSpectrum1 and @IEEESpectrum) and Facebook.