How a University of Nebraska supercomputer figured out what we want to read: BTN LiveBIG
What separates a bestselling novel from a non-bestselling novel? That’s the code University of Nebraska-Lincoln English professor and associate dean Matthew Jockers has cracked, with a little help.
Jockers, along with his co-author Jodie Archer, PhD, a former acquisitions editor for Penguin Books UK, enlisted the help of Tusker. Tusker is a room-sized cluster computer which read 4,500 novels, quantifying the elements of a book to show what separates the successful ones from the not-so-successful ones.
Jockers and Archer detail their research in their new book, “The Bestseller Code”, available now. BTN LiveBIG spoke with Jockers to learn more about Tusker and cracking the bestseller code.
BTN LiveBIG: Explain to us how the Tusker algorithm works, how it determines what is a bestseller and what isn’t.
Matthew Jockers: The first thing that happens is there’s a bit of code that splits the book into every single sentence. It isolates each sentence and then it identifies the grammatical structure of each one of those sentences. Then, it goes through the series of steps which includes everything from part of speech tagging and grammatical information.
We also have a part of the process that identifies the primary or major themes that are in each book. All this information gets extracted and saved so each book becomes a row in a spreadsheet.
In the next phase, you feed that massive spreadsheet to a different type of algorithm; you tell it which of the books that are the bestsellers and which are the books that are not bestsellers. That machine looks at that huge string of data and tries to identify common patterns in bestsellers, and the common patterns in non-bestsellers.
BTN LiveBIG: Can you apply this formula when writing a book?
MJ: No, the system is designed to look at an entire manuscript. … It’s not like we can give you a formula or a prescription for how it affects you.
BTN LiveBIG: Can Tusker measure all types of books?
MJ: It could measure anything, but that would be a bad decision. I’s designed to look at adult fiction. Applying it to non-fiction doesn’t make a lot of sense, although I think that it would work with biographies, which are very narrative in nature.
BTN LiveBIG: What makes a bestseller?
MJ: I’ll tell you what those features are, but first I’ll give you my caveat — we found out about 2,800 features that the machine found were useful in helping it distinguish between a bestseller and non-bestseller. If I talk about three or four of those 2,800 features, that’s a pretty radical reduction of material the machine is using. And the machine is actually not just looking at two or three features in isolation, it’s looking at all of these features as an aggregate thing. So that’s my disclaimer.
What we found is that bestsellers tend to have three dominant topics. Then we found that there was one particular topic that sort of had to be present in, or it was always present in bestsellers but was not always present in non-bestsellers in the right proportion. That topic we called human closeness. It’s different from sexual closeness, which is a topic that the machine told us is not a bestselling topic. Sex did not sell, despite “Fifty Shades of Grey”, which is an outlier. But this topic of human closeness is much more about sort of connected interpersonal relationships.
When we study character … we found that bestselling characters are engaged in actions — less passive, more active.
And what we what we’ve discovered in the style chapter is that bestselling authors have a style that’s a bit more colloquial, a bit more in the language of a common person. It’s not slang, but it’s not formal academic highbrow prose.
BTN LiveBIG: What, according to Tusker, is the unequivocal bestseller?
MJ: We said, “all right machine, which book has the best combination of all of these things (topic, characters, style and plot).” Now that doesn’t mean what’s the book that’s the best bestseller, it just means which book has everything going for it.
And the book was called “The Circle.” It’s a novel by Dave Eggers, and the wonderful, wonderful irony is that it’s a dystopian book about what happens with technology when things get out of control. So it’s a book that’s in many respects very critical of big data, and of course the computer picked it. There was a beautiful irony in that.
BTN LiveBIG: Have you run “The Bestseller Code” through Tusker?
MJ: No [laughs], I need to do that actually. People keep asking that question, but it’s designed to analyze fiction.
But we have a few of the reviews of “The Bestseller Code” that have said things like. “It’s a gripping journey, the book itself is a real page turner.” And so I suppose I owe it to us to analyze it and see what happens.