Now that we have our voting history data in the form of a Pandas data frame, we can now move on to the next step – quantifying relationships in Survivor.
[This is the second article in my Survivor Alliance Analysis series. The first article focuses on scraping the Survivor wiki for voting history data. The third article focuses on visualizing the network of relationships. The fourth article focuses on comparing the alliance networks of different seasons.]
Numerous studies have studied the TV show Survivor. One such study analyzed the portrayal of women in the show, while another looked at the psychology of jury votes. However, one thing I noticed is that all of the articles I’ve read study the show from a qualitative angle. I believe that there is some benefit in using a computational perspective, and this is what I’ll be doing in this article.
Relationships are the most important element to survive in Survivor. Relationships lead to alliances and alliances lead to voting patterns. The question is, can we actually go the other way around? By analyzing voting history data, can we draw a map of castaway relationships over time?
In the last post, we scraped the Survivor wiki for voting data and constructed a data frame out of it. For example, Caramoan’s data frame looks like this:
You may notice certain weird things from the data frame since I made three simplifications.
First. The columns in the data frame represent voting rounds, not tribal councils. We define a voting round to be an instance wherein castaways vote. Whenever a tribal council ends in a tie, a re-vote is done. This means that a single tribal council can have two or more voting rounds, depending on if ties occur. There are also times when someone gets eliminated but doesn’t leave via tribal council. For example, when a castaway gets sick and has to be evacuated, or when a castaway simply gives up and quits the game. These eliminations are not considered.
Second. Cancelled votes are still counted. Since cancelled votes are still indicators of the castaways’ relationships, they should still be considered in the analysis.
Third. Jury votes are not considered.
When two castaways vote to eliminate the same person, the two are usually in an alliance. To represent relationships such as these, we introduce a simple affiliation index between each pair of castaways present in a voting round. This affiliation index can be either 1 or 0. It’s 1 if the pair voted together, and 0 if the pair did not.
To fix notation, let us consider contestants i and j at voting round t. The affiliation index of the two at time t is written as
You may notice that voted together is a bit vague. How can we tell if the pair of castaways voted together? Thinking about it, the most obvious case is when the pair of castaways voted to eliminate the same person.
However, this does not take into account a common strategy employed by a majority alliance to pick off minority members: vote splitting. Sometimes, the majority alliance is in a position comfortable enough that they are able to split the votes between two minority members. Why would they do that? To nullify the hidden immunity idol of the minority member, if he decides to play one.
To recognize a split vote, we have to look at the vote of everyone, including the targets. Usually, there are more than two names in the ballots during a split vote – the two targets of the majority alliance and the targets (one or more) of the minority. My algorithm scans through each target, and looks if whether two targets vote the same way (my algorithm is more general, as I’ll explain later). If the two targets have the same vote, then the castaways who voted for the two targets are considered to be in an alliance.
… That still sounds really confusing, so I’ll illustrate with an example.
Think back on the Andrea boot in Survivor Caramoan. Here, Cochran, Dawn, Sheri, Erik, and Brenda split their votes between Eddie and Andrea. On the other hand, Eddie and Andrea voted for Brenda.
Now, since Eddie and Andrea voted for the same person, we can consider the pair to be in an alliance, although a minority one. Not only that, we can conclude further that Cochran, Dawn, Sheri, Erik and Brenda are in an alliance as well even though some voted for Eddie and some voted for Andrea, since Eddie and Andrea targeted the same person within their group.
My algorithm recognizes this type of scenario. In fact, it’s more general – Andrea and Eddie don’t have to vote for the same person, they can vote for two people as long as these two people vote for either Andrea or Eddie alone. My algorithm doesn’t cover all cases since voting patterns can be quite complex. But as a first approximation, it works quite well.
Hence, each pair in will have affiliation index equal to 1 in the Andrea-boot voting round, and the pair Eddie-Andrea will have value 1 as well. All other pairs will have 0.
If we collect the affiliation indices for each pair of castaways for each voting round, we will be able to construct an affiliation data frame where each row is a castaway pair and each column is a voting round. Each entry in the data frame will either be a 1, 0, or NaN. The latter will happen if the castaway pair did not attend the voting round together.
The implementation of the algorithm can be found here. It’s quite long, so I didn’t include it in the post.
We now have some vague idea on who voted together during each voting round. However, what we actually want to analyze is the dynamics of the castaways’ relationships over the course of the season, not just individual voting rounds.
We can use the affiliation index to define a quantity, the alliance index, to see how relationships change over time.
Suppose that we have two contestants i and j at voting round t. The alliance index is defined as
where vot(i,j,t) is the number of voting rounds that i and j attended together up to voting round t. Basically, the alliance index is the average of contestant i and j’s affiliation index up to the voting round t.
For intuition, let’s introduce two hypothetical castaways, Linda and Ty. Suppose that voting round 7 just ended. Prior to round 7, Linda and Ty experienced 5 voting rounds together. Suppose that their alliance index currently sits at 0.80. This means that during the 5 voting rounds they attended together, they voted together 80% of the time (or 4 out of 5 voting rounds).
We can calculate the alliance index of a castaway pair using the following script. Here, series refers to a row in the affiliation data frame.
As the number of voting rounds used to calculate the alliance index gets higher, the more information the alliance index contains. If we want to summarize a pair of castaway’s relationship over the whole season, we can calculate their alliance index up to the last voting round. If we do this for every pair of castaway, we can rank the resulting alliance indices and identify the top m relationships of each Survivor castaway.
However, there is one caveat. The more voting rounds that a pair of castaways attended together, the more confident we are that the alliance index is significant. Say, if Linda and Ty only experienced one voting round together, and if they actually voted together, then their alliance index would be 100%. If they experienced 10 voting rounds together, and voted together in all of these, then their alliance index would also be 100%. It’s apparent, however, that the alliance index in the second case is much more significant than the first, since their relationship has been tested a lot more.
For example, in the case of Millennials vs. Gen X, Adam-Mari would register an alliance index of 1.0, since they both voted for Figgy in Mari’s one and only tribal council. However, we would expect Adam-Hannah to rank better than Adam-Mari, even though Adam and Hannah voted differently in the first tribal, since we are considering the season as a whole.
We can get around this issue by using the number of voting rounds together (let’s call this k) as a filter. Consider again our hypothetical castaway Linda. Let’s fix k=5. What we mean by that is we scan all other castaways and consider only those who have attended 5 or more voting rounds with Linda. From those that remain, we select the m that have the highest final alliance index with Linda. In our Millennials vs. Gen X example, the Adam-Mari pair won’t be considered since Adam and Mari only attended one voting round together.
The best way to see the dynamics of the alliance index is to fix a contestant and plot his alliance index with other castaways as a time series over voting rounds. The following code plots the top m relationships (with a filter k) for each castaway in a specified season.
The alliance index of a castaway pair can either increase, stay constant, or decrease in each succeeding voting round. When the alliance index increases, we can think of the castaway pair’s relationship strengthening (since they voted together). If it is stays constant, then either they didn’t attend the voting round together, or one of them got voted out already. If it decreases, then the castaway pair didn’t vote together in the voting round. In this case, we can think of it as their relationship weakening.
To demonstrate, let’s take a look at the top 5 relationships of each castaway in Survivor Millennials vs. Gen X’s Final 10. To get significant alliance indices, let’s set our minimum number of ‘voting rounds together’ filter to 5.
Four of Jessica’s top relationships are from the Gen X tribe, which is expected since she is a Gen X-er as well. Note that Sunday and Bret are unexpected here. She didn’t have a strong bond with the two of them, which is apparent from the plot trends. Initially, her alliance index with Sunday is high, since they voted together at the first council, but then it slowly decreases. Her alliance index with Bret, on the other hand, starts out low, increases a bit, and ends quite low.
We can also see here how Jessica’s relationship with David goes up, probably due to David using his hidden immunity idol to save Jessica. Her alliance index with Adam starts at voting round 5 when the tribe expansion took place. Then it decreases during the Taylor vote (which shouldn’t have been the case since they split the vote – my algorithm didn’t detect this since Jay and Taylor didn’t have the same target), and increases thereafter.
Adam and David tie as Zeke’s top relationship during the whole season (their plots overlap starting at voting round 4). They voted together in all the rounds until the Hannah-Zeke tie. Following the tie, Zeke’s relationship with the two went down. On the other hand, his relationship with Hannah, Jay, and Wil increased over the season.
I forgot what happened with Wil. I think he was initially aligned with Hannah, Jay and the others in the Mari vote, explaining the high index. Eventually, he went with the Gen X-ers (Bret and Sunday) + Jay. But he flipped quite a number a number of times. I think the plot above tells an interesting story.
I think the algorithm detected Sunday’s top alliance members well. Bret, Jay, and Chris tie at the top 1 spot by the end of her Survivor run.
Sunday and Wil are at the top of Jay’s relationships. A close third is Bret. His relationship with Wil started high, but it went down since he flipped temporarily to Adam, then went back to voting with Jay.
Unsurprisingly, Ken appears at the top of David’s ranking. Adam and Hannah follow. Starting round 15, Adam and David’s index started going down – this is the point wherein Adam started targeting David. Also, Jessica is at the 5th spot, even if David is at Jess’s 3rd spot in her ranking. (Personal opinion: I wanted David to win.)
David would’ve been Ken’s top 1 if he didn’t vote for David at the last tribal council. Adam and Hannah are Ken’s top 2, which is expected since they have been in an alliance for a long time. We can see that over the course of the season, Ken’s relationship with Jessica developed.
Ken, David, Adam are at the top, as expected. Wil and Bret also appear here, surprisingly. Starting voting round 8, Ken, David, and Adam’s indices with Hannah show an upward trajectory until the last vote when the group voted out David.
Adam’s plot is nice to look at because we can see which voting rounds he turned on his alliance. Starting voting round 10 (Hannah-Zeke vote), his index with Zeke started going down. Starting round 15 (David-Bret vote), his index with David started going down. We also see here that, initially, Hannah and Adam didn’t have a good relationship, but over the season grew significantly.
The algorithm for detect castaways who are voting together isn’t perfect. There are a lot of complex scenarios that it doesn’t recognize. Further tweaking is necessary to obtain more accurate results.
The analysis done above is more descriptive than predictive. You can’t really look at the plots and say that in the next voting round, two castaways would turn on each other. Really, then, what’s the point of all of this?
One other way we can use the alliance index is to summarize the relationships in a season. We can try to summarize the relationships of the castaways in a single, neat network. The alliance index lends itself perfects to construct a Survivor Alliance Network. From this network, we will be able to see the interconnections of the castaways.
To prepare for the next posts, you can check out my primer on network metrics. This is a very quick introduction to networks and some metrics used to measure centrality and clustering.
As always, the full code is available on my Github. Like the FB page or Subscribe via e-mail (see sidebar) to keep updated when the next post goes live. ‘Til next time!