User:Lucasvb/Majority and consensus under ordinal and cardinal perspectives

From electowiki

The notion of a majority is often claimed to be a defining trait of democracy and voting, if not the main trait. However, when the typical notion of a majority is put under scrutiny, particularly as it is applied to voting, many important issues appear which are usually taken for granted.

Deconstructing the majority

Majority rule is usually presented as a statement similar to "the largest side should win", or the 50% threshold is presented as the magical number after which everything falls into place.

While it feels intuitive that the "largest side should win", one aspect that is rarely addressed is what is a "side"? Additionally, siding with what? How many "sides" are there? Does the 50% threshold make sense if there are more than two?

It should be clear that a majority, if to be sought and if to be considered legitimate at all, should ideally be an inherent property of the voters, based on some underlying degree of cohesion. After all, if it is not an inherent property of the voters (to sufficient degree), then in what sense is majority rule representative of the will of the voters?

This vague notion of "cohesion" poses the important question of whether such a thing even exists under all circumstances, which would warrant for it to be sought as a general goal.

It is reasonable to imagine that, in a sufficiently fractured society, in which many factions compete for their mutually-conflicting interests, one would expect that no such degree of cohesion may exist at all. What are we supposed to do, then? What ought to be the most democratic decision?

This exact scenario is the origins of plurality voting: let each faction be a "side", and voters show their full allegiance to exactly one of these sides. The largest faction wins.

But... In what sense was this a property of the voters? Such factions may not be completely distinct. In reality, people often agree on a few issues while still disagreeing in others. Factions overlap.

So how is forcing voters to pick one side representative of their true beliefs and allegiances?

Consensus and polarization

People can agree or disagree with each other to various degrees. Similarly, groups of people may share agreements and disagreements in various ways. As said earlier, factions overlap.

Considering the reality that there are hundreds if not thousands of potential relevant topics in which agreement and disagreement may occur, and which may play important roles in an election, it would be helpful to use precise terminology to describe the various scenarios.

The most straightforward way to deal with this is to not hastily assign people to factions, and instead look directly at the pertinent issues one by one, and the existing opinions.

Let's imagine we pick one of these issues, and we do an opinion poll on it with all the voters, where they each evaluate their opinions on a scale of [completely disagree], to [neutral], to [completely agree]. In principle, this could be done for any issue we care about.

If it turns out the distribution of answers is concentrated around one of these opinions (no matter where), we can say there is a consensus, that is, the population is roughly in agreement on this issue. People who go against the consensus are holding fringe opinions.

Note that this is distinct from the notion of a "center" or "moderate", which are perceived as relative.

If the issue we picked was something like "dogs should be outlawed as pets", we would find that the consensus is around [completely disagree]. This is an extremist and resolute position for most people, not a centrist or moderate opinion, and it is the overwhelming consensus in our society. In fact, most of the various consensus that exist are extreme: murder should be illegal, taxes should be used for the public good, all children should have access to education, etc.

The notion of a "center" or a "moderate" are rarely useful when one considers the mutual existence of other issues, especially polarizing ones.

Polarization is when the distribution of opinions is concentrated in two separated groups. This could be between the [neutral] opinion and one of the extremes, or it could be between opposite extremes. Between polarization and consensus there is a continuum of possible distributions.

The "center" or "moderate" position, under strong polarization, is by definition an ideological desert: very few people hold that belief, which makes the "center"/"moderate" the fringe under polarization. There is no consensus.

In reality, there are many, many issues at play in any given moment. Some will be polarized, some will be a consensus, others will be somewhere in between. How does this fit with our goal of thinking about a "majority" or "sides"? For that, we need to understand the dynamics of voting and how these multiple issues play a role.

Preference vs. evaluation

Under voting, options are presented, and voters are supposed to express their opinions through a ballot. Opinions can be complex due to the large number of issues, but the ballot is inherently confining. Voters then have to weight-in their priorities on issues to maximize their expression on the ballot, given the circumstances.

One form of expressing such opinions is through ranked ballots, which express preference. Ranked preferences are inherently comparative, so a ranking is always between two options, such as "I prefer A to B" (or simply A>B). If more options exist, the ranking has to, somehow, work for all of them simultaneously. But fundamentally, a preference refers to a choice between exactly two alternatives.

The alternative are rated ballots, which express evaluations. The context of voting inherently makes the evaluations comparative, so rated ballots may still carry the same preference information as ranked ballots. But rated ballots do something more: they evaluate all options under the same comparative scale. The information is not strictly pairwise, but instead considers the relative assessment with respect to all other options, simultaneously. Crucially, under such a framework all information is taken into account together, in aggregate. This aggregate information, it turns out, carries even more information than its parts, as we will see later.

These are two competing traditions to deal with democracy: ordinalism (rankings) and cardinalism (with evaluations).

Now, how does the notion of "majority" and "sides" arise in both?

Ordinalism and the "majority of preference"

Since rankings are inherently pairwise, and must be either decisive (A>B or B>A) or totally indifferent (A=B), they force voters to strictly "take sides" between A and B.

This leads to the typical 50% threshold of majority rule: between two sides, we pick the one with the most people. But as asked in the beginning, what is the legitimacy of this, and how inherent to the voters is this majority of preference?

To illustrate, let us imagine two issues and the distribution of voters regarding those issues. We will first consider two consensus issues, so the entire population is in strong agreement here.

In an election, there would be many candidates, and voters would cast ranked ballots giving preference information between any two of them. We will look at two such candidates out of many (the others will be hidden, as preference is strictly pairwise information). So keep that in mind, this is not an election with only two candidates.

We will consider the two candidates each forming a "faction", defined by the preferences people express. How does the ordinal/ranked framework react to this situation?

In this 2D opinion space, in which there is a true consensus over both issues, taking into account only ranked preferences between two candidates (darker moving circles) leads to distortions, due to the creation of "artificial factions" between the two. (If there were no candidates, there would be only one big consensus faction in which everyone belongs.)

Note that while there is plenty of consensus, the ranked preferences "slices" the population in various ways (here, we assume a voter sides with whatever candidate is ideologically closer). Thus, rankings are inherently factionalist, and any "majority" created (shaded background) represents a distorted and artificial picture of the true opinions of the population under such a scenario.

Additionally, each of the two artificial factions will perceive its own "factional consensus" (moving crosses), which will be far away from the other. This happens even though both groups actually have a greater underlying consensus, which remains unchanged (black static cross, center). You can think of these crosses as "the ideological picture" the ranked preferences are painting to us. As you can see, it is a very distorted picture.

Thus, under ordinalism or ranked preferences, "majority" is a property of the candidates more than that of the voters, as it is the candidates who are "drawing the line", not the voters. The voters are being forced to take sides which they do not create naturally.

Notice how fringe candidates (when the dots move towards the edge) can easily radicalize their minority faction, creating a highly distorted faction consensus near the fringe. In real life, complete allegiance to a faction, and support for political candidates, usually creates an echo chamber effect. These people will be more likely to side and engage with other "like-minded people", according to this faction that was established. But as we can see from the above diagram, even if the population as a whole shares a lot of consensus and agreement, a fringe candidate can generate the illusion of a faction having its own fringe consensus. Furthermore, if this occurs, it will be worse when a consensus candidate actually exits, as that pushes the dividing line further towards the fringe.

What if we had a mixture of polarization and consensus?

As you can see from the animation, under polarization the behavior is virtually identical. Rankings, by their own nature, cannot distinguish between true polarization and an artificial one.

However, despite these conceptual shortcomings, the "majority of preference" still serves its intended purpose, as the candidate closest to a consensus will very likely be on the majority's side, due to the very nature of consensus. This explains why majority rule has performed well enough in voting applications: it's a very good rule of thumb, but it is just that, a rule of thumb. In particular, it is a good rule of thumb for picking the candidate closest to the consensus, but as we can see from these animations, it is not adequate to infer what the ideological consensus lies, as it cannot distinguish consensus from polarization.

Condorcet voting methods take this weakness into account, and attempt to test every possible pairwise faction split between multiple factions. If one faction always dominates the other, then it is much more likely to be a genuine consensus.

Ordinal voting methods all appeal to "majority rule" in one form or another. But given the fundamental limitations of the ordinal "majority", which is dominated by the candidates, not the voters, one should consider alternative justifications for this criterion.

In conclusion, since people have multiple attributes which can be used to classify them, in many attributes they will be a majority and in many others a minority. How are we supposed to claim that any one of these possible divisions of the population has a greater claim to power over the other? More importantly, what processes are defining this partition, and how legitimate are they?

Cardinalism and the "majority of consensus"

As we have seen, the notion of a "majority" as an inherent property of the voters is hard to establish using the ordinal formalism. The candidates have too much influence in what it actually conveys.

Under a cardinal framework, however, the concept is more subtle. Taking the consensus as the blueprint of voter cohesion, we can informally define a "majority of consensus" as the group of 50%+1 voters which lie closest to all of the existing consensuses. A more natural notion of majority can be defined in terms of the spatial model of voters.

As before, we will consider an election with many candidates. Voters would be casting cardinal ballots which inherently carry comparative information between the many candidates. We then look at what information would be available between two candidates, if we look at the scores given to both by the voters. Once more, this is not an election with two candidates, but a picture of the electorate two candidates in an election provide to us.

We consider the smallest region around the consensus which contains a majority of voters within it. As opposed to the "majority of preference", this is a true majority, a property of the voters that is independent of candidates and whatever factions they create. In this diagram, this majority of consensus is denoted as a red circle around the consensus.

As voters are not being forced to take sides, and may support both candidates simultaneously to various degrees, there is no immediate notion of "factions" or "consensus within a faction", unless such a distinction exists in the voters themselves.

In practice, however, we do not have direct access to this geometric picture. We are confined by the information presented in ballots which relates directly to the candidates, as was the case with ranked ballots.

Can we recover the spirit of this "majority of consensus"? It turns out yes, we can.

In the diagram, the candidate closest to the consensus is being "magically" picked as the "winner", coloring the interior of the circle. There is no "voting" taking place! It is a completely geometric property being depicted, representing the candidate closest to the consensus. This candidate would be the closest to represent the "majority of consensus", by definition.

At the bottom, we have a distribution of distances from voters to the candidates, one distribution per candidate. This is what voters would be intuitively measuring during an election, and attempting to convey in their ballots. The vertical lines are the medians of the distributions, which is also used to plot the dashed circles around the consensus for each candidate. The dashed gray distribution is the distance distribution relative to the consensus, with the red line the median distance, which defines the "majority of consensus circle".

This is analogous to voters voting in a continuous cardinal scale, from 0 (candidate has exactly the same beliefs as the voter) to infinity (candidate is completely incomprehensible to the voter), mapping distance perfectly to this scale. In reality things are not so simple, but the goal here is to show that in principle the information is there. Also, since cardinal voting contains total comparative information (candidates are not judged in isolation), the best and worst candidates define a "yardstick" voters use to measure distance. If voters are to be taken as equally worthy in opinion, the aggregation of cardinal ballots represents taking the ballot to represent the "mean yardstick" of voters.

The mean (not the median) of the distances exactly matches the coloring of the majority of consensus circle: if mean distance to the yellow candidate is lower than that of the purple candidate (the "voting"), the yellow dot is geometrically closer to the consensus ("magically" selected from the geometry of the problem). See remarks at the end for explanation. Note that the mean is also used to define the consensus, not the median as one would naively expect. The median is inadequate under this scenario. (The reasons for this are a bit technical, so we omit it here. See the remarks at the end.)

Under an actual cardinal voting scheme, the mapping of distances to the ballot scale are bounded by the limited ballot, confined to discrete steps, and may not be linear. This will reduce the resolution and distort the results away from this idealized scenario. But this example shows that under consensus, the cardinal formalism adequately captures a notion of "majority of consensus", which is a fundamental property of voters.

Moreover, even though voters can only express simple information about the candidates, the information given by all voters, taken together, has a direct connection to this "majority of consensus" notion.

What about the polarized case?

As we can see, the histograms of cardinal information between any two candidates in an election can reveal to us whether between the two candidates there is a consensus or a polarization.

As before, either the mean or median is capable of predicting which candidate is closer to the overall consensus. This is a property independent of the distribution, and thus, it always approximates the "majority of consensus". However, the mean will generally be more accurate to predict proximity to the consensus.

Conclusion

  • There is no such thing as "the majority", as it is usually promoted in democracy and ranked method advocacy. It is not a property of the voters that we are "trying to find out" through the voting process.
  • The existence of multiple issues implies the existence of multiple majorities and minorities, which will generally be incompatible. What is the legitimacy of giving power/representation to any one of them?
  • A voting method cannot "guarantee a majority" in any meaningful or representative way.
  • Ranking encourages factionalism and creates artificial polarization where there is none. This distorts our picture of the true ideological distribution of voters and factions, and voters will respond to this by becoming even more factionalist.
  • If the goal of democracy is to represent the population as a whole, with all its agreements and disagreements, ranked methods are sub-optimal. If the goal of democracy is to promote the ideals of the dominant faction, established largely arbitrarily on the spot, then ranked methods suit this goal.
  • More generally, forcing voters to take sides destroys consensus and agreements. The corollary of this is that Instant-Runoff Voting is anti-consensus.
  • Condorcet methods are designed to make the best use of limited ranking information to find the consensus.

Final remarks

  • A ranked preference is the answer to the question "which of these two candidates the voter feels it is closer to their interests?", so it gives an information about "distance". Thus, in the cardinal case, we are also showing continuous distance information. A more advanced model of voters would have to map distances to something like "utility", and then one would need to map utilities to cardinal ballots, and the distributions would look coarser in resolution. This would introduce too many arbitrary steps and wouldn't illustrate anything important. For our purposes, the distance is is sufficient.
  • While the median is a better metric of "central tendency in response to outliers", that is only useful if we know where that median position is. This is not the information that is available to us with ranked ballots. All we have is "this side has 55% people, the other 45%" and so on. We have no information about "where" the line was drawn, and what the ideological distribution looks like at that location. One could have hoped to estimate this by imagining a "line" between the two candidates, and placing a point along this line that represents the ratios of the votes received by either side (the "consensus between the candidates"), but this would still have a "sideways" bias away from the consensus.
  • The reason the mean and not the median is used in defining the consensus is related to the the role of consensus and polarization. Since we are trying to define the "majority of consensus", the contribution of polarizing issues to the "consensus" must be minimized, as they are not a consensus. Imagine the 1D case where there is maximum (50%+1,50%-1) polarization on an issue, and all voters on either side have very sharp-peaked equal beliefs. The "consensus", if defined as the median opinion, would lie entirely within one of the factions, and the "majority of consensus" would account only for that faction, completely ignoring the other. So this definition cannot capture the notion of a consensus under polarization.
  • The "majority of consensus" reproduces the intuitive notion of majority, and it is well-captured by the median distance. However, the median is mathematically less capable of minimizing the distance to the consensus, as defined by the mean opinion as just explained. In the animations above, if one pays attention it can be seen that the smallest median distance does not correlate precisely with the color of the circle, "magically picked" by directly picking the candidate closer to the consensus. This is because the median still biases the results in favor of the dominant faction, as can be observed by how quickly the median lines move across the distance distributions in the polarized case. The mean is in a sense more "neutral" to the underlying polarization structure.
  • The mean is more optimal than the median as it minimizes the sum of squares of Euclidean distances, and thus the direct Euclidean distance to any point, whereas the geometric median minimizes the simple sum of distances. The sum of squares can be understood as a weighted sum, where each distance is weighted by a factor proportional to the distance itself, penalizing points which stray too far away from the consensus more.
  • The cardinal method closest to applying this notion of "majority of consensus" is likely Majority Judgement, but as per above, it will still bias towards majority factions, so even though it approximates the consensus it ultimately sides with the dominant faction.