Third-order correlation

From electowiki

Third-order correlation is a measure of Candidate correlation proposed by Dan Bishop. The name comes from the fact that the correlations can be computed with a third-order summation array.

Definitions[edit | edit source]

On a ballot, a candidate C is voted between A and B if either C is voted both strictly lower than A and strictly higher than B, or vice-versa.

The correlation of A and B with respect to C, denoted "corr(A, B) wrt C", is the proportion of the ballots on which C is not voted between A and B.

The correlation of A and B is the minimum of corr(A, B) wrt C over all candidates C in the complement of {A, B}.

Example[edit | edit source]

Tennessee's four cities are spread throughout the state

Imagine that Tennessee is having an election on the location of its capital. The population of Tennessee is concentrated around its four major cities, which are spread throughout the state. For this example, suppose that the entire electorate lives in these four cities, and that everyone wants to live as near the capital as possible.

The candidates for the capital are:

  • Memphis, the state's largest city, with 42% of the voters, but located far from the other cities
  • Nashville, with 26% of the voters, near the center of Tennessee
  • Knoxville, with 17% of the voters
  • Chattanooga, with 15% of the voters

The preferences of the voters would be divided like this:

42% of voters
(close to Memphis)
26% of voters
(close to Nashville)
15% of voters
(close to Chattanooga)
17% of voters
(close to Knoxville)
  1. Memphis
  2. Nashville
  3. Chattanooga
  4. Knoxville
  1. Nashville
  2. Chattanooga
  3. Knoxville
  4. Memphis
  1. Chattanooga
  2. Knoxville
  3. Nashville
  4. Memphis
  1. Knoxville
  2. Chattanooga
  3. Nashville
  4. Memphis

Consider, for example, the correlation between Chattanooga and Memphis with respect to Knoxville. For brevity, the cities will be denoted by their initial letters.

  • On the M>N>C>K ballots, K is not voted between M and C. Therefore, the 42% of the ballots with this ranking are counted in corr(C, M) wrt K.
  • However, on the N>C>K>M ballots, K is voted between C and M, so these ballots do not count towards the correlation.
  • The same is true for the C>K>N>M ballots.
  • But on the K>C>N>M ballots, K is not voted between C and M, so these 17% of the ballots count towards the correlation.

Therefore, corr(C, M) wrt K = 42%+17% = 59%. Similarly,

  • corr(C, K) wrt M = 100%
  • corr(C, K) wrt N = 100%
  • corr(C, M) wrt K = 59%
  • corr(C, M) wrt N = 26%
  • corr(C, N) wrt K = 85%
  • corr(C, N) wrt M = 100%
  • corr(K, M) wrt C = 41%
  • corr(K, M) wrt N = 26%
  • corr(K, N) wrt C = 15%
  • corr(K, N) wrt M = 100%
  • corr(M, N) wrt C = 74%
  • corr(M, N) wrt K = 74%

The correlations between each possible pair of candidates are:

  • corr(C, K) = min(100%, 100%) = 100%
  • corr(C, M) = min(59%, 26%) = 26%
  • corr(C, N) = min(85%, 100%) = 85%
  • corr(K, M) = min(41%, 26%) = 26%
  • corr(K, N) = min(15%, 100%) = 15%
  • corr(M, N) = min(74%, 74%) = 74%

The most-correlated pair is Chattanooga and Knoxville.