Pairwise Least Squares: Difference between revisions

From electowiki
Content added Content deleted
(Added a brief summary and one algorithm for applying least squares to a pairwise matrix.)
 
(Revised significantly. Added implementation.)
Line 1: Line 1:
Pairwise least squares is a description for a method that assigns a score to each candidate based on voters' pairwise preferences.
Pairwise least squares is a description for a method that assigns a score to each candidate based on voters' pairwise preferences.


=== Overview ===
=== Summary===


Least squares estimation is a technique from linear algebra that finds the closest solution to a system of equations.
Least squares estimation is a technique from linear algebra that finds the closest solution to a system of equations.
Line 7: Line 7:
A pairwise listing of preferences is the number of people who prefer a candidate to another.
A pairwise listing of preferences is the number of people who prefer a candidate to another.


Applying least squares to pairwise preferences produces a number of votes for each candidate. A difference between any two candidates votes can be found. The differences are a projection of the preferences. Any preference loops are not represented in these differences. The square distance between these differences and the preferences is minimized.
Applying least squares to pairwise preferences produces a score for each candidate.


The original preferences are now approximated by taking the difference between any two candidates' scores. The approximation is a projection in the space of pairwise preferences. The projection is the result of subtracting any cyclic group preference component. The projection minimizes the sum of squares of errors in preference numbers.
=== Math ===


These scores turn out to be essentially the same as borda scores.
Let <math>\boldsymbol{A}</math> be the incidence matrix with a row for each preference and a column for each candidate.


=== Abstraction ===
Let <math>\hat{\boldsymbol{x}}</math> be scores to be assigned to each candidate.


Let the election be modeled as a network.
Let <math>\boldsymbol{y}</math> be the number of voters with a preference.


Let '''x''' be a list of scores. Each node in the network is a candidate with a score.
The problem to solve is:


Let '''A''' be a list of preferences. Every pair of nodes is connected by two edges. The edges are directed and represent preference for one candidate over another. In network theory, this is an incidence matrix with a row for each edge and a column for each node. For each preference, put a 1 for the “for” candidate and a -1 for the “against” candidate. Leave the rest as 0.
:<math>\mathbf{A}\hat{\boldsymbol{x}} = \mathbf y.</math>


Let '''b''' be a tally of preferences. This is how many voters support a preference.
There are actually many solutions because the same score could be added to each candidate. So we set the score of the first candidate to 0 in order to solve for one solution.


The problem is to find scores '''x''' for some given tally of preferences '''b''':
:<math>\mathbf{A_0} = \mathbf{A[:,1:]}.</math>

: '''Ax ≟ b.'''

=== Approximation ===

Generally, the problem is unsolvable because there can be cyclic group preferences. Cyclic preferences don’t make sense if the preferences '''b''' are computed differences in score, '''Ax'''.

However, we can find a solution '''x̂''' to a transformed equation:

: '''Aᵀ A x̂ = Aᵀ b.'''

This new '''x̂''' is an approximation of scores that fits the network model..

=== Solution ===

There are actually many solutions because the same score could be added to each candidate. So we set the score of the first candidate to 0 in order to solve for one particular solution.

: '''A₀ = A[:,1:]'''


The solution is:
The solution is:


: '''x̂ = ( A₀ᵀ A₀ )⁻¹ A₀ᵀ b.'''
:<math>\hat{\boldsymbol{x}} = \left( \mathbf{A_0}^{\operatorname{T}} \mathbf{A_0} \right)^{-1} \mathbf{A_0}^{\operatorname{T}} \mathbf y.</math>

=== Projection ===

Multiplying by '''Aᵀ''' ignores any component of '''b''' that is in the null space of '''Aᵀ'''. The null space of '''Aᵀ''' is composed entirely of cyclic group preferences. This is '''Aᵀ''' is a list of edges connected to each node and the null space '''N(Aᵀ)''' is such that '''Aᵀ N(Aᵀ) = 0''', which is a statement of conservation of flow. Putting the solution to this transformed equation back into the original equation gives some transformed preference matrix '''p''' that has no cycles:

: '''A x̂ = p.'''

The closest we can get to preferences '''b''' using our score model is to project '''b''' into the space of possible score model preferences '''Ax''', which does not have any cyclic component. The projection can be represented by the matrix '''P'''. The below definition of '''P''' gives '''Pb = p'''.

: '''P = A₀ ( A₀ᵀ A₀ )⁻¹ A₀ᵀ'''

: '''Pb = A₀ ( A₀ᵀ A₀ )⁻¹ A₀ᵀ b'''

: '''Pb = A₀ ( A₀ᵀ A₀ )⁻¹ A₀ᵀ A₀ x̂'''

: '''Pb = A₀ x̂'''

: '''Pb = p'''

=== Least Square Error ===

The projected preferences '''p''' are the closest to the actual preferences '''b'''. In other words, projecting made the difference '''b - p''' as small as possible.

: '''p = argmin <sub>p ∈ A x̂</sub> || b - p ||'''

Let’s call this difference the error '''e.'''

: '''e = b – p'''

The error '''e''' is a vector. The length of a vector is the square root of the sum of squares of the components. This means projecting also minimizes the sum of the squares of the error for each preference.

: '''p = argmin <sub>p ∈ A x̂</sub> || e ||²'''

=== Borda ===

The matrix '''A₀ᵀ A₀''' is '''2 ( n I - 1 )'''. '''A₀ᵀ b''' is the sum of preferences for minus preferences against. So, the scores '''x̂''' are effectively the net borda scores '''A₀ᵀ b'''.

: '''A₀ᵀ b = 2 ( n I - 1 ) x̂'''

=== Implementation ===

<syntaxhighlight lang="python3">

from numpy import *

def net(n):
m = int(n * (n-1) )
A = mat(zeros((m,n))) # incidence matrix
e = mat(zeros((m,2))) # edge list
r = 0
for i in range(n):
for k in range(n):
if i == k:
continue
A[r,i] = 1
A[r,k] = -1
e[r] = mat([i,k])
r = r + 1
A0 = A[:,1:]
L0 = (A0.T * A0) ** -1 * A0.T
z = zeros([1,L0.shape[1]])
L = block([[z],[L0]])
return [e,A,L]

def edgeFromDense(d,e):
m = e.shape[0]
b = mat(zeros((m,1)))
for i in range(m):
k0 = int(e[i,0])
k1 = int(e[i,1])
b[i,0] = d[k0,k1]
return b

def denseFromEdge(e,b,n):
m = e.shape[0]
d = mat(zeros((n,n)))
for i in range(m):
k0 = int(e[i,0])
k1 = int(e[i,1])
d[k0,k1] = b[i,0]
return d

def solveNetwork(A,b,L):
x = L * b
p = A * x
return [x,p]

def solveNetDense(d):
n = d.shape[0]
[e,A,L] = net(n)
b = edgeFromDense(d,e)
[x,p] = solveNetwork(A,b,L)
pd = denseFromEdge(e,p,n)
pd2 = pd + (d + d.T) / 2
return [x,pd,pd2]

def test():
d = mat([[ 0., 5., 2., 5.],
[ 5., 0., 4., 7.],
[ 8., 6., 0., 10.],
[ 5., 3., 0., 0.]]
) # d = dense network graph
[x,pd,pd2] = solveNetDense(d)
print("Scores")
print(x)
print("Borda Scores")
print(sum(d,1))
print("Preferences")
print(p)
print("Projected Preferences")
print(pd)
print("Projected Preferences Plus Pairwise Loops")
print(pd2)
print("Higher Order (>2) Loops Removed")
print(d - pd2)
print("Borda Scores of Projected Preferences Plus Pairwise Loops")
print(sum(pd2,1))

test()

</syntaxhighlight>


=== External Links ===
=== External Links ===


* [https://www.youtube.com/watch?v=6-wh6yvk6uc&list=PL49CF3715CB9EF31D&index=12 MIT Linear Algebra - Lecture 12 - Graphs, Networks, Incidence Matrices]
*[https://www.youtube.com/watch?v=6-wh6yvk6uc&list=PL49CF3715CB9EF31D&index=12 MIT Linear Algebra - Lecture 12 - Graphs, Networks, Incidence Matrices]
* [https://www.youtube.com/watch?v=osh80YCg_GM&list=PL49CF3715CB9EF31D&index=16 MIT Linear Algebra - Lecture 16 - Projection Matrices and Least Squares]
*[https://www.youtube.com/watch?v=Y_Ac6KiQ1t0&list=PL49CF3715CB9EF31D&index=15 MIT Linear Algebra - Lecture 15 - Projections onto Subspaces]

Revision as of 14:58, 21 October 2023

Pairwise least squares is a description for a method that assigns a score to each candidate based on voters' pairwise preferences.

Summary

Least squares estimation is a technique from linear algebra that finds the closest solution to a system of equations.

A pairwise listing of preferences is the number of people who prefer a candidate to another.

Applying least squares to pairwise preferences produces a score for each candidate.

The original preferences are now approximated by taking the difference between any two candidates' scores. The approximation is a projection in the space of pairwise preferences. The projection is the result of subtracting any cyclic group preference component. The projection minimizes the sum of squares of errors in preference numbers.

These scores turn out to be essentially the same as borda scores.

Abstraction

Let the election be modeled as a network.

Let x be a list of scores. Each node in the network is a candidate with a score.

Let A be a list of preferences. Every pair of nodes is connected by two edges. The edges are directed and represent preference for one candidate over another. In network theory, this is an incidence matrix with a row for each edge and a column for each node. For each preference, put a 1 for the “for” candidate and a -1 for the “against” candidate. Leave the rest as 0.

Let b be a tally of preferences. This is how many voters support a preference.

The problem is to find scores x for some given tally of preferences b:

Ax ≟ b.

Approximation

Generally, the problem is unsolvable because there can be cyclic group preferences. Cyclic preferences don’t make sense if the preferences b are computed differences in score, Ax.

However, we can find a solution to a transformed equation:

Aᵀ A x̂ = Aᵀ b.

This new is an approximation of scores that fits the network model..

Solution

There are actually many solutions because the same score could be added to each candidate. So we set the score of the first candidate to 0 in order to solve for one particular solution.

A₀ = A[:,1:]

The solution is:

x̂ = ( A₀ᵀ A₀ )⁻¹ A₀ᵀ b.

Projection

Multiplying by Aᵀ ignores any component of b that is in the null space of Aᵀ. The null space of Aᵀ is composed entirely of cyclic group preferences. This is Aᵀ is a list of edges connected to each node and the null space N(Aᵀ) is such that Aᵀ N(Aᵀ) = 0, which is a statement of conservation of flow. Putting the solution to this transformed equation back into the original equation gives some transformed preference matrix p that has no cycles:

A x̂ = p.

The closest we can get to preferences b using our score model is to project b into the space of possible score model preferences Ax, which does not have any cyclic component. The projection can be represented by the matrix P. The below definition of P gives Pb = p.

P = A₀ ( A₀ᵀ A₀ )⁻¹ A₀ᵀ
Pb = A₀ ( A₀ᵀ A₀ )⁻¹ A₀ᵀ b
Pb = A₀ ( A₀ᵀ A₀ )⁻¹ A₀ᵀ A₀ x̂
Pb = A₀ x̂
Pb = p

Least Square Error

The projected preferences p are the closest to the actual preferences b. In other words, projecting made the difference b - p as small as possible.

p = argmin p ∈ A x̂ || b - p ||

Let’s call this difference the error e.

e = b – p

The error e is a vector. The length of a vector is the square root of the sum of squares of the components. This means projecting also minimizes the sum of the squares of the error for each preference.

p = argmin p ∈ A x̂ || e ||²

Borda

The matrix A₀ᵀ A₀ is 2 ( n I - 1 ). A₀ᵀ b is the sum of preferences for minus preferences against. So, the scores are effectively the net borda scores A₀ᵀ b.

A₀ᵀ b = 2 ( n I - 1 ) x̂

Implementation

from numpy import *

def net(n):
    m = int(n * (n-1) )
    A = mat(zeros((m,n))) # incidence matrix
    e = mat(zeros((m,2))) # edge list
    r = 0
    for i in range(n):
        for k in range(n):
            if i == k:
                continue
            A[r,i] = 1
            A[r,k] = -1
            e[r] = mat([i,k])
            r = r + 1
    A0 = A[:,1:]
    L0 = (A0.T * A0) ** -1 * A0.T
    z = zeros([1,L0.shape[1]])
    L = block([[z],[L0]])
    return [e,A,L]

def edgeFromDense(d,e):
    m = e.shape[0]
    b = mat(zeros((m,1)))
    for i in range(m):
        k0 = int(e[i,0])
        k1 = int(e[i,1])
        b[i,0] = d[k0,k1]
    return b

def denseFromEdge(e,b,n):
    m = e.shape[0]
    d = mat(zeros((n,n)))
    for i in range(m):
        k0 = int(e[i,0])
        k1 = int(e[i,1])
        d[k0,k1] = b[i,0]
    return d

def solveNetwork(A,b,L):
    x = L * b
    p = A * x
    return [x,p]

def solveNetDense(d):
    n = d.shape[0]
    [e,A,L] = net(n)
    b = edgeFromDense(d,e)
    [x,p] = solveNetwork(A,b,L)
    pd = denseFromEdge(e,p,n) 
    pd2 = pd + (d + d.T) / 2
    return [x,pd,pd2]

def test():
    d = mat([[ 0.,  5.,  2.,  5.],
        [ 5.,  0.,  4.,  7.],
        [ 8.,  6.,  0., 10.],
        [ 5.,  3.,  0.,  0.]]
    ) # d = dense network graph
    [x,pd,pd2] = solveNetDense(d)
    print("Scores")
    print(x)
    print("Borda Scores")
    print(sum(d,1))
    print("Preferences")
    print(p)
    print("Projected Preferences")
    print(pd)
    print("Projected Preferences Plus Pairwise Loops")
    print(pd2)
    print("Higher Order (>2) Loops Removed")
    print(d - pd2)
    print("Borda Scores of Projected Preferences Plus Pairwise Loops")
    print(sum(pd2,1))

test()

External Links