Idea: Use a linked list to store elements but, every time an
element is sought, move it to the front of the list.
Intuition: Frequently-sought, or high-probability, elements will
tend to stay near the front of the list.
For example:
Searching in a Move-to-Front (MTF) list costs only a few pointer
assignments more than searching in a normal list. It's also very
easy to implement.
But is the move-to-front heuristic really useful? We must somehow
measure the quality of this heuristic. To do so, we'll
compare the expected search time in a MTF list with that in an optimal
list of the same elements.
Note that we can almost never construct the optimal list, since the
probability distribution of queries is almost always unknown. So
we're hoping that the MTF heuristic gives an expected search time
close to that of the optimal list.
Given
elements, name them by their position in the optimal list:
.
Then (from the optimal list theorem)
.
Let
be the expected search time in the optimal list of
these elements.
Let
be the expected search time in the MTF list of the
same elements.
Clearly,
since the optimal list has
minimum expected search time, of all possible lists.
Theorem
Discussion
The theorem says that the expected search time of the MTF list is
at most twice optimal. This is a surprising result, since we don't
even know the probability distribution!
Consider, for example, these elements and their associated probabilities:
x |
A |
B |
C |
D |
P( x ) |
0.1 |
0.1 |
0.8 |
0.0 |
.
The theorem states that
.
Since we don't know
, we might have built an arbitrary,
static list, like
Had we done so, we would have been stuck forever with an expected
search time of
. Clearly, the MTF list is better, even
though it's not optimal. When we don't know the probability
distribution, it's best to use an MTF list rather than risk choosing a
bad ordering in a static list.
Proof of Theorem
In the MTF list of these elements, let's perform enough searches so
that the ordering of the MTF list is unaffected by its initial
ordering. Each element will have been sought at least once.
We'll take a ``snapshot'' of the list at this point and will
analyse the expected search time.
Let
be the position of
in this snapshot. The expected
value of
is (from our definition of expectation)
For example, suppose we have
|
percentage of time |
1 |
50 % |
2 |
10 % |
3 |
20 % |
4 |
20 % |
Then
and
is, on average, in position 2.1.
If we knew
, we would know
, since
.
So what is
?
In the summation above, if
precedes
, it adds
one to the number of preceding elements.
So what is
?
Generalising from this example,
.
In the example above,
.
Thus
Note the symmetry in i and j !
Therefore
.
Note that
.
Using this fact:
.
That's it! We've proven that
.
Note that we had to get the
in
order to exploit the symmetry.
The key was to produce the
term.
It's great that we can prove the the MTF heuristic is
actually useful.