Suppose we have a linked list and we pick a key. We then search
the list to recover the data associated with that key. What's the
expected search time? To answer this, we resort to random variables
and ``expectation.''
Events and Sample Spaces
Defining the problem more formally, we perform an experiment. The
experiment consists of picking a key and searching for it. The act of
picking a key and searching for it is an event.
Suppose the possible search keys are
. Then
the sample space of the experiment is the set,
, of events
that can occur:
where, in event
, we pick key
and search for it.
Each event,
, has some probability,
, of occurring.
If each event is equally likely to occur, the distribution of
probabilities is said to be uniform, and
.
Of course,
and
.
Random Variables and Expectation
A random variable is a variable that is associated with an
event. For example, when event
occurs, some amount of time is
required to search for key
. Let
be that amount of
time.
is a random variable.
The expected value of a random variable is the average value
of the random variable, given the probability distribution of events.
The expected value of a random variable,
, is denoted
.
If the probability distribution is uniform, the expected value is
is just the average value that we're familiar with:
But if the probability distribution is not uniform, more probable
events are given greater weight in the sum. Each event has a
corresponding value of the random variable. In the sum, that value is
given a weight (or ``importance'') equal to the probability
that the event occurs:
Note that the uniform distribution is just a special case of this.
Going back to our original question: What's the expected search
time in a linked list of
elements? Here's the list:
The experiment: Given a linked list of
elements, pick one key
which is known to be in the list and search for it.
The sample space: Let
be the set of keys in the list, so
denotes the event of picking key
and searching for it:
It's sometimes convenient, as we've done here, to name the events
after parts of the experiment, like the keys. There's nothing wrong
with calling an event
(that is, naming the event after the key,
). Whether
denotes a key or an event should be clear from
the context.
The probability distribution: We will assume that each event is
equally likely:
.
The random variable: Let
be a random variable associated
with event
.
is the time to search for key
.
So, what's
?
Thus, on average, we'll search about half-way down the list.
Here's a problem for you to solve: Suppose that the search times are not
uniform and that
Answer the next two questions in order. Don't click on the button
until you've worked on the problem yourself!