CSC 378: Data Structures and Algorithm Analysis

Probability Review

Probability

Suppose we have a linked list and we pick a key. We then search the list to recover the data associated with that key. What's the expected search time? To answer this, we resort to random variables and ``expectation.''

Events and Sample Spaces

Defining the problem more formally, we perform an experiment. The experiment consists of picking a key and searching for it. The act of picking a key and searching for it is an event.

Suppose the possible search keys are . Then the sample space of the experiment is the set, , of events that can occur:

where, in event , we pick key and search for it.

Each event, , has some probability, , of occurring. If each event is equally likely to occur, the distribution of probabilities is said to be uniform, and .

Of course, and .

Random Variables and Expectation

A random variable is a variable that is associated with an event. For example, when event occurs, some amount of time is required to search for key . Let be that amount of time. is a random variable.

The expected value of a random variable is the average value of the random variable, given the probability distribution of events.

The expected value of a random variable, , is denoted .

If the probability distribution is uniform, the expected value is is just the average value that we're familiar with:

But if the probability distribution is not uniform, more probable events are given greater weight in the sum. Each event has a corresponding value of the random variable. In the sum, that value is given a weight (or ``importance'') equal to the probability that the event occurs:

Note that the uniform distribution is just a special case of this.

Searching in a Linked List

Going back to our original question: What's the expected search time in a linked list of elements? Here's the list:

The experiment: Given a linked list of elements, pick one key which is known to be in the list and search for it.

The sample space: Let be the set of keys in the list, so denotes the event of picking key and searching for it:

It's sometimes convenient, as we've done here, to name the events after parts of the experiment, like the keys. There's nothing wrong with calling an event (that is, naming the event after the key, ). Whether denotes a key or an event should be clear from the context.

The probability distribution: We will assume that each event is equally likely: .

The random variable: Let be a random variable associated with event . is the time to search for key .

So, what's ?

Thus, on average, we'll search about half-way down the list.

A Problem

Here's a problem for you to solve: Suppose that the search times are not uniform and that

Answer the next two questions in order. Don't click on the button until you've worked on the problem yourself!