Conditional expectation
Conditional expectation of a random variable given an event
In probability theory, it may be tempting to say that a conditional expectation is simply the expectation of a conditional probability distribution. Thus if X is a random variable, and A is an event whose probability is not 0, then the conditional probability distribution of X given A assigns a probability P(X ≤ x | A) to the interval from − ∞ to x, and we have a conditional probability distribution, which may have a first moment, called E(X | A), the conditional expected value of X given A.
Conditional expectation of a random variable given another random variable
However, that account omits some matters of interest and utility. If Y is another random variable, then the conditional expected value E(X | Y = y) of X given the event that Y = y is a function of y; let us call it g(y). Then the conditional expectation E(X | Y) is g(Y), another random variable whose value depends on that of Y. (Reminder for those less-than-accustomed to the conventional language and notation of probability theory: this paragraph is an example of why case-sensitivity of notation must not be neglected, since capital Y and lower-case y refer to different things.)
If X has an expected value, or -- what is the same thing -- E(|X|) < ∞, then the conditional expectation E(X | Y) also has an expected value, which is the same as that of X. That fact is the law of total expectation. See also law of total variance and law of total probability.
Conditional expectation of a random variable given a sigma-algebra
There is also the notion of conditional expected value of X given a sigma-algebra G, denoted E(X | G). This is a random variable that is G-measurable and whose integral over any G-measurable set is the same as the integral of X over the same set. The existence of this conditional expectation follows from the Radon-Nikodym theorem. If X happens to be G-measurable, then E(X | G) = X.
Hilbert space version
A more abstract take on conditional expectations (assuming for example the existence of first and second moments) comes from a general setting of two sigma-algebras F and G, with F being finer (that is, G is contained in F). In this case if we form the Hilbert space H of square-integrable F-measurable functions, with respect to a fixed probability measure on F, it contains a subspace consisting of (the classes of) the G-measurable functions. Assuming only that that subspace is closed, there is an orthogonal projection operator onto it.
For example, in the case of the ordinary expectation, we take G to consist of the (trivial) sigma-algebra consisting of the sample space X and the empty set, only. The G-measurable functions are just the constants. Orthogonal projection of the typical random variable f onto the constants amounts to writing it as E(f) as constant random variable, plus something orthogonal to 1, namely f − E(f) that therefore has expectation 0.
Subject to the restriction on moments, this approach allows the formation of conditional expectations in general.
