Random variable with multiple component dimensions
Part of a series on statistics |
Probability theory |
---|
![](//upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Standard_deviation_diagram_micro.svg/250px-Standard_deviation_diagram_micro.svg.png) |
- Probability
- Determinism
- Indeterminism
- Randomness
|
- Probability space
- Sample space
- Event
- Collectively exhaustive events
- Elementary event
- Mutual exclusivity
- Outcome
- Singleton
- Experiment
- Probability distribution
- Bernoulli distribution
- Binomial distribution
- Exponential distribution
- Normal distribution
- Pareto distribution
- Poisson distribution
- Probability measure
- Random variable
|
|
|
|
|
In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value. The individual variables in a random vector are grouped together because they are all part of a single mathematical system — often they represent different properties of an individual statistical unit. For example, while a given person has a specific age, height and weight, the representation of these features of an unspecified person from within a group would be a random vector. Normally each element of a random vector is a real number.
Random vectors are often used as the underlying implementation of various types of aggregate random variables, e.g. a random matrix, random tree, random sequence, stochastic process, etc.
More formally, a multivariate random variable is a column vector
(or its transpose, which is a row vector) whose components are scalar-valued random variables on the same probability space as each other,
, where
is the sample space,
is the sigma-algebra (the collection of all events), and
is the probability measure (a function returning each event's probability).
Probability distribution
Every random vector gives rise to a probability measure on
with the Borel algebra as the underlying sigma-algebra. This measure is also known as the joint probability distribution, the joint distribution, or the multivariate distribution of the random vector.
The distributions of each of the component random variables
are called marginal distributions. The conditional probability distribution of
given
is the probability distribution of
when
is known to be a particular value.
The cumulative distribution function
of a random vector
is defined as[1]: p.15
![{\displaystyle F_{\mathbf {X} }(\mathbf {x} )=\operatorname {P} (X_{1}\leq x_{1},\ldots ,X_{n}\leq x_{n})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/13ad8b4311ebf247c4854daeb4b4c76dbcbfc00e) | | (Eq.1) |
where
.
Operations on random vectors
Random vectors can be subjected to the same kinds of algebraic operations as can non-random vectors: addition, subtraction, multiplication by a scalar, and the taking of inner products.
Affine transformations
Similarly, a new random vector
can be defined by applying an affine transformation
to a random vector
:
, where
is an
matrix and
is an
column vector.
If
is an invertible matrix and
has a probability density function
, then the probability density of
is
.
Invertible mappings
More generally we can study invertible mappings of random vectors.[2]: p.290–291
Let
be a one-to-one mapping from an open subset
of
onto a subset
of
, let
have continuous partial derivatives in
and let the Jacobian determinant of
be zero at no point of
. Assume that the real random vector
has a probability density function
and satisfies
. Then the random vector
is of probability density
![{\displaystyle \left.f_{\mathbf {Y} }(\mathbf {y} )={\frac {f_{\mathbf {X} }(\mathbf {x} )}{\left|\det {\frac {\partial \mathbf {x} }{\partial \mathbf {y} }}\right|}}\right|_{\mathbf {x} =g^{-1}(\mathbf {y} )}\mathbf {1} (\mathbf {y} \in R_{\mathbf {Y} })}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1eece69ee5e3aa518d81d6914ce04bf34aa300d7)
where
denotes the indicator function and set
denotes support of
.
Expected value
The expected value or mean of a random vector
is a fixed vector
whose elements are the expected values of the respective random variables.[3]: p.333
![{\displaystyle \operatorname {E} [\mathbf {X} ]=(\operatorname {E} [X_{1}],...,\operatorname {E} [X_{n}])^{\mathrm {T} }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/fb2b77d329025627999ff8a4ac19c3dcb12c8af1) | | (Eq.2) |
Covariance and cross-covariance
Definitions
The covariance matrix (also called second central moment or variance-covariance matrix) of an
random vector is an
matrix whose (i,j)th element is the covariance between the i th and the j th random variables. The covariance matrix is the expected value, element by element, of the
matrix computed as
, where the superscript T refers to the transpose of the indicated vector:[2]: p. 464 [3]: p.335
![{\displaystyle \operatorname {K} _{\mathbf {X} \mathbf {X} }=\operatorname {Var} [\mathbf {X} ]=\operatorname {E} [(\mathbf {X} -\operatorname {E} [\mathbf {X} ])(\mathbf {X} -\operatorname {E} [\mathbf {X} ])^{T}]=\operatorname {E} [\mathbf {X} \mathbf {X} ^{T}]-\operatorname {E} [\mathbf {X} ]\operatorname {E} [\mathbf {X} ]^{T}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/832c09d3bfed857c48003e7fc636541db3d12f5a) | | (Eq.3) |
By extension, the cross-covariance matrix between two random vectors
and
(
having
elements and
having
elements) is the
matrix[3]: p.336
![{\displaystyle \operatorname {K} _{\mathbf {X} \mathbf {Y} }=\operatorname {Cov} [\mathbf {X} ,\mathbf {Y} ]=\operatorname {E} [(\mathbf {X} -\operatorname {E} [\mathbf {X} ])(\mathbf {Y} -\operatorname {E} [\mathbf {Y} ])^{T}]=\operatorname {E} [\mathbf {X} \mathbf {Y} ^{T}]-\operatorname {E} [\mathbf {X} ]\operatorname {E} [\mathbf {Y} ]^{T}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2fc40423a3bffc9a6cbfef4423ca44b501b807ab) | | (Eq.4) |
where again the matrix expectation is taken element-by-element in the matrix. Here the (i,j)th element is the covariance between the i th element of
and the j th element of
.
Properties
The covariance matrix is a symmetric matrix, i.e.[2]: p. 466
.
The covariance matrix is a positive semidefinite matrix, i.e.[2]: p. 465
.
The cross-covariance matrix
is simply the transpose of the matrix
, i.e.
.
Uncorrelatedness
Two random vectors
and
are called uncorrelated if
.
They are uncorrelated if and only if their cross-covariance matrix
is zero.[3]: p.337
Correlation and cross-correlation
Definitions
The correlation matrix (also called second moment) of an
random vector is an
matrix whose (i,j)th element is the correlation between the i th and the j th random variables. The correlation matrix is the expected value, element by element, of the
matrix computed as
, where the superscript T refers to the transpose of the indicated vector:[4]: p.190 [3]: p.334
![{\displaystyle \operatorname {R} _{\mathbf {X} \mathbf {X} }=\operatorname {E} [\mathbf {X} \mathbf {X} ^{\mathrm {T} }]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a4bb7af4f585dfa9c3e53939ad69802f1e112150) | | (Eq.5) |
By extension, the cross-correlation matrix between two random vectors
and
(
having
elements and
having
elements) is the
matrix
![{\displaystyle \operatorname {R} _{\mathbf {X} \mathbf {Y} }=\operatorname {E} [\mathbf {X} \mathbf {Y} ^{T}]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2068dba590bf3230ca69509a143ee7ce4013e979) | | (Eq.6) |
Properties
The correlation matrix is related to the covariance matrix by
.
Similarly for the cross-correlation matrix and the cross-covariance matrix:
![{\displaystyle \operatorname {R} _{\mathbf {X} \mathbf {Y} }=\operatorname {K} _{\mathbf {X} \mathbf {Y} }+\operatorname {E} [\mathbf {X} ]\operatorname {E} [\mathbf {Y} ]^{T}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4efa28447cc9ee6d8b3b0023ab2fc652c5717f2e)
Orthogonality
Two random vectors of the same size
and
are called orthogonal if
.
Independence
Two random vectors
and
are called independent if for all
and
![{\displaystyle F_{\mathbf {X,Y} }(\mathbf {x,y} )=F_{\mathbf {X} }(\mathbf {x} )\cdot F_{\mathbf {Y} }(\mathbf {y} )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/fd8491ad5d5c859ad662b18465bd258f0090c4bb)
where
and
denote the cumulative distribution functions of
and
and
denotes their joint cumulative distribution function. Independence of
and
is often denoted by
. Written component-wise,
and
are called independent if for all
.
Characteristic function
The characteristic function of a random vector
with
components is a function
that maps every vector
to a complex number. It is defined by[2]: p. 468
.
Further properties
Expectation of a quadratic form
One can take the expectation of a quadratic form in the random vector
as follows:[5]: p.170–171
![{\displaystyle \operatorname {E} [\mathbf {X} ^{T}A\mathbf {X} ]=\operatorname {E} [\mathbf {X} ]^{T}A\operatorname {E} [\mathbf {X} ]+\operatorname {tr} (AK_{\mathbf {X} \mathbf {X} }),}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3b9dfcc4b9bca76f1a66522e7b7603bd7adca96a)
where
is the covariance matrix of
and
refers to the trace of a matrix — that is, to the sum of the elements on its main diagonal (from upper left to lower right). Since the quadratic form is a scalar, so is its expectation.
Proof: Let
be an
random vector with
and
and let
be an
non-stochastic matrix.
Then based on the formula for the covariance, if we denote
and
, we see that:
![{\displaystyle \operatorname {Cov} [\mathbf {X} ,\mathbf {Y} ]=\operatorname {E} [\mathbf {X} \mathbf {Y} ^{T}]-\operatorname {E} [\mathbf {X} ]\operatorname {E} [\mathbf {Y} ]^{T}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e9d84e6ce8bc9899e40bf71a837a358dd49d9f17)
Hence
![{\displaystyle {\begin{aligned}\operatorname {E} [XY^{T}]&=\operatorname {Cov} [X,Y]+\operatorname {E} [X]\operatorname {E} [Y]^{T}\\\operatorname {E} [z^{T}Az]&=\operatorname {Cov} [z^{T},z^{T}A^{T}]+\operatorname {E} [z^{T}]\operatorname {E} [z^{T}A^{T}]^{T}\\&=\operatorname {Cov} [z^{T},z^{T}A^{T}]+\mu ^{T}(\mu ^{T}A^{T})^{T}\\&=\operatorname {Cov} [z^{T},z^{T}A^{T}]+\mu ^{T}A\mu ,\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1248dedb225ea14b29d71dc4c7d97a07cbee7026)
which leaves us to show that
![{\displaystyle \operatorname {Cov} [z^{T},z^{T}A^{T}]=\operatorname {tr} (AV).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7abae2aaa5d16ade185de7982d6debf5f8d5327b)
This is true based on the fact that one can cyclically permute matrices when taking a trace without changing the end result (e.g.:
).
We see that
![{\displaystyle {\begin{aligned}\operatorname {Cov} [z^{T},z^{T}A^{T}]&=\operatorname {E} \left[\left(z^{T}-E(z^{T})\right)\left(z^{T}A^{T}-E\left(z^{T}A^{T}\right)\right)^{T}\right]\\&=\operatorname {E} \left[(z^{T}-\mu ^{T})(z^{T}A^{T}-\mu ^{T}A^{T})^{T}\right]\\&=\operatorname {E} \left[(z-\mu )^{T}(Az-A\mu )\right].\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bc6d588e480ee364b9a02a3ab231217bc8ddbfa6)
And since
![{\displaystyle \left({z-\mu }\right)^{T}\left({Az-A\mu }\right)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6ce7255c2b558199d78c3695294acea5d9fae73a)
is a scalar, then
![{\displaystyle (z-\mu )^{T}(Az-A\mu )=\operatorname {tr} \left({(z-\mu )^{T}(Az-A\mu )}\right)=\operatorname {tr} \left((z-\mu )^{T}A(z-\mu )\right)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2e6f9b51cd233331ffa7bc8740f60edffd1cfd85)
trivially. Using the permutation we get:
![{\displaystyle \operatorname {tr} \left({(z-\mu )^{T}A(z-\mu )}\right)=\operatorname {tr} \left({A(z-\mu )(z-\mu )^{T}}\right),}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6f5ba98fe2cc12d785b9e8cbf3194d972417814e)
and by plugging this into the original formula we get:
![{\displaystyle {\begin{aligned}\operatorname {Cov} \left[{z^{T},z^{T}A^{T}}\right]&=E\left[{\left({z-\mu }\right)^{T}(Az-A\mu )}\right]\\&=E\left[\operatorname {tr} \left(A(z-\mu )(z-\mu )^{T}\right)\right]\\&=\operatorname {tr} \left({A\cdot \operatorname {E} \left((z-\mu )(z-\mu )^{T}\right)}\right)\\&=\operatorname {tr} (AV).\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/577d2617a04eb2dce0255889dbe7af3997cfabb1)
Expectation of the product of two different quadratic forms
One can take the expectation of the product of two different quadratic forms in a zero-mean Gaussian random vector
as follows:[5]: pp. 162–176
![{\displaystyle \operatorname {E} \left[(\mathbf {X} ^{T}A\mathbf {X} )(\mathbf {X} ^{T}B\mathbf {X} )\right]=2\operatorname {tr} (AK_{\mathbf {X} \mathbf {X} }BK_{\mathbf {X} \mathbf {X} })+\operatorname {tr} (AK_{\mathbf {X} \mathbf {X} })\operatorname {tr} (BK_{\mathbf {X} \mathbf {X} })}](https://wikimedia.org/api/rest_v1/media/math/render/svg/82326436dbf5e99dbe52e69dbddceb9b4150a9fa)
where again
is the covariance matrix of
. Again, since both quadratic forms are scalars and hence their product is a scalar, the expectation of their product is also a scalar.
Applications
Portfolio theory
In portfolio theory in finance, an objective often is to choose a portfolio of risky assets such that the distribution of the random portfolio return has desirable properties. For example, one might want to choose the portfolio return having the lowest variance for a given expected value. Here the random vector is the vector
of random returns on the individual assets, and the portfolio return p (a random scalar) is the inner product of the vector of random returns with a vector w of portfolio weights — the fractions of the portfolio placed in the respective assets. Since p = wT
, the expected value of the portfolio return is wTE(
) and the variance of the portfolio return can be shown to be wTCw, where C is the covariance matrix of
.
Regression theory
In linear regression theory, we have data on n observations on a dependent variable y and n observations on each of k independent variables xj. The observations on the dependent variable are stacked into a column vector y; the observations on each independent variable are also stacked into column vectors, and these latter column vectors are combined into a design matrix X (not denoting a random vector in this context) of observations on the independent variables. Then the following regression equation is postulated as a description of the process that generated the data:
![{\displaystyle y=X\beta +e,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e3940aae84e376feaf1abb80f096dc5bd866b157)
where β is a postulated fixed but unknown vector of k response coefficients, and e is an unknown random vector reflecting random influences on the dependent variable. By some chosen technique such as ordinary least squares, a vector
is chosen as an estimate of β, and the estimate of the vector e, denoted
, is computed as
![{\displaystyle {\hat {e}}=y-X{\hat {\beta }}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2e018ea3b7ebf9e90d53fd573d51123d9ea260bd)
Then the statistician must analyze the properties of
and
, which are viewed as random vectors since a randomly different selection of n cases to observe would have resulted in different values for them.
Vector time series
The evolution of a k×1 random vector
through time can be modelled as a vector autoregression (VAR) as follows:
![{\displaystyle \mathbf {X} _{t}=c+A_{1}\mathbf {X} _{t-1}+A_{2}\mathbf {X} _{t-2}+\cdots +A_{p}\mathbf {X} _{t-p}+\mathbf {e} _{t},\,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6eb905df49bd5615febd034a92ff977876e8acf4)
where the i-periods-back vector observation
is called the i-th lag of
, c is a k × 1 vector of constants (intercepts), Ai is a time-invariant k × k matrix and
is a k × 1 random vector of error terms.
References
- ^ Gallager, Robert G. (2013). Stochastic Processes Theory for Applications. Cambridge University Press. ISBN 978-1-107-03975-9.
- ^ a b c d e Lapidoth, Amos (2009). A Foundation in Digital Communication. Cambridge University Press. ISBN 978-0-521-19395-5.
- ^ a b c d e Gubner, John A. (2006). Probability and Random Processes for Electrical and Computer Engineers. Cambridge University Press. ISBN 978-0-521-86470-1.
- ^ Papoulis, Athanasius (1991). Probability, Random Variables and Stochastic Processes (Third ed.). McGraw-Hill. ISBN 0-07-048477-5.
- ^ a b Kendrick, David (1981). Stochastic Control for Economic Models. McGraw-Hill. ISBN 0-07-033962-7.
Further reading
- Stark, Henry; Woods, John W. (2012). "Random Vectors". Probability, Statistics, and Random Processes for Engineers (Fourth ed.). Pearson. pp. 295–339. ISBN 978-0-13-231123-6.