Quantum relative entropy

In quantum information theory, quantum relative entropy is a measure of distinguishability between two quantum states. It is the quantum mechanical analog of relative entropy.

Motivation

For simplicity, it will be assumed that all objects in the article are finite-dimensional.

We first discuss the classical case. Suppose the probabilities of a finite sequence of events is given by the probability distribution P = {p1...pn}, but somehow we mistakenly assumed it to be Q = {q1...qn}. For instance, we can mistake an unfair coin for a fair one. According to this erroneous assumption, our uncertainty about the j-th event, or equivalently, the amount of information provided after observing the j-th event, is

log q j . {\displaystyle \;-\log q_{j}.}

The (assumed) average uncertainty of all possible events is then

j p j log q j . {\displaystyle \;-\sum _{j}p_{j}\log q_{j}.}

On the other hand, the Shannon entropy of the probability distribution p, defined by

j p j log p j , {\displaystyle \;-\sum _{j}p_{j}\log p_{j},}

is the real amount of uncertainty before observation. Therefore the difference between these two quantities

j p j log q j ( j p j log p j ) = j p j log p j j p j log q j {\displaystyle \;-\sum _{j}p_{j}\log q_{j}-\left(-\sum _{j}p_{j}\log p_{j}\right)=\sum _{j}p_{j}\log p_{j}-\sum _{j}p_{j}\log q_{j}}

is a measure of the distinguishability of the two probability distributions p and q. This is precisely the classical relative entropy, or Kullback–Leibler divergence:

D K L ( P Q ) = j p j log p j q j . {\displaystyle D_{\mathrm {KL} }(P\|Q)=\sum _{j}p_{j}\log {\frac {p_{j}}{q_{j}}}\!.}

Note

  1. In the definitions above, the convention that 0·log 0 = 0 is assumed, since lim x 0 x log ( x ) = 0 {\displaystyle \lim _{x\searrow 0}x\log(x)=0} . Intuitively, one would expect that an event of zero probability to contribute nothing towards entropy.
  2. The relative entropy is not a metric. For example, it is not symmetric. The uncertainty discrepancy in mistaking a fair coin to be unfair is not the same as the opposite situation.

Definition

As with many other objects in quantum information theory, quantum relative entropy is defined by extending the classical definition from probability distributions to density matrices. Let ρ be a density matrix. The von Neumann entropy of ρ, which is the quantum mechanical analog of the Shannon entropy, is given by

S ( ρ ) = Tr ρ log ρ . {\displaystyle S(\rho )=-\operatorname {Tr} \rho \log \rho .}

For two density matrices ρ and σ, the quantum relative entropy of ρ with respect to σ is defined by

S ( ρ σ ) = Tr ρ log σ S ( ρ ) = Tr ρ log ρ Tr ρ log σ = Tr ρ ( log ρ log σ ) . {\displaystyle S(\rho \|\sigma )=-\operatorname {Tr} \rho \log \sigma -S(\rho )=\operatorname {Tr} \rho \log \rho -\operatorname {Tr} \rho \log \sigma =\operatorname {Tr} \rho (\log \rho -\log \sigma ).}

We see that, when the states are classically related, i.e. ρσ = σρ, the definition coincides with the classical case, in the sense that if ρ = S D 1 S T {\displaystyle \rho =SD_{1}S^{\mathsf {T}}} and σ = S D 2 S T {\displaystyle \sigma =SD_{2}S^{\mathsf {T}}} with D 1 = diag ( λ 1 , , λ n ) {\displaystyle D_{1}={\text{diag}}(\lambda _{1},\ldots ,\lambda _{n})} and D 2 = diag ( μ 1 , , μ n ) {\displaystyle D_{2}={\text{diag}}(\mu _{1},\ldots ,\mu _{n})} (because ρ {\displaystyle \rho } and σ {\displaystyle \sigma } commute, they are simultaneously diagonalizable), then S ( ρ σ ) = j = 1 n λ j ln ( λ j μ j ) {\displaystyle S(\rho \|\sigma )=\sum _{j=1}^{n}\lambda _{j}\ln \left({\frac {\lambda _{j}}{\mu _{j}}}\right)} is just the ordinary Kullback-Leibler divergence of the probability vector ( λ 1 , , λ n ) {\displaystyle (\lambda _{1},\ldots ,\lambda _{n})} with respect to the probability vector ( μ 1 , , μ n ) {\displaystyle (\mu _{1},\ldots ,\mu _{n})} .

Non-finite (divergent) relative entropy

In general, the support of a matrix M is the orthogonal complement of its kernel, i.e. supp ( M ) = ker ( M ) {\displaystyle {\text{supp}}(M)={\text{ker}}(M)^{\perp }} . When considering the quantum relative entropy, we assume the convention that −s · log 0 = ∞ for any s > 0. This leads to the definition that

S ( ρ σ ) = {\displaystyle S(\rho \|\sigma )=\infty }

when

supp ( ρ ) ker ( σ ) { 0 } . {\displaystyle {\text{supp}}(\rho )\cap {\text{ker}}(\sigma )\neq \{0\}.}

This can be interpreted in the following way. Informally, the quantum relative entropy is a measure of our ability to distinguish two quantum states where larger values indicate states that are more different. Being orthogonal represents the most different quantum states can be. This is reflected by non-finite quantum relative entropy for orthogonal quantum states. Following the argument given in the Motivation section, if we erroneously assume the state ρ {\displaystyle \rho } has support in ker ( σ ) {\displaystyle {\text{ker}}(\sigma )} , this is an error impossible to recover from.

However, one should be careful not to conclude that the divergence of the quantum relative entropy S ( ρ σ ) {\displaystyle S(\rho \|\sigma )} implies that the states ρ {\displaystyle \rho } and σ {\displaystyle \sigma } are orthogonal or even very different by other measures. Specifically, S ( ρ σ ) {\displaystyle S(\rho \|\sigma )} can diverge when ρ {\displaystyle \rho } and σ {\displaystyle \sigma } differ by a vanishingly small amount as measured by some norm. For example, let σ {\displaystyle \sigma } have the diagonal representation

σ = n λ n | f n f n | {\displaystyle \sigma =\sum _{n}\lambda _{n}|f_{n}\rangle \langle f_{n}|}

with λ n > 0 {\displaystyle \lambda _{n}>0} for n = 0 , 1 , 2 , {\displaystyle n=0,1,2,\ldots } and λ n = 0 {\displaystyle \lambda _{n}=0} for n = 1 , 2 , {\displaystyle n=-1,-2,\ldots } where { | f n , n Z } {\displaystyle \{|f_{n}\rangle ,n\in \mathbb {Z} \}} is an orthonormal set. The kernel of σ {\displaystyle \sigma } is the space spanned by the set { | f n , n = 1 , 2 , } {\displaystyle \{|f_{n}\rangle ,n=-1,-2,\ldots \}} . Next let

ρ = σ + ϵ | f 1 f 1 | ϵ | f 1 f 1 | {\displaystyle \rho =\sigma +\epsilon |f_{-1}\rangle \langle f_{-1}|-\epsilon |f_{1}\rangle \langle f_{1}|}

for a small positive number ϵ {\displaystyle \epsilon } . As ρ {\displaystyle \rho } has support (namely the state | f 1 {\displaystyle |f_{-1}\rangle } ) in the kernel of σ {\displaystyle \sigma } , S ( ρ σ ) {\displaystyle S(\rho \|\sigma )} is divergent even though the trace norm of the difference ( ρ σ ) {\displaystyle (\rho -\sigma )} is 2 ϵ {\displaystyle 2\epsilon } . This means that difference between ρ {\displaystyle \rho } and σ {\displaystyle \sigma } as measured by the trace norm is vanishingly small as ϵ 0 {\displaystyle \epsilon \to 0} even though S ( ρ σ ) {\displaystyle S(\rho \|\sigma )} is divergent (i.e. infinite). This property of the quantum relative entropy represents a serious shortcoming if not treated with care.

Non-negativity of relative entropy

Corresponding classical statement

For the classical Kullback–Leibler divergence, it can be shown that

D K L ( P Q ) = j p j log p j q j 0 , {\displaystyle D_{\mathrm {KL} }(P\|Q)=\sum _{j}p_{j}\log {\frac {p_{j}}{q_{j}}}\geq 0,}

and the equality holds if and only if P = Q. Colloquially, this means that the uncertainty calculated using erroneous assumptions is always greater than the real amount of uncertainty.

To show the inequality, we rewrite

D K L ( P Q ) = j p j log p j q j = j ( log q j p j ) ( p j ) . {\displaystyle D_{\mathrm {KL} }(P\|Q)=\sum _{j}p_{j}\log {\frac {p_{j}}{q_{j}}}=\sum _{j}(-\log {\frac {q_{j}}{p_{j}}})(p_{j}).}

Notice that log is a concave function. Therefore -log is convex. Applying Jensen's inequality, we obtain

D K L ( P Q ) = j ( log q j p j ) ( p j ) log ( j q j p j p j ) = 0. {\displaystyle D_{\mathrm {KL} }(P\|Q)=\sum _{j}(-\log {\frac {q_{j}}{p_{j}}})(p_{j})\geq -\log(\sum _{j}{\frac {q_{j}}{p_{j}}}p_{j})=0.}

Jensen's inequality also states that equality holds if and only if, for all i, qi = (Σqj) pi, i.e. p = q.

The result

Klein's inequality states that the quantum relative entropy

S ( ρ σ ) = Tr ρ ( log ρ log σ ) . {\displaystyle S(\rho \|\sigma )=\operatorname {Tr} \rho (\log \rho -\log \sigma ).}

is non-negative in general. It is zero if and only if ρ = σ.

Proof

Let ρ and σ have spectral decompositions

ρ = i p i v i v i , σ = i q i w i w i . {\displaystyle \rho =\sum _{i}p_{i}v_{i}v_{i}^{*}\;,\;\sigma =\sum _{i}q_{i}w_{i}w_{i}^{*}.}

So

log ρ = i ( log p i ) v i v i , log σ = i ( log q i ) w i w i . {\displaystyle \log \rho =\sum _{i}(\log p_{i})v_{i}v_{i}^{*}\;,\;\log \sigma =\sum _{i}(\log q_{i})w_{i}w_{i}^{*}.}

Direct calculation gives

S ( ρ σ ) = k p k log p k i , j ( p i log q j ) | v i w j | 2 {\displaystyle S(\rho \|\sigma )=\sum _{k}p_{k}\log p_{k}-\sum _{i,j}(p_{i}\log q_{j})|v_{i}^{*}w_{j}|^{2}}
= i p i ( log p i j log q j | v i w j | 2 ) {\displaystyle \qquad \quad \;=\sum _{i}p_{i}(\log p_{i}-\sum _{j}\log q_{j}|v_{i}^{*}w_{j}|^{2})}
= i p i ( log p i j ( log q j ) P i j ) , {\displaystyle \qquad \quad \;=\sum _{i}p_{i}(\log p_{i}-\sum _{j}(\log q_{j})P_{ij}),}

where Pi j = |vi*wj|2.

Since the matrix (Pi j)i j is a doubly stochastic matrix and -log is a convex function, the above expression is

i p i ( log p i log ( j q j P i j ) ) . {\displaystyle \geq \sum _{i}p_{i}(\log p_{i}-\log(\sum _{j}q_{j}P_{ij})).}

Define ri = Σjqj Pi j. Then {ri} is a probability distribution. From the non-negativity of classical relative entropy, we have

S ( ρ σ ) i p i log p i r i 0. {\displaystyle S(\rho \|\sigma )\geq \sum _{i}p_{i}\log {\frac {p_{i}}{r_{i}}}\geq 0.}

The second part of the claim follows from the fact that, since -log is strictly convex, equality is achieved in

i p i ( log p i j ( log q j ) P i j ) i p i ( log p i log ( j q j P i j ) ) {\displaystyle \sum _{i}p_{i}(\log p_{i}-\sum _{j}(\log q_{j})P_{ij})\geq \sum _{i}p_{i}(\log p_{i}-\log(\sum _{j}q_{j}P_{ij}))}

if and only if (Pi j) is a permutation matrix, which implies ρ = σ, after a suitable labeling of the eigenvectors {vi} and {wi}.

Joint convexity of relative entropy

The relative entropy is jointly convex. For 0 λ 1 {\displaystyle 0\leq \lambda \leq 1} and states ρ 1 ( 2 ) , σ 1 ( 2 ) {\displaystyle \rho _{1(2)},\sigma _{1(2)}} we have

D ( λ ρ 1 + ( 1 λ ) ρ 2 λ σ 1 + ( 1 λ ) σ 2 ) λ D ( ρ 1 σ 1 ) + ( 1 λ ) D ( ρ 2 σ 2 ) {\displaystyle D(\lambda \rho _{1}+(1-\lambda )\rho _{2}\|\lambda \sigma _{1}+(1-\lambda )\sigma _{2})\leq \lambda D(\rho _{1}\|\sigma _{1})+(1-\lambda )D(\rho _{2}\|\sigma _{2})}

Monotonicity of relative entropy

The relative entropy decreases monotonically under completely positive trace preserving (CPTP) operations N {\displaystyle {\mathcal {N}}} on density matrices,

S ( N ( ρ ) N ( σ ) ) S ( ρ σ ) {\displaystyle S({\mathcal {N}}(\rho )\|{\mathcal {N}}(\sigma ))\leq S(\rho \|\sigma )} .

This inequality is called Monotonicity of quantum relative entropy and was first proved by Lindblad.

An entanglement measure

Let a composite quantum system have state space

H = k H k {\displaystyle H=\otimes _{k}H_{k}}

and ρ be a density matrix acting on H.

The relative entropy of entanglement of ρ is defined by

D R E E ( ρ ) = min σ S ( ρ σ ) {\displaystyle \;D_{\mathrm {REE} }(\rho )=\min _{\sigma }S(\rho \|\sigma )}

where the minimum is taken over the family of separable states. A physical interpretation of the quantity is the optimal distinguishability of the state ρ from separable states.

Clearly, when ρ is not entangled

D R E E ( ρ ) = 0 {\displaystyle \;D_{\mathrm {REE} }(\rho )=0}

by Klein's inequality.

Relation to other quantum information quantities

One reason the quantum relative entropy is useful is that several other important quantum information quantities are special cases of it. Often, theorems are stated in terms of the quantum relative entropy, which lead to immediate corollaries concerning the other quantities. Below, we list some of these relations.

Let ρAB be the joint state of a bipartite system with subsystem A of dimension nA and B of dimension nB. Let ρA, ρB be the respective reduced states, and IA, IB the respective identities. The maximally mixed states are IA/nA and IB/nB. Then it is possible to show with direct computation that

S ( ρ A | | I A / n A ) = l o g ( n A ) S ( ρ A ) , {\displaystyle S(\rho _{A}||I_{A}/n_{A})=\mathrm {log} (n_{A})-S(\rho _{A}),\;}
S ( ρ A B | | ρ A ρ B ) = S ( ρ A ) + S ( ρ B ) S ( ρ A B ) = I ( A : B ) , {\displaystyle S(\rho _{AB}||\rho _{A}\otimes \rho _{B})=S(\rho _{A})+S(\rho _{B})-S(\rho _{AB})=I(A:B),}
S ( ρ A B | | ρ A I B / n B ) = l o g ( n B ) + S ( ρ A ) S ( ρ A B ) = l o g ( n B ) S ( B | A ) , {\displaystyle S(\rho _{AB}||\rho _{A}\otimes I_{B}/n_{B})=\mathrm {log} (n_{B})+S(\rho _{A})-S(\rho _{AB})=\mathrm {log} (n_{B})-S(B|A),}

where I(A:B) is the quantum mutual information and S(B|A) is the quantum conditional entropy.

References

  • Vedral, V. (8 March 2002). "The role of relative entropy in quantum information theory". Reviews of Modern Physics. 74 (1). American Physical Society (APS): 197–234. arXiv:quant-ph/0102094. Bibcode:2002RvMP...74..197V. doi:10.1103/revmodphys.74.197. ISSN 0034-6861. S2CID 6370982.
  • Michael A. Nielsen, Isaac L. Chuang, "Quantum Computation and Quantum Information"
  • Marco Tomamichel, "Quantum Information Processing with Finite Resources -- Mathematical Foundations". arXiv:1504.00233