Artigo Acesso aberto Revisado por pares

A convergent gambling estimate of the entropy of English

1978; Institute of Electrical and Electronics Engineers; Volume: 24; Issue: 4 Linguagem: Inglês

10.1109/tit.1978.1055912

ISSN

1557-9654

Autores

Thomas M. Cover, R King,

Tópico(s)

Evolutionary Algorithms and Applications

Resumo

In his original paper on the subject, Shannon found upper and lower bounds for the entropy of printed English based on the number of trials required for a subject to guess subsequent symbols in a given text. The guessing approach precludes asymptotic consistency of either the upper or lower bounds except for degenerate ergodic processes. Shannon's technique of guessing the next symbol is altered by having the subject place sequential bets on the next symbol of text. If S_{n} denotes the subject's capital after n bets at 27 for 1 odds, and if it is assumed that the subject knows the underlying probability distribution for the process X , then the entropy estimate is \hat{H}_{n}(X)=(1-(1/n) \log_{27}S_{n}) \log_{2} 27 bits/symbol. If the subject does not know the true probability distribution for the stochastic process, then \hat{H}_{n}(X) is an asymptotic upper bound for the true entropy. If X is stationary, E\hat{H}_{n}(X)\rightarrowH(X), H(X) being the true entropy of the process. Moreover, if X is ergodic, then by the Shannon-McMillan-Breiman theorem \hat{H}_{n}(X)\rightarrowH(X) with probability one. Preliminary indications are that English text has an entropy of approximately 1.3 bits/symbol, which agrees well with Shannon's estimate. In his original paper on the subject, Shannon found upper and lower bounds for the entropy of printed English based on the number of trials required for a subject to guess subsequent symbols in a given text. The guessing approach precludes asymptotic consistency of either the upper or lower bounds except for degenerate ergodic processes. Shannon's technique of guessing the next symbol is altered by having the subject place sequential bets on the next symbol of text. If S_{n} denotes the subject's capital after n bets at 27 for 1 odds, and if it is assumed that the subject knows the underlying probability distribution for the process X , then the entropy estimate is \hat{H}_{n}(X)=(1-(1/n) \log_{27}S_{n}) \log_{2} 27 bits/symbol. If the subject does not know the true probability distribution for the stochastic process, then \hat{H}_{n}(X) is an asymptotic upper bound for the true entropy. If X is stationary, E\hat{H}_{n}(X)\rightarrowH(X), H(X) being the true entropy of the process. Moreover, if X is ergodic, then by the Shannon-McMillan-Breiman theorem \hat{H}_{n}(X)\rightarrowH(X) with probability one. Preliminary indications are that English text has an entropy of approximately 1.3 bits/symbol, which agrees well with Shannon's estimate. In his original paper on the subject, Shannon found upper and lower bounds for the entropy of printed English based on the number of trials required for a subject to guess subsequent symbols in a given text. The guessing approach precludes asymptotic consistency of either the upper or lower bounds except for degenerate ergodic processes. Shannon's technique of guessing the next symbol is altered by having the subject place sequential bets on the next symbol of text. If S_{n} denotes the subject's capital after n bets at 27 for 1 odds, and if it is assumed that the subject knows the underlying probability distribution for the process X , then the entropy estimate is \hat{H}_{n}(X)=(1-(1/n) \log_{27}S_{n}) \log_{2} 27 bits/symbol. If the subject does not know the true probability distribution for the stochastic process, then \hat{H}_{n}(X) is an asymptotic upper bound for the true entropy. If X is stationary, E\hat{H}_{n}(X)\rightarrowH(X), H(X) being the true entropy of the process.Moreover, if X is ergodic, then by the Shannon-McMillan-Breiman theorem \hat{H}_{n}(X)\rightarrowH(X) with probability one. Preliminary indications are that English text has an entropy of approximately 1.3 bits/symbol, which agrees well with Shannon's estimate. In his original paper on the subject, Shannon found upper and lower bounds for the entropy of printed English based on the number of trials required for a subject to guess subsequent symbols in a given text. The guessing approach precludes asymptotic consistency of either the upper or lower bounds except for degenerate ergodic processes. Shannon's technique of guessing the next symbol is altered by having the subject place sequential bets on the next symbol of text. If S_{n} denotes the subject's capital after n bets at 27 for 1 odds, and if it is assumed that the subject knows the underlying probability distribution for the process X , then the entropy estimate is \hat{H}_{n}(X)=(1-(1/n) \log_{27}S_{n}) \log_{2} 27 bits/symbol. If the subject does not know the true probability distribution for the stochastic process, then \hat{H}_{n}(X) is an asymptotic upper bound for the true entropy. If X is stationary, E\hat{H}_{n}(X)\rightarrowH(X), H(X) being the true entropy of the process. Moreover, if X is ergodic, then by the Shannon-McMillan-Breiman theorem \hat{H}_{n}(X)\rightarrowH(X) with probability one. Preliminary indications are that English text has an entropy of approximately 1.3 bits/symbol, which agrees well with Shannon's estimate.

Referência(s)