A convergent gambling estimate of the entropy of English
1978; Institute of Electrical and Electronics Engineers; Volume: 24; Issue: 4 Linguagem: Inglês
10.1109/tit.1978.1055912
ISSN1557-9654
Autores Tópico(s)Evolutionary Algorithms and Applications
ResumoIn his original paper on the subject, Shannon found upper and lower bounds for the entropy of printed English based on the number of trials required for a subject to guess subsequent symbols in a given text. The guessing approach precludes asymptotic consistency of either the upper or lower bounds except for degenerate ergodic processes. Shannon's technique of guessing the next symbol is altered by having the subject place sequential bets on the next symbol of text. If S_{n} denotes the subject's capital after n bets at 27 for 1 odds, and if it is assumed that the subject knows the underlying probability distribution for the process X , then the entropy estimate is \hat{H}_{n}(X)=(1-(1/n) \log_{27}S_{n}) \log_{2} 27 bits/symbol. If the subject does not know the true probability distribution for the stochastic process, then \hat{H}_{n}(X) is an asymptotic upper bound for the true entropy. If X is stationary, E\hat{H}_{n}(X)\rightarrowH(X), H(X) being the true entropy of the process. Moreover, if X is ergodic, then by the Shannon-McMillan-Breiman theorem \hat{H}_{n}(X)\rightarrowH(X) with probability one. Preliminary indications are that English text has an entropy of approximately 1.3 bits/symbol, which agrees well with Shannon's estimate. In his original paper on the subject, Shannon found upper and lower bounds for the entropy of printed English based on the number of trials required for a subject to guess subsequent symbols in a given text. The guessing approach precludes asymptotic consistency of either the upper or lower bounds except for degenerate ergodic processes. Shannon's technique of guessing the next symbol is altered by having the subject place sequential bets on the next symbol of text. If S_{n} denotes the subject's capital after n bets at 27 for 1 odds, and if it is assumed that the subject knows the underlying probability distribution for the process X , then the entropy estimate is \hat{H}_{n}(X)=(1-(1/n) \log_{27}S_{n}) \log_{2} 27 bits/symbol. If the subject does not know the true probability distribution for the stochastic process, then \hat{H}_{n}(X) is an asymptotic upper bound for the true entropy. If X is stationary, E\hat{H}_{n}(X)\rightarrowH(X), H(X) being the true entropy of the process. Moreover, if X is ergodic, then by the Shannon-McMillan-Breiman theorem \hat{H}_{n}(X)\rightarrowH(X) with probability one. Preliminary indications are that English text has an entropy of approximately 1.3 bits/symbol, which agrees well with Shannon's estimate. In his original paper on the subject, Shannon found upper and lower bounds for the entropy of printed English based on the number of trials required for a subject to guess subsequent symbols in a given text. The guessing approach precludes asymptotic consistency of either the upper or lower bounds except for degenerate ergodic processes. Shannon's technique of guessing the next symbol is altered by having the subject place sequential bets on the next symbol of text. If S_{n} denotes the subject's capital after n bets at 27 for 1 odds, and if it is assumed that the subject knows the underlying probability distribution for the process X , then the entropy estimate is \hat{H}_{n}(X)=(1-(1/n) \log_{27}S_{n}) \log_{2} 27 bits/symbol. If the subject does not know the true probability distribution for the stochastic process, then \hat{H}_{n}(X) is an asymptotic upper bound for the true entropy. If X is stationary, E\hat{H}_{n}(X)\rightarrowH(X), H(X) being the true entropy of the process.Moreover, if X is ergodic, then by the Shannon-McMillan-Breiman theorem \hat{H}_{n}(X)\rightarrowH(X) with probability one. Preliminary indications are that English text has an entropy of approximately 1.3 bits/symbol, which agrees well with Shannon's estimate. In his original paper on the subject, Shannon found upper and lower bounds for the entropy of printed English based on the number of trials required for a subject to guess subsequent symbols in a given text. The guessing approach precludes asymptotic consistency of either the upper or lower bounds except for degenerate ergodic processes. Shannon's technique of guessing the next symbol is altered by having the subject place sequential bets on the next symbol of text. If S_{n} denotes the subject's capital after n bets at 27 for 1 odds, and if it is assumed that the subject knows the underlying probability distribution for the process X , then the entropy estimate is \hat{H}_{n}(X)=(1-(1/n) \log_{27}S_{n}) \log_{2} 27 bits/symbol. If the subject does not know the true probability distribution for the stochastic process, then \hat{H}_{n}(X) is an asymptotic upper bound for the true entropy. If X is stationary, E\hat{H}_{n}(X)\rightarrowH(X), H(X) being the true entropy of the process. Moreover, if X is ergodic, then by the Shannon-McMillan-Breiman theorem \hat{H}_{n}(X)\rightarrowH(X) with probability one. Preliminary indications are that English text has an entropy of approximately 1.3 bits/symbol, which agrees well with Shannon's estimate.
Referência(s)