Speech recognition for a digital video library
1998; Wiley; Volume: 49; Issue: 7 Linguagem: Inglês
10.1002/(sici)1097-4571(1998)49
ISSN1097-4571
AutoresMichael Witbrock, Alexander G. Hauptmann,
Tópico(s)Image Retrieval and Classification Techniques
ResumoJournal of the American Society for Information ScienceVolume 49, Issue 7 p. 619-632 Speech recognition for a digital video library Michael J. Witbrock, Corresponding Author Michael J. Witbrock [email protected] Artificial Intelligence Research Group, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213Justresearch, 4616 Henry St., Pittsburgh, PA 15213Search for more papers by this authorAlexander G. Hauptmann, Alexander G. Hauptmann [email protected] Artificial Intelligence Research Group, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213Search for more papers by this author Michael J. Witbrock, Corresponding Author Michael J. Witbrock [email protected] Artificial Intelligence Research Group, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213Justresearch, 4616 Henry St., Pittsburgh, PA 15213Search for more papers by this authorAlexander G. Hauptmann, Alexander G. Hauptmann [email protected] Artificial Intelligence Research Group, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213Search for more papers by this author First published: 06 January 1999 https://doi.org/10.1002/(SICI)1097-4571(19980515)49:7 3.0.CO;2-ACitations: 10AboutPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Abstract The standard method for making the full content of audio and video material searchable is to annotate it with human-generated meta-data that describes the content in a way that the search can understand, as is done in the creation of multimedia CD-ROMs. However, for the huge amounts of data that could usefully be included in digital video and audio libraries, the cost of producing this meta-data is prohibitive. In the Informedia Digital Video Library, the production of the meta-data supporting the library interface is automated using techniques derived from artificial intelligence (AI) research. By applying speech recognition together with natural language processing, information retrieval, and image analysis, an interface has been produced that helps users locate the information they want, and navigate or browse the digital video library more effectively. Specific interface components include automatic titles, filmstrips, video skims, word location marking, and representative frames for shots. Both the user interface and the information retrieval engine within Informedia are designed for use with automatically derived meta-data, much of which depends on speech recognition for its production. Some experimental information retrieval results will be given, supporting a basic premise of the Informedia project: That speech recognition generated transcripts can make multimedia material searchable. The Informedia project emphasizes the integration of speech recognition, image processing, natural language processing, and information retrieval to compensate for deficiencies in these individual technologies. © 1998 John Wiley & Sons, Inc. References Brown, M. G., Foote, J. T., Jones, G. J. F., Sparck Jones, K., & Young, S. J. (1995, November) Automatic content-based retrieval of broadcast news. Proceedings of ACM Multimedia (pp 35– 43) San Francisco ACM. Cable News Network/Intel (1995) CNN at work—Live news on your networked PC product information Available http //www intel com/ comm-net/cnn_work/index html. Chnstel, M., Kanade, T., Mauldin, M., Reddy, R., Sirbu, M., Stevens, S., & Wactlar, H. (1994a, April) Informedia Digital Video Library. Communications of the ACM, 38(4), 57– 58. Chnstel, M., Stevens, S., & Wactlar, H. (1994b, October) Informedia digital video library. Proceedings of the Second ACM International Conference on Multimedia, Video Program (pp 480– 481) New York ACM. CMU-Speech (1995) Available URL http//www speech cs emu edu/speech/ CMU-Speech (1996) Available URL http//www speech cs emu edu/ cgi-bin/cmudict Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkam, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., & Yanker, P. (1995, September) Query by image and video content The QBIC system. IEEE Computer, pp 23– 31. Hauptmann, A. G., & Smith, M. A. (1995) Text, speech and vision for video segmentation The Informedia project. AAAI Fall Symposiumn Computational Models for Integrating Language and Vision, November 10–12, 1995, Boston, MA (pp. 90– 95). Hauptmann, A. G., & Witbrock, M. J. (1997). Informedia news on demand: Multimedia information acquisition and retrieval. In Maybury, M, T, (Ed.), Intelligent multimedia information retrieval, pp. 215– 239. Menlo Park, CA: AAAI Press/MIT Press. Hauptmann, A. G., Witbrock, M. J., Rudnicky, A. I., & Reed, S. (1995). Speech for multimedia information retrieval. UIST-95, Proceedings of User Interface Software Technology. Pittsburgh, PA, November, 1995. Hwang, M., Rosenfeld, R., Thayer, E., Mosur, R., Chase, L., Weide, R., Huang, X., & Alleva, F. (1994). Improving speech recognition performance via phone-dependent VQ codebooks and adaptive language models in SPHINX-II. ICASSP-94, I, 549– 552. Informedia. (1995). Available: http://www.informedia.cs.cmu.edu/ James, D. A. (1996). System for unrestricted topic retrieval from radio news broadcasts. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 1996, Atlanta, GA (pp. 279– 282). Jones, G. J. F., Foote, J. T., Sparck Jones, K., & Young, S. J. (1996). Retrieving spoken documents by combining multiple index sources. SIGIR-96, Proceedings of the 1996 ACM SIGIR Conference, Li, W., Gauch, S., Gauch, J., & Pua, K. M. (1996). VISION: A digital video library. Digital Libraries '96: 1st ACM International Conference on Research and Development in Digital Libraries, March 1996, Bethesda, MD. Mani, I., House, D., Maybury, M., & Green, M. (1997). Towards content-based browsing of broadcast news video. In M. T. Maybury (Ed.), Intelligent Multimedia Information Retrieval. Maybury, M., Merlino, A., & Rayson, J. Segmentation, content extraction and visualization of broadcast news video using multistream analysis. (1997). AAAI Spring Symposium on Intelligent Integration and Use of Text, Image, Video and Audio Corpora. Ogle, V., & Stonebraker, M. (1995, September). Chabot: Retrieval from a relational database of images. IEEE Computer, 28(9). Pentland, A., Picard, R., & Sclaroff, S. (1994). Photobook: Tools for content-base manipulation of image databases. SPIE Conference on Storage and Retrieval of Image and Video Databases II, (SPIE paper 2185–05) February 6–10, 1994, San Jose, CA (pp. 34– 47). Rudnicky, A. (1995). Language modeling with limited domain data. Proceeding of the 1995 ARPA Workshop on Spoken Language Technology. San Mateo: Morgan Kauffmann (pp. 66– 69). G. Salton (Ed.). (1971). The SMART retrieval system. Englewood Cliffs, NJ: Prentice-Hall. Schauble, P., & Wechsler, M. (1995, August). First experiences with a system for content based retrieval of information from speech recordings. In M. T. Maybury (Chair), IJCAI-95 Workshop on Intelligent Multimedia Information Retrieval, [working notes] pp. 59– 69. Wactlar, H. D., Kanade, T., Smith, M. A., & Stevens, S. M. (1996, May). Intelligent access to digital video: Informedia project. IEEE Computer, 29(5), 46– 52. Witten, I. H., Moffat, A., & Bell, T. C. (1994). Managing gigabytes: Compressing and indexing documents and images. Van Nostrand Reinhold. Woods, B. (1996). Conceptually indexed video: Enhanced storage and retrieval. Available: http://www.sun.com/960201/cover/video.html Zhang, H., Low, C., & Smoliar, S. (1995, March). Video parsing and indexing of compressed data. Multimedia Tools and Applications, 1, 89– 111. Citing Literature Volume49, Issue7Special Issue: Artificial Intelligence Techniques for Emerging Information Systems Applications15 May 1998Pages 619-632 ReferencesRelatedInformation
Referência(s)