Verification of Bangla Sentence Structure using N-Gram
Keywords:
N-gram, sentence structure, corpus, witten-bell smoothing, word error
Abstract
Statistical N-gram language modeling is used in many domains like spelling and syntactic verification, speech recognition, machine translation, character recognition and like others. This paper describes a system for sentence structure verification based on Ngram modeling of Bangla. An experimental corpus containing one million word tokens was used to train the system. The corpus was a part of the BdNC01 corpus, created in the SIPL lab. of Islamic university. Collecting several sample text from different newspapers, the system was tested by 1000 correct and another 1000 incorrect sentences. The system has successfully identified the structural validity of test sentences at a rate of 93%. This paper also describes the limitations of our system with possible solutions.
Downloads
- Article PDF
- TEI XML Kaleidoscope (download in zip)* (Beta by AI)
- Lens* NISO JATS XML (Beta by AI)
- HTML Kaleidoscope* (Beta by AI)
- DBK XML Kaleidoscope (download in zip)* (Beta by AI)
- LaTeX pdf Kaleidoscope* (Beta by AI)
- EPUB Kaleidoscope* (Beta by AI)
- MD Kaleidoscope* (Beta by AI)
- FO Kaleidoscope* (Beta by AI)
- BIB Kaleidoscope* (Beta by AI)
- LaTeX Kaleidoscope* (Beta by AI)
How to Cite
References
Published
2014-01-15
Issue
Section