Abstract:In view of the limitation exhibited by the statistical model in its size of the annotated corpus with its inability to capture the context information of the tagging sequence, a Hindi part-of-speech tagging model with deep learning and statistical learning combined has thus been proposed. The model has a three-layer logical structure. First, the deep neural network framework is used to extract the morphological features of Hindi words in the word representation layer, together with using the word2vec method to generate with the low-dimensional dense real number word vectors with semantic information to be generated by adopting the word2vec method. Next, in the sequence presentation layer, the information features of the input sequence can be obtained with morphological features and word vectors used as the input of the deep neural network model tested by training. Finally, in the CRF inference layer, with the output state of the deep neural network model and the current transition probability matrix as the parameters of the CRF model, the optimal label sequence can be finally obtained. The results show that, compared with other statistical models, the performance of the proposed method has been significantly improved.