Preview

Vestnik natsional'nogo issledovatel'skogo yadernogo universiteta "MIFI"

Advanced search

Application of Restricted Boltzmann Machines to Solve the Problem of Author’s Profiling of Russian Texts

https://doi.org/10.1134/S2304487X20050144

Abstract

   The possibility of using the restricted Boltzmann machine (RBM) to solve the problem of author’s profiling of Russian texts has been studied on the example of determining the gender and age of an author. The restricted Boltzmann machine is used as a transformer that extracts useful features from documents, where words are encoded using morphological tags. The classification is carried out using a composite two-layer module, which includes MultinomialNB and LinearSVC. Within this task, four corpuses of documents are used, three of which are classified by the gender of the author, and the fourth one, by age. The experiments show that the constructed model successfully solves the tasks assigned to it, surpassing the baseline model (LinearSVC) on average (for all four corpuses) by 7.5 % in terms of f1-score. In addition, of the results are compared with the results of other models from the literature (in particular, using a complex model, based on a convolutional neural network and LSTM ). This comparison shows the efficiency of the constructed composite neural network based on the RBM and the stability of its results on a set of presented corpora.

About the Authors

A. G. Sboev
National Research Center Kurchatov Institute; National Research Nuclear University MEPhI (Moscow Engineering Physics Institute)
Russian Federation

123182

115409

Moscow



R. B. Rybka
National Research Center Kurchatov Institute
Russian Federation

123182

Moscow



Yu. A. Davydov
National Research Center Kurchatov Institute
Russian Federation

123182

Moscow



A. A. Selivanov
National Research Center Kurchatov Institute
Russian Federation

123182

Moscow



References

1. Goodfellow I., Yoshua Bengio, Courville A. Deep Learning. MIT Press, 2016.

2. Hinton G. E., Osindero S., Yee Whye Teh. A fast learning algorithm for deep belief nets // Neural Computation. 2006. V. 18. P. 1527–1554.

3. Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle. Greedy layer-wise training of deep networks / In: Advances in Neural Information Processing Systems 19 (NIPS’06), Ed. by B. Schölkopf, J. Platt, T. Hoffman. 2007. MIT Press, 2007. P. 153–160.

4. Welling M., Sutton Ch. Learning in Markov random fields with contrastive free energies / In: Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (AISTATS’05), 2005. P. 397–404.

5. Salakhutdinov R., Hinton G. E. Semantic hashing / In: Proceedings of the 2007 Workshop on Information Retrieval and applications of Graphical Models (SIGIR’07), Amsterdam: Elsevier, 2007.

6. Mnih A., Hinton G. E. Three new graphical models for statistical language modelling / In: Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML’07). Ed. by Zoubin Ghahramani, ACM, 2007. P. 641–648.

7. Larochelle H. et al. Learning Algorithms for the Classification Restricted Boltzmann Machine // J. Mach. Learn. Res. 2012. V. 13. P. 643–669.

8. Antkiewicz M., Kuta Marcin, Kitowski J. Author Profiling with Classification Restricted Boltzmann Machines. 2017.

9. Sboev A., Litvinova T., Gudovskikh D., Rybka R., Moloshnikov I. Machine Learning Models of Text Categorization by Author Gender Using Topic-independent Features // Procedia Computer Science, 2016. V. 101. P. 135–142.

10. Pedregosa et al. Scikit-learn: Machine Learning in Python. JMLR, 2011. V. 12. P. 2825–2830.

11. Trang T. Le, Weixuan Fu, Jason H. Moore. Scaling treebased automated machine learning to biomedical big data with a feature set selector // Bioinformatics. 2020. V. 36. № 1. P. 250–256.

12. Bergstra J., Yamins D., Cox D. D. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures / To appear in Proc. of the 30th International Conference on Machine Learning (ICML 2013). 2013.

13. Sboev A., Gudovskikh D., Moloshnikov I., Rybka R. A gender identification of text author in mixture of Russian multi-genre texts with distortions on base of data-driven approach using machine learning models // AIP Conference Proceedings. 2019. V. 2116. P. 270006. doi: 10.1063/1.5114280

14. Sboev A., Rybka R., Moloshnikov I., Gudovskikh D., Litvinova T. To the question of data-driven identification of author’s age for Russian texts with age deceptions using machine learning // Journal of Physics: Conf. Ser. 2019. V. 1205. P. 012049.


Review

For citations:


Sboev A.G., Rybka R.B., Davydov Yu.A., Selivanov A.A. Application of Restricted Boltzmann Machines to Solve the Problem of Author’s Profiling of Russian Texts. Vestnik natsional'nogo issledovatel'skogo yadernogo universiteta "MIFI". 2020;9(5):475-480. (In Russ.) https://doi.org/10.1134/S2304487X20050144

Views: 132


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2304-487X (Print)