🙏 Acknowledgments
lexprep is built on the shoulders of giants. We integrate and depend on several excellent open-source libraries created by talented researchers and developers around the world. This page acknowledges their work and provides proper attribution.
We encourage you to visit their repositories, star their projects, and support their continued development.
English Libraries
g2p-en G2P
Grapheme-to-phoneme conversion using the CMU Pronouncing Dictionary and a neural network model for out-of-vocabulary words.
Authors: Kyubyong Park & Jongseok Kim (2019)
License: Apache License 2.0
github.com/Kyubyong/g2p
pyphen Syllables
Pure Python hyphenation library based on TeX-compatible hyphenation dictionaries from LibreOffice (Hunspell format).
Authors: Kozea
License: LGPL/GPL/MPL
github.com/Kozea/Pyphen
spaCy POS
Industrial-strength natural language processing library with pre-trained models for many languages.
Authors: Explosion
License: MIT License
spacy.io
Persian Libraries
PersianG2p G2P
Persian grapheme-to-phoneme conversion library. Handles Persian script and produces phoneme-like Latin transcriptions.
Authors: Demetry Pascal (forked from AzamRabiee)
License: MIT License
github.com/PasaOpasen/PersianG2P
Stanza POS
Stanford NLP Group's Python library for many human languages. Provides Universal Dependencies-style annotations including POS tagging and lemmatization.
Authors: Qi et al. (2020), Stanford NLP Group
License: Apache 2.0
stanfordnlp.github.io/stanza
Japanese Libraries
Stanza POS
Stanford NLP Group's Python library for many human languages. Provides Universal Dependencies-style annotations.
Authors: Qi et al. (2020), Stanford NLP Group
License: Apache 2.0
stanfordnlp.github.io/stanza
Fugashi Tokenizer
A Cython wrapper for MeCab, a Japanese morphological analyzer. Fast and easy to use from Python.
Authors: Paul O'Leary McCann
License: MIT License
github.com/polm/fugashi
UniDic Dictionary
UniDic is a dictionary for morphological analysis of Japanese, developed by NINJAL (National Institute for Japanese Language and Linguistics).
Authors: NINJAL
License: GPL/LGPL/BSD
clrd.ninjal.ac.jp/unidic
📖 Citing lexprep
If you use lexprep in your research, please cite it alongside the underlying libraries you used. This helps support the development of open-source research tools.
Mazaherizaveh, S. (2026). lexprep: A toolkit for research
wordlist preparation. GitHub repository.
https://github.com/sajjad-mazaheri/lexprep
Please also cite the specific language libraries you used (spaCy, Stanza, etc.) as listed in the Academic References section above.
License Information
lexprep itself is released under the MIT License, which allows free use, modification, and distribution.
However, some underlying libraries have different licenses (GPL, LGPL, Apache, etc.). If you redistribute lexprep or include it in other projects, please ensure compliance with all applicable licenses.
When in doubt, refer to the individual library repositories for their specific license terms and requirements.