🙏 Acknowledgments

lexprep is built on the shoulders of giants. We integrate and depend on several excellent open-source libraries created by talented researchers and developers around the world. This page acknowledges their work and provides proper attribution.

We encourage you to visit their repositories, star their projects, and support their continued development.

US English Libraries

g2p-en G2P

Grapheme-to-phoneme conversion using the CMU Pronouncing Dictionary and a neural network model for out-of-vocabulary words.

Authors: Kyubyong Park & Jongseok Kim (2019)
License: Apache License 2.0
github.com/Kyubyong/g2p

pyphen Syllables

Pure Python hyphenation library based on TeX-compatible hyphenation dictionaries from LibreOffice (Hunspell format).

Authors: Kozea
License: LGPL/GPL/MPL
github.com/Kozea/Pyphen

spaCy POS

Industrial-strength natural language processing library with pre-trained models for many languages.

Authors: Explosion
License: MIT License
spacy.io

IR Persian Libraries

PersianG2p G2P

Persian grapheme-to-phoneme conversion library. Handles Persian script and produces phoneme-like Latin transcriptions.

Authors: Demetry Pascal (forked from AzamRabiee)
License: MIT License
github.com/PasaOpasen/PersianG2P

Stanza POS

Stanford NLP Group's Python library for many human languages. Provides Universal Dependencies-style annotations including POS tagging and lemmatization.

Authors: Qi et al. (2020), Stanford NLP Group
License: Apache 2.0
stanfordnlp.github.io/stanza

JP Japanese Libraries

Stanza POS

Stanford NLP Group's Python library for many human languages. Provides Universal Dependencies-style annotations.

Authors: Qi et al. (2020), Stanford NLP Group
License: Apache 2.0
stanfordnlp.github.io/stanza

Fugashi Tokenizer

A Cython wrapper for MeCab, a Japanese morphological analyzer. Fast and easy to use from Python.

Authors: Paul O'Leary McCann
License: MIT License
github.com/polm/fugashi

UniDic Dictionary

UniDic is a dictionary for morphological analysis of Japanese, developed by NINJAL (National Institute for Japanese Language and Linguistics).

Authors: NINJAL
License: GPL/LGPL/BSD
clrd.ninjal.ac.jp/unidic

📖 Citing lexprep

If you use lexprep in your research, please cite it alongside the underlying libraries you used. This helps support the development of open-source research tools.

Mazaherizaveh, S. (2026). lexprep: A toolkit for research
wordlist preparation. GitHub repository.
https://github.com/sajjad-mazaheri/lexprep

Please also cite the specific language libraries you used (spaCy, Stanza, etc.) as listed in the Academic References section above.

License Information

lexprep itself is released under the MIT License, which allows free use, modification, and distribution.

However, some underlying libraries have different licenses (GPL, LGPL, Apache, etc.). If you redistribute lexprep or include it in other projects, please ensure compliance with all applicable licenses.

When in doubt, refer to the individual library repositories for their specific license terms and requirements.