Sophia NLU Engine v1.0 Released
Sophia NLU Engine v1.0 Released
We're proud to release the Sophia NLU Engine v1.0, a lightweight, robust NLU (natural language understanding) engine developed in Rust, prioritizing efficiency and performance. Its compact, self-contained nature processes up to 20,000 words/sec with a highly accurate POS tagger, phrase interpreter, automated spelling corrections, and more.
Its privacy focused, self-contained design—free of external dependencies, API calls, or bulky PyTorch models—enables instant setup for in-house robust NLU solutions. Requires only a small Rust library and a 79MB (base) or 177MB (full) vocabulary data store making it suitable for even wearables.
If you deploy AI agents of any kind and wish for a better way to understand what your users are saying with no API calls or monthly bills, Sophia may be for you. It boasts:
- Extensive vocabulary of 914k (full) / 145k (base) words, including 65k MWEs (multi-word entities), 79k named entities, along with stems, plurals, synonyms, a vast multi-hierarchical categorization system, and more.
- Highly accurate POS tagger with custom model architecture.
- Automated spelling corrections.
- Advanced phrase interpreter that segments input into digestible phrases of verb / noun clauses.
Premium features are available including:
- Binary application with localhost RPC server for instant setup across languages.
- Easily import custom named entities into the vocabulary.
- Query selectors providing intent classification by algorithmically matching user input to pre-defined phrases with optional LLM fallback.
- User feedback pipeline allowing for seamless collection of user feedback.
- Detailed usage statistics, logging, and analytics.
You may view the full feature list, test the online demo, and download the open source code including SDK at: https://cicero.sh/sophia/
Future Roadmap
A major upgrade coming soon will bring advanced contextual awareness to Sophia, transforming it into a world leading NLU engine. There are three main components to this upgrade:
- Existing categorization system will be replaced with a vector-based scoring system, resulting in superior word clustering and granular word filtering.
- Phrase interpreter will be greatly advanced and transformed from a heuristics based interpreter to a hybrid that includes many small, efficient, accurate custom models architected for the various English language constructs (eg. anaphora resolution, phrase boundary detection, classification, negation, etc.), resulting in an exceptionally accurate and robust phrase interpreter.
- Contextual awareness training, upon completion of which will allow Sophia to differentiate, for example, the difference between "visit google.com", "visit Mark's idea", "visit the school", "visit my parents", "visit the magical kingdom in my dream", and so on. Due to novel methodologies being used, full training details will not be divulged until its open source release.
POS Tagger
Over the coming days and weeks, multiple upgrades to the vocabulary data store will be released that will enhance the POS tagger until 100% accuracy is achieved with full confidence. All upgrades will be open source and available to the public.
Multi-Lingual Support
In the near future, full multi-lingual support will be developed, with vocabulary data stores curated and trained for each language. Romance languages will be first due to their similarity to English, but all languages will follow. Please bear with us as tonal languages such as those found throughout SE Asia, RTL, and others will take some resources to perfect.
Get Started with Sophia Today!
For full feature list, online demo, and open source download, please visit: https://cicero.sh/sophia/
Although open source and free for individual use, if you will be using Sophia for commercial use, please consider acquiring a premium license and show your support for Cicero's mission of dropping our dependence on big tech through open source innovation, as outlined in the mission statement and Origins and End Goals posts.
If you have any questions or concerns, please complete the contact form for a prompt response.