ESP32 Wrapper for Marisa compressed static dictionary lookups

User avatar
arundale
Posts: 10
Joined: Thu Oct 11, 2018 11:14 am

ESP32 Wrapper for Marisa compressed static dictionary lookups

Postby arundale » Fri Sep 23, 2022 1:13 pm

Try my new Arduino library: https://github.com/siara-cc/marisa-esp32

This is a ESP32 Arduino wrapper for Marisa - a library for building and querying compressed static dictionaries. The original library for Microprocessors [can be found here](https://github.com/siara-cc/marisa-esp32).

Applications
  • Compressing large string arrays
  • Spell check
  • Autocomplete
Although Spell check and Autocomplete may sound far fetched for ESP32, it is quite useful to have these features for implementing a messaging system over BLE or LORA such as the [Meshtastic library](https://github.com/meshtastic/Meshtastic-device), since the input devices that may be available for keying in messages could be quite restrictive.

Getting started

The example provided with this Arduino library has a dictionary that was created using Marisa consisting of 487,568 words/phrases. Almost all English words or phrases should be found in this dictionary. Flash this example into any ESP32 dev board and query the dictionary by typing some English word.

Marisa provides 4 functions: `lookup`, `reverse_lookup`, `common_prefix_search`, `predictive_search` and these can be tried out using the given example. Screenshots of how each of these function work are shown below:

- lookup() and reverse_lookup()

Image

- common_prefix_search()

Image

- predictive_search()

Image

Dependencies / pre-requisites

No dependencies except for the Arduino and ESP32 core SDK. The dictionaries can be built using the tools provided with [Marisa library](https://github.com/siara-cc/marisa-esp32).

Using your own dictionaries in sketches

To query your own dictionary in Arduino sketches, please make a text file with list of entries and build the dictionary using `marisa-build` utility provided under the tools folder of the Marisa library such as:

Code: Select all

marisa-build mylist.txt -o mydict.bin
After this, `mydict.bin` can be incorporated as a flash string in sketches by converting it into a string using the `print_file_as_string.cpp` utility provided with this library. It can be compiled using the command:

Code: Select all

g++ -o print_file_as_string print_file_as_string.cpp
./print_file_as_string mydict.bin > string.txt
Then copy paste the contents of `string.txt` into your sketch and load it as `marisa::Trie` object as shown below:

Code: Select all

#define KEY_COUNT <enter number of entries>
const char data[] PROGMEM = "<paste here>";
trie.map(data, KEY_COUNT);
Once the dictionary is loaded as an object, it can be queried using one of the four functions as shown in the example supplied.

Acknowledgements
License

The sketch with this library itself is available under Apache 2.0 License but the original Marisa library is dual licensed (BSD and LGPL) and any use of this library should also follow its [licensing terms](https://github.com/s-yata/marisa-trie/b ... COPYING.md).

Issues

Please contact the author (Arundale Ramanathan <arun@siara.cc>) or create issue in github repository if you face problems.

Who is online

Users browsing this forum: No registered users and 114 guests