diff --git a/docs/faq.md b/docs/faq.md index f1ba09e..5bc381b 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -6,3 +6,9 @@ FAQ When you execute recognize() function (ex: `recognize(image, 'eng')`), the language model to download is determined by the 2nd argument of recognize(). (`eng` in the example) Tesseract.js will first check if \*.traineddata already exists. (browser: [IndexedDB](https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API), Node.js: fs, in the folder you execute the command) If the \*.traineddata doesn't exist, it will fetch \*.traineddata.gz from [tessdata](https://github.com/naptha/tessdata), ungzip and store in IndexedDB or fs, you can delete it manually and it will download again for you. + +## How can I train my own \*.traineddata? + +For tesseract.js v2, check [TrainingTesseract 4.00](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00) + +For tesseract.js v1, check [Training Tesseract 3.03–3.05](https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.03%E2%80%933.05)