diff --git a/README.md b/README.md index 85dca15..c4c27c8 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ [![Downloads Month](https://img.shields.io/npm/dm/tesseract.js.svg)](https://www.npmjs.com/package/tesseract.js)

- Version 2 is now available and under development in the master branch
+ Version 2 beta is now available and under development in the master branch
Check the support/1.x branch for version 1

@@ -26,25 +26,45 @@ It works in the browser using [webpack](https://webpack.js.org/) or plain script After you [install it](#installation), using it is as simple as: ```javascript -import { TesseractWorker } from 'tesseract.js'; -const worker = new TesseractWorker(); - -worker.recognize(myImage) - .progress(progress => { - console.log('progress', progress); - }).then(result => { - console.log('result', result); - }); +import Tesseract from 'tesseract.js'; + +Tesseract.recognize( + 'https://tesseract.projectnaptha.com/img/eng_bw.png', + 'eng', + { logger: m => console.log(m) } +).then(({ data: { text } }) => { + console.log(text); +}) +``` + +Or more imperative + +```javascript +import { createWorker } from 'tesseract.js'; + +const worker = createWorker({ + logger: m => console.log(m) +}); + +(async () => { + await worker.load(); + await worker.loadLanguage('eng'); + await worker.initialize('eng'); + const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png'); + console.log(text); + await woker.terminate(); +})(); ``` [Check out the docs](#docs) for a full explanation of the API. -## Major changes in v2 -- Upgrade to tesseract v4 +## Major changes in v2 beta +- Upgrade to tesseract v4.1 (using emscripten 1.38.45) - Support multiple languages at the same time, eg: eng+chi_tra for English and Traditional Chinese - Supported image formats: png, jpg, bmp, pbm - Support WebAssembly (fallback to ASM.js when browser doesn't support) +- Support Typescript ## Installation @@ -54,7 +74,7 @@ Tesseract.js works with a ` + @@ -103,7 +123,7 @@ npm start ``` The development server will be available at http://localhost:3000/examples/browser/demo.html in your favorite browser. -It will automatically rebuild `tesseract.dev.js` and `worker.min.js` when you change files in the src folder. +It will automatically rebuild `tesseract.dev.js` and `worker.dev.js` when you change files in the **src** folder. You can also run the development server in Gitpod ( a free online IDE and dev environment for GitHub that will automate your dev setup ) with a single click. diff --git a/docs/api.md b/docs/api.md index b95d6c9..2f1bedb 100644 --- a/docs/api.md +++ b/docs/api.md @@ -1,5 +1,249 @@ # API +- [createWorker()](#create-worker) + - [Worker.load](#worker-load) + - [Worker.loadLanguage](#worker-load-language) + - [Worker.initialize](#worker-initialize) + - [Worker.setParameters](#worker-set-parameters) + - [Worker.recognize](#worker-recognize) + - [Worker.detect](#worker-detect) + - [Worker.terminate](#worker-terminate) +- [createScheduler()](#create-scheduler) + - [Scheduler.addWorker](#scheduler-add-worker) + - [Scheduler.addJob](#scheduler-add-job) + - [Scheduler.getQueueLen](#scheduler-get-queue-len) + - [Scheduler.getNumWorkers](#scheduler-get-num-workers) +- [setLogging()](#set-logging) +- [recognize()](#recognize) +- [detect()](#detect) +- [PSM](#psm) +- [OEM](#oem) + +--- + + +## createWorker(options): Worker + +createWorker is a factory function that creates a tesseract worker, a worker is basically a Web Worker in browser and Child Process in Node. + +**Arguments:** + +- `options` an object of customized options + - `corePath` path for tesseract-core.js script + - `langPath` path for downloading traineddata, do not include `/` at the end of the path + - `workerPath` path for downloading worker script + - `dataPath` path for saving traineddata in WebAssembly file system, not common to modify + - `cachePath` path for the cached traineddata, more useful for Node, for browser it only changes the key in IndexDB + - `cacheMethod` a string to indicate the method of cache management, should be one of the following options + - write: read cache and write back (default method) + - readOnly: read cache and not to write back + - refresh: not to read cache and write back + - none: not to read cache and not to write back + - `workerBlobURL` a boolean to define whether to use Blob URL for worker script, default: true + - `gzip` a boolean to define whether the traineddata from the remote is gzipped, default: true + - `logger` a function to log the progress, a quick example is `m => console.log(m)` + + +**Examples:** + +```javascript +const { createWorker } = Tesseract; +const worker = createWorker({ + langPath: '...', + logger: m => console.log(m), +}); +``` + +## Worker + +A Worker helps you to do the OCR related tasks, it takes few steps to setup Worker before it is fully functional. The full flow is: + +- load +- loadLanguauge +- initialize +- setParameters // optional +- recognize or detect +- terminate + +Each function is async, so using async/await or Promise is required. When it is resolved, you get an object: + +```json +{ + "jobId": "Job-1-123", + "data": { ... } +} +``` + +jobId is generated by Tesseract.js, but you can put your own when calling any of the function above. + + +### Worker.load(jobId): Promise + +Worker.load() loads tesseract.js-core scripts (download from remote if not presented), it makes Web Worker/Child Process ready for next action. + +**Arguments:** + +- `jobId` Please see details above + +**Examples:** + +```javascript +(async () => { + await worker.load(); +})(); +``` + + +### Worker.loadLanguage(langs, jobId): Promise + +Worker.loadLanguage() loads traineddata from cache or download traineddata from remote, and put traineddata into the WebAssembly file system. + +**Arguments:** + +- `langs` a string to indicate the languages traineddata to download, multiple languages are concated with **+**, ex: **eng+chi\_tra** +- `jobId` Please see details above + +**Examples:** + +```javascript +(async () => { + await worker.loadLanguage('eng+chi_tra'); +})(); +``` + + +### Worker.initialize(langs, oem, jobId): Promise + +Worker.initialize() initializes the Tesseract API, make sure it is ready for doing OCR tasks. + +**Arguments:** + +- `langs` a string to indicate the languages loaded by Tesseract API, it can be the subset of the languauge traineddata you loaded from Worker.loadLanguage. +- `oem` a enum to indicate the OCR Engine Mode you use +- `jobId` Please see details above + +**Examples:** + +```javascript +(async () => { + /** You can load more languages in advance, but use only part of them in Worker.initialize() */ + await worker.loadLanguage('eng+chi_tra'); + await worker.initialize('eng'); +})(); +``` + +### Worker.setParameters(params, jobId): Promise + +Worker.setParameters() set parameters for Tesseract API (using SetVariable()), it changes the behavior of Tesseract and some parameters like tessedit\_char\_whitelist is very useful. + +**Arguments:** + +- `params` an object with key and value of the parameters +- `jobId` Please see details above + +**Supported Paramters:** + +| name | type | default value | description | +| ---- | ---- | ------------- | ----------- | +| tessedit\_ocr\_engine\_mode | enum | OEM.LSTM\_ONLY | Check [HERE](https://github.com/tesseract-ocr/tesseract/blob/4.0.0/src/ccstruct/publictypes.h#L268) for definition of each mode | +| tessedit\_pageseg\_mode | enum | PSM.SINGLE\_BLOCK | Check [HERE](https://github.com/tesseract-ocr/tesseract/blob/4.0.0/src/ccstruct/publictypes.h#L163) for definition of each mode | +| tessedit\_char\_whitelist | string | '' | setting white list characters makes the result only contains these characters, useful the content in image is limited | +| tessjs\_create\_hocr | string | '1' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes hocr in the result | +| tessjs\_create\_tsv | string | '1' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes tsv in the result | +| tessjs\_create\_box | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes box in the result | +| tessjs\_create\_unlv | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes unlv in the result | +| tessjs\_create\_osd | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes osd in the result | + +**Examples:** + +```javascript +(async () => { + await worker.setParameters({ + tessedit_char_whitelist: '0123456789', + }); +}) +``` + + + +### Worker.recognize(image, options, jobId): Promise + +### Worker.detect(image, jobId): Promise + +### Worker.terminate(jobId): Promise + + +## createScheduler(): Scheduler + + +### Scheduler.addWorker(worker): string + + +### Scheduler.addJob(worker): Promise + + +### Scheduler.getQueueLen(): number + +Scheduler.getNumWorkers() returns the length of job queue. + + +### Scheduler.getNumWorkers(): number + +Scheduler.getNumWorkers() returns number of workers added into the scheduler + + +### Scheduler.terminate(): Promise + +Scheduler.terminate() terminates all workers added, useful to do quick clean up. + +**Examples:** + +```javascript +(async () => { + await scheduler.terminate(); +})(); +``` + + +## setLogging(logging: boolean) + +setLogging() sets the logging flag, you can `setLogging(true)` to see detailed information, useful for debugging. + +**Arguments:** + +- `logging` boolean to define whether to see detailed logs, default: false + +**Examples:** + +```javascript +const { setLogging } = Tesseract; +setLogging(true); +``` + + +## recognize(image, langs, options): Promise + +recognize() is a function to quickly achieve recognize() task, it is not recommended to use in real application, but useful when you want to save some time. + +See [Tesseract.js](../src/Tesseract.js) + + +## detect(image, options): Promise + +Same background as recongize(), but it does detect instead. + +See [Tesseract.js](../src/Tesseract.js) + + +## PSM + +See [PSM.js](../src/constatns/PSM.js) + + +## OEM + +See [OEM.js](../src/constatns/OEM.js) + ## TesseractWorker.recognize(image, lang, [, options]) -> [TesseractJob](#tesseractjob) Figures out what words are in `image`, where the words are in `image`, etc. > Note: `image` should be sufficiently high resolution. diff --git a/docs/examples.md b/docs/examples.md index cc08942..eccd752 100644 --- a/docs/examples.md +++ b/docs/examples.md @@ -12,217 +12,147 @@ Example repositories: ### basic ```javascript -import Tesseract from 'tesseract.js'; - -const { TesseractWorker } = Tesseract; -const worker = new TesseractWorker(); - -worker - .recognize('https://tesseract.projectnaptha.com/img/eng_bw.png') - .progress((p) => { - console.log('progress', p); - }) - .then(({ text }) => { - console.log(text); - worker.terminate(); - }); +import { createWorker } from 'tesseract.js'; + +const worker = createWorker(); + +(async () => { + await worker.load(); + await worker.loadLanguage('eng'); + await worker.initialize('eng'); + const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png'); + console.log(text); + await worker.terminate(); +})(); ``` ### with detailed progress ```javascript -import Tesseract from 'tesseract.js'; - -const { TesseractWorker } = Tesseract; -const worker = new TesseractWorker(); - -worker - .recognize('https://tesseract.projectnaptha.com/img/eng_bw.png') - .progress((p) => { - console.log('progress', p); - }) - .then(({ text }) => { - console.log(text); - worker.terminate(); - }); +import { createWorker } from 'tesseract.js'; + +const worker = createWorker({ + logger: m => console.log(m), // Add logger here +}); + +(async () => { + await worker.load(); + await worker.loadLanguage('eng'); + await worker.initialize('eng'); + const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png'); + console.log(text); + await worker.terminate(); +})(); ``` ### with multiple languages, separate by '+' ```javascript -import Tesseract from 'tesseract.js'; - -const { TesseractWorker } = Tesseract; -const worker = new TesseractWorker(); - -worker - .recognize( - 'https://tesseract.projectnaptha.com/img/eng_bw.png', - 'eng+chi_tra' - ) - .progress((p) => { - console.log('progress', p); - }) - .then(({ text }) => { - console.log(text); - worker.terminate(); - }); +import { createWorker } from 'tesseract.js'; + +const worker = createWorker(); + +(async () => { + await worker.load(); + await worker.loadLanguage('eng+chi_tra'); + await worker.initialize('eng+chi_tra'); + const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png'); + console.log(text); + await worker.terminate(); +})(); ``` +### with whitelist char (^2.0.0-beta.1) -### with whitelist char (^2.0.0-alpha.5) +```javascript +import { createWorker } from 'tesseract.js'; -Sadly, whitelist chars is not supported in tesseract.js v4, so in tesseract.js we need to switch to tesseract v3 mode to make it work. +const worker = createWorker(); -```javascript -import Tesseract from 'tesseract.js'; - -const { TesseractWorker, OEM } = Tesseract; -const worker = new TesseractWorker(); - -worker - .recognize( - 'https://tesseract.projectnaptha.com/img/eng_bw.png', - 'eng', - { - 'tessedit_ocr_engine_mode': OEM.TESSERACT_ONLY, - 'tessedit_char_whitelist': '0123456789-.', - } - ) - .progress((p) => { - console.log('progress', p); - }) - .then(({ text }) => { - console.log(text); - worker.terminate(); +(async () => { + await worker.load(); + await worker.loadLanguage('eng'); + await worker.initialize('eng'); + await worker.setParameters({ + tessedit_char_whitelist: '0123456789', }); + const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png'); + console.log(text); + await worker.terminate(); +})(); ``` -### with different pageseg mode (^2.0.0-alpha.5) +### with different pageseg mode (^2.0.0-beta.1) Check here for more details of pageseg mode: https://github.com/tesseract-ocr/tesseract/blob/4.0.0/src/ccstruct/publictypes.h#L163 ```javascript -import Tesseract from 'tesseract.js'; - -const { TesseractWorker, PSM } = Tesseract; -const worker = new TesseractWorker(); - -worker - .recognize( - 'https://tesseract.projectnaptha.com/img/eng_bw.png', - 'eng', - { - 'tessedit_pageseg_mode': PSM.SINGLE_BLOCK, - } - ) - .progress((p) => { - console.log('progress', p); - }) - .then(({ text }) => { - console.log(text); - worker.terminate(); - }); -``` - -### with pdf output (^2.0.0-alpha.12) +import { createWorker, PSM } from 'tesseract.js'; -In this example, pdf file will be downloaded in browser and write to file system in Node.js +const worker = createWorker(); -```javascript -import Tesseract from 'tesseract.js'; - -const { TesseractWorker } = Tesseract; -const worker = new TesseractWorker(); - -worker - .recognize( - 'https://tesseract.projectnaptha.com/img/eng_bw.png', - 'eng', - { - 'tessjs_create_pdf': '1', - } - ) - .progress((p) => { - console.log('progress', p); - }) - .then(({ text }) => { - console.log(text); - worker.terminate(); +(async () => { + await worker.load(); + await worker.loadLanguage('eng'); + await worker.initialize('eng'); + await worker.setParameters({ + tessedit_pageseg_mode: PSM.SINGLE_BLOCK, }); + const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png'); + console.log(text); + await worker.terminate(); +})(); ``` -If you want to handle pdf file by yourself +### with pdf output (^2.0.0-beta.1) -```javascript -import Tesseract from 'tesseract.js'; - -const { TesseractWorker } = Tesseract; -const worker = new TesseractWorker(); - -worker - .recognize( - 'https://tesseract.projectnaptha.com/img/eng_bw.png', - 'eng', - { - 'tessjs_create_pdf': '1', - 'tessjs_pdf_auto_download': false, // disable auto download - 'tessjs_pdf_bin': true, // add pdf file bin array in result - } - ) - .progress((p) => { - console.log('progress', p); - }) - .then(({ files: { pdf } }) => { - console.log(Object.values(pdf)); // As pdf is an array-like object, you need to do a little convertion first. - worker.terminate(); - }); -``` +Please check **examples** folder for details. -### with preload language data +Browser: [download-pdf.html](../examples/browser/download-pdf.html) +Node: [download-pdf.js](../examples/node/download-pdf.js) -```javascript -const Tesseract = require('tesseract.js'); - -const { TesseractWorker, utils: { loadLang } } = Tesseract; -const worker = new TesseractWorker(); - -loadLang({ langs: 'eng', langPath: worker.options.langPath }) - .then(() => { - worker - .recognize('https://tesseract.projectnaptha.com/img/eng_bw.png') - .progress(p => console.log(p)) - .then(({ text }) => { - console.log(text); - worker.terminate(); - }); - }); +### with only part of the image (^2.0.0-beta.1) +```javascript +import { createWorker } from 'tesseract.js'; + +const worker = createWorker(); +const rectangles = [ + { left: 0, top: 0, width: 500, height: 250 }, +]; + +(async () => { + await worker.load(); + await worker.loadLanguage('eng'); + await worker.initialize('eng'); + const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png', 'eng', { rectangles }); + console.log(text); + await worker.terminate(); +})(); ``` -### with only part of the image (^2.0.0-alpha.12) +### with multiple workers to speed up (^2.0.0-beta.1) ```javascript -import Tesseract from 'tesseract.js'; - -const { TesseractWorker } = Tesseract; -const worker = new TesseractWorker(); - -worker - .recognize( - 'https://tesseract.projectnaptha.com/img/eng_bw.png', - 'eng', - { - tessjs_image_rectangle_left: 0, - tessjs_image_rectangle_top: 0, - tessjs_image_rectangle_width: 500, - tessjs_image_rectangle_height: 250, - } - ) - .progress((p) => { - console.log('progress', p); - }) - .then(({ text }) => { - console.log(text); - worker.terminate(); - }); +import { createWorker, createScheduler } from 'tesseract.js'; + +const scheduler = createScheduler(); +const worker1 = createWorker(); +const worker2 = createWorker(); + +(async () => { + await worker1.load(); + await worker2.load(); + await worker1.loadLanguage('eng'); + await worker2.loadLanguage('eng'); + await worker1.initialize('eng'); + await worker2.initialize('eng'); + scheduler.addWorker(worker1); + scheduler.addWorker(worker2); + /** Add 10 recognition jobs */ + const results = await Promise.all(Array(10).fill(0).map(() => ( + await scheduler.addJob('recognize', 'https://tesseract.projectnaptha.com/img/eng_bw.png') + ))) + console.log(results); + await scheduler.terminate(); // It also terminates all workers. +})(); ``` diff --git a/docs/faq.md b/docs/faq.md index ca1ddd7..b8dd046 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -3,9 +3,9 @@ FAQ ## How does tesseract.js download and keep \*.traineddata? -When you execute recognize() function (ex: `recognize(image, 'eng')`), the language model to download is determined by the 2nd argument of recognize(). (`eng` in the example) +The language model is downloaded by `worker.loadLanguage()` and you need to pass the langs to `worker.initialize()`. -Tesseract.js will first check if \*.traineddata already exists. (browser: [IndexedDB](https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API), Node.js: fs, in the folder you execute the command) If the \*.traineddata doesn't exist, it will fetch \*.traineddata.gz from [tessdata](https://github.com/naptha/tessdata), ungzip and store in IndexedDB or fs, you can delete it manually and it will download again for you. +During the downloading of language model, Tesseract.js will first check if \*.traineddata already exists. (browser: [IndexedDB](https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API), Node.js: fs, in the folder you execute the command) If the \*.traineddata doesn't exist, it will fetch \*.traineddata.gz from [tessdata](https://github.com/naptha/tessdata), ungzip and store in IndexedDB or fs, you can delete it manually and it will download again for you. ## How can I train my own \*.traineddata? @@ -15,26 +15,28 @@ For tesseract.js v1, check [Training Tesseract 3.03–3.05](https://github.com/t ## How can I get HOCR, TSV, Box, UNLV, OSD? -Starting from 2.0.0-alpha.10, you can get all these information in the final result. +Starting from 2.0.0-beta.1, you can get all these information in the final result. ```javascript -import Tesseract from 'tesseract.js'; - -const { TesseractWorker } = Tesseract; -const worker = new TesseractWorker(); - -worker - .recognize('https://tesseract.projectnaptha.com/img/eng_bw.png', 'eng', { +import { createWorker } from 'tesseract.js'; +const worker = createWorker({ + logger: m => console.log(m) +}); + +(async () => { + await worker.load(); + await worker.loadLanguage('eng'); + await worker.initialize('eng'); + await worker.setParameters({ tessedit_create_box: '1', tessedit_create_unlv: '1', tessedit_create_osd: '1', - }) - .then((result) => { - console.log(result.text); - console.log(result.hocr); - console.log(result.tsv); - console.log(result.box); - console.log(result.unlv); - console.log(result.osd); }); + const { data: { text, hocr, tsv, box, unlv } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png'); + console.log(text); + console.log(hocr); + console.log(tsv); + console.log(box); + console.log(unlv); +})(); ``` diff --git a/docs/local-installation.md b/docs/local-installation.md index cc7c0f6..1f18fe9 100644 --- a/docs/local-installation.md +++ b/docs/local-installation.md @@ -9,10 +9,20 @@ Because of this we recommend loading `tesseract.js` from a CDN. But if you reall In Node.js environment, the only path you may want to customize is languages/langPath. ```javascript -const worker = Tesseract.TesseractWorker({ - workerPath: 'https://unpkg.com/tesseract.js@v2.0.0-alpha.13/dist/worker.min.js', +Tesseract.recognize(image, langs, { + workerPath: 'https://unpkg.com/tesseract.js@v2.0.0-beta.1/dist/worker.min.js', langPath: 'https://tessdata.projectnaptha.com/4.0.0', - corePath: 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.10/tesseract-core.wasm.js', + corePath: 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.13/tesseract-core.wasm.js', +}) +``` + +Or + +```javascript +const worker = createWorker({ + workerPath: 'https://unpkg.com/tesseract.js@v2.0.0-beta.1/dist/worker.min.js', + langPath: 'https://tessdata.projectnaptha.com/4.0.0', + corePath: 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.13/tesseract-core.wasm.js', }); ``` @@ -23,6 +33,6 @@ A string specifying the location of the [worker.js](./dist/worker.min.js) file. A string specifying the location of the tesseract language files, with default value 'https://tessdata.projectnaptha.com/4.0.0'. Language file URLs are calculated according to the formula `langPath + langCode + '.traineddata.gz'`. ### corePath -A string specifying the location of the [tesseract.js-core library](https://github.com/naptha/tesseract.js-core), with default value 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.10/tesseract-core.wasm.js' (fallback to tesseract-core.asm.js when WebAssembly is not available). +A string specifying the location of the [tesseract.js-core library](https://github.com/naptha/tesseract.js-core), with default value 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.13/tesseract-core.wasm.js' (fallback to tesseract-core.asm.js when WebAssembly is not available). -Another WASM option is 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.10/tesseract-core.js' which is a script that loads 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.10/tesseract-core.wasm'. But it fails to fetch at this moment. +Another WASM option is 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.13/tesseract-core.js' which is a script that loads 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.13/tesseract-core.wasm'. But it fails to fetch at this moment. diff --git a/docs/tesseract_parameters.md b/docs/tesseract_parameters.md index 6b6d598..3b2071b 100644 --- a/docs/tesseract_parameters.md +++ b/docs/tesseract_parameters.md @@ -1,12 +1,14 @@ Tesseract.js Parameters ======================= -In the 3rd argument of `TesseractWorker.recognize()`, you can pass a params object to customize the result of OCR, below are supported parameters in tesseract.js so far. +When initializing + +In the 3rd argument of `ecognize()`, you can pass a params object to customize the result of OCR, below are supported parameters in tesseract.js so far. Example: ```javascript -import Tesseract from 'tesseract.js'; +import { createWorker, OEM, PSM } from 'tesseract.js'; const { TesseractWorker, OEM, PSM } = Tesseract; const worker = new TesseractWorker(); @@ -24,17 +26,8 @@ worker | tessedit\_ocr\_engine\_mode | enum | OEM.LSTM\_ONLY | Check [HERE](https://github.com/tesseract-ocr/tesseract/blob/4.0.0/src/ccstruct/publictypes.h#L268) for definition of each mode | | tessedit\_pageseg\_mode | enum | PSM.SINGLE\_BLOCK | Check [HERE](https://github.com/tesseract-ocr/tesseract/blob/4.0.0/src/ccstruct/publictypes.h#L163) for definition of each mode | | tessedit\_char\_whitelist | string | '' | setting white list characters makes the result only contains these characters, useful the content in image is limited | -| tessjs\_create\_pdf | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js generates a pdf output | | tessjs\_create\_hocr | string | '1' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes hocr in the result | | tessjs\_create\_tsv | string | '1' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes tsv in the result | | tessjs\_create\_box | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes box in the result | | tessjs\_create\_unlv | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes unlv in the result | | tessjs\_create\_osd | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes osd in the result | -| tessjs\_pdf\_name | string | 'tesseract.js-ocr-result' | the name of the generated pdf file | -| tessjs\_pdf\_title | string | 'Tesseract.js OCR Result' | the title of the generated pdf file | -| tessjs\_pdf\_auto\_download | boolean | true | If the value is true, tesseract.js will automatic download/writeFile pdf file | -| tessjs\_pdf\_bin | boolean | false | whether to include pdf binary array in the result object (result.files.pdf) | -| tessjs\_image\_rectangle\_left | number | 0 | The left of the sub-rectangle of the image. | -| tessjs\_image\_rectangle\_top | number | 0 | The top of the sub-rectangle of the image. | -| tessjs\_image\_rectangle\_width | number | -1 | The width of the sub-rectangle of the image, -1 means auto width detection | -| tessjs\_image\_rectangle\_height | number | -1 | The height of the sub-rectangle of the image, -1 means auto height detection |