diff --git a/README.md b/README.md
index 85dca15..c4c27c8 100644
--- a/README.md
+++ b/README.md
@@ -11,7 +11,7 @@
[![Downloads Month](https://img.shields.io/npm/dm/tesseract.js.svg)](https://www.npmjs.com/package/tesseract.js)
- Version 2 is now available and under development in the master branch
+ Version 2 beta is now available and under development in the master branch
Check the support/1.x branch for version 1
@@ -26,25 +26,45 @@ It works in the browser using [webpack](https://webpack.js.org/) or plain script
After you [install it](#installation), using it is as simple as:
```javascript
-import { TesseractWorker } from 'tesseract.js';
-const worker = new TesseractWorker();
-
-worker.recognize(myImage)
- .progress(progress => {
- console.log('progress', progress);
- }).then(result => {
- console.log('result', result);
- });
+import Tesseract from 'tesseract.js';
+
+Tesseract.recognize(
+ 'https://tesseract.projectnaptha.com/img/eng_bw.png',
+ 'eng',
+ { logger: m => console.log(m) }
+).then(({ data: { text } }) => {
+ console.log(text);
+})
+```
+
+Or more imperative
+
+```javascript
+import { createWorker } from 'tesseract.js';
+
+const worker = createWorker({
+ logger: m => console.log(m)
+});
+
+(async () => {
+ await worker.load();
+ await worker.loadLanguage('eng');
+ await worker.initialize('eng');
+ const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
+ console.log(text);
+ await woker.terminate();
+})();
```
[Check out the docs](#docs) for a full explanation of the API.
-## Major changes in v2
-- Upgrade to tesseract v4
+## Major changes in v2 beta
+- Upgrade to tesseract v4.1 (using emscripten 1.38.45)
- Support multiple languages at the same time, eg: eng+chi_tra for English and Traditional Chinese
- Supported image formats: png, jpg, bmp, pbm
- Support WebAssembly (fallback to ASM.js when browser doesn't support)
+- Support Typescript
## Installation
@@ -54,7 +74,7 @@ Tesseract.js works with a `
+
@@ -103,7 +123,7 @@ npm start
```
The development server will be available at http://localhost:3000/examples/browser/demo.html in your favorite browser.
-It will automatically rebuild `tesseract.dev.js` and `worker.min.js` when you change files in the src folder.
+It will automatically rebuild `tesseract.dev.js` and `worker.dev.js` when you change files in the **src** folder.
You can also run the development server in Gitpod ( a free online IDE and dev environment for GitHub that will automate your dev setup ) with a single click.
diff --git a/docs/api.md b/docs/api.md
index b95d6c9..2f1bedb 100644
--- a/docs/api.md
+++ b/docs/api.md
@@ -1,5 +1,249 @@
# API
+- [createWorker()](#create-worker)
+ - [Worker.load](#worker-load)
+ - [Worker.loadLanguage](#worker-load-language)
+ - [Worker.initialize](#worker-initialize)
+ - [Worker.setParameters](#worker-set-parameters)
+ - [Worker.recognize](#worker-recognize)
+ - [Worker.detect](#worker-detect)
+ - [Worker.terminate](#worker-terminate)
+- [createScheduler()](#create-scheduler)
+ - [Scheduler.addWorker](#scheduler-add-worker)
+ - [Scheduler.addJob](#scheduler-add-job)
+ - [Scheduler.getQueueLen](#scheduler-get-queue-len)
+ - [Scheduler.getNumWorkers](#scheduler-get-num-workers)
+- [setLogging()](#set-logging)
+- [recognize()](#recognize)
+- [detect()](#detect)
+- [PSM](#psm)
+- [OEM](#oem)
+
+---
+
+
+## createWorker(options): Worker
+
+createWorker is a factory function that creates a tesseract worker, a worker is basically a Web Worker in browser and Child Process in Node.
+
+**Arguments:**
+
+- `options` an object of customized options
+ - `corePath` path for tesseract-core.js script
+ - `langPath` path for downloading traineddata, do not include `/` at the end of the path
+ - `workerPath` path for downloading worker script
+ - `dataPath` path for saving traineddata in WebAssembly file system, not common to modify
+ - `cachePath` path for the cached traineddata, more useful for Node, for browser it only changes the key in IndexDB
+ - `cacheMethod` a string to indicate the method of cache management, should be one of the following options
+ - write: read cache and write back (default method)
+ - readOnly: read cache and not to write back
+ - refresh: not to read cache and write back
+ - none: not to read cache and not to write back
+ - `workerBlobURL` a boolean to define whether to use Blob URL for worker script, default: true
+ - `gzip` a boolean to define whether the traineddata from the remote is gzipped, default: true
+ - `logger` a function to log the progress, a quick example is `m => console.log(m)`
+
+
+**Examples:**
+
+```javascript
+const { createWorker } = Tesseract;
+const worker = createWorker({
+ langPath: '...',
+ logger: m => console.log(m),
+});
+```
+
+## Worker
+
+A Worker helps you to do the OCR related tasks, it takes few steps to setup Worker before it is fully functional. The full flow is:
+
+- load
+- loadLanguauge
+- initialize
+- setParameters // optional
+- recognize or detect
+- terminate
+
+Each function is async, so using async/await or Promise is required. When it is resolved, you get an object:
+
+```json
+{
+ "jobId": "Job-1-123",
+ "data": { ... }
+}
+```
+
+jobId is generated by Tesseract.js, but you can put your own when calling any of the function above.
+
+
+### Worker.load(jobId): Promise
+
+Worker.load() loads tesseract.js-core scripts (download from remote if not presented), it makes Web Worker/Child Process ready for next action.
+
+**Arguments:**
+
+- `jobId` Please see details above
+
+**Examples:**
+
+```javascript
+(async () => {
+ await worker.load();
+})();
+```
+
+
+### Worker.loadLanguage(langs, jobId): Promise
+
+Worker.loadLanguage() loads traineddata from cache or download traineddata from remote, and put traineddata into the WebAssembly file system.
+
+**Arguments:**
+
+- `langs` a string to indicate the languages traineddata to download, multiple languages are concated with **+**, ex: **eng+chi\_tra**
+- `jobId` Please see details above
+
+**Examples:**
+
+```javascript
+(async () => {
+ await worker.loadLanguage('eng+chi_tra');
+})();
+```
+
+
+### Worker.initialize(langs, oem, jobId): Promise
+
+Worker.initialize() initializes the Tesseract API, make sure it is ready for doing OCR tasks.
+
+**Arguments:**
+
+- `langs` a string to indicate the languages loaded by Tesseract API, it can be the subset of the languauge traineddata you loaded from Worker.loadLanguage.
+- `oem` a enum to indicate the OCR Engine Mode you use
+- `jobId` Please see details above
+
+**Examples:**
+
+```javascript
+(async () => {
+ /** You can load more languages in advance, but use only part of them in Worker.initialize() */
+ await worker.loadLanguage('eng+chi_tra');
+ await worker.initialize('eng');
+})();
+```
+
+### Worker.setParameters(params, jobId): Promise
+
+Worker.setParameters() set parameters for Tesseract API (using SetVariable()), it changes the behavior of Tesseract and some parameters like tessedit\_char\_whitelist is very useful.
+
+**Arguments:**
+
+- `params` an object with key and value of the parameters
+- `jobId` Please see details above
+
+**Supported Paramters:**
+
+| name | type | default value | description |
+| ---- | ---- | ------------- | ----------- |
+| tessedit\_ocr\_engine\_mode | enum | OEM.LSTM\_ONLY | Check [HERE](https://github.com/tesseract-ocr/tesseract/blob/4.0.0/src/ccstruct/publictypes.h#L268) for definition of each mode |
+| tessedit\_pageseg\_mode | enum | PSM.SINGLE\_BLOCK | Check [HERE](https://github.com/tesseract-ocr/tesseract/blob/4.0.0/src/ccstruct/publictypes.h#L163) for definition of each mode |
+| tessedit\_char\_whitelist | string | '' | setting white list characters makes the result only contains these characters, useful the content in image is limited |
+| tessjs\_create\_hocr | string | '1' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes hocr in the result |
+| tessjs\_create\_tsv | string | '1' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes tsv in the result |
+| tessjs\_create\_box | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes box in the result |
+| tessjs\_create\_unlv | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes unlv in the result |
+| tessjs\_create\_osd | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes osd in the result |
+
+**Examples:**
+
+```javascript
+(async () => {
+ await worker.setParameters({
+ tessedit_char_whitelist: '0123456789',
+ });
+})
+```
+
+
+
+### Worker.recognize(image, options, jobId): Promise
+
+### Worker.detect(image, jobId): Promise
+
+### Worker.terminate(jobId): Promise
+
+
+## createScheduler(): Scheduler
+
+
+### Scheduler.addWorker(worker): string
+
+
+### Scheduler.addJob(worker): Promise
+
+
+### Scheduler.getQueueLen(): number
+
+Scheduler.getNumWorkers() returns the length of job queue.
+
+
+### Scheduler.getNumWorkers(): number
+
+Scheduler.getNumWorkers() returns number of workers added into the scheduler
+
+
+### Scheduler.terminate(): Promise
+
+Scheduler.terminate() terminates all workers added, useful to do quick clean up.
+
+**Examples:**
+
+```javascript
+(async () => {
+ await scheduler.terminate();
+})();
+```
+
+
+## setLogging(logging: boolean)
+
+setLogging() sets the logging flag, you can `setLogging(true)` to see detailed information, useful for debugging.
+
+**Arguments:**
+
+- `logging` boolean to define whether to see detailed logs, default: false
+
+**Examples:**
+
+```javascript
+const { setLogging } = Tesseract;
+setLogging(true);
+```
+
+
+## recognize(image, langs, options): Promise
+
+recognize() is a function to quickly achieve recognize() task, it is not recommended to use in real application, but useful when you want to save some time.
+
+See [Tesseract.js](../src/Tesseract.js)
+
+
+## detect(image, options): Promise
+
+Same background as recongize(), but it does detect instead.
+
+See [Tesseract.js](../src/Tesseract.js)
+
+
+## PSM
+
+See [PSM.js](../src/constatns/PSM.js)
+
+
+## OEM
+
+See [OEM.js](../src/constatns/OEM.js)
+
## TesseractWorker.recognize(image, lang, [, options]) -> [TesseractJob](#tesseractjob)
Figures out what words are in `image`, where the words are in `image`, etc.
> Note: `image` should be sufficiently high resolution.
diff --git a/docs/examples.md b/docs/examples.md
index cc08942..eccd752 100644
--- a/docs/examples.md
+++ b/docs/examples.md
@@ -12,217 +12,147 @@ Example repositories:
### basic
```javascript
-import Tesseract from 'tesseract.js';
-
-const { TesseractWorker } = Tesseract;
-const worker = new TesseractWorker();
-
-worker
- .recognize('https://tesseract.projectnaptha.com/img/eng_bw.png')
- .progress((p) => {
- console.log('progress', p);
- })
- .then(({ text }) => {
- console.log(text);
- worker.terminate();
- });
+import { createWorker } from 'tesseract.js';
+
+const worker = createWorker();
+
+(async () => {
+ await worker.load();
+ await worker.loadLanguage('eng');
+ await worker.initialize('eng');
+ const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
+ console.log(text);
+ await worker.terminate();
+})();
```
### with detailed progress
```javascript
-import Tesseract from 'tesseract.js';
-
-const { TesseractWorker } = Tesseract;
-const worker = new TesseractWorker();
-
-worker
- .recognize('https://tesseract.projectnaptha.com/img/eng_bw.png')
- .progress((p) => {
- console.log('progress', p);
- })
- .then(({ text }) => {
- console.log(text);
- worker.terminate();
- });
+import { createWorker } from 'tesseract.js';
+
+const worker = createWorker({
+ logger: m => console.log(m), // Add logger here
+});
+
+(async () => {
+ await worker.load();
+ await worker.loadLanguage('eng');
+ await worker.initialize('eng');
+ const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
+ console.log(text);
+ await worker.terminate();
+})();
```
### with multiple languages, separate by '+'
```javascript
-import Tesseract from 'tesseract.js';
-
-const { TesseractWorker } = Tesseract;
-const worker = new TesseractWorker();
-
-worker
- .recognize(
- 'https://tesseract.projectnaptha.com/img/eng_bw.png',
- 'eng+chi_tra'
- )
- .progress((p) => {
- console.log('progress', p);
- })
- .then(({ text }) => {
- console.log(text);
- worker.terminate();
- });
+import { createWorker } from 'tesseract.js';
+
+const worker = createWorker();
+
+(async () => {
+ await worker.load();
+ await worker.loadLanguage('eng+chi_tra');
+ await worker.initialize('eng+chi_tra');
+ const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
+ console.log(text);
+ await worker.terminate();
+})();
```
+### with whitelist char (^2.0.0-beta.1)
-### with whitelist char (^2.0.0-alpha.5)
+```javascript
+import { createWorker } from 'tesseract.js';
-Sadly, whitelist chars is not supported in tesseract.js v4, so in tesseract.js we need to switch to tesseract v3 mode to make it work.
+const worker = createWorker();
-```javascript
-import Tesseract from 'tesseract.js';
-
-const { TesseractWorker, OEM } = Tesseract;
-const worker = new TesseractWorker();
-
-worker
- .recognize(
- 'https://tesseract.projectnaptha.com/img/eng_bw.png',
- 'eng',
- {
- 'tessedit_ocr_engine_mode': OEM.TESSERACT_ONLY,
- 'tessedit_char_whitelist': '0123456789-.',
- }
- )
- .progress((p) => {
- console.log('progress', p);
- })
- .then(({ text }) => {
- console.log(text);
- worker.terminate();
+(async () => {
+ await worker.load();
+ await worker.loadLanguage('eng');
+ await worker.initialize('eng');
+ await worker.setParameters({
+ tessedit_char_whitelist: '0123456789',
});
+ const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
+ console.log(text);
+ await worker.terminate();
+})();
```
-### with different pageseg mode (^2.0.0-alpha.5)
+### with different pageseg mode (^2.0.0-beta.1)
Check here for more details of pageseg mode: https://github.com/tesseract-ocr/tesseract/blob/4.0.0/src/ccstruct/publictypes.h#L163
```javascript
-import Tesseract from 'tesseract.js';
-
-const { TesseractWorker, PSM } = Tesseract;
-const worker = new TesseractWorker();
-
-worker
- .recognize(
- 'https://tesseract.projectnaptha.com/img/eng_bw.png',
- 'eng',
- {
- 'tessedit_pageseg_mode': PSM.SINGLE_BLOCK,
- }
- )
- .progress((p) => {
- console.log('progress', p);
- })
- .then(({ text }) => {
- console.log(text);
- worker.terminate();
- });
-```
-
-### with pdf output (^2.0.0-alpha.12)
+import { createWorker, PSM } from 'tesseract.js';
-In this example, pdf file will be downloaded in browser and write to file system in Node.js
+const worker = createWorker();
-```javascript
-import Tesseract from 'tesseract.js';
-
-const { TesseractWorker } = Tesseract;
-const worker = new TesseractWorker();
-
-worker
- .recognize(
- 'https://tesseract.projectnaptha.com/img/eng_bw.png',
- 'eng',
- {
- 'tessjs_create_pdf': '1',
- }
- )
- .progress((p) => {
- console.log('progress', p);
- })
- .then(({ text }) => {
- console.log(text);
- worker.terminate();
+(async () => {
+ await worker.load();
+ await worker.loadLanguage('eng');
+ await worker.initialize('eng');
+ await worker.setParameters({
+ tessedit_pageseg_mode: PSM.SINGLE_BLOCK,
});
+ const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
+ console.log(text);
+ await worker.terminate();
+})();
```
-If you want to handle pdf file by yourself
+### with pdf output (^2.0.0-beta.1)
-```javascript
-import Tesseract from 'tesseract.js';
-
-const { TesseractWorker } = Tesseract;
-const worker = new TesseractWorker();
-
-worker
- .recognize(
- 'https://tesseract.projectnaptha.com/img/eng_bw.png',
- 'eng',
- {
- 'tessjs_create_pdf': '1',
- 'tessjs_pdf_auto_download': false, // disable auto download
- 'tessjs_pdf_bin': true, // add pdf file bin array in result
- }
- )
- .progress((p) => {
- console.log('progress', p);
- })
- .then(({ files: { pdf } }) => {
- console.log(Object.values(pdf)); // As pdf is an array-like object, you need to do a little convertion first.
- worker.terminate();
- });
-```
+Please check **examples** folder for details.
-### with preload language data
+Browser: [download-pdf.html](../examples/browser/download-pdf.html)
+Node: [download-pdf.js](../examples/node/download-pdf.js)
-```javascript
-const Tesseract = require('tesseract.js');
-
-const { TesseractWorker, utils: { loadLang } } = Tesseract;
-const worker = new TesseractWorker();
-
-loadLang({ langs: 'eng', langPath: worker.options.langPath })
- .then(() => {
- worker
- .recognize('https://tesseract.projectnaptha.com/img/eng_bw.png')
- .progress(p => console.log(p))
- .then(({ text }) => {
- console.log(text);
- worker.terminate();
- });
- });
+### with only part of the image (^2.0.0-beta.1)
+```javascript
+import { createWorker } from 'tesseract.js';
+
+const worker = createWorker();
+const rectangles = [
+ { left: 0, top: 0, width: 500, height: 250 },
+];
+
+(async () => {
+ await worker.load();
+ await worker.loadLanguage('eng');
+ await worker.initialize('eng');
+ const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png', 'eng', { rectangles });
+ console.log(text);
+ await worker.terminate();
+})();
```
-### with only part of the image (^2.0.0-alpha.12)
+### with multiple workers to speed up (^2.0.0-beta.1)
```javascript
-import Tesseract from 'tesseract.js';
-
-const { TesseractWorker } = Tesseract;
-const worker = new TesseractWorker();
-
-worker
- .recognize(
- 'https://tesseract.projectnaptha.com/img/eng_bw.png',
- 'eng',
- {
- tessjs_image_rectangle_left: 0,
- tessjs_image_rectangle_top: 0,
- tessjs_image_rectangle_width: 500,
- tessjs_image_rectangle_height: 250,
- }
- )
- .progress((p) => {
- console.log('progress', p);
- })
- .then(({ text }) => {
- console.log(text);
- worker.terminate();
- });
+import { createWorker, createScheduler } from 'tesseract.js';
+
+const scheduler = createScheduler();
+const worker1 = createWorker();
+const worker2 = createWorker();
+
+(async () => {
+ await worker1.load();
+ await worker2.load();
+ await worker1.loadLanguage('eng');
+ await worker2.loadLanguage('eng');
+ await worker1.initialize('eng');
+ await worker2.initialize('eng');
+ scheduler.addWorker(worker1);
+ scheduler.addWorker(worker2);
+ /** Add 10 recognition jobs */
+ const results = await Promise.all(Array(10).fill(0).map(() => (
+ await scheduler.addJob('recognize', 'https://tesseract.projectnaptha.com/img/eng_bw.png')
+ )))
+ console.log(results);
+ await scheduler.terminate(); // It also terminates all workers.
+})();
```
diff --git a/docs/faq.md b/docs/faq.md
index ca1ddd7..b8dd046 100644
--- a/docs/faq.md
+++ b/docs/faq.md
@@ -3,9 +3,9 @@ FAQ
## How does tesseract.js download and keep \*.traineddata?
-When you execute recognize() function (ex: `recognize(image, 'eng')`), the language model to download is determined by the 2nd argument of recognize(). (`eng` in the example)
+The language model is downloaded by `worker.loadLanguage()` and you need to pass the langs to `worker.initialize()`.
-Tesseract.js will first check if \*.traineddata already exists. (browser: [IndexedDB](https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API), Node.js: fs, in the folder you execute the command) If the \*.traineddata doesn't exist, it will fetch \*.traineddata.gz from [tessdata](https://github.com/naptha/tessdata), ungzip and store in IndexedDB or fs, you can delete it manually and it will download again for you.
+During the downloading of language model, Tesseract.js will first check if \*.traineddata already exists. (browser: [IndexedDB](https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API), Node.js: fs, in the folder you execute the command) If the \*.traineddata doesn't exist, it will fetch \*.traineddata.gz from [tessdata](https://github.com/naptha/tessdata), ungzip and store in IndexedDB or fs, you can delete it manually and it will download again for you.
## How can I train my own \*.traineddata?
@@ -15,26 +15,28 @@ For tesseract.js v1, check [Training Tesseract 3.03–3.05](https://github.com/t
## How can I get HOCR, TSV, Box, UNLV, OSD?
-Starting from 2.0.0-alpha.10, you can get all these information in the final result.
+Starting from 2.0.0-beta.1, you can get all these information in the final result.
```javascript
-import Tesseract from 'tesseract.js';
-
-const { TesseractWorker } = Tesseract;
-const worker = new TesseractWorker();
-
-worker
- .recognize('https://tesseract.projectnaptha.com/img/eng_bw.png', 'eng', {
+import { createWorker } from 'tesseract.js';
+const worker = createWorker({
+ logger: m => console.log(m)
+});
+
+(async () => {
+ await worker.load();
+ await worker.loadLanguage('eng');
+ await worker.initialize('eng');
+ await worker.setParameters({
tessedit_create_box: '1',
tessedit_create_unlv: '1',
tessedit_create_osd: '1',
- })
- .then((result) => {
- console.log(result.text);
- console.log(result.hocr);
- console.log(result.tsv);
- console.log(result.box);
- console.log(result.unlv);
- console.log(result.osd);
});
+ const { data: { text, hocr, tsv, box, unlv } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
+ console.log(text);
+ console.log(hocr);
+ console.log(tsv);
+ console.log(box);
+ console.log(unlv);
+})();
```
diff --git a/docs/local-installation.md b/docs/local-installation.md
index cc7c0f6..1f18fe9 100644
--- a/docs/local-installation.md
+++ b/docs/local-installation.md
@@ -9,10 +9,20 @@ Because of this we recommend loading `tesseract.js` from a CDN. But if you reall
In Node.js environment, the only path you may want to customize is languages/langPath.
```javascript
-const worker = Tesseract.TesseractWorker({
- workerPath: 'https://unpkg.com/tesseract.js@v2.0.0-alpha.13/dist/worker.min.js',
+Tesseract.recognize(image, langs, {
+ workerPath: 'https://unpkg.com/tesseract.js@v2.0.0-beta.1/dist/worker.min.js',
langPath: 'https://tessdata.projectnaptha.com/4.0.0',
- corePath: 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.10/tesseract-core.wasm.js',
+ corePath: 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.13/tesseract-core.wasm.js',
+})
+```
+
+Or
+
+```javascript
+const worker = createWorker({
+ workerPath: 'https://unpkg.com/tesseract.js@v2.0.0-beta.1/dist/worker.min.js',
+ langPath: 'https://tessdata.projectnaptha.com/4.0.0',
+ corePath: 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.13/tesseract-core.wasm.js',
});
```
@@ -23,6 +33,6 @@ A string specifying the location of the [worker.js](./dist/worker.min.js) file.
A string specifying the location of the tesseract language files, with default value 'https://tessdata.projectnaptha.com/4.0.0'. Language file URLs are calculated according to the formula `langPath + langCode + '.traineddata.gz'`.
### corePath
-A string specifying the location of the [tesseract.js-core library](https://github.com/naptha/tesseract.js-core), with default value 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.10/tesseract-core.wasm.js' (fallback to tesseract-core.asm.js when WebAssembly is not available).
+A string specifying the location of the [tesseract.js-core library](https://github.com/naptha/tesseract.js-core), with default value 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.13/tesseract-core.wasm.js' (fallback to tesseract-core.asm.js when WebAssembly is not available).
-Another WASM option is 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.10/tesseract-core.js' which is a script that loads 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.10/tesseract-core.wasm'. But it fails to fetch at this moment.
+Another WASM option is 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.13/tesseract-core.js' which is a script that loads 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.13/tesseract-core.wasm'. But it fails to fetch at this moment.
diff --git a/docs/tesseract_parameters.md b/docs/tesseract_parameters.md
index 6b6d598..3b2071b 100644
--- a/docs/tesseract_parameters.md
+++ b/docs/tesseract_parameters.md
@@ -1,12 +1,14 @@
Tesseract.js Parameters
=======================
-In the 3rd argument of `TesseractWorker.recognize()`, you can pass a params object to customize the result of OCR, below are supported parameters in tesseract.js so far.
+When initializing
+
+In the 3rd argument of `ecognize()`, you can pass a params object to customize the result of OCR, below are supported parameters in tesseract.js so far.
Example:
```javascript
-import Tesseract from 'tesseract.js';
+import { createWorker, OEM, PSM } from 'tesseract.js';
const { TesseractWorker, OEM, PSM } = Tesseract;
const worker = new TesseractWorker();
@@ -24,17 +26,8 @@ worker
| tessedit\_ocr\_engine\_mode | enum | OEM.LSTM\_ONLY | Check [HERE](https://github.com/tesseract-ocr/tesseract/blob/4.0.0/src/ccstruct/publictypes.h#L268) for definition of each mode |
| tessedit\_pageseg\_mode | enum | PSM.SINGLE\_BLOCK | Check [HERE](https://github.com/tesseract-ocr/tesseract/blob/4.0.0/src/ccstruct/publictypes.h#L163) for definition of each mode |
| tessedit\_char\_whitelist | string | '' | setting white list characters makes the result only contains these characters, useful the content in image is limited |
-| tessjs\_create\_pdf | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js generates a pdf output |
| tessjs\_create\_hocr | string | '1' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes hocr in the result |
| tessjs\_create\_tsv | string | '1' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes tsv in the result |
| tessjs\_create\_box | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes box in the result |
| tessjs\_create\_unlv | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes unlv in the result |
| tessjs\_create\_osd | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes osd in the result |
-| tessjs\_pdf\_name | string | 'tesseract.js-ocr-result' | the name of the generated pdf file |
-| tessjs\_pdf\_title | string | 'Tesseract.js OCR Result' | the title of the generated pdf file |
-| tessjs\_pdf\_auto\_download | boolean | true | If the value is true, tesseract.js will automatic download/writeFile pdf file |
-| tessjs\_pdf\_bin | boolean | false | whether to include pdf binary array in the result object (result.files.pdf) |
-| tessjs\_image\_rectangle\_left | number | 0 | The left of the sub-rectangle of the image. |
-| tessjs\_image\_rectangle\_top | number | 0 | The top of the sub-rectangle of the image. |
-| tessjs\_image\_rectangle\_width | number | -1 | The width of the sub-rectangle of the image, -1 means auto width detection |
-| tessjs\_image\_rectangle\_height | number | -1 | The height of the sub-rectangle of the image, -1 means auto height detection |