# API

- [createWorker()](#create-worker)
  - [Worker.load](#worker-load)
  - [Worker.loadLanguage](#worker-load-language)
  - [Worker.initialize](#worker-initialize)
  - [Worker.setParameters](#worker-set-parameters)
  - [Worker.recognize](#worker-recognize)
  - [Worker.detect](#worker-detect)
  - [Worker.terminate](#worker-terminate)
- [createScheduler()](#create-scheduler)
  - [Scheduler.addWorker](#scheduler-add-worker)
  - [Scheduler.addJob](#scheduler-add-job)
  - [Scheduler.getQueueLen](#scheduler-get-queue-len)
  - [Scheduler.getNumWorkers](#scheduler-get-num-workers)
- [setLogging()](#set-logging)
- [recognize()](#recognize)
- [detect()](#detect)
- [PSM](#psm)
- [OEM](#oem)

---

<a name="create-worker"></a>
## createWorker(options): Worker

createWorker is a factory function that creates a tesseract worker, a worker is basically a Web Worker in browser and Child Process in Node.

**Arguments:**

- `options` an object of customized options
  - `corePath` path for tesseract-core.js script
  - `langPath` path for downloading traineddata, do not include `/` at the end of the path
  - `workerPath` path for downloading worker script
  - `dataPath` path for saving traineddata in WebAssembly file system, not common to modify
  - `cachePath` path for the cached traineddata, more useful for Node, for browser it only changes the key in IndexDB
  - `cacheMethod` a string to indicate the method of cache management, should be one of the following options
    - write: read cache and write back (default method)
    - readOnly: read cache and not to write back
    - refresh: not to read cache and write back
    - none: not to read cache and not to write back
  - `workerBlobURL` a boolean to define whether to use Blob URL for worker script, default: true
  - `gzip` a boolean to define whether the traineddata from the remote is gzipped, default: true
  - `logger` a function to log the progress, a quick example is `m => console.log(m)`
  - `errorHandler` a function to handle worker errors, a quick example is `err => console.error(err)`


**Examples:**

```javascript
const { createWorker } = Tesseract;
const worker = createWorker({
  langPath: '...',
  logger: m => console.log(m),
});
```

## Worker

A Worker helps you to do the OCR related tasks, it takes few steps to setup Worker before it is fully functional. The full flow is:

- load
- loadLanguauge
- initialize
- setParameters // optional
- recognize or detect
- terminate

Each function is async, so using async/await or Promise is required. When it is resolved, you get an object:

```json
{
  "jobId": "Job-1-123",
  "data": { ... }
}
```

jobId is generated by Tesseract.js, but you can put your own when calling any of the function above.

<a name="worker-load"></a>
### Worker.load(jobId): Promise

Worker.load() loads tesseract.js-core scripts (download from remote if not presented), it makes Web Worker/Child Process ready for next action.

**Arguments:**

- `jobId` Please see details above

**Examples:**

```javascript
(async () => {
  await worker.load();
})();
```

<a name="worker-load-language"></a>
### Worker.loadLanguage(langs, jobId): Promise

Worker.loadLanguage() loads traineddata from cache or download traineddata from remote, and put traineddata into the WebAssembly file system.

**Arguments:**

- `langs` a string to indicate the languages traineddata to download, multiple languages are concated with **+**, ex: **eng+chi\_tra**
- `jobId` Please see details above

**Examples:**

```javascript
(async () => {
  await worker.loadLanguage('eng+chi_tra');
})();
```

<a name="worker-initialize"></a>
### Worker.initialize(langs, oem, jobId): Promise

Worker.initialize() initializes the Tesseract API, make sure it is ready for doing OCR tasks.

**Arguments:**

- `langs` a string to indicate the languages loaded by Tesseract API, it can be the subset of the languauge traineddata you loaded from Worker.loadLanguage.
- `oem` a enum to indicate the OCR Engine Mode you use
- `jobId` Please see details above

**Examples:**

```javascript
(async () => {
  /** You can load more languages in advance, but use only part of them in Worker.initialize() */
  await worker.loadLanguage('eng+chi_tra');
  await worker.initialize('eng');
})();
```
<a name="worker-set-parameters"></a>
### Worker.setParameters(params, jobId): Promise

Worker.setParameters() set parameters for Tesseract API (using SetVariable()), it changes the behavior of Tesseract and some parameters like tessedit\_char\_whitelist is very useful.

**Arguments:**

- `params` an object with key and value of the parameters
- `jobId` Please see details above

**Supported Paramters:**

| name                        | type   | default value     | description                                                                                                                     |
| --------------------------- | ------ | ----------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| tessedit\_ocr\_engine\_mode | enum   | OEM.DEFAULT       | Check [HERE](https://github.com/tesseract-ocr/tesseract/blob/4.0.0/src/ccstruct/publictypes.h#L268) for definition of each mode |
| tessedit\_pageseg\_mode     | enum   | PSM.SINGLE\_BLOCK | Check [HERE](https://github.com/tesseract-ocr/tesseract/blob/4.0.0/src/ccstruct/publictypes.h#L163) for definition of each mode |
| tessedit\_char\_whitelist   | string | ''                | setting white list characters makes the result only contains these characters, useful the content in image is limited           |
| preserve\_interword\_spaces | string | '0'               | '0' or '1', keeps the space between words                                                                                       |
| tessjs\_create\_hocr        | string | '1'               | only 2 values, '0' or '1', when the value is '1', tesseract.js includes hocr in the result                                      |
| tessjs\_create\_tsv         | string | '1'               | only 2 values, '0' or '1', when the value is '1', tesseract.js includes tsv in the result                                       |
| tessjs\_create\_box         | string | '0'               | only 2 values, '0' or '1', when the value is '1', tesseract.js includes box in the result                                       |
| tessjs\_create\_unlv        | string | '0'               | only 2 values, '0' or '1', when the value is '1', tesseract.js includes unlv in the result                                      |
| tessjs\_create\_osd         | string | '0'               | only 2 values, '0' or '1', when the value is '1', tesseract.js includes osd in the result                                       |

**Examples:**

```javascript
(async () => {
  await worker.setParameters({
    tessedit_char_whitelist: '0123456789',
  });
})
```

<a name="worker-recognize"></a>
### Worker.recognize(image, options, jobId): Promise

Worker.recognize() provides core function of Tesseract.js as it executes OCR

Figures out what words are in `image`, where the words are in `image`, etc.
> Note: `image` should be sufficiently high resolution.
> Often, the same image will get much better results if you upscale it before calling `recognize`.

**Arguments:**

- `image` see [Image Format](./image-format.md) for more details.
- `options` a object of customized optons
  - `rectangle` an object to specify the region you want to recognized in the image, the object should contain top, left, width and height, see example below.
- `jobId` Please see details above

**Output:**

**Examples:**

```javascript
const { createWorker } = Tesseract;
(async () => {
  const worker = createWorker();
  await worker.load();
  await worker.loadLanguage('eng');
  await worker.initialize('eng');
  const { data: { text } } = await worker.recognize(image);
  console.log(text);
})();
```

With rectangle

```javascript
const { createWorker } = Tesseract;
(async () => {
  const worker = createWorker();
  await worker.load();
  await worker.loadLanguage('eng');
  await worker.initialize('eng');
  const { data: { text } } = await worker.recognize(image, {
    rectangle: { top: 0, left: 0, width: 100, height: 100 },
  });
  console.log(text);
})();
```

<a name="worker-detect"></a>
### Worker.detect(image, jobId): Promise

Worker.detect() does OSD (Orientation and Script Detection) to the image instead of OCR.

**Arguments:**

- `image` see [Image Format](./image-format.md) for more details.
- `jobId` Please see details above

**Examples:**

```javascript
const { createWorker } = Tesseract;
(async () => {
  const worker = createWorker();
  await worker.load();
  await worker.loadLanguage('eng');
  await worker.initialize('eng');
  const { data } = await worker.detect(image);
  console.log(data);
})();
```

<a name="worker-terminate"></a>
### Worker.terminate(jobId): Promise

Worker.terminate() terminates the worker and clean up

**Arguments:**

- `jobId` Please see details above

```javascript
(async () => {
  await worker.terminate();
})();
```

<a name="create-scheduler"></a>
## createScheduler(): Scheduler

createScheduler() is a factory function to create a scheduler, a scheduler manage a job queue and workers to enable multiple workers to work together, it is useful when you want to speed up your performance.

**Examples:**

```javascript
const { createScheduler } = Tesseract;
const scheduler = createScheduler();
```

### Scheduler

<a name="scheduler-add-worker"></a>
### Scheduler.addWorker(worker): string

Scheduler.addWorker() adds a worker into the worker pool inside scheduler, it is suggested to add one worker to only one sheduler.

**Arguments:**

- `worker` see Worker above

**Examples:**

```javascript
const { createWorker, createScheduler } = Tesseract;
const scheduler = createScheduler();
const worker = createWorker();
scheduler.addWorker(worker);
```

<a name="scheduler-add-job"></a>
### Scheduler.addJob(action, ...payload): Promise

Scheduler.addJob() adds a job to the job queue and scheduler waits and finds an idle worker to take the job.

**Arguments:**

- `action` a string to indicate the action you want to do, right now only **recognize** and **detect** are supported
- `payload` a arbitrary number of args depending on the action you called.

**Examples:**

```javascript
(async () => {
 const { data: { text } } = await scheduler.addJob('recognize', image, options);
 const { data } = await scheduler.addJob('detect', image);
})();
```

<a name="scheduler-get-queue-len"></a>
### Scheduler.getQueueLen(): number

Scheduler.getNumWorkers() returns the length of job queue.

<a name="scheduler-get-num-workers"></a>
### Scheduler.getNumWorkers(): number

Scheduler.getNumWorkers() returns number of workers added into the scheduler

<a name="scheduler-terminate"></a>
### Scheduler.terminate(): Promise

Scheduler.terminate() terminates all workers added, useful to do quick clean up.

**Examples:**

```javascript
(async () => {
  await scheduler.terminate();
})();
```

<a name="set-logging"></a>
## setLogging(logging: boolean)

setLogging() sets the logging flag, you can `setLogging(true)` to see detailed information, useful for debugging.

**Arguments:**

- `logging` boolean to define whether to see detailed logs, default: false

**Examples:**

```javascript
const { setLogging } = Tesseract;
setLogging(true);
```

<a name="recognize"></a>
## recognize(image, langs, options): Promise

recognize() is a function to quickly do recognize() task, it is not recommended to use in real application, but useful when you want to save some time.

See [Tesseract.js](../src/Tesseract.js)

<a name="detect"></a>
## detect(image, options): Promise

Same background as recognize(), but it does detect instead.

See [Tesseract.js](../src/Tesseract.js)

<a name="psm"></a>
## PSM

See [PSM.js](../src/constants/PSM.js)

<a name="oem"></a>
## OEM

See [OEM.js](../src/constants/OEM.js)