Jerome Wu
6 years ago
5 changed files with 195 additions and 204 deletions
@ -0,0 +1,146 @@ |
|||||||
|
# API |
||||||
|
|
||||||
|
## Tesseract.recognize(image [, options]) -> [TesseractJob](#tesseractjob) |
||||||
|
Figures out what words are in `image`, where the words are in `image`, etc. |
||||||
|
> Note: `image` should be sufficiently high resolution. |
||||||
|
> Often, the same image will get much better results if you upscale it before calling `recognize`. |
||||||
|
|
||||||
|
- `image` see [Image Format](./image-format.md) for more details. |
||||||
|
- `options` is either absent (in which case it is interpreted as `'eng'`), a string specifing a language short code from the [language list](./tesseract_lang_list.md), or a flat json object that may: |
||||||
|
+ include properties that override some subset of the [default tesseract parameters](./tesseract_parameters.md) |
||||||
|
+ include a `lang` property with a value from the [list of lang parameters](./tesseract_lang_list.md), you can use multiple languages separated by '+', ex. `eng+chi_tra` |
||||||
|
|
||||||
|
Returns a [TesseractJob](#tesseractjob) whose `then`, `progress`, `catch` and `finally` methods can be used to act on the result. |
||||||
|
|
||||||
|
### Simple Example: |
||||||
|
```javascript |
||||||
|
const worker = new Tessearct.TesseractWorker(); |
||||||
|
worker |
||||||
|
.recognize(myImage) |
||||||
|
.then(function(result){ |
||||||
|
console.log(result); |
||||||
|
}); |
||||||
|
``` |
||||||
|
|
||||||
|
### More Complicated Example: |
||||||
|
```javascript |
||||||
|
const worker = new Tessearct.TesseractWorker(); |
||||||
|
// if we know our image is of spanish words without the letter 'e': |
||||||
|
worker |
||||||
|
.recognize(myImage, { |
||||||
|
lang: 'spa', |
||||||
|
tessedit_char_blacklist: 'e', |
||||||
|
}) |
||||||
|
.then(function(result){ |
||||||
|
console.log(result); |
||||||
|
}); |
||||||
|
``` |
||||||
|
|
||||||
|
## Tesseract.detect(image) -> [TesseractJob](#tesseractjob) |
||||||
|
|
||||||
|
Figures out what script (e.g. 'Latin', 'Chinese') the words in image are written in. |
||||||
|
|
||||||
|
- `image` see [Image Format](./image-format.md) for more details. |
||||||
|
|
||||||
|
Returns a [TesseractJob](#tesseractjob) whose `then`, `progress`, `catch` and `finally` methods can be used to act on the result of the script. |
||||||
|
|
||||||
|
```javascript |
||||||
|
const worker = new Tessearct.TesseractWorker(); |
||||||
|
worker |
||||||
|
.detect(myImage) |
||||||
|
.then(function(result){ |
||||||
|
console.log(result); |
||||||
|
}); |
||||||
|
``` |
||||||
|
|
||||||
|
## TesseractJob |
||||||
|
|
||||||
|
A TesseractJob is an object returned by a call to `recognize` or `detect`. It's inspired by the ES6 Promise interface and provides `then` and `catch` methods. It also provides `finally` method, which will be fired regardless of the job fate. One important difference is that these methods return the job itself (to enable chaining) rather than new. |
||||||
|
|
||||||
|
Typical use is: |
||||||
|
```javascript |
||||||
|
const worker = new Tessearct.TesseractWorker(); |
||||||
|
worker.recognize(myImage) |
||||||
|
.progress(message => console.log(message)) |
||||||
|
.catch(err => console.error(err)) |
||||||
|
.then(result => console.log(result)) |
||||||
|
.finally(resultOrError => console.log(resultOrError)); |
||||||
|
``` |
||||||
|
|
||||||
|
Which is equivalent to: |
||||||
|
```javascript |
||||||
|
const worker = new Tessearct.TesseractWorker(); |
||||||
|
const job1 = worker.recognize(myImage); |
||||||
|
|
||||||
|
job1.progress(message => console.log(message)); |
||||||
|
|
||||||
|
job1.catch(err => console.error(err)); |
||||||
|
|
||||||
|
job1.then(result => console.log(result)); |
||||||
|
|
||||||
|
job1.finally(resultOrError => console.log(resultOrError)); |
||||||
|
``` |
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
### TesseractJob.progress(callback: function) -> TesseractJob |
||||||
|
Sets `callback` as the function that will be called every time the job progresses. |
||||||
|
- `callback` is a function with the signature `callback(progress)` where `progress` is a json object. |
||||||
|
|
||||||
|
For example: |
||||||
|
```javascript |
||||||
|
const worker = new Tessearct.TesseractWorker(); |
||||||
|
worker.recognize(myImage) |
||||||
|
.progress(function(message){console.log('progress is: ', message)}); |
||||||
|
``` |
||||||
|
|
||||||
|
The console will show something like: |
||||||
|
```javascript |
||||||
|
progress is: {loaded_lang_model: "eng", from_cache: true} |
||||||
|
progress is: {initialized_with_lang: "eng"} |
||||||
|
progress is: {set_variable: Object} |
||||||
|
progress is: {set_variable: Object} |
||||||
|
progress is: {recognized: 0} |
||||||
|
progress is: {recognized: 0.3} |
||||||
|
progress is: {recognized: 0.6} |
||||||
|
progress is: {recognized: 0.9} |
||||||
|
progress is: {recognized: 1} |
||||||
|
``` |
||||||
|
|
||||||
|
|
||||||
|
### TesseractJob.then(callback: function) -> TesseractJob |
||||||
|
Sets `callback` as the function that will be called if and when the job successfully completes. |
||||||
|
- `callback` is a function with the signature `callback(result)` where `result` is a json object. |
||||||
|
|
||||||
|
|
||||||
|
For example: |
||||||
|
```javascript |
||||||
|
const worker = new Tessearct.TesseractWorker(); |
||||||
|
worker.recognize(myImage) |
||||||
|
.then(function(result){console.log('result is: ', result)}); |
||||||
|
``` |
||||||
|
|
||||||
|
The console will show something like: |
||||||
|
```javascript |
||||||
|
result is: { |
||||||
|
blocks: Array[1] |
||||||
|
confidence: 87 |
||||||
|
html: "<div class='ocr_page' id='page_1' ..." |
||||||
|
lines: Array[3] |
||||||
|
oem: "DEFAULT" |
||||||
|
paragraphs: Array[1] |
||||||
|
psm: "SINGLE_BLOCK" |
||||||
|
symbols: Array[33] |
||||||
|
text: "Hello World↵from beyond↵the Cosmic Void↵↵" |
||||||
|
version: "3.04.00" |
||||||
|
words: Array[7] |
||||||
|
} |
||||||
|
``` |
||||||
|
|
||||||
|
### TesseractJob.catch(callback: function) -> TesseractJob |
||||||
|
Sets `callback` as the function that will be called if the job fails. |
||||||
|
- `callback` is a function with the signature `callback(error)` where `error` is a json object. |
||||||
|
|
||||||
|
### TesseractJob.finally(callback: function) -> TesseractJob |
||||||
|
Sets `callback` as the function that will be called regardless if the job fails or success. |
||||||
|
- `callback` is a function with the signature `callback(resultOrError)` where `resultOrError` is a json object. |
@ -0,0 +1,13 @@ |
|||||||
|
# Image Format |
||||||
|
|
||||||
|
Support Format: **bmp, jpg, png, pbm** |
||||||
|
|
||||||
|
The main Tesseract.js functions (ex. recognize, detect) take an `image` parameter, which should be something that is like an image. What's considered "image-like" differs depending on whether it is being run from the browser or through NodeJS. |
||||||
|
|
||||||
|
On a browser, an image can be: |
||||||
|
- an `img`, `video`, or `canvas` element |
||||||
|
- a `File` object (from a file `<input>`) |
||||||
|
- a path or URL to an accessible image |
||||||
|
|
||||||
|
In Node.js, an image can be |
||||||
|
- a path to a local image |
@ -0,0 +1,24 @@ |
|||||||
|
## Local Installation |
||||||
|
|
||||||
|
In browser environment, `tesseract.js` simply provides the API layer. Internally, it opens a WebWorker to handle requests. That worker itself loads code from the Emscripten-built `tesseract.js-core` which itself is hosted on a CDN. Then it dynamically loads language files hosted on another CDN. |
||||||
|
|
||||||
|
Because of this we recommend loading `tesseract.js` from a CDN. But if you really need to have all your files local, you can pass extra arguments to `TessearctWorker` to specify custom paths for workers, languages, and core. |
||||||
|
|
||||||
|
In Node.js environment, the only path you may want to customize is languages/langPath. |
||||||
|
|
||||||
|
```javascript |
||||||
|
const worker = Tesseract.TesseractWorker({ |
||||||
|
workerPath: 'https://cdn.jsdelivr.net/gh/naptha/tesseract.js@v2.0.0/dist/worker.min.js', |
||||||
|
langPath: 'https://tessdata.projectnaptha.com/4.0.0', |
||||||
|
corePath: 'https://cdn.jsdelivr.net/gh/naptha/tesseract.js-core@v2.0.0-beta.5/tesseract-core.js', |
||||||
|
}); |
||||||
|
``` |
||||||
|
|
||||||
|
### workerPath |
||||||
|
A string specifying the location of the [worker.js](./dist/worker.min.js) file. |
||||||
|
|
||||||
|
### langPath |
||||||
|
A string specifying the location of the tesseract language files, with default value 'https://tessdata.projectnaptha.com/4.0.0'. Language file URLs are calculated according to the formula `langPath + langCode + '.traineddata.gz'`. |
||||||
|
|
||||||
|
### corePath |
||||||
|
A string specifying the location of the [tesseract.js-core library](https://github.com/naptha/tesseract.js-core), with default value 'https://cdn.jsdelivr.net/gh/naptha/tesseract.js-core@v2.0.0-beta.5/tesseract-core.js'. |
Loading…
Reference in new issue