Version 2 is now available and under development in the master branch, read a story about v2: <ahref="https://jeromewu.github.io/why-i-refactor-tesseract.js-v2/">Why I refactor tesseract.js v2?</a><br>
Check the <ahref="https://github.com/naptha/tesseract.js/tree/support/1.x">support/1.x</a> branch for version 1
</h3>
<br>
Tesseract.js is a javascript library that gets words in [almost any language](./docs/tesseract_lang_list.md) out of images. ([Demo](http://tesseract.projectnaptha.com/))
Tesseract.js is a javascript library that gets words in [almost any language](./docs/tesseract_lang_list.md) out of images. ([Demo](http://tesseract.projectnaptha.com/))
Image Recognition
Image Recognition
@ -69,6 +62,16 @@ const worker = createWorker({
[Check out the docs](#documentation) for a full explanation of the API.
[Check out the docs](#documentation) for a full explanation of the API.
## Major changes in v3
- Significantly faster performance
- Runtime reduction of 84% for Browser and 96% for Node.js when recognizing the [example images](./examples/data)
- Upgrade to Tesseract v5.1.0 (using emscripten 3.1.18)
- Added SIMD-enabled build for supported devices
- Added support:
- Node.js version 18
- Removed support:
- ASM.js version, any other old versions of Tesseract.js-core (<3.0.0)
- Node.js versions 10 and 12
## Major changes in v2
## Major changes in v2
- Upgrade to tesseract v4.1.1 (using emscripten 1.39.10 upstream)
- Upgrade to tesseract v4.1.1 (using emscripten 1.39.10 upstream)
@ -77,7 +80,8 @@ const worker = createWorker({
- Support WebAssembly (fallback to ASM.js when browser doesn't support)
- Support WebAssembly (fallback to ASM.js when browser doesn't support)
- Support Typescript
- Support Typescript
Read a story about v2: <ahref="https://jeromewu.github.io/why-i-refactor-tesseract.js-v2/">Why I refactor tesseract.js v2?</a><br>
Check the <ahref="https://github.com/naptha/tesseract.js/tree/support/1.x">support/1.x</a> branch for version 1
## Installation
## Installation
Tesseract.js works with a `<script>` tag via local copy or CDN, with webpack via `npm` and on Node.js with `npm/yarn`.
Tesseract.js works with a `<script>` tag via local copy or CDN, with webpack via `npm` and on Node.js with `npm/yarn`.
@ -95,16 +99,16 @@ After including the script the `Tesseract` variable will be globally available.
### Node.js
### Node.js
**Tesseract.js currently requires Node.js v6.8.0 or higher**
**Tesseract.js v3 requires Node.js v14 or higher**
The main Tesseract.js functions (ex. recognize, detect) take an `image` parameter. The image formats and data types supported are listed below.
The main Tesseract.js functions (ex. recognize, detect) take an `image` parameter, which should be something that is like an image. What's considered "image-like" differs depending on whether it is being run from the browser or through NodeJS.
Support Image Formats: **bmp, jpg, png, pbm, webp**
On a browser, an image can be:
For browser and Node, supported data types are:
- an `img`, `video`, or `canvas` element
- string with base64 encoded image (fits `data:image\/([a-zA-Z]*);base64,([^"]*)` regexp)
- a `File` object (from a file `<input>`)
- buffer
- a `Blob` object
- a path or URL to an accessible image
- a base64 encoded image fits `data:image\/([a-zA-Z]*);base64,([^"]*)` regexp
In Node.js, an image can be
For browser only, supported data types are:
- a path to a local image
- `File` or `Blob` object
- a Buffer storing binary image
- `img` or `canvas` element
- a base64 encoded image fits `data:image\/([a-zA-Z]*);base64,([^"]*)` regexp
For Node only, supported data types are:
- string containing a path to local image
Note: images must be a supported image format **and** a supported data type. For example, a buffer containing a png image is supported. A buffer containing raw pixel data is not supported.