> # UNDER CONTRUCTION > ## Due for Release on Monday, Oct 3, 2016 # tesseract.js Tesseract.js is a pure javascript version of the Tesseract OCR Engine that can recognize English, Chinese, Russian, and 60 other languages. Tesseract.js lets your code get the words out of scanned documents and other images. # Installation Tesseract.js works with a ` ``` ### Local First grab copies of `tesseract.js` and `tesseract.worker.js` from the [dist folder](https://github.com/naptha/tesseract.js/tree/master/dist). Then include `tesseract.js` on your page, and set `Tesseract.workerUrl` like this: ```html ``` ## npm ### TODO walp # Docs * [Tesseract.recognize(image: ImageLike[, options]) -> [TesseractJob](#tesseractjob)](#tesseractrecognizeimage-imagelike-options---tesseractjob) + [Simple Example:](#simple-example) + [More Complicated Example:](#more-complicated-example) * [Tesseract.detect(image: ImageLike) -> [TesseractJob](#tesseractjob)](#tesseractdetectimage-imagelike---tesseractjob) * [ImageLike](#imagelike) * [TesseractJob](#tesseractjob) + [TesseractJob.progress(callback: function) -> TesseractJob](#tesseractjobprogresscallback-function---tesseractjob) + [TesseractJob.then(callback: function) -> TesseractJob](#tesseractjobthencallback-function---tesseractjob) + [TesseractJob.error(callback: function) -> TesseractJob](#tesseractjoberrorcallback-function---tesseractjob) * [Tesseract Remote File Options](#tesseract-remote-file-options) + [Tesseract.coreUrl](#tesseractcoreurl) + [Tesseract.workerUrl](#tesseractworkerurl) + [Tesseract.langUrl](#tesseractlangurl) ## Tesseract.recognize(image: [ImageLike](#imagelike)[, options]) -> [TesseractJob](#tesseractjob) Figures out what words are in `image`, where the words are in `image`, etc. - `image` is any [ImageLike](#imagelike) object. - `options` is an optional flat json object. `options` may: + include properties that override some subset of the [default tesseract parameters](./tesseract_parameters.md) + include a `lang` property with a value from the [list of lang parameters](./tesseract_lang_list.md) Returns a [TesseractJob](#tesseractjob) whose `then`, `progress`, and `error` methods can be used to act on the result. ### Simple Example: ```javascript Tesseract.recognize('#my-image') .then(function(result){ console.log(result) }) ``` ### More Complicated Example: ```javascript // if we know our image is of spanish words without the letter 'e': Tesseract.recognize('#my-image', { lang: 'spa', tessedit_char_blacklist: 'e' }) .then(function(result){ console.log(result) }) ``` ## Tesseract.detect(image: [ImageLike](#imagelike)) -> [TesseractJob](#tesseractjob) Figures out what script (e.g. 'Latin', 'Chinese') the words in image are written in. - `image` is any [ImageLike](#imagelike) object. Returns a [TesseractJob](#tesseractjob) whose `then`, `progress`, and `error` methods can be used to act on the result of the script. ```javascript Tesseract.detect('#my-image') .then(function(result){ console.log(result) }) ``` ## ImageLike The main Tesseract.js functions take an `image` parameter, which should be something that is 'image-like'. That means `image` should be - an `img` element or querySelector that matches an `img` element - a `video` element or querySelector that matches a `video` element - a `canvas` element or querySelector that matches a `canvas` element - a CanvasRenderingContext2D (returned by `canvas.getContext('2d')`) - the absolute `url` of an image from the same website that is running your script. Browser security policies don't allow access to the content of images from other websites :( ## TesseractJob A TesseractJob is an an object returned by a call to recognize or detect. All methods of a given TesseractJob return that TesseractJob to enable chaining. Typical use is: ```javascript Tesseract.recognize('#my-image') .progress(function(message){console.log(message)}) .error(function(err){console.error(err)}) .then(function(result){console.log(result)}) ``` Which is equivalent to: ```javascript var job1 = Tesseract.recognize('#my-image'); job1.progress(function(message){console.log(message)}); job1.error(function(err){console.error(err)}); job1.then(function(result){console.log(result)}) ``` ### TesseractJob.progress(callback: function) -> TesseractJob Sets `callback` as the function that will be called every time the job progresses. - `callback` is a function with the signature `callback(progress)` where `progress` is a json object. For example: ```javascript Tesseract.recognize('#my-image') .progress(function(message){console.log('progress is: 'message)}) ``` The console will show something like: ```javascript progress is: {loaded_lang_model: "eng", from_cache: true} progress is: {initialized_with_lang: "eng"} progress is: {set_variable: Object} progress is: {set_variable: Object} progress is: {recognized: 0} progress is: {recognized: 0.3} progress is: {recognized: 0.6} progress is: {recognized: 0.9} progress is: {recognized: 1} ``` ### TesseractJob.then(callback: function) -> TesseractJob Sets `callback` as the function that will be called if and when the job successfully completes. - `callback` is a function with the signature `callback(result)` where `result` is a json object. For example: ```javascript Tesseract.recognize('#my-image') .then(function(result){console.log('result is: 'result)}) ``` The console will show something like: ```javascript progress is: { blocks: Array[1] confidence: 87 html: "