> # UNDER CONTRUCTION > ## Due for Release on Monday, Oct 3, 2016 # tesseract.js Tesseract.js is a pure javascript version of the Tesseract OCR Engine that can recognize English, Chinese, Russian, and 60 other languages. Tesseract.js lets your code get the words out of scanned documents and other images. # Installation Tesseract.js works with a ` ``` ### Local First grab copies of `tesseract.js` and `tesseract.worker.js` from the [dist folder](https://github.com/naptha/tesseract.js/tree/master/dist). Then include `tesseract.js` on your page, and set `Tesseract.workerUrl` like this: ```html ``` ## npm ### TODO # Docs ## ImageLike The main Tesseract.js functions take an `image` parameter, which should be something that is 'image-like'. That means `image` should be - an `img` element or querySelector that matches an `img` element - a `video` element or querySelector that matches a `video` element - a `canvas` element or querySelector that matches a `canvas` element - a CanvasRenderingContext2D (returned by `canvas.getContext('2d')`) - the absolute `url` of an image from the same website that is running your script. Browser security policies don't allow access to the content of images from other websites :( ## Tesseract.recognize(image: [ImageLike](#imagelike)[, options]) -> [TesseractJob](#tesseractjob) Figures out what words are in `image`, where the words are in `image`, etc. - `image` is any [ImageLike](#imagelike) object. - `options` is an optional flat json object. `options` may: + override some subset of the [default tesseract parameters](./tesseract_parameters.md) + specify a lang parameter that is one of `'afr'`, `'ara'`, `'aze'`, `'bel'`, `'ben'`, `'bul'`, `'cat'`, `'ces'`, `'chi_sim'`, `'chi_tra'`, `'chr'`, `'frak'`, `'dan'`, `'frak'`, `'deu'`, `'ell'`, `'eng'`, `'enm'`, `'epo'`, `'equ'`, `'est'`, `'eus'`, `'fin'`, `'fra'`, `'frk'`, `'frm'`, `'glg'`, `'grc'`, `'heb'`, `'hin'`, `'hrv'`, `'hun'`, `'ind'`, `'isl'`, `'ita'`, `'ita_old'`, `'jpn'`, `'kan'`, `'kor'`, `'lav'`, `'lit'`, `'mal'`, `'meme'`, `'mkd'`, `'mlt'`, `'msa'`, `'nld'`, `'nor'`, `'osd'`, `'pol'`, `'por'`, `'ron'`, `'rus'`, `'frak'`, `'slk'`, `'slv'`, `'spa'`, `'spa_old'`, `'sqi'`, `'srp'`, `'swa'`, `'swe'`, `'tam'`, `'tel'`, `'tgl'`, `'tha'`, `'tur'`, `'ukr'`, or `'vie'`. The default is `'eng'`. Returns a [TesseractJob](#tesseractjob) whose `then`, `progress`, and `error` methods can be used to act on the result. Example: ```javascript Tesseract.recognize('#my-image') .then(function(result){ console.log(result) }) ``` ## Tesseract.detect(image: [ImageLike](#imagelike)) -> [TesseractJob](#tesseractjob) Figures out what script (e.g. 'Latin', 'Chinese') the words in image are written in. - `image` is any [ImageLike](#imagelike) object. Returns a [TesseractJob](#tesseractjob) whose `then`, `progress`, and `error` methods can be used to act on the result of the script. ```javascript Tesseract.detect('#my-image') .then(function(result){ console.log(result) }) ``` ## TesseractJob A TesseractJob is an an object returned by a call to recognize or detect. All methods of a TesseractJob return itself to enable chaining. Typical use is: ```javascript var job1 = Tesseract.recognize('#my-image') job1.progress(function(message){console.log(message)}) .error(function(err){console.error(err)}) .then(function(result){console.log(result)}) ``` or more concisely: ```javascript Tesseract.recognize('#my-image') .progress(function(message){console.log(message)}) .error(function(err){console.error(err)}) .then(function(result){console.log(result)}) ``` ### TesseractJob.progress(callback: function) -> TesseractJob Sets `callback` as the function that will be called every time the job progresses. `callback` is a function with the signature `callback(progress)` where progress is json object. For example: ```javascript Tesseract.recognize('#my-image') .progress(function(message){console.log('progress is: 'message)}) ``` The console will show something like: ```javascript progress is: {loaded_lang_model: "eng", from_cache: true} progress is: {initialized_with_lang: "eng"} progress is: {set_variable: Object} progress is: {set_variable: Object} progress is: {recognized: 0} progress is: {recognized: 0.3} progress is: {recognized: 0.6} progress is: {recognized: 0.9} progress is: {recognized: 1} ``` ### TesseractJob.then(callback: function) -> TesseractJob Sets `callback` as the function that will be called if and when the job successfully completes. For example: ```javascript Tesseract.recognize('#my-image') .then(function(result){console.log('result is: 'result)}) ``` The console will show something like: ```javascript progress is: { blocks: Array[1] confidence: 87 html: "