Browse Source

rearranging readme

pull/12/head
Guillermo 9 years ago
parent
commit
dff5b07613
  1. 78
      README.md

78
README.md

@ -4,6 +4,8 @@
# tesseract.js # tesseract.js
Tesseract.js is a pure javascript version of the Tesseract OCR Engine that can recognize English, Chinese, Russian, and 60 other languages. Tesseract.js is a pure javascript version of the Tesseract OCR Engine that can recognize English, Chinese, Russian, and 60 other languages.
Tesseract.js lets your code get the words out of scanned documents and other images.
<!-- ![alt text]( "Logo Title Text 1") --> <!-- ![alt text]( "Logo Title Text 1") -->
# Installation # Installation
@ -12,14 +14,11 @@ Tesseract.js works with a `<script>` tag via local copy or cdn, or with `npm` (i
## Script Tag ## Script Tag
### CDN ### CDN
```html ```html
<script src='https://cdn.rawgit.com/naptha/tesseract.js/5ed4c0bc/dist/tesseract.js'></script> <script src='https://cdn.rawgit.com/naptha/tesseract.js/a01d2a2/dist/tesseract.js'></script>
<script> <script>
var worker = createTesseractWorker('https://cdn.rawgit.com/naptha/tesseract.js/5ed4c0bc/dist/tesseract.worker.js') Tesseract.recognize('#my-image')
worker.recognize('#my-image')
.progress(function (p) { console.log('progress', p) }) .progress(function (p) { console.log('progress', p) })
.then(function (result) { console.log('result', result) }) .then(function (result) { console.log('result', result) })
</script> </script>
@ -27,16 +26,16 @@ worker.recognize('#my-image')
### Local ### Local
First grab copies of `tesseract.js` and `tesseract.worker.js` from the [dist folder](https://github.com/naptha/tesseract.js/tree/master/dist). Then include `tesseract.js` on your page like this: First grab copies of `tesseract.js` and `tesseract.worker.js` from the [dist folder](https://github.com/naptha/tesseract.js/tree/master/dist). Then include `tesseract.js` on your page, and set `Tesseract.workerUrl` like this:
```html ```html
<script src='/path/to/tesseract.js'></script> <script src='/path/to/tesseract.js'></script>
<script> <script>
var worker = createTesseractWorker('/path/to/tesseract.worker.js') Tesseract.workerUrl = 'http://www.absolute-path-to/tesseract.worker.js'
worker.recognize('#my-image') Tesseract.recognize('#my-image')
.progress(function (p) { console.log('progress', p) }) .progress(function (p) { console.log('progress', p) })
.then(function (result) { console.log('result', result) }) .then(function (result) { console.log('result', result) })
</script> </script>
@ -51,30 +50,45 @@ worker.recognize('#my-image')
```--> ```-->
# Docs # Docs
## Tesseract.recognize(image) -> [TesseractJob](#tesseractjob)
Returns a TesseractJob whose `then` method can be used to act on the result of the OCR. ## ImageLike
The main Tesseract.js functions take an `image` parameter, which should be something that is 'image-like'.
For example: That means `image` should be
- an `img` element or querySelector that matches an `img` element
`image` can be - a `video` element or querySelector that matches a `video` element
- an `img` element or querySelector that matches an `img` element - a `canvas` element or querySelector that matches a `canvas` element
- a `video` element or querySelector that matches a `video` element - a CanvasRenderingContext2D (returned by `canvas.getContext('2d')`)
- a `canvas` element or querySelector that matches a `canvas` element - the absolute `url` of an image from the same website that is running your script. Browser security policies don't allow access to the content of images from other websites :(
- a CanvasRenderingContext2D (returned by `canvas.getContext('2d')`)
- the absolute `url` of an image from the same website that is running your script. Browser security policies don't allow access to the content of images from other websites :(
- ## Tesseract.recognize(image: [ImageLike](#imagelike)[, options]) -> [TesseractJob](#tesseractjob)
Figures out what words are in the image, where the words are, etc.
## Tesseract.detect(image) -> [TesseractJob](#tesseractjob) - `image` should be an [ImageLike](#imagelike) object.
Returns a TesseractJob whose `then` method can be used to act on the result of the OCR. - `options` is an optional parameter with tesseract specific keys
+ hi
For example: Returns a [TesseractJob](#tesseractjob) whose `then` method can be used to act on the result.
`image` can be Example:
- an `img` element or querySelector that matches an `img` element ```javascript
- a `video` element or querySelector that matches a `video` element Tesseract.recognize('#my-image')
- a `canvas` element or querySelector that matches a `canvas` element .then(function(result){
- a CanvasRenderingContext2D (returned by `canvas.getContext('2d')`) console.log(result)
- the absolute `url` of an image from the same website that is running your script. Browser security policies don't allow access to the content of images from other websites :( })
```
## Tesseract.detect(image: [ImageLike](#imagelike)) -> [TesseractJob](#tesseractjob)
Figures out what script (e.g. 'Latin', 'Chinese') the words in the image are written in.
`image` should be an [ImageLike](#imagelike) object.
Returns a [TesseractJob](#tesseractjob) whose `then` method can be used to act on the result of the script.
```javascript
Tesseract.detect('#my-image')
.then(function(result){
console.log(result)
})
```
## TesseractJob ## TesseractJob
A TesseractJob is an an object returned by a call to recognize or detect. A TesseractJob is an an object returned by a call to recognize or detect.

Loading…
Cancel
Save