Browse Source

Bringing all files up to 2.0; removed index.html in favor of GitHub pages site

pull/17/head 2.0.0
Matthew Holt 11 years ago
parent
commit
087b42ca0e
  1. 390
      README.md
  2. 214
      index.html
  3. 6
      jquery.parse.min.js
  4. 2
      tests.html

390
README.md

@ -1,12 +1,13 @@ @@ -1,12 +1,13 @@
Parse (jquery.parse) Plugin
===========================
Parse CSV with Javascript
========================================
The jQuery Parse plugin is a robust and efficient CSV (character-separated values) parser with these features:
Papa Parse (formerly the jQuery Parse Plugin) is a robust and powerful CSV (character-separated values) parser with these features:
- Parses delimited text strings without any fuss
- Attach to `<input type="file">` elements to load and parse files from disk
- Automatically detects delimiter (or specify a delimiter yourself)
- Header row support
- Supports streaming large inputs
- Utilize the header row, if present
- Gracefully handles malformed data
- Optional dynamic typing so that numeric data is parsed as numbers
- Descriptive and contextual errors
@ -16,380 +17,16 @@ The jQuery Parse plugin is a robust and efficient CSV (character-separated value @@ -16,380 +17,16 @@ The jQuery Parse plugin is a robust and efficient CSV (character-separated value
Demo
----
**[jsFIDDLE DEMO](http://jsfiddle.net/mholt/nCaee/)**
Or download the repository and open `index.html` in your browser.
Visit **[PapaParse.com](http://papaparse.com/#demo)** to give Papa a whirl!
Get Started
-----------
For production: [jquery.parse.min.js](https://github.com/mholt/jquery.parse/blob/master/jquery.parse.min.js)
For debug/dev: [jquery.parse.js](https://github.com/mholt/jquery.parse/blob/master/jquery.parse.js)
### Config object
Any time you invoke the parser, you may customize it using a "config" object. It supports these properties:
| Option | Default | Description
|-------------------- | ------- | ---------------
| **`delimiter`** | ` ` | The delimiting character. Leave blank to auto-detect. If you specify a delimiter, it must be a string of length 1, and cannot be `\n`, `\r`, or `"`.
| **`header`** | `true` | If true, interpret the first row of parsed data as column titles; fields are returned separately from the data, and data will be returned keyed to its field name. Duplicate field names would be problematic. If false, the parser simply returns an array (list) of arrays (rows), including the first row.
| **`dynamicTyping`** | `true` | If true, fields that are only numeric will be converted to a number type. If false, each parsed datum is returned as a string.
| **`preview`** | `0` | If preview > 0, only that many rows will be parsed.
### Parsing strings
To parse a delimited text string with default settings, simply do:
```javascript
var results = $.parse(csvString);
```
Or to customize the settings, pass in a config object with any properties you wish to change:
```javascript
var results = $.parse(csvString, {
delimiter: "\t",
header: false,
dynamicTyping: false,
preview: 10
});
```
### Parsing files
You can parse multiple files from multiple `<input type="file">` elements like so, where each property is optional:
```javascript
$('input[type=file]').parse({
config: {
// base settings to use for each file
},
before: function(file, inputElem)
{
// executed before parsing each file begins;
// see documentation for how return values
// affect the behavior of the plugin
},
error: function(err, file, inputElem)
{
// executed if an error occurs during loading the file,
// or if the file being iterated is the wrong type,
// or if the input element has no files selected
},
complete: function(results, file, inputElem, event)
{
// executed when parsing each file completes;
// this function receives the parse results
}
});
```
In order to be parsed, a file must have "text" in its MIME type.
#### Callbacks
As indicated above, there are callbacks you can use when parsing files.
##### `before(file, inputElem)`
If the next file in the queue is found to be some sort of "text" MIME type, this callback will be executed immediately before setting up the FileReader, loading the file, and parsing it. It receives the file object and the `<input>` element so you can inspect the file to be parsed.
You can change what happens next depending on what you return:
- Return `"skip"` to skip parsing this file.
- Return `false` to abort parsing this and all other files in the queue.
- Return a config object to alter the options for parsing this file only.
Returning anything else, including `undefined`, continues without any changes.
##### `error(err, file, inputElem)`
Invoked if there is an error loading the file. It receives an object that implements the [`DOMError`](https://developer.mozilla.org/en-US/docs/Web/API/DOMError) interface (i.e. call `err.name` to get the error), the file object at hand, and the `<input>` element from which the file was selected.
Errors can occur before reading the file if:
- the HTML element has no files chosen
- a file chosen is not a "text" type (e.g. "text/csv" or "text/plain")
- a user-defined callback function (`before`) aborted the process
Otherwise, errors are invoked by FileReader when opening the file. *Parse errors are not reported here, but are reported in the results later on.*
Use [jquery.parse.min.js](https://github.com/mholt/jquery.parse/blob/master/jquery.parse.min.js) for production.
##### `complete(results, file, inputElem, event)`
Invoked when parsing a file completes. It receives the results of the parse (including errors), the file object, the `<input>` element from which the file was chosen, and the FileReader-generated event.
Output
------
Whether you're parsing strings or files, the results returned by the parser are the same since, under the hood, the FileReader loads a file as a string.
The results will always have this basic structure:
```javascript
{
results: // parse results
errors: // parse errors, keyed by row
}
```
If no delimiter is specified and a delimiter cannot be auto-detected, an error keyed by "config" will be produced, and a default delimiter will be chosen.
**Example input:**
Item,SKU,Cost,Quantity
Book,ABC1234,10.95,4
Movie,DEF5678,29.99,3
### Results if `header: true` and `dynamicTyping: true`
With a header row, each value is keyed to its field name, so the result is an object with `fields` and `rows`. The fields are an array of strings, and the rows are an array of objects:
```json
{
"results": {
"fields": [
"Item",
"SKU",
"Cost",
"Quantity"
],
"rows": [
{
"Item": "Book",
"SKU": "ABC1234",
"Cost": 10.95,
"Quantity": 4
},
{
"Item": "Movie",
"SKU": "DEF5678",
"Cost": 29.99,
"Quantity": 3
}
]
},
"errors": {
"length": 0
}
}
```
Notice how the numeric values were converted to numbers. That is what `dynamicTyping` does.
With a header row, the field count must be the same on each row, or a FieldMismatch error will be produced for that row. (Without a header row, lines can have variable number of fields without errors.)
### Results if `header: false` and `dynamicTyping: false`
Without a header row, the result is an array (list) of arrays (rows).
```json
{
"results": [
[
"Item",
"SKU",
"Cost",
"Quantity"
],
[
"Book",
"ABC1234",
"10.95",
"4"
],
[
"Movie",
"DEF5678",
"29.99",
"3"
]
],
"errors": {
"length": 0
}
}
```
Notice how, since dynamic typing is disabled, the numeric values are strings.
If you are concerned about optimizing the performance of the parser, disable dynamic typing. That should speed things up by at least 2x.
Parse Errors
------------
Parse errors are returned alongside the results as an array of objects. Here is the structure of an error object:
```javascript
{
type: "", // Either "Quotes" or "FieldMismatch"
code: "", // Standardized error code like "UnexpectedQuotes"
message: "", // Human-readable error details
line: 0, // Line of original input
row: 0, // Row index of parsed data where error is
index: 0 // Character index within original input
}
```
Assuming the default settings, suppose the input is malformed:
Item,SKU,Cost,Quantity
Book,"ABC1234,10.95,4
Movie,DEF5678,29.99,3
Notice the stray quotes on the second line. This is the output:
```json
{
"results": {
"fields": [
"Item",
"SKU",
"Cost",
"Quantity"
],
"rows": [
{
"Item": "Book",
"SKU": "ABC1234,10.95,4\nMovie,DEF5678,29.99,3"
}
]
},
"errors": {
"0": [
{
"type": "FieldMismatch",
"code": "TooFewFields",
"message": "Too few fields: expected 4 fields but parsed 2",
"line": 2,
"row": 0,
"index": 66
},
{
"type": "Quotes",
"code": "MissingQuotes",
"message": "Unescaped or mismatched quotes",
"line": 2,
"row": 0,
"index": 66
}
],
"length": 2
}
}
```
If the header row is disabled, field counting does not occur because there is no need to key the data to the field name. Thus we only get a Quotes error:
```json
{
"results": [
[
"Item",
"SKU",
"Cost",
"Quantity"
],
[
"Book",
"ABC1234,10.95,4\nMovie,DEF5678,29.99,3"
]
],
"errors": {
"1": [
{
"type": "Quotes",
"code": "MissingQuotes",
"message": "Unescaped or mismatched quotes",
"line": 2,
"row": 1,
"index": 66
}
],
"length": 1
}
}
```
Suppose a field value with a delimiter is not escaped:
Item,SKU,Cost,Quantity
Book,ABC1234,10,95,4
Movie,DEF5678,29.99,3
Again, notice the second line, "10,95" instead of "10.95". This field *should* be quoted: `"10,95"` but the parser handles the problem gracefully:
```json
{
"results": {
"fields": [
"Item",
"SKU",
"Cost",
"Quantity"
],
"rows": [
{
"Item": "Book",
"SKU": "ABC1234",
"Cost": 10,
"Quantity": 95,
"__parsed_extra": [
"4"
]
},
{
"Item": "Movie",
"SKU": "DEF5678",
"Cost": 29.99,
"Quantity": 3
}
]
},
"errors": {
"0": [
{
"type": "FieldMismatch",
"code": "TooManyFields",
"message": "Too many fields: expected 4 fields but parsed 5",
"line": 2,
"row": 0,
"index": 43
}
],
"length": 1
}
}
```
Since files with headers are supposed to have the same number of fields per row, any extra fields are parsed into a special array field named "__parsed_extra" in the order that the remaining line was parsed.
For usage instructions, see the [homepage](http://papaparse.com) and, for more detail, the [documentation](http://papaparse.com/docs.html).
@ -400,14 +37,7 @@ The Parser component is under test. Download this repository and open `tests.htm @@ -400,14 +37,7 @@ The Parser component is under test. Download this repository and open `tests.htm
The Parser function
-------------------
Inside this jQuery plugin is a `Parser` function that performs the parsing of delimited text. It does not depend upon jQuery. This plugin uses jQuery to attach to `<input type="file">` elements and to make it more convenient to activate and use the parsing mechanism.
Contributing
-------------------
------------
Please feel free to chip in! If you'd like to see a feature or fix, pull down the code and submit a pull request. But remember, if you're changing anything in the Parser function, a pull request, *with test*, is best. (All changes to the parser component should be validated with tests.)
If you'd like to see a feature or bug fix, pull down the code and submit a pull request. But remember, if you're changing anything in the Parser function, a pull request, *with test*, is best. (All changes to the parser component should be validated with tests.) You may also open issues for discussion or join in on Twitter with [#PapaParse](https://twitter.com/search?q=%23PapaParse&src=typd&f=realtime)

214
index.html

@ -1,214 +0,0 @@ @@ -1,214 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<title>Parse jQuery Plugin</title>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/2.0.3/jquery.min.js"></script>
<script src="jquery.parse.js"></script>
<style>
body {
font-family: sans-serif;
}
textarea,
#delim {
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
font: 14px/1.5em 'Monaco', monospace;
outline: none;
}
textarea {
width: 100%;
padding: 10px;
height: 260px;
}
#delim {
width: 80px;
}
#tabdelim {
font-size: 12px;
}
.container {
width: 100%;
}
.text-center {
text-align: center;
}
code {
white-space: pre;
font: 12px/1.25em 'Monaco', monospace;
background: #EEE;
display: block;
padding: 5px;
}
label {
white-space: nowrap;
}
button {
font-size: 18px;
padding: 10px 40px;
}
</style>
</head>
<body>
<div class="container">
<div class="text-center">
Delimiter: <input type="text" id="delim" value="" maxlength="1" placeholder="auto"> <a href="javascript:" id="tabdelim">(Tab)</a>
<br>
<label><input type="checkbox" id="header" checked> Header row</label>
&nbsp; &nbsp;
<label><input type="checkbox" id="dyntype" checked> Dynamic typing</label>
&nbsp; &nbsp;
<label><input type="checkbox" id="stream"> Stream results</label>
</div>
<br>
<textarea id="tb" placeholder="CSV input">Address,City,State,Zipcode,Name,Phone Number,Group,URL
1 Crossgates Mall Road,Albany,NY,12203,Apple Store Cross Gates,(518) 869-3192,"Example ""Group"" 1",http://www.apple.com/retail/crossgates/
Duke Rd & Walden Ave,Buffalo,NY,14225,Apple Store Walden Galleria,(716) 685-2762,Example Group 2,http://www.apple.com/retail/walden/
630 Old Country Rd.,Garden City,NY,11530,Apple Store Roosevelt Field,(516) 248-3347,Example Group 3,http://www.apple.com/retail/rooseveltfield/
160 Walt Whitman Rd.,Huntington Station,NY,11746,Apple Store Walt Whitman,(631) 425-1563,Example Group 3,http://www.apple.com/retail/waltwhitman/
9553 Carousel Center Drive,Syracuse,NY,13290,Apple Store Carousel,(315) 422-8484,Example Group 2,http://www.apple.com/retail/carousel/
2655 Richmond Ave,Staten Island,NY,10314,Apple Store Staten Island,(718) 477-4180,Example Group 1,http://www.apple.com/retail/statenisland/
7979 Victor Road,Victor,NY,14564,Apple Store Eastview,(585) 421-3030,Example Group 1,http://www.apple.com/retail/eastview/
1591 Palisades Center Drive,West Nyack,NY,10994,Apple Store Palisades,(845) 353-6756,Example Group 2,http://www.apple.com/retail/palisades/
125 Westchester Ave.,White Plains,NY,10601,Apple Store The Westchester,(914) 428-1877,Example Group 3,http://www.apple.com/retail/thewestchester/
103 Prince Street,New York,NY,10012,Apple Store SoHo,(212) 226-3126,Example Group 2,http://www.apple.com/retail/soho/</textarea>
<br>
<div class="text-center">
<button id="parseText">Parse Text</button>
<br><hr>
or
<br><hr>
<input type="file" id="fileinput1" multiple>
<input type="file" id="fileinput2" multiple>
<br><br>
<button id="parseFiles">Parse File(s)</button>
</div>
<br><br>
<code id="output"></code>
</div>
<script>
$(function()
{
var rowCount = 0, queued = 0;
var bigParse = 5243000; // 10 MB
var bigRender = 1024 * 10; // 10 KB
// Note: when streaming from a large file, I'm able to get about 1 million rows every 1.8s
$('#parseText').click(function()
{
rowCount = 0;
$('#parseFiles').prop('disabled', true);
if (is('stream'))
console.log("Now parsing input (not showing progress for massive performance boost)...");
// TODO: Build in some performance logging?
var start = performance.now();
var results = $.parse($('#tb').val(), userConfig());
var end = performance.now();
console.log(Math.round(end - start) + " ms to parse input text");
if (!is('stream'))
render(results);
else
{
console.log("Rows parsed:", rowCount);
render({"message": "Results were streamed and were not aggregated in order to save memory. See console for row count."});
}
$('#parseFiles').prop('disabled', false);
});
$('#parseFiles').click(function()
{
rowCount = 0, queued = 0;
var start = performance.now();
$('#fileinput1, #fileinput2').parse(
{
before: function(file, inputElem)
{
console.log("BEFORE", file, inputElem);
if (file.size && file.size > bigParse && !is('stream'))
{
if (!confirm("WARNING - " + file.name + " is a large file, but you chose not to stream the results. This could make your browser tab lock up. Continue?"))
return false;
}
queued++;
$('#parseFiles').prop('disabled', true);
if (is('stream'))
console.log("File is being parsed and the results are being streamed... (not showing progress for massive performance boost)");
},
error: function(err, file, elem)
{
console.log("ERROR", err, file, elem);
if (err.name != "NoFileError")
queued--;
if (queued == 0 || err.name == "AbortError")
$('#parseFiles').prop('disabled', false);
},
complete: function(data, file, inputElem, event)
{
var end = performance.now();
console.log("COMPLETE", file.size < bigRender ? data : "(too big to render data here or file was streamed)", file, inputElem, event);
console.log(Math.round(end - start) + " ms to parse file");
queued--;
if (file.size && file.size < bigRender && !is('stream'))
render(data);
else
render({"message": "File was streamed or is too big to render here; open Developer Tools to see the console output instead"});
if (queued == 0)
$('#parseFiles').prop('disabled', false);
if (is('stream'))
console.log("Rows parsed:", rowCount);
},
config: userConfig()
});
});
$('#tabdelim').click(function()
{
$('#delim').val("\t");
});
function userConfig()
{
return {
delimiter: $("#delim").val(),
header: is('header'),
dynamicTyping: is('dyntype'),
stream: is('stream') ? function(results) { rowCount++; } : false
};
}
function is(checkboxId)
{
return $('#'+checkboxId).is(':checked');
}
function render(results)
{
$('#output').text(JSON.stringify(results, undefined, 2));
}
});
</script>
</body>
</html>

6
jquery.parse.min.js vendored

File diff suppressed because one or more lines are too long

2
tests.html

@ -1,7 +1,7 @@ @@ -1,7 +1,7 @@
<!DOCTYPE html>
<html>
<head>
<title>jQuery Parse Plugin Tests</title>
<title>Parser Tests</title>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/2.0.3/jquery.min.js"></script>
<script src="jquery.parse.js"></script>
<script src="tests.js"></script>

Loading…
Cancel
Save