$.parse(inputString[, config])
Returns a parse results object.
With default settings:
var results = $.parse(csvString);
With custom config:
var results = $.parse(csvString, {
delimiter: "\t",
header: false,
dynamicTyping: false,
preview: 10,
step: function(data, file, inputElem) {
console.log(data.results);
}
});
step
callback, the input will be streamed and step
will be executed after each row is parsed.
$(selector).parse(settings)
Where selector selects file input elements and settings is an object as described below.
You can parse one or more files from one or more <input type="file">
elements like so, where each property is optional:
$('input[type=file]').parse({
config: {
// base config to use for each file
},
before: function(file, inputElem)
{
// executed before parsing each file begins;
// what you return here controls the flow
},
error: function(err, file, inputElem)
{
// executed if an error occurs during loading the file,
// or if the file being iterated is the wrong type,
// or if the input element has no files selected
},
complete: function(results, file, inputElem, event)
{
// executed when parsing each file completes;
// this function receives the parse results
}
});
config
should be a config object as described below.
before
is an optional callback that lets you inspect each file before parsing begins. Return:
"skip"
to skip parsing just that file.false
to abort parsing this and all other files in the queue.undefined
, to continue without any changeserror
is executed when there is a problem getting the file ready to parse. (Parse errors are not reported here.) It receives an object that implements the DOMError interface, the File object at hand, and the <input> element from which the file was selected. Errors can occur before reading the file if:
complete
is invoked when parsing a file completes. It receives the results of the parse (including parse errors), the File object, the <input> element from which the file was chosen, and the FileReader-generated event.
Use a config object to specify the parser's behavior.
Any time you invoke the parser, you may customize its behavior using a "config" object. It supports these properties:
delimiter
The delimiting character. Leave blank to auto-detect. If you specify a delimiter, it must be a string of length 1, and cannot be \n
, \r
, or "
.
header
If true, the first row of parsed data will be interpreted as column titles (fields). Fields are returned separately from the rows, and each data point will be keyed to its field name. If false, the parser simply returns an array of arrays, including the first row.
dynamicTyping
If true, fields that are only numeric will be converted to a number type. If false, each parsed datum is returned as a string.
preview
If preview > 0, only that many rows will be parsed.
step
To use a stream, define a callback function here which receives the data, row-by-row, as each row is parsed. If parsing a file, step also receives the source file and file input element. Return false
to abort the process.
{
delimiter: "",
header: true,
dynamicTyping: true,
preview: 0,
step: undefined
}
Parse output is always an object like this:
{
results: // parse results
errors: // parse errors, keyed by row
}
results
will be an array of arrays if header row is disabled, or an array of objects if header row is enabled.
With default config (header row and dynamic typing enabled):
{
"results": {
"fields": [
"Item",
"SKU",
"Cost",
"Quantity"
],
"rows": [
{
"Item": "Book",
"SKU": "ABC1234",
"Cost": 10.95,
"Quantity": 4
},
{
"Item": "Movie",
"SKU": "DEF5678",
"Cost": 29.99,
"Quantity": 3
}
]
},
"errors": {
"length": 0
}
}
With header row and dynamic typing disabled:
{
"results": [
[
"Item",
"SKU",
"Cost",
"Quantity"
],
[
"Book",
"ABC1234",
"10.95",
"4"
],
[
"Movie",
"DEF5678",
"29.99",
"3"
]
],
"errors": {
"length": 0
}
}
Parse errors are returned in this format, keyed by the row number, alongside the "length" property (shown above) which is included for convenience:
{
type: "", // A generalization of the error
code: "", // Standardized error code
message: "", // Human-readable details
line: 0, // Line of original input
row: 0, // Row index of parsed data where error is
index: 0 // Character index within original input
}
type
will be one of "Abort", "Quotes", "Delimiter", or "FieldMismatch".
code
may be:
index
will be the character index across the entire input where the error occurred; it is not the index of the offending character on that line.
Papa can load and parse very large files by using streams.
Yes. By defining a step callback function, you're able to receive parsed results, row-by-row, as the data is collected. This dramatically reduces memory usage and prevents browsers from crashing.
A stream is a unique data structure which, given infinite time, gives you infinite space. So if you're short on memory (as client computers often are), use a stream.
Yes and no. Typically, when we gain speed, we pay with space. The opposite is true, too. Streaming uses significantly less memory with large inputs, but since the reading happens in chunks and results are processed at each row instead of at the very end, yes, it can be slower.
But consider the alternative: upload the file to a remote server, open and process it there using a (hopefully) fast and accurate parser, then compress it and have the client download the results. How long does it take you to upload a 500 MB or 1 GB file? Then consider that the server still has to open the file and read its contents, which is what the client would have done minutes ago. The server might parse it faster with natively-compiled binaries, but only if its resources are dedicated to the task and isn't already parsing files for many other users.
So unless your clients have a fiber line and you have a scalable cloud application, local parsing by streaming is nearly guaranteed to be faster.
step
function?
Simple example:
$('input[type=file]').parse({
config: {
step: function(data, file, inputElem) {
console.log("Row data:", data.results);
console.log("Row errors:", data.errors);
}
}
complete: function() {
console.log("All done!");
}
});
Notice that the function only receives data, which has the same structure as the output described above.
You don't. Unless you assemble it manually. And really, don't do that... it defeats the purpose of using a stream. Just take the bits you need as they come through.
In some very unscientific testing (with the fastest 2013 Macbook Pro), we were able to load files of about 250 MB for parsing in Chrome without crashing the tab. Beyond that, Chrome started to choke. Actual performance may vary widely. But keep in mind that file size may not be the only factor for choosing to stream.
Getting parsed results one row at a time is usually less convenient to work with; it's hard to see the big picture. (But the big picture might be really big.) As results stream in, you can tabulate stats or keep track of whatever you need to, but you wouldn't want to reassemble all the data...
The space required by the parsed results is often much larger than that of the original input file. The convenience of Javascript objects afforded by 64-bit pointers (to make each value quickly accessible) takes up a lot more space than globbing it together like a file does (at the cost of accessibility). In other words, the output may not fit in memory even if the input does.
Yes, though that's often not necessary. Input that comfortably fits in a textarea usually is small enough that it doesn't need to be streamed.
Papa uses HTML 5's FileReader API to load files, which uses a stream to read in the data. FileReader doesn't technically allow us to hook into the underlying stream (other than providing occasional progress reports), but it does let us load the file in chunks/blobs. Don't worry about that though, because if you want to stream, you'll still get results, row-by-row, into your step function.
Papa can convert numeric values to true numbers for you
By default, parsed values are returned as strings. Dynamic typing is a feature built into Papa that converts numeric values to a Number type. When dynamic typing is enabled, parsed values that resemble a number will be converted to one.
If you're performing mathematical operations on the data, then yes, it'll be very helpful. (You'd probably rather add two numbers than concatenate them, right?)
Performance, as usual. Each parsed value is matched against a regular expression to determine its numerality. You probably won't notice the degraded performance except with very large inputs. Even then, it may not be significantly slower in many cases. But if you absolutely need the best performance possible, turn off dynamic typing (and header row).
Papa can convert numbers like: 1, -2, 1.23, -4.56, .123, 1., 2., 1.23e4, 5.67E+7, -1.23e4, 5.67e-7, etc.
If, for some reason, the data is padded by whitespace, it will be ignored. Within the actual data, however, whitespace is significant. For example, floats represented using scientific notation should not have spaces around the "e" character.
Help make Papa better
Please, feel free to fork Papa on GitHub and submit a pull request. Remember, the Parser component is under test, so if you're making changes to the actual parsing mechanisms, be sure to add a test case to validate your change.
You can open an issue on GitHub to ask questions or start discussion, or you can hashtag #PapaParse on Twitter.