Thanks to all <ahref="https://github.com/mholt/jquery.parse/graphs/contributors">contributors</a>!
<!--<div class="footer-top">
<h3>Make Your Papa Proud</h3>
<h4><ahref="https://github.com/mholt/PapaParse">Star</a> and <ahref="https://github.com/mholt/PapaParse/blob/gh-pages/resources/js/lovers.js">shout</a> if you love #PapaParse</h4>
</div>-->
<divclass="footer-main">
<divclass="grid-container">
<divclass="grid-40 text-center">
<divclass="logo">P</div>
<br><br>
Papa Parse by <ahref="https://twitter.com/mholt6">Matt Holt</a>
There's a thousand CSV libraries for Javascript. Papa is different. It's written with correctness and performance in mind. Papa is the first (and so far only) multi-threaded CSV parser that runs on web pages. It can parse files gigabytes in size without crashing the browser. It correctly handles malformed or edge-case CSV text. It can parse files on the local file system or download them over the Internet. Papa is boss.
</p>
@ -68,38 +72,43 @@
@@ -68,38 +72,43 @@
As of version 4, Papa Parse is the <ahref="http://jsperf.com/javascript-csv-parsers/4">fastest CSV parser</a> for the browser, whereas it used to be the slowest.
</p>
<h4id="nodejs">Can I use Papa Parse server-side with Node.js?</h4>
<h6id="nodejs">Can I use Papa Parse server-side with Node.js?</h6>
<p>
There's a fork of Papa called <ahref="https://github.com/Rich-Harris/BabyParse"target="_blank">Baby Parse</a> which is <ahref="https://www.npmjs.org/package/babyparse">published on npm</a>. Some features are unavailable (like worker threads and file opening/downloading), but the core parser is functional.
</p>
<h4id="dependencies">Does Papa Parse have any dependencies?</h4>
<h6id="dependencies">Does Papa Parse have any dependencies?</h6>
<p>
No. Papa Parse has no dependencies. If jQuery is present, however, it plugs in to make it easier to select files from the DOM.
</p>
<h4id="browsers">Which browsers is it compatible with?</h4>
<h6id="browsers">Which browsers is it compatible with?</h6>
<p>
All modern, competent browsers should support all of the features. However, as usual, use IE at your own risk. It looks like IE 10+ and Safari 6+ should support all the features. Firefox and Chrome should work with all features back to versions 3 and 4. Opera 11 and up should be fine. If you really need to use Papa in old IE or Opera, then keep the fancy features off and you may be in luck.
</p>
<h4id="async">Can Papa Parse be loaded asynchronously (after the page loads)?</h4>
<h6id="async">Can Papa Parse be loaded asynchronously (after the page loads)?</h6>
<p>
Not without some minor modifications. When Papa Parse loads, it has to obtain the script's relative path in order to facilitate worker threads. If you don't need this feature, Papa can be loaded asynchronously by removing or commenting a couple lines. If you do need workers, you can just hardcode the script's path. See <ahref="https://github.com/mholt/PapaParse/issues/69#issuecomment-49886575">issue 69</a> and <ahref="https://github.com/mholt/PapaParse/issues/87">issue 87</a> for more information.
Yes. But if you want to use Web Workers, you'll need to specify the relative path to Papa Parse. To do this, set <ahref="/docs#readonly">Papa.SCRIPT_PATH</a> to the relative path of the Papa Parse file. In synchronous loading, this is automatically detected.
</p>
<h6id="combine">Can I build Papa Parse into the same file with other JS dependencies?</h6>
<p>
Yes, but then don't use the Web Worker feature unless your other dependencies are battle-hardened for worker threads.
</p>
<h4id="open-source">Is it open source? (Can I contribute something?)</h4>
<h6id="open-source">Is it open source? (Can I contribute something?)</h6>
<p>
Yes, please! I don't want to do this all by myself. Head over to the <ahref="https://github.com/mholt/PapaParse">GitHub project page</a> and hack away. If you're making a significant change, open an issue first so we can talk about it.
</p>
<h4id="fast-mode">Why wouldn't I always enable fast mode?</h4>
<h6id="fast-mode">Why wouldn't I always enable fast mode?</h6>
<p>
Fast mode makes <ahref="http://jsperf.com/javascript-csv-parsers/3">Papa Parse screaming fast</a>, but you wouldn't want to use it if there are (or may be) quoted fields in your input. Fast mode is fast because it makes one major assumption: no quoted fields. But if you know that your input has no quotes, turn that sucker on. With fast mode on, 1 GB files can be parsed in about 20 seconds.
</p>
@ -112,28 +121,26 @@
@@ -112,28 +121,26 @@
<br><br>
<h4id="streaming">Streaming</h4>
<hr>
<h3id="streaming">Streaming</h3>
<h4>Can Papa load and parse huge files?</h4>
<h6>Can Papa load and parse huge files?</h6>
<p>
Yes. Parsing huge text files is facilitated by <i>streaming</i>, where the file is loaded a little bit at a time, parsed, and the results are sent to your <ahref="/docs#config">step</a> callback function, row-by-row.
</p>
<h4>How do I stream my input?</h4>
<h6>How do I stream my input?</h6>
<p>
Just specify a <ahref="/docs#config">step</a> callback function. Results will <i>not</i> be available after parsing is finished, however. You have to inspect the results one row at a time.
</p>
<h4>What is a stream and when should I stream files?</h4>
<h6>What is a stream and when should I stream files?</h6>
<p>
A stream is a unique data structure which, given infinite time, gives you infinite space.
So if you're short on memory (as browsers often are), use a stream.
</p>
<h4>Wait, does that mean streaming takes more time?</h4>
<h6>Wait, does that mean streaming takes more time?</h6>
<p>
Yes and no. Typically, when we gain speed, we pay with space. The opposite is true, too. Streaming uses significantly less memory with large inputs, but since the reading happens in chunks and results are processed at each row instead of at the very end, yes, it can be slower.
</p>
@ -144,26 +151,26 @@
@@ -144,26 +151,26 @@
So unless your clients have <ahref="http://google.com/fiber">a fiber line</a> and you have a scalable cloud application, local parsing by streaming is nearly guaranteed to be faster.
</p>
<h4>How do I get all the results together after streaming?</h4>
<h6>How do I get all the results together after streaming?</h6>
<p>
You don't. Unless you assemble it manually. And really, don't do that... it defeats the purpose of using a stream. Just take the parts you need as they come through.
</p>
<h4>Does Papa use a true stream?</h4>
<h6>Does Papa use a true stream?</h6>
<p>
Papa uses HTML5's FileReader API which uses a stream. FileReader doesn't technically allow us to hook into the underlying stream, but it does let us load the file in pieces. But fortunately you don't have to worry about that; it's all taken care of for you. Just take the results one row at a time.
</p>
<h4>Can I stream files over a network or the Internet?</h4>
<h6>Can I stream files over a network or the Internet?</h6>
<p>
Yes, Papa Parse supports this. It will download a file in pieces using HTTP's standard Range header, then pass the parsed results to your step function just like a local file. However, these requests may not work cross-origin (different domain/hostname), depending on the server's configuration.
</p>
<p>
Streaming remote files also requires the Content-Range header in the server's response. Most production-ready servers support this header, but Python's SimpleHTTPServer does not. If you need a quick and easy server, <ahref="https://github.com/rif/spark">spark</a> will do the trick.<code>./spark . -port=4100</code>
Streaming remote files also requires the Content-Range header in the server's response. Most production-ready servers support this header, but Python's SimpleHTTPServer does not. If you need a quick and easy server, <ahref="https://github.com/rif/spark">spark</a> will do the trick:<code>$ ./spark</code>
</p>
<h4>Can I pause and resume parsing?</h4>
<h6>Can I pause and resume parsing?</h6>
<p>
Yes, as long as you are streaming and not using a worker. Your <ahref="/docs#step">step callback</a> is passed a ParserHandle which has pause, resume, and abort functions.
</p>
@ -173,23 +180,20 @@
@@ -173,23 +180,20 @@
<br><br>
<h4id="workers">Multi-Threading (Workers)</h4>
<hr>
<h3id="workers">Multi-threading (Workers)</h3>
<h4>What is a web worker? Why use one?</h4>
<h6>What is a web worker? Why use one?</h6>
<p>
<ahref="https://developer.mozilla.org/en-US/docs/Web/API/Worker">HTML5 Web Workers</a> facilitate basic multi-threading in the browser. This means that a web page can spawn a new thread in the operating system that runs Javascript code. This is highly beneficial for long-running scripts that would otherwise lock up the web page.
</p>
<h4>How do I use a worker?</h4>
<h6>How do I use a worker?</h6>
<p>
Just specify <code>worker: true</code> in your <ahref="/docs#config">config</a>. You'll also need to make a <code>complete</code> callback (unless you're streaming) so that you can get the results, because using a worker makes the parse function asynchronous.
</p>
<h4>When should I use a worker?</h4>
<h6>When should I use a worker?</h6>
<p>
That's up to you. The most typical reason to use a web worker is if your web page becomes unresponsive during parsing. In other words, if it freezes and you can't click things or the scrolling becomes choppy. If that happens, some browsers (like Firefox) will warn the user that a script has become unresponsive or is taking a long time (even if it's working properly). If this happens to you or some of your users, consider using a web worker, at least for the large inputs.
</p>
@ -197,7 +201,7 @@
@@ -197,7 +201,7 @@
However, read the next answer for more info. Using workers has performance implications (both good and bad).
</p>
<h4>What are the performance implications of using a worker thread?</h4>
<h6>What are the performance implications of using a worker thread?</h6>
<p>
Using a worker will be a little slower. In Javascript, threads don't share memory. That's really annoying because sharing memory is the primary reason for multi-threading. As such, all parse results in a worker thread need to be <i>copied</i> to the main thread. And if you're parsing a string in a worker thread, that string also needs to be copied into the worker in the first place. (Files will be opened or downloaded by the worker itself, so the input doesn't need to be copied from the main thread in those cases.)
</p>
@ -209,26 +213,57 @@
@@ -209,26 +213,57 @@
</p>
<h4>Can I stream and use a worker at the same time?</h4>
<h6>Can I stream and use a worker at the same time?</h6>
<p>
Yup. If the input is too large to fit in memory (or large enough to crash the browser), streaming is <i>always</i> the answer, even in a worker thread. Workers keep the page reactive. Streaming makes it able to fit in memory. Use both if you need to.
Thanks to all <ahref="https://github.com/mholt/jquery.parse/graphs/contributors">contributors</a>!
<!--<div class="footer-top">
<h3>Make Your Papa Proud</h3>
<h4><ahref="https://github.com/mholt/PapaParse">Star</a> and <ahref="https://github.com/mholt/PapaParse/blob/gh-pages/resources/js/lovers.js">shout</a> if you love #PapaParse</h4>
</div>-->
<divclass="footer-main">
<divclass="grid-container">
<divclass="grid-40 text-center">
<divclass="logo">P</div>
<br><br>
Papa Parse by <ahref="https://twitter.com/mholt6">Matt Holt</a>
That's okay. Papa will scan the first few rows of input to find the right delimiter for you. You can also set the delimiting character manually. Either way, the delimiter used is returned with every result set.
<divclass="grid-33">
<pclass="lover">
<ahref="https://smartystreets.com">SmartyStreets</a> verifies addresses, many of which are in CSV files. Papa Parse can process huge files in the browser. <i>"We rapidly built an awesome client-side file processor with Papa Parse."</i>
<ahref="http://jannah.github.io/MetaReader/">MetaReader</a> helps you see your data from a meta level before you start detailed analysis. <i>"Papa Parse made it very easy to load and ready user CSV files in the browser on the client side."</i>
</p>
</div>
<divclass="grid-33">
<pclass="lover">
<ahref="http://jannah.github.io/MetaReader/">EpiML</a> is an agent-based mathematical model for the web, still in its early stages of development. <i>"Papa makes it so easy to use CSV, which is good for scientists."</i>
</p>
</div>
<divclass="clear"></div>
<hr>
<divclass="grid-100 text-center">
<br>
<b><ahref="https://github.com/mholt/PapaParse/blob/gh-pages/resources/js/lovers.js"class="add-lover-link subheader"><iclass="fa fa-plus-square"></i> Add your link (it's free)</a></b>
</div>
</div>
</section>
<sectionid="parse">
<divclass="grid-container narrow-grid">
<divclass="grid-100">
<h4>CSV Parsing</h4>
<h5>"Isn't parsing CSV just <code>String.split(',')</code>?"</h5>
<p>Heavens, no. Papa does it right. Just pass in the CSV string with an optional <ahref="/docs#config">configuration</a>.</p>
<divclass="grid-40 suffix-5">
<divclass="note"id="local-files">Parse local files</div>
<h4>"Great, but I have a <i>file</i> to parse."</h4>
<p>
Just give Papa a <ahref="https://developer.mozilla.org/en-US/docs/Web/API/File">File</a> instead of a string. And a callback.
<h5>"Great, but I have a <i>file</i> to parse."</h5>
<p>Then give Papa a <ahref="https://developer.mozilla.org/en-US/docs/Web/API/File">File</a> instead of a string. Since file parsing is asynchronous, don't forget a callback.</p>
That's what streaming is for. Specify a <code>step</code> callback to receive the results row-by-row. This way, you won't load the whole file into memory and crash the browser.
<p>That's what streaming is for. Specify a step callback to receive the results row-by-row. This way, you won't load the whole file into memory and crash the browser.</p>
Oh. Yeah, that happens when a long-running script is executing in the same thread. Use a <ahref="https://developer.mozilla.org/en-US/docs/Web/API/Worker">Worker</a> thread by specifying <code>worker: true</code>. It may take slightly longer, but your page will stay reactive.
</p>
</div>
<divclass="grid-55">
<codeclass="block">Papa.parse(bigFile, {
<p>That happens when a long-running script is executing in the same thread as the page. Use a <ahref="https://developer.mozilla.org/en-US/docs/Web/API/Worker">Worker</a> thread by specifying <code>worker: true</code>. It may take slightly longer, but your page will stay reactive.</p>
<h4>"Hey, these numbers are all parsed as strings."</h4>
<p>
Everything is parsed as strings. If you need the convenience, you can have numeric and boolean data automatically converted to number and boolean types.
</p>
</div>
<divclass="grid-55">
<codeclass="block"><spanclass="comment">// All parsed data is normally returned as a string.
// Dynamic typing converts numbers to numbers
// and booleans to booleans.</span>
<sectionid="type-conversion">
<divclass="grid-container narrow-grid">
<divclass="grid-100">
<h4>Type Conversion</h4>
<h5>"Hey, these numbers are parsed as strings."</h5>
<p><i>Everything</i> is parsed as strings. If you want numbers and booleans, you can enable dynamic typing to do the conversion for you.</p>
<pre><codeclass="language-javascript">// Converts numeric/boolean data
var results = Papa.parse(csv, {
dynamicTyping: true
});</code>
});</code></pre>
</div>
<divclass="clear"></div>
<hr>
</div>
</section>
<divclass="grid-40 suffix-5">
<divclass="note"id="comments">Comments</div>
<h4>"I forgot to mention: my CSV files have comments in them."</h4>
<p>
Okay, first off: that's really weird. But you can skip those lines... just specify the comment character.
</p>
</div>
<divclass="grid-55">
<codeclass="block"><spanclass="comment">// Mostly found in academia, some CSV files
// may have commented lines in them</span>
<sectionid="comments">
<divclass="grid-container narrow-grid">
<divclass="grid-100">
<h4>Comments</h4>
<h5>"I forgot to mention: my CSV files have comments in them."</h5>
<p>Okay, first off: that's really weird. But fortunately, you can skip those lines... just specify the comment string.</p>
<pre><codeclass="language-javascript">// Mostly found in academia, some CSV files
// may have commented lines in them
var results = Papa.parse(csv, {
comments: "#"
});</code>
});</code></pre>
</div>
<divclass="clear"></div>
<hr>
</div>
</section>
<divclass="grid-40 suffix-5">
<divclass="note"id="errors">Error handling</div>
<h4>"I'm getting tired, are we done—aw, shoot. Errors."</h4>
<p>
Yeah, almost done. Fortunately, Papa handles errors pretty well. The <ahref="http://tools.ietf.org/html/rfc4180">CSV standard</a> is somewhat <strike>loose</strike> ambiguous, so Papa tries to consider the edge cases. For example, mismatched fields aren't always the end of the world.
</p>
</div>
<divclass="grid-55">
<codeclass="block"><spanclass="comment">// Example error:</span>
<sectionid="errors">
<divclass="grid-container narrow-grid">
<divclass="grid-100">
<h4>Error Handling</h4>
<h5>"Aw, shoot. Errors."</h5>
<p>Papa handles errors pretty well. The <ahref="http://tools.ietf.org/html/rfc4180">CSV standard</a> is somewhat <strike>loose</strike> ambiguous, so Papa is designed for edge cases. For example, mismatched fields won't break parsing.</p>
<pre><codeclass="language-javascript">// Example error:
{
type: "FieldMismatch",
code: "TooManyFields",
message: "Expected 3 fields, but parsed 4",
row: 1
}</code>
}</code></pre>
</div>
<divclass="clear"></div>
<hr>
</div>
</section>
<divclass="grid-40 suffix-5">
<divclass="note"id="jquery">jQuery Plugin</div>
<h4>"Can I use Papa with jQuery?"</h4>
<p>
Sure, but it's not required. You can use jQuery to select file input elements and then parse their files. Papa exposes its file parsing API as a jQuery plugin only when jQuery is defined. Papa Parse has <b>no dependencies</b>.
</p>
</div>
<divclass="grid-55">
<codeclass="block">$("input[type=file]").parse({
<sectionid="jquery">
<divclass="grid-container narrow-grid">
<divclass="grid-100">
<h4>jQuery Plugin</h4>
<h5>"Can I use Papa with jQuery?"</h5>
<p>Sure, but it's not required. You can use jQuery to select file input elements and then parse their files. Papa exposes its file parsing API as a jQuery plugin only when jQuery is defined. Papa Parse has <b>no dependencies</b>.</p>
<iclass="fa fa-download"></i> Get Papa Parse on GitHub
<iclass="fa fa-github"></i> GitHub
</a>
<ahref="/demo"class="button red">
<iclass="fa fa-bolt"></i> Try the demo
<iclass="fa fa-magic"></i> Demo
</a>
<ahref="/docs"class="button gray">
<iclass="fa fa-book"></i> Documentation
</a>
</div>
<divclass="clear"></div>
</section>
</div>
</main>
</main>
<footer>
<footer>
<!--<div class="footer-top">
<h3>Make Your Papa Proud</h3>
<h4><ahref="https://github.com/mholt/PapaParse">Star</a> and <ahref="https://github.com/mholt/PapaParse/blob/gh-pages/resources/js/lovers.js">shout</a> if you love #PapaParse</h4>
description:"verifies addresses, many of which are submitted in CSV files. Papa Parse can process files with over a million addresses in the browser.",
quote:"Because of Papa Parse, we rapidly built an awesome client-side list processing service."
},
{
link:"http://jannah.github.io/MetaReader",
name:"MetaReader",
description:"helps you see your data from a meta perspective before you start detailed analysis.",
quote:"Papa Parse made it very easy to load and ready user CSV files in the browser on the client side."
},
{
link:"https://github.com/JamesJansson/EpiML",
name:"EpiML",
description:"is an agent-based mathematical model for the web, still in its early stages of development.",
quote:"Papa makes it so easy to use CSV, which is good for scientists."
},
{
link:"https://wikipedia.org",
name:"Wikipedia",
description:"uses Papa Parse in VisualEditor to help article editors effortlessly build data tables from text files."