DataMade maintains several core open source libraries that power many of our products and applications.
De-duplication and data linking
dedupe
A python library that quickly performs de-duplication, entity resolution and linking on large, structured data.
csvdedupe
Command line tools for using the dedupe python library for deduplicating CSV files.
The dedupe library also powers dedupe.io, our product that provides a web interface for quickly and automatically finding similar rows in a spreadsheet or database, using machine learning methods.
Parsing messy data
usaddress
A python library for parsing unstructured United States address strings into components like AddressNumber, StreetName and ZipCode.
probablepeople
A python library for parsing unstructured western name strings into components like GivenName, MiddleInitial, Surname, or Corporation.
parserator
Need to parse some messy text? We created a toolkit for making domain-specific probabilistic parsers. To create a parser, all you need is some training data to teach your parser about its domain. We used this framework to build usaddress and parserator.
Bonus! You can also parse names and addresses online without using any code. Check out parserator.datamade.us
Guides
Site launch checklist
Did you forget to setup Google Analytics again? What about load testing? We created a checklist of final tasks to do before launching a public, open source website or tool.
Data making guide
DataMade’s guide to creating non-destructive, repeatable scripts for extracting, transforming and loading (ETL) data.
Data analysis guide
The principles that inform DataMade's approach to data analysis, as well as the tools and organizational practices that make it possible.
Code testing guide
the principles, procedure and patterns that guide test writing at DataMade.
Data visualization
Searchable map template with Google Fusion Tables
An HTML and javascript template, powered by Fusion Tables, that helps you turn a spreadsheet into a fully customizable searchable map.
CSV to HTML Table
Display any CSV file as a searchable, filterable, pretty HTML table. Done in 100% JavaScript.