Open source tools

DataMade maintains several core open source libraries that power many of our products and applications.

Parsing messy data

  • πŸ“ usaddress
    A Python library for parsing unstructured United States address strings into components like AddressNumber, StreetName, and ZipCode.

  • πŸ‘€ probablepeople
    A Python library for parsing unstructured western name strings into components like GivenName, MiddleInitial, Surname, or Corporation.

  • πŸ”§ parserator
    Need to parse some messy text? We created a toolkit for making domain-specific probabilistic parsers. To create a parser, all you need is some training data to teach your parser about its domain. We used this framework to build usaddress and probablepeople.

Bonus! You can also parse names and addresses online without using any code. Check out parserator.datamade.us!

Guides

  • βœ… Site launch checklist
    Did you forget to set up Google Analytics again? What about load testing? We created a checklist of final tasks to do before launching a public, open source website or tool.

  • πŸ“˜ Data making guidelines
    DataMade’s guide to creating non-destructive, repeatable scripts for extracting, transforming, and loading (ETL) data.

  • πŸ“˜ Testing guidelines
    The principles, procedure, and patterns that guide writing automated tests at DataMade.

  • πŸ“˜ How to
    Wide-ranging documentation of our work practices and tech stack.