Towards A Less Harmful Data Practice: The DataMade Way

Data can be a powerful tool for illuminating our world, but it can also reproduce bias, perpetuate harm, and subvert truth. A significant amount of our work at DataMade involves making data available, accessible, and intelligible; we therefore must take steps to minimize the biases, harm, and fallacies that can come from utilizing data irresponsibly.

This living document summarizes DataMade’s approach to a less harmful data practice with regards to the kinds of sensitive data we work with most often. This list includes crime report data; race, gender, and ethnicity data; individual-level public record data; and user data collection practices.

These recommendations guide our work with our current and future partners. Where possible, we are taking steps to revise or add to past work that does not align with these principles.

Crime report data

A myriad of state, county, and city level crime data reports exist alongside the national Uniform Crime Report (UCR) and National Crime Victimization Survey datasets. Our knowledge of the role of crime report data has grown, and we understand that crime report data can be used irresponsibly to justify the further policing and surveillance of communities of Black, Indigenous, and People of Color.

With this in mind, we commit to the following with regards to crime report data:

We refuse to redistribute raw crime report data. We commit to looking for alternative datasets to meet our partners’ needs.
We refuse to use crime data as an indicator of the criminality of the communities the data are about, particularly in public-facing projects. We commit to using the minimum amount of crime report data necessary for internal reports or projects, and to contextualizing it to remind users of these data’s limitations.

Journalists like Pascal Sabino at Block Club Chicago use policing data appropriately to surface how Chicago police unjustly search Black Chicagoans under the pretense of traffic stops.

Race, gender, and ethnicity data

Race, gender, and ethnicity data are both common and useful. When these labels are self-reported, they help us to understand people’s relationship to their society. However, when these classifications are imposed, they tell us more about the harmful effects of the systems and algorithms that use these imposed labels than they necessarily do about the people whom the data is about.

With this in mind, we commit to the following with regards to race, gender, and ethnicity data:

We refuse to rely solely on US Census-designated categorizations while designing surveys or questionnaires. We commit to favoring self-defined and self-reported identifications in public-facing data.
We refuse to leave imposed race, gender, and ethnicity data uncontextualized, or to use it to tell data stories about people or communities. We commit to using these data to tell stories about systems where these categorizations create disparate outcomes.

Journalism coming out of The Circuit project, which DataMade is a partner in, compellingly reports on defendants’ disparate treatment based on the demographic classifications imposed on them by Cook County Circuit Courts. The US Census provides an imperfect initial model for balancing self-reported demographic data while simultaneously using defined categories for data analysis.

Individual-level public records

Individual-level public record information about public servants and occasionally high-profile private citizens can be used to make the actions of people with significant power transparent and to hold these people accountable for their actions.

While accountability is important, it is equally important to balance individuals’ need for privacy. To that end, we commit to the following:

We refuse to publish personally identifying information (PII) unless there is a justifiable public interest in redistributing it. We commit to removing PII wherever we can, including by obscuring and/or suppressing records in instances where a combination of characteristics results in an extremely small number of records. We will make decisions on this by asking ourselves the following questions:
- Is PII necessary to support this project’s objective? If so, what is the minimum number of fields we need in order to achieve this objective?
- Would publishing PII cause more harm to people who are already disadvantaged?
- Can the goals of this project or story be met by focusing only on a few specific individuals?

Though US courts do not currently recognize this right to privacy, we do look to the European General Data Protection Regulation (“the Right to Be Forgotten”) for guidance on when to avoid using or remove PII from our sites.

User data collection

User data collection can allow a website to enhance user experience. In the US currently, though, it is commonplace to collect and sell user data without any transparency. This prevents users from being able to consent to having their data collected and/or sold, and precludes the possibility of collecting user data ethically.

A number of European companies employ clear, concise, and transparent ways of requesting user data; this blog post lists a few. Google’s data practices guide speaks to Google’s data practices in a concise and intelligible way.

In order to provide users with a positive user experience on our sites while allowing them control over their own data, we commit to the following:

We refuse to collect data indiscriminately and without our users’ consent. We commit to being intentional about the data we collect about our sites’ users by limiting ourselves to collecting data necessary for ease of use and by allowing users to opt out of data collection.
We refuse to sell our users’ data. We commit to being transparent about the trade-offs between ease of use and cross-site tracking to our sites’ users by making information about our integrations’ data collection practices easy to find and understand.

Our ultimate commitment

As conversations around using, collecting, and distributing data continue and evolve, we will revisit these commitments to ensure we handle the data we work with responsibly.