AI Ethics

Danger

This is a draft document that currently requires legal review. They should not (yet?) be used as an example of legal advice or policy from Neocrym.

Caution

This policy is currently being drafted at Neocrym. It may not describe current official policy before it is ratified.

This document describes the ethical principles that we follow at Neocrym when it comes to the training and inference of machine learning models.

The casual reader can develop a strong understanding of our ethical principles from merely reading the below Table of Contents. More serious readers may enjoy reading this document from start to end to learn the how and why behind our ethical practices.

Table of Contents

Terminology used in this document

Basic machine learning terminology

A dataset is a set of pairs—each one made up of a sample and a label.

A model accepts a sample and returns a prediction of what the sample’s corresponding label might be.

A model learns how to make good predictions with a training dataset using the following process:

  1. The model receives samples from the training set and returns predicted labels.

  2. The training dataset contains the actual labels for each sample, so the model compares the predicted labels versus the actual labels.

  3. The training dataset improves itself in the “direction” specified by the difference between the predicted and actual labels.

A model learns to make decisions using the training set. A separate dataset called the validation set is used to compare models as they improve. A test set is another dataset used to compare different models at the end of the training process.

If we have successfully trained a model, it is ready to be used for inference—the use of the model to generate predictions from data samples whose true labels are unknown to us.

Generative and discriminative machine learning models

This document distinguishes between two types of machine learning models:

  1. A discriminative model accepts a sample and predicts a label. For example, a discriminative model may accept a song as input and output a prediction of its genre.

  2. A generative model predicts a new sample from scratch. For example, a generative model may accept a random vector and output (parts of) an original song.

In other documents published by Neocrym, we often refer to discriminative models as detectors, classifiers, or regressors because the word “discriminate” has negative connotations among many non-technical people.

Discriminative and generative neural networks are often used to train each other, such as in generative adversarial networks (GANs).

Terminology relating to harmful biases

The word bias can mean many things. In this document we use “bias” to describe unfair (and often illegal) judgements against groups of humans.

The phrase protected attribute often refers to human characteristics that are protected by employment or civil rights laws. The actual laws defining protected classes can vary across jurisdictions. For the purpose of this AI Ethics Code, the following is an (incomplete!) list of protected attributes:

  • religion

  • creed

  • political beliefs

  • biological sex

  • gender, gender identity, and gender expression

  • genetic information, medical history, and disability status

  • family planning, pregnancy status, and family size

  • nationality, national origin, and citizenship status

  • race, color, and ethnicity

We use the phrase harmful bias to refer to any mechanism that could lead to judging a person—whether real or imaginary—based on any protected attribute. Harmful biases can originate from many places, including:

  • intentionally-biased human decisions. Many organizations, such as banks, schools, employers, and governments made biased decisions as institutional policy—such as redlining. If you wanted to train a model on historical mortgage data or real estate sales, you would have to find a way to control for lenders’ history of institutionalized racism.

  • unconsciously-biased human decisions. Nowadays, most people will not admit to consciously holding harmful biases, but academic research like Harvard’s Implicit Association Test (IAT) claim to show that nearly everybody acts on unconscious biases. Because of this, it may be impossible to guarantee that any individual human can make unbiased decisions.

  • hardware, software, or algorithm design. Some examples:

  • within the weights of a statistical model. A model may have tens of billions of parameters, which cannot easily be directly studied or analyzed. Sometimes the statistical properties of a model’s inputs and outputs can be used to suggest bias exists, but it is an unsolved problem to prove that a sophisticated model has no harmful biases at all.

Most of this AI Ethics Code is focused around minimizing harmful biases when training machine learning models.

General exceptions to this AI Ethics Code

Each of the principles in this document comes with its own exceptions. But all of the principles in this document also have these exceptions listed here:

These principles apply to business-critical models

The principles in this document only apply to machine learning models that are either:

  • used as part of externally-facing software products.

  • used in internal tools to help employees and contractors make business decisions.

  • used to generate data that we release to the public.

This AI Ethics Code does not apply to models trained or used for internal machine learning research. For example, we may want to intentionally train a “research model” with harmful biases, specifically to get a better understanding of how models learn biases. Such research models and their outputs should not be published or integrated into any of our internal or external software products.

These principles do not apply to our vendors

This AI Ethics Code also does not apply to machine learning models sold, leased, or licensed to us from vendors. Because their datasets, training methods, and policies are proprietary, we would not be able to enforce our AI Ethics Code against them.

Sometimes we eliminate harmful biases. Sometimes we want to study them

Sometimes we want to train a model with harmful biases specifically to predict the behavior of humans with harmful biases.

Ethical choices of training set data samples

We will not train models on data samples that could reveal a human’s protected attributes

The principle

We will not train models directly against datasets where the samples can reveal—for example—a person’s skin color. Examples of such datasets include:

  • music album art.

  • image data extracted from music videos.

  • text other than music lyrics or song metadata.

  • socioeconomic or biographical data, such as language, locale, time zone, employment history, zip code, or race.

While there may be ways to reduce harmful biases learned by models trained on these datasets, we cannot take the risk.

The exceptions

There are two main exceptions to this principle:

  • Models that predict similarity and/or identify copyright infringement. This principle forbids us from training a model that predicts “quality” or commercial success from album art. However, we still may want to train image-hashing models that can help identify whether one artist has plagiarized their album art from another artist.

  • Models trained on raw audio. In many cases, we may want to understand the biases in a raw audio dataset, especially if they directly correlate with what type of music listeners enjoy. While we don’t endorse such biases or believe they cause good music to be made, it is important that we understand them.

We will not train models on “private” data samples without permission

The principle

We will not train models on unreleased music without the artists’ consent. There are many ways we may end up in possession of unreleased music, such as:

  • demo tapes sent to us by artists.

  • music created but not yet released by artists already signed to Neocrym.

  • by operating online tools, web applications, or cloud storage services where artists can upload music that they are working on.

This rule exists so that we do not violate confidentiality with entities that send us private data.

The exceptions

The exceptions to this principle are all centered around detecting bad behavior. We reserve the right to use private data samples for:

  • detecting malicious inputs. For example, if you send an email to someone at Neocrym, we may mark it as spam and use it to train a spam classifiers. But we wouldn’t use the emails you send us to train a model that generates text—unless you consented to such a use case. Similarly, any credit card purchases you make with Neocrym are considered to be private, but data from suspicious or fraudulent transactions may be used to improve fraud detection models.

  • investigating potential data leaks. For example, if we have a private server containing unreleased music and private data, we reserve the right to calculate hashes and fingerprints from that private data to identify if our private server has been compromised and identify your/our private data being distributed on the Internet.

Ethical choices of training set data labels

We will not train models that predict labels containing protected attributes

The principle

For example, we will not train a model that tries to predict whether the audio of a given song contains a male or female singer. However, this principle does allow training models on the same audio, but with a different predicted label—such as genre or popularity instead of gender—unless forbidden by another principle in this document.

We will minimize our usage of individual human labels on creative datasets

The principle

At Neocrym, our goal is to find underground songs that have the potential to become tomorrow’s hit song. This sets up an obvious ethical problem: Every individual has an opinion of what a hit song sounds like, but that opinion may be tainted by harmful biases.

We want to avoid training a model based on the opinions of a small number of employees or data-labelers. As such, we only teach a model about popularity using features extracted from the aggregate behavior of tens of millions of music fans.

Of course, the collective behavior of millions of people still encodes a bias—potentially even a harmful bias. But in the interest of helping artists make money, we need to model and analyze such collective biases.

The exceptions

The important part of this principle is to not let a few humans define broad opinion-based predictions of creativity, quality, or potential.

Other than that, there are numerous exceptions where we do need to attach human labels to our data samples:

  • identifying fraud. Many artists increase their streaming site popularity metrics using fraudulent methods, such as using bots that pretend to be real listeners. When we train models on these popularity metrics, we risk training a model that is fooled by fraud. Because fraudsters continually innovate beyond the automated systems meant to detect fraud, our best hope for detecting fraud is by using human analysts to label tracks or artists as suspicious.

  • identifying offensive content. Because the concept of what is offensive is so specific to human cultures, it would be impossible for us to identify offensive content with only machine-generated models. There are many “dog whistle” phrases like the 14 Words that humans recognize as a racist slogan, but a computer may not.

  • recommendations. Recommender systems are trained from the aggregate behavior of many users, but each user’s recommendations are heavily influenced by their own behavior. This ethical principle should not prevent us from deploying recommender systems.

  • personalization. We might create a model capable of generating or modifying music, but individual artists using the model may want to customize it as per their own preferences. Artists can customize a generative model by labeling a few data samples and using them to fine-tune the model’s decoder.

  • psychoacoustic benchmarking. Lossy audio codecs like MP3 dramatically reduce the size of an audio file by discarding audio information that humans cannot hear. If we are developing software with psychoacoustic properties, we need to test it against human ears.

  • CAPTCHAs. Many products like reCAPTCHA distinguish humans from bots by having them solve visual or audio puzzles. The resulting human-generated answers are used to train models solving problems like optical character recognition and speech-to-text. This principle does not prevent us from generating, using, or solving CAPTCHAs or similar products.

  • near-deterministic or factual labels. There are many simple labels, such as whether a song contains any human voices at all, that are not provided from where we get our data. These labels are not opinions governing creativity or potential—instead they are deterministic facts that machines need human labels for.

Ethical design and usage of discriminative models

A discriminative model should not directly be used to decide whether to do business with a person

The principle

Many Ethical ML “cautionary tales” involve companies creating models to approve or deny applicants for jobs, loans, insurance policies, and so forth. These models learned from historical data that contained illegal biases and therefore made illegally biased data going forward.

We recognize the problematic history of using machine learning to judge people, and it makes us very careful about what we do. Some things we keep in mind when we do our work:

  • A model never has the final say. We use machine learning models to search and sort through tens of millions of songs that have been ignored by the rest of the music industry. Our use of machine learning makes us very good at finding underestimated talent, but no judgement from a model guarantees or prevents us from working with an artist.

  • We recognize that all models are biased, whether we know it or not. Philosophically speaking, there is no such thing as a human or machine learning model that is literally absent of any type of bias whatsoever. Whenever we see the output from a model, we have to consider whether the output is the result of an adversarial and harmful bias.

  • We use models to counteract our human biases. The last hundred-odd years of music industry history is filled with music executives demonstrating unethical biases involving race, gender, socioeconomic status, and so forth. While we recognize that our models probably have biases too, we build and maintain models so they can challenge and counteract the human biases we hold.

The exceptions

Unfortunately, there are many cases where a model may directly decide if we do business with a person, and we cannot change our usage of such a model without creating a major risk for our business.

We may use such models for:

  • defensive computer security tools. We may use many different models to protect our company, our employees, and our property. It is infeasible for a human to review decisions from such models. Some examples include:

    • models trained on emails to filter spam.

    • models trained on network traffic to filter denial-of-service attacks.

    • models trained on malware to filter out MP3 files and other email attachments containing computer viruses.

  • detecting copyright infringement. At Neocrym, we look for undiscovered songs from independent artists that have an auditory resemblance to popular hit songs. However, the easiest way to create a song resembling a hit is to plagiarize from a hit song. It could be incredibly harmful for our business if we end up financing, promoting, or releasing a work that infringes upon another work. As such, we may scan songs using copyright detection models without validating every single prediction with a human-led investigation.

We will not train models with the intent of identifying specific individuals

The principle

This principle’s intention is to protect the privacy of any humans whose biometric information was collected by accident. There are three rules:

  • Neocrym will not train models that can identify a person from a given image, audio, or video as input.

  • Neocrym will not train models that generate or collect faceprints of voiceprints for the purpose of identifying specific individuals. In this context, a faceprint or voiceprint is a piece of data used to identify a person—in contrast to an audio fingerprint that is used to identify a specific piece of audio.

  • Neocrym will not train stylometry models for the purpose of recognizing the author of a passage of text.

The exceptions

As previously noted, this principle does not forbid training models that can recognize other songs for the purpose of copyright detection. A satisfactory copyright detection system must know the names of the rightsholders of an infringed work. Therefore, a copyright detection system would have some limited ability to connect an arbitrary song to an individual’s identity. However, this is a necessary compromise to avoid infringing on copyrights.

Ethical design and usage of generative models

We will not use generative models to impersonate a deceased human, even with the permission of their estate

The principle

Many people and companies in the entertainment industry are excited about video and audio “deepfakes” for the purpose of simulating new content from deceased artists.

Even if a deceased person’s estate approves the creation of computer-generated depictions of them, Neocrym would not want to be involved in such a project. We recognize that the approval of a person’s estate is not the same as the approval of the person.

The exceptions

We could depict a deceased person as long as that person specifically gave us permission back when they were alive. Similarly, this principle does not require that we unpublish depictions of people who have deceased.

We will prevent our generative models from infringing upon its own training dataset

The principle

Many machine learning models sometimes output a few of their training samples verbatim. Because of this, there is always a risk that a generative model trained on copyrighted data could end up committing copyright infringement.

There are various ways to solve this problem. The exact solution is left up to the reader.

We will not use generative models to generate pornographic images or videos

The principle

The principles of this document already allow the use of generative models to create depictions of living people that consent to being depicted.

However, those depictions must be non-pornographic. This principle forbids creating and/or using generative models to generate any human depiction that is pornographic in nature.

There are numerous unsolvable unethical and legal issues with using generative models specifically to create or modify pornography:

  • Issues involving child pornography. Pornographic depictions of minors—even fictional depictions—are illegal in much of the industrialized world. There is no possible way to prevent a porn-generating model from generating images that are either considered by law enforcement to be fictional child pornography or mistaken as real child pornography.

  • Plausible deniability for real crimes. The existence of a model capable of generating fictional child pornography (or fictional depictions of crimes like sexual assault) can be used to cast doubt on genuine evidence in real criminal cases. Eventually, generative models will become good enough to replicate the exact statistical characteristics of real images, at which point no human or computer will be able to visually inspect the difference between fake photographic evidence and real photographic evidence.

  • Potential for blackmail. “Revenge porn” describes pornographic content that is distributed without the consent of everybody depicted. Adversaries often publish—or threaten to publish—revenge porn specifically to blackmail a person depicted in the porn. If a porn-generating model can produce sufficiently-realistic images of a victim, then the model could easily become a tool for humiliation or blackmail.

The exceptions

This AI Ethics Code does not apply to humans creating pornography without the use of generative machine learning models. Neocrym may have other restrictions on employees’ production or consumption of pornography, but such restrictions are not in this document.

Additionally, this AI Ethics Code does not forbid the development and use of generative models that are not specifically related to pornography, even if they can be used on pornography. For example, photo editing software can use models to make human faces more attractive—such as whitening teeth or removing blemishes. Such a model functions identically whether the photo depicts a clothed person or a nude person.

We will not intentionally train generative models to stereotype a protected attribute

The principle

We will not train models that behave in ways that are specific to one subgroup of people. For example, if we build a model that sings or raps a given set of lyrics, we will never have it use the n-word. We would want to restrict such a model to only the subset of English that anybody can speak without causing offense. This principle generalizes to many other stereotypes or cultural sensitivities.

The exceptions

This principle is intended for when Neocrym or our artists create a new character from scratch. It does not apply when artists use computer programs to change their appearance or voice, but the output image or audio is still credited and recognized as the artist.