|Machine learning (ML)||A mathematical model that is able to improve its performance on a task by exposure to data.|
|Deep neural networks||
ML models with one or more latent (hidden) layers allowing for the generation of non-linear output and complex interactions between layers. Deep neural networks power “deep learning,” which enables tasks, such as image recognition, natural language processing (NLP), and complex predictions.|
Subtypes of deep neural networks are classified based on the relationship between hidden layers and include convolutional, recurrent, gated graph, and generative adversarial neural networks.
|Training, test, and validation sets||
Training set: Dataset from which the model learns the optimal parameters to accomplish the task.|
Test set: Dataset on which the performance of a trained, parameterized model is evaluated.
Validation set: Dataset that is used to evaluate the model’s performance during training. Differs from a test set in that it is used during training to establish hyperparameters of the model.
|Supervised learning||A subset of ML in which the outcomes to be learned by the model (“labels”) are provided in the training set. For example, teaching a model to identify breast cancer patients for study inclusion would require training the model on a training set containing labeled patients with and without breast cancer prior to validating that model on a new set of unlabeled patients with and without breast cancer.|
|Unsupervised learning||A subset of ML in which there are no pre-specified labels for the model to learn to predict; instead, models identify hidden patterns in the data.|
|Natural language processing (NLP)||A form of artificial intelligence that enables the understanding of language. Much modern NLP uses deep neural networks in which words and their relationships to each other are encoded in a set of highly dimensional vectors, enabling the model to parse the meaning of new pieces of text it is presented with.|