Here are some definitions for commonly used terms/technologies in machine learning. I’ll try to update and improve this page with new entries over time. Apache Spark — A library for distributed computing for large-scale data manipulation and machine learning. Artificial Neural Networks — Machine learning algorithms inspired by biological neural networks. Back-propagation — An algorithm for training neural networks in which errors are propagated backwards through the network. Big Data — Data which is difficult to work upon using a single machine, typically in the order of terabytes or more. It can also mean machine learning and other types of analyses on data of this scale. Classification — A machine learning problem involving the prediction of two or more classes from an observation. Clustering — The process of grouping observations that are similar according to a particular criterion. Cython — A Python-like language uses to give C-like performance to Python. Cross Validation — ...