Here are some definitions for commonly used terms/technologies in machine learning. I’ll try to update and improve this page with new entries over time.
Apache Spark — A library for distributed computing for large-scale data manipulation and machine learning.
Artificial Neural Networks — Machine learning algorithms inspired by biological neural networks.
Back-propagation — An algorithm for training neural networks in which errors are propagated backwards through the network.
Big Data — Data which is difficult to work upon using a single machine, typically in the order of terabytes or more. It can also mean machine learning and other types of analyses on data of this scale.
Classification — A machine learning problem involving the prediction of two or more classes from an observation.
Clustering — The process of grouping observations that are similar according to a particular criterion.
Cython — A Python-like language uses to give C-like performance to Python.
Cross Validation — A method for evaluating the performance of a learning algorithm. Particularly useful for small datasets.
Data Science — A field covering machine learning, data cleaning and preparation, and data analysis techniques such as visualisation.
Deep Learning — A class of machine learning algorithms which use artificial neural networks with many layers.
Face Detection — The problem of determining whether a face contains an image.
Face Recognition — The problem of identifying a face in an image.
Feature Extraction — The process of finding relevant features in a set of data.
Gradient Descent — An optimization method which can find a minimum of a function by following the gradient.
Hyper-parameter — A user-defined parameter in a machine learning algorithm.
k-nearest Neighbors — An algorithm which makes a prediction based on the k-nearest observations.
Kaggle — A data science competition.
Linear Algebra — A field of mathematics concerning linear mappings between vector spaces. Essential to machine learning.
Machine Learning — Algorithms which improve their performance with experience. A computational branch of statistics.
Model Selection — The process of choosing hyper-parameters for a machine learning algorithm.
Natural Language Processing — A field of computer science concerned with the analysis of natural (human) languages.
Numpy — A Python array/matrix library.
OpenCV — A computer vision library in C++ with bindings for Python.
Optimization — The branch of mathematics concerned with finding the minimum or maximum of a function. Essential to many machine learning algorithms.
Pandas — The Python Data Analysis library.
Principal Components Analysis — A classic feature extraction algorithm based on prediction into a subspace.
Python — A high-level programming language, popular for machine learning applications.
Regression — A machine learning problem involving the prediction of a real-valued scalar or vector.
Singular Value Decomposition — A well-known matrix factorisation method.
Scikit-learn — A library for Machine Learning in Python.
Scipy — A Python library for scientific computing.
Statistics — A branch of mathematics concerned with finding useful patterns in data.
Stochastic Gradient Descent — A fast numerical optimisation algorithm commonly used in deep learning algorithms.
Tensor — A multidimensional array.
Tensorflow — A deep learning library developed by Google.
Test Set — A set of examples/observations used for evaluating the prediction performance of an algorithm.
Theano — A tensor manipulation library for Python which can run code on the GPU.
Training Set — A set of examples/observations used for training a machine learning algorithm.
Validation Set — A set of examples/observations used for tuning the parameters of an algorithm whilst training.
Comments
Post a Comment