Skip to main content

Chapter -1 The Machine Learning Dictionary


Here are some definitions for commonly used terms/technologies in machine learning. I’ll try to update and improve this page with new entries over time.

Apache Spark — A library for distributed computing for large-scale data manipulation and machine learning.

Artificial Neural Networks — Machine learning algorithms inspired by biological neural networks.

Back-propagation — An algorithm for training neural networks in which errors are propagated backwards through the network.

Big Data — Data which is difficult to work upon using a single machine, typically in the order of terabytes or more. It can also mean machine learning and other types of analyses on data of this scale.

Classification — A machine learning problem involving the prediction of two or more classes from an observation.

Clustering — The process of grouping observations that are similar according to a particular criterion.

Cython — A Python-like language uses to give C-like performance to Python.

Cross Validation — A method for evaluating the performance of a learning algorithm. Particularly useful for small datasets.

Data Science — A field covering machine learning, data cleaning and preparation, and data analysis techniques such as visualisation.

Deep Learning — A class of machine learning algorithms which use artificial neural networks with many layers.

Face Detection — The problem of determining whether a face contains an image.

Face Recognition — The problem of identifying a face in an image.

Feature Extraction — The process of finding relevant features in a set of data.

Gradient Descent — An optimization method which can find a minimum of a function by following the gradient.

Hyper-parameter — A user-defined parameter in a machine learning algorithm.

k-nearest Neighbors — An algorithm which makes a prediction based on the k-nearest observations.

Kaggle — A data science competition.

Linear Algebra — A field of mathematics concerning linear mappings between vector spaces. Essential to machine learning.

Machine Learning — Algorithms which improve their performance with experience. A computational branch of statistics.

Model Selection — The process of choosing hyper-parameters for a machine learning algorithm.

Natural Language Processing — A field of computer science concerned with the analysis of natural (human) languages.

Numpy — A Python array/matrix library.

OpenCV — A computer vision library in C++ with bindings for Python.

Optimization — The branch of mathematics concerned with finding the minimum or maximum of a function. Essential to many machine learning algorithms.

Pandas — The Python Data Analysis library.

Principal Components Analysis — A classic feature extraction algorithm based on prediction into a subspace.

Python — A high-level programming language, popular for machine learning applications.

Regression — A machine learning problem involving the prediction of a real-valued scalar or vector.

Singular Value Decomposition — A well-known matrix factorisation method.

Scikit-learn — A library for Machine Learning in Python.

Scipy — A Python library for scientific computing.

Statistics — A branch of mathematics concerned with finding useful patterns in data.

Stochastic Gradient Descent — A fast numerical optimisation algorithm commonly used in deep learning algorithms.

Tensor — A multidimensional array.

Tensorflow — A deep learning library developed by Google.

Test Set — A set of examples/observations used for evaluating the prediction performance of an algorithm.

Theano — A tensor manipulation library for Python which can run code on the GPU.

Training Set — A set of examples/observations used for training a machine learning algorithm.

Validation Set — A set of examples/observations used for tuning the parameters of an algorithm whilst training.

Comments

Popular posts from this blog

ETL Process in Power BI

 We can make a data analysis process easy using the ETL process in Power BI Desktop.  What is ETL in Power BI Desktop ? ETL (Extract, Transform and Load) data is a process by which we can change the shape of our dataset. You can perform several operations in Power Query editor.  During the ETL process, data is extracted from a data source (raw data), then transformed, corrected and then loaded for the next step. You can do this process manually in Excel but it is time consuming process and at the end of the day, we don't want to do the same process for upcoming new data. It should be done by automation. If you have not read my last blog on Power BI then please go through the link  Blog We will start the ETL process step by step A. Extract Data  B. Transform Data C. Load Data A. Extract Data from a data source To extract data from a data source, click on Get Data. We will extract raw data from Excel sheet and then will connect the sheet with Power BI. You can ext...

Excel for Data Analysis

 Most important functions of Excel for data analysis Excel and google sheets are powerful tools for managing the raw data. There are many functions that can make your task easier.  As a Data Analyst, you use many functions like VLOOKUP, Pivot Tables, Conditional formatting, etc., So here, I have listed some most important functions of excel for data analysis : - 1. Pivot Table Pivot tables are a way to aggregate data to look at smaller subsets without using manual filters. Without it, you would use the manual filters on every column, check to see if they work, and then use [SUM] functions. All of that is built for you with a pivot table and pivot chart. You can set rows, columns, values and filters at the same at your level. You can use any aggregate function while creating a Pivot table.  In short, Pivot table gives you the summary of your data. We will see more detail in next blog. 2. VLOOKUP VLOOKUP stands for "Vertical Lookup", this means the act of looking up da...

Get started with Power BI Desktop

The very first question comes in mind is Should I use Power BI Desktop or Power BI Service to create a dashboard ? You should get started with Power BI Desktop. As I have already told in the first blog ( Blog 1 ) that the flow of creating a report is    Power BI Desktop > Power BI Service > Power BI Mobile