Bagging and boosting methods: What is the difference?

Betting on a profession of the future is synonymous with betting on technology. And one of the safest bets for scientists is, without a doubt, the Artificial Intelligence. And within it, there is the branch of the Machine Learning which offers a multitude of professional opportunities.

Machine Learning is a technological speciality that straddles AI and Data Science and seeks to use data and algorithms to mimic human learning, improving its accuracy.

These algorithms can predict almost any type of variable.This is why we can use machine learning in all kinds of sectors of work.

The learning of these machines or pieces of software is continuous. Gradually, they acquire more data and thus become more intelligent; they are able to understand human behaviour.

Assembled machine learning algorithms

The assembled algorithms or assemblies are a type of machine learning algorithm that improves generalisation by using different combination strategies.

In other words: are the union of several simple algorithms that form a more complex and powerful one..

It is worth noting that, although there are different types of ensemble algorithms such as majority voting, bagging, boosting or stacking in machine learning, in this post we only want to highlight boosting and bagging. Therefore, we will explain the bagging and boosting methods and their difference in detail below.

Bagging

Firstly, it should be noted that both bagging and boosting methods serve to reduce variance (or variability of the data, with respect to the mean) in learning statistics.

That said, bagging is an aggregation Bootstrap (a set of open source tools used in web development) that achieves the combination of different models, starting from an initial family, which reduces variance and avoids over-fitting. In other words, that when we use bagging we are employing different machine learning models.

This methodology makes predictive errors are compensated forThe model is trained on subsets - which choose samples with repetition, randomly - from the global training set.

The bagging method is widely used with so-called decision trees. Do you know what they consist of?

Definition of Random Forest

First of all, Decision trees are those prediction models made up of binary rules.. In other words: yes or no.

These decision trees form what are called Random Forest Bootstrap models or random forests, combined with bagging. In fact, their samples are somewhat different and the prediction is made from a new observation, which has been previously added to the individual trees of each model.

These random forests are widely used in bagging, due to their performance and speed.

Boosting

In contrast to bagging (which is known for its speed), boosting is a general methodology of slow learning. In this method, a wide variety of models that are obtained from a method with poor prediction are combined in order to produce a better predictor.

Thus, shallowly constructed decision trees, small and highly combinable trees, are used here.

Likewise, boosting is an attempt to fix the prediction errors of previous models.. Sequenced trees seeking to improve on the previous classification.

An additive model, where more weight is given to misclassified samples than to those that are well classified.

Dedicate yourself to machine learning with IMMUNE

At the IMMUNE Institute of Technology we have different complementary trainings so that you will never ask again about "bagging and boosting methods: what's the difference". At IMMUNE you will be able to become an expert in machine learning.

To begin with, we have the Degree in Software Development Engineeringas well as with our Master in Data Sciencewith which you can become a data scientist. Also, if you prefer it by time, we have this one. Data Analytics Bootcamp or this one on Voice TechWelcome to the training of the present and the future!

Escrito por

Marta López

Tiempo de lectura:

3–4 minutes