The Fans Who Kicked Sneakers Into the Big Leagues

When an investment bank like Cowen is prepared to list sneakers as a credible alternative asset class, you know that the ground is moving beneath your feet, whether you wear Nike, Adidas or…

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转

Objective function and Optimization

Does it really work as the name implies, Boosting?

As we know that XGBoost is an ensemble learning technique, particularly a BOOSTING one. Let’s go a step back and have a look at “Ensembles”.

Ensembles in layman are nothing but grouping and trust me this is the whole idea behind ensembles. They combine the decisions from multiple models to improve the overall performance. It is sort of asking opinion on something from different people and then collectively form an overall opinion for that.

Below is the graphics interchange format for Ensemble that is well defined and related to real-life scenarios.

Ensemble Learning, Source: machinelearningknowledge.ai

Ensemble learning is considered as one of the ways to tackle the bias-variance tradeoff in Decision Trees.

There are various ways of Ensemble learning but two of them are widely used:

Let’s quickly see how Bagging & Boosting works...

BAGGING is an ensemble technique used to reduce the variance of our predictions by combining the result of multiple classifiers modeled on different sub-samples of the same data set.

In a nutshell, BAGGING comes from two words “Bootstrap” & “Aggregation”. Bootstrap refers to subsetting the data and Aggregation refer to aggregating the results that we will be getting from different models.

Random forest is one of the famous and widely use Bagging models.

BOOSTING is a sequential process, where each subsequent model attempts to correct the errors of the previous model. The succeeding models are dependent on the previous model and hence work sequentially. It fits a sequence of weak learners − models that are only slightly better than random guessings, such as small decision trees − to weighted versions of the data. More weight is given to examples that were misclassified by earlier rounds/iterations.

Mathematically, it can be expressed as below:

In a nutshell: Bagging vs Boosting is,

Many boosting algorithms impart additional boost to the model’s accuracy, a few of them are:

Remember, the basic principle for all the Boosting algorithms will be the same as we discussed above, it’s just some specialty that makes them different from others. We will now be focussing on XGBoost and will see its functionalities.

Is it truly a winning model?

XGBoost has become a widely used and really popular tool among Kaggle competitors and Data Scientists in the industry, as it has been battle-tested for production on large-scale problems.

The amount of flexibility and features XGBoost is offering are worth conveying that fact. Its name stands for eXtreme Gradient Boosting. The implementation of XGBoost offers several advanced features for model tuning, computing environments, and algorithm enhancement. It is capable of performing the three main forms of gradient boosting (Gradient Boosting (GB), Stochastic GB, and Regularized (GB) and it is robust enough to support fine-tuning and addition of regularization parameters.

Salient Features of XGboost:

Why use XGBoost?

The two major reasons to use XGBoost:

But, how does it work?

Let’s quickly see Gradient Boosting, gradient boosting comprises an ensemble method that sequentially adds predictors and corrects previous models. However, instead of assigning different weights to the classifiers after every iteration, this method fits the new model to new residuals of the previous prediction and then minimizes the loss when adding the latest prediction. So, in the end, you are updating your model using gradient descent and hence the name, gradient boosting. This is supported for both regression and classification problems.

The objective function (loss function and regularization) at iteration t that we need to optimize is the following:

Attaching hand-written notes to understand the things in a better way:

Regularization term in XGboost is basically given as:

The mean square error loss function form is very friendly, with a linear term (often called the residual term) and a quadratic term. It is not easy to get such a good form for other notable loss functions (such as logistic loss). So in general, we extend the Taylor expansion of the loss function to the second-order

This becomes our optimization goal for the new tree. An important advantage of this definition is that the value of the objective function depends only on pi with qi. This is how XGBoost supports custom losses.

The authors of XGBoost have divided the parameters into four categories, general parameters, booster parameters, learning task parameters & command line parameters. Here, I have highlighted the majority of parameters to be considered while performing tuning.

After covering all these things, you might be realizing “XGboost is worth a model winning thing”, right?

This is it for this blog, I will try to do a practical implementation in Python and will be sharing the amazing results of XGboost in my upcoming blog.

Happy Learning!!

The Fans Who Kicked Sneakers Into the Big Leagues

Objective function and Optimization

Add a comment

Related posts:

Hunger

Get customers hooked to your product

Journey as Management Trainee