In another article, we discussed basic concepts around decision trees or CART algorithms and the advantages and limitations of using a decision tree in Regression or Classification problems.
Read more on that here:
For a video introduction on Decision Trees, check out this 8-minute lesson:
In this article, we are going to focus on:
Memoization is a type of caching that stores the result of a deterministic function. More specifically, memoization is an optimization technique used to accelerate programs by storing the results of function calls and returning the cached result when redundant inputs arise.
In other words, memoization prevents a program from running the same calculation twice.
Let’s see this behavior with an artificially slow Python function.
When we run
slow_func.py we get the following output:
A decision tree is a supervised machine learning algorithm that can be used for regression and classification problems. A decision tree follows a set of nested if-else conditions to make predictions.
Since decision trees can be used for classification and regression the algorithm used to grow them is often called CART (Classification and Regression Trees). There is no single decision tree algorithm. Multiple algorithms have been proposed to build decision trees, but we will focus on the CART algorithm used in scikit-learn.
Decision trees are binary trees where each node represents a…
It is a common mistake to diagnose a model as "overfitting the data" simply by comparing a metric in training versus the same metric in testing. Some models are just designed in a way such that they tend to have high train accuracy.
You make the point for Random Forest very clearly in your graphs.
This is one case where understanding how a model works would lead to one fewer rabbit hole. Depending on your specific implementation, a decision tree evaluates all possible splits or a subset of them. However, for any given split, the feature's scale doesn't make any difference in computing the gain in Gini (or entropy or misclassfication rate). So the encoders should produce equivalent results.
Of course, since random forests are random, you get a little fluctuation.
This article will discuss a data science competition we did with one of our classes. We will discuss the five best-scoring models and their complexity.
The challenge is to create a machine learning model that predicts fish weight. The student whose model has the lowest mean-squared error (MSE) will be declared the winner!
Hello! Welcome to the famous Tsukiji fish market of Tokyo, Japan! We came here to collect data on the fish they have, but we didn’t wake up at 5 am for the tuna auction. By the time we showed up, there…
We begin with some background on functional programming concepts and a discussion of timing and tracing.
Next, we illustrate the decorator pattern and its syntax with two examples,
timefunc. To do this, we use the Python libraries
Then we move to a deeper discussion of
functools.wraps and how it preserves the metadata of a decorated function. Lastly, we show this preservation with some examples.
Tracing is recording the inputs and outputs of functions as the program runs. Experienced programmers use tracing to troubleshoot programs, often as a substitute for…
We’ve all heard of arguments and keyword arguments (args and kwargs) when discussing Python functions. Arguments usually consist of numerical values, while keyword arguments, as the name suggests, are semantic. When writing functions,
**kwargs are often passed directly into a function definition.
This function can handle any number of args and kwargs because of the asterisk(s) used in the function definition. These asterisks are packing and unpacking operators.
When this file is run, the following output is generated.
Our goal is to have a codebase of pure functions that we can decorate with a tracer. By applying this decorator to pure functions, we can debug code without using a cumbersome debugger. This reduces developer pain as debuggers are often tedious and difficult to work with.
Strictly speaking, in a functional programming paradigm, all functions are pure functions. What are they, how can we code them and why are they useful?
Before we get into that, let's review mathematical functions.
Mathematical functions, such as
cos(x), return a single value. Give it an
Our goal is to create a reusable way to trace functions in Python. We do this by coding a decorator with Python's
functools library. This decorator will then be applied to functions whose runtime we are interested in.
The code below represents a common decorator pattern that has a reusable and flexible structure. Notice the placement of
functool.wraps. It is a decorator for our closure. This decorator preserves
func’s metadata as it is passed to the closure.
If we did not use
functools.wraps to decorate our closure on line 7, the wrong name…
Data Scientist, Software Developer and Educator