Topics
Apart from data standardization, we can also scale data by compressing it into a fixed range. One of the biggest use cases for this is compressing data into the range [0, 1]
. This allows us to view the data in terms of proportions, or percentages, based on the minimum and maximum values in the data.
The formula for scaling based on a range is a two-step process. For a given data value, x
, we first compute the proportion of the value with respect to the min and max of the data: d_\min and d_\max, respectively.
The formula above computes the proportion of the data value, .
Warning
Note that this only works if not all the data values are the same (i.e. d_\max \ne d_\min).
We then use the proportion of the value to scale to the specified range, [r_\min, r_\max]:
x_\text{scale} = x_\text{prop}\cdot (r_\max - r_\min) + r_\minfrom sklearn.preprocessing import MinMaxScaler
# the default range is [0,1]
default_scaler = MinMaxScaler(feature_range=(-2, 3))
default_scaler.fit(data)
transformed = default_scaler.transform(new_data)