bolt > preprocessing > LabelEncoder
LabelEncoder
The LabelEncoder class encodes categorical string labels as unique integer values, allowing categorical data to be used in numerical models. It also provides functionality to reverse the encoding and retrieve the original string labels.
Overview
Label encoding is a preprocessing step that converts categorical labels (strings) into numerical values, which is essential for many machine learning algorithms. Each unique label is assigned a unique integer, making it easy to handle categorical variables.
Parameters
LabelEncoder/fit(y: List[str])
- y: Target labels, a list of categorical values in string format.
Returns: A fitted LabelEncoder model, which stores mappings between classes and their integer encodings.
LabelEncoder/transform(model: LabelEncoder, y: List[str])
- model: A fitted
LabelEncoderinstance. - y: List of categorical values (strings) to encode, which should only include labels known to the model.
Returns: Encoded labels as a list of integers (u24) corresponding to each categorical label in y.
LabelEncoder/inverse_transform(model: LabelEncoder, y: List[u24])
- model: A fitted
LabelEncoderinstance. - y: List of integer-encoded labels to decode back into the original string labels.
Returns: Decoded labels as a list of strings corresponding to the original labels.
Model Attributes
After fitting the model using LabelEncoder/fit, the following attributes become available:
- classes: A dictionary mapping each unique string label to its corresponding integer encoding (
u24). Useful for understanding which integer values correspond to which labels. - inverse_classes: A dictionary mapping each integer encoding (
u24) back to its original string label. Useful for reversing transformations.
Accessing Model Attributes
After the LabelEncoder model is fitted, you can inspect the mappings to understand how the labels are encoded:
model = LabelEncoder/fit(y)
# Accessing encoded class mappings
open LabelEncoder: model
with IO:
IO/print("Classes:", model.classes)
IO/print("Inverse Classes:", model.inverse_classes)
Examples
Basic Usage
# Sample categorical labels
y = ["apple", "banana", "cherry", "apple", "banana"]
# Fit the LabelEncoder model to data
model = LabelEncoder/fit(y)
# Transform the labels to integer encoding
encoded_y = LabelEncoder/transform(model, y)
print("Encoded Labels:", encoded_y)
# Output: Encoded Labels: [0, 1, 2, 0, 1]
# Inverse transform the encoded labels to retrieve original labels
decoded_y = LabelEncoder/inverse_transform(model, encoded_y)
print("Decoded Labels:", decoded_y)
# Output: Decoded Labels: ["apple", "banana", "cherry", "apple", "banana"]
Explanation of Encoding and Decoding
- Encoding: Converts categorical labels to unique integers, enabling algorithms that require numerical input.
- Decoding: Reverses the encoding, returning integers back to their original categorical labels. This is helpful for interpreting model outputs when the encoded labels are used in predictions.
Common Issues and Error Handling
- Unknown Labels in Transform: Attempting to transform a label that was not present in the training set will result in an error. Make sure all labels in
yfortransformare known. - Incompatible Data in Inverse Transform: Ensure that integer values passed to
inverse_transformwere previously generated bytransform. Passing out-of-range integers may lead to undefined results.