bolt > preprocessing > LabelEncoder

LabelEncoder

The LabelEncoder class encodes categorical string labels as unique integer values, allowing categorical data to be used in numerical models. It also provides functionality to reverse the encoding and retrieve the original string labels.

Overview

Label encoding is a preprocessing step that converts categorical labels (strings) into numerical values, which is essential for many machine learning algorithms. Each unique label is assigned a unique integer, making it easy to handle categorical variables.

Parameters

`LabelEncoder/fit(y: List[str])`

y: Target labels, a list of categorical values in string format.

Returns: A fitted LabelEncoder model, which stores mappings between classes and their integer encodings.

`LabelEncoder/transform(model: LabelEncoder, y: List[str])`

model: A fitted LabelEncoder instance.
y: List of categorical values (strings) to encode, which should only include labels known to the model.

Returns: Encoded labels as a list of integers (u24) corresponding to each categorical label in y.

`LabelEncoder/inverse_transform(model: LabelEncoder, y: List[u24])`

model: A fitted LabelEncoder instance.
y: List of integer-encoded labels to decode back into the original string labels.

Returns: Decoded labels as a list of strings corresponding to the original labels.

Model Attributes

After fitting the model using LabelEncoder/fit, the following attributes become available:

classes: A dictionary mapping each unique string label to its corresponding integer encoding (u24). Useful for understanding which integer values correspond to which labels.
inverse_classes: A dictionary mapping each integer encoding (u24) back to its original string label. Useful for reversing transformations.

Accessing Model Attributes

After the LabelEncoder model is fitted, you can inspect the mappings to understand how the labels are encoded:

model = LabelEncoder/fit(y)

# Accessing encoded class mappings
open LabelEncoder: model
with IO:
  IO/print("Classes:", model.classes)
  IO/print("Inverse Classes:", model.inverse_classes)

Examples

Basic Usage

# Sample categorical labels
y = ["apple", "banana", "cherry", "apple", "banana"]

# Fit the LabelEncoder model to data
model = LabelEncoder/fit(y)

# Transform the labels to integer encoding
encoded_y = LabelEncoder/transform(model, y)
print("Encoded Labels:", encoded_y)
# Output: Encoded Labels: [0, 1, 2, 0, 1]

# Inverse transform the encoded labels to retrieve original labels
decoded_y = LabelEncoder/inverse_transform(model, encoded_y)
print("Decoded Labels:", decoded_y)
# Output: Decoded Labels: ["apple", "banana", "cherry", "apple", "banana"]

Explanation of Encoding and Decoding

Encoding: Converts categorical labels to unique integers, enabling algorithms that require numerical input.
Decoding: Reverses the encoding, returning integers back to their original categorical labels. This is helpful for interpreting model outputs when the encoded labels are used in predictions.

Common Issues and Error Handling

Unknown Labels in Transform: Attempting to transform a label that was not present in the training set will result in an error. Make sure all labels in y for transform are known.
Incompatible Data in Inverse Transform: Ensure that integer values passed to inverse_transform were previously generated by transform. Passing out-of-range integers may lead to undefined results.