Safe Random Forest Notebook#

A Quick Start Guide to implementing Safer Random Forests#

Lets start by making some data with one disclosive case#

  • We’ll do this by adding an example to the iris data and give it a new class to make things really obvious.

  • The same risks exist for more complex data sets but everyone knows iris

[1]:
import os
import numpy as np
from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
y = iris.target


# print the max and min values in each feature to help hand-craft the disclosive point
for feature in range(4):
    print(f"feature {feature} min {np.min(X[:,feature])}, min {np.max(X[:,feature])}")

# now add a single disclosve point with features [7,2,4.5,1] and label 3
X = np.vstack([X, (7, 2.0, 4.5, 1)])
y = np.append(y, 4)
feature 0 min 4.3, min 7.9
feature 1 min 2.0, min 4.4
feature 2 min 1.0, min 6.9
feature 3 min 0.1, min 2.5

Some basic Libraries for visualization#

[2]:
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree

Defining a new class SafeRandomForestClassifier¶#

-Don’t forget to import the SafeModel classes.

[3]:
from sacroml.safemodel.classifiers import SafeRandomForestClassifier
[4]:
safeRFModel = SafeRandomForestClassifier(n_estimators=100)  # (criterion="entropy")

safeRFModel.fit(X, y)

print(f"Training set accuracy in this safe case is {safeRFModel.score(X,y)}")
fig, ax = plt.subplots(10, 10, figsize=(15, 15))
for row in range(10):
    for column in range(10):
        whichTree = 10 * row + column
        treeRowCol = safeRFModel.estimators_[whichTree]
        _ = plot_tree(treeRowCol, filled=True, ax=ax[row][column], fontsize=1)
Preliminary checks: WARNING: model parameters may present a disclosure risk:
- parameter min_samples_leaf = 1 identified as less than the recommended min value of 5.
Training set accuracy in this safe case is 1.0
../_images/notebooks_example-notebook-randomforest_7_1.png

Using the save and reporting functionality¶#

[5]:
safeRFModel.save(name="testSaveRF.pkl")
safeRFModel.preliminary_check()
safeRFModel.request_release(path="testSaveRF", ext="pkl")
Preliminary checks: WARNING: model parameters may present a disclosure risk:
- parameter min_samples_leaf = 1 identified as less than the recommended min value of 5.

The checkfile reports any warnings and recomendations in JSON format#

[6]:
target_yaml = os.path.normpath("testSaveRF/target.yaml")
with open(target_yaml) as f:
    print(f.read())
dataset_name: ''
dataset_module_path: ''
features: {}
generalisation_error: .nan
safemodel:
- researcher: unknown
  model_type: RandomForestClassifier
  details: 'WARNING: model parameters may present a disclosure risk:

    - parameter min_samples_leaf = 1 identified as less than the recommended min value
    of 5.'
  k_anonymity: '1'
  recommendation: Do not allow release
  reason: 'WARNING: model parameters may present a disclosure risk:

    - parameter min_samples_leaf = 1 identified as less than the recommended min value
    of 5.'
  timestamp: '2025-12-02 21:12:52'
model_type: SklearnModel
model_name: SafeRandomForestClassifier
model_params:
  n_estimators: 100
  bootstrap: true
  oob_score: false
  n_jobs: null
  random_state: null
  verbose: 0
  warm_start: false
  class_weight: null
  max_samples: null
  criterion: gini
  max_depth: null
  min_samples_split: 2
  min_samples_leaf: 1
  min_weight_fraction_leaf: 0.0
  max_features: sqrt
  max_leaf_nodes: null
  min_impurity_decrease: 0.0
  ccp_alpha: 0.0
model_path: model.pkl
X_train_path: ''
y_train_path: ''
X_test_path: ''
y_test_path: ''
X_train_orig_path: ''
y_train_orig_path: ''
X_test_orig_path: ''
y_test_orig_path: ''
proba_train_path: ''
proba_test_path: ''
indices_train_path: ''
indices_test_path: ''

Putting it all together#

-Don’t forget to import the SafeModel classes.

[7]:
from sacroml.safemodel.classifiers import SafeRandomForestClassifier

safeRFModel = SafeRandomForestClassifier(n_estimators=100)  # (criterion="entropy")
safeRFModel.fit(X, y)
safeRFModel.save(name="testSaveRF.pkl")
safeRFModel.preliminary_check()
safeRFModel.request_release(path="testSaveRF", ext="pkl")
Preliminary checks: WARNING: model parameters may present a disclosure risk:
- parameter min_samples_leaf = 1 identified as less than the recommended min value of 5.
Preliminary checks: WARNING: model parameters may present a disclosure risk:
- parameter min_samples_leaf = 1 identified as less than the recommended min value of 5.

Examine the checkfile contents#

[8]:
target_yaml = os.path.normpath("testSaveRF/target.yaml")
with open(target_yaml) as f:
    print(f.read())
dataset_name: ''
dataset_module_path: ''
features: {}
generalisation_error: .nan
safemodel:
- researcher: unknown
  model_type: RandomForestClassifier
  details: 'WARNING: model parameters may present a disclosure risk:

    - parameter min_samples_leaf = 1 identified as less than the recommended min value
    of 5.'
  k_anonymity: '1'
  recommendation: Do not allow release
  reason: 'WARNING: model parameters may present a disclosure risk:

    - parameter min_samples_leaf = 1 identified as less than the recommended min value
    of 5.'
  timestamp: '2025-12-02 21:12:52'
model_type: SklearnModel
model_name: SafeRandomForestClassifier
model_params:
  n_estimators: 100
  bootstrap: true
  oob_score: false
  n_jobs: null
  random_state: null
  verbose: 0
  warm_start: false
  class_weight: null
  max_samples: null
  criterion: gini
  max_depth: null
  min_samples_split: 2
  min_samples_leaf: 1
  min_weight_fraction_leaf: 0.0
  max_features: sqrt
  max_leaf_nodes: null
  min_impurity_decrease: 0.0
  ccp_alpha: 0.0
model_path: model.pkl
X_train_path: ''
y_train_path: ''
X_test_path: ''
y_test_path: ''
X_train_orig_path: ''
y_train_orig_path: ''
X_test_orig_path: ''
y_test_orig_path: ''
proba_train_path: ''
proba_test_path: ''
indices_train_path: ''
indices_test_path: ''

[ ]: