drifter_ml.classification_tests package¶

Submodules¶

drifter_ml.classification_tests.classification_tests module¶

class drifter_ml.classification_tests.classification_tests.ClassificationTests(clf, test_data, target_name, column_names)¶

Bases: drifter_ml.classification_tests.classification_tests.FixedClassificationMetrics

The general goal of this class it to test classification algorithms. The tests in this class move from simple to sophisticated:

cross_val_average : the average of all folds must be above some number
cross_val_lower_boundary : each fold must be above the lower boundary
lower_boundary_per_class : each class must be above a given lower boundary the lower boundary per class can be different
cross_val_anomaly_detection : the score for each fold must have a deviance from the average below a set tolerance
cross_val_per_class_anomaly_detection : the score for each class for each fold must have a deviance from the average below a set tolerance

As you can see, at each level of sophistication we need more data to get representative sets. But if more data is available, then we are able to test increasingly more cases. The more data we have to test against, the more sure we can be about how well our model does.

Another lense to view each classes of tests, is with respect to stringency. If we need our model to absolutely work all the time, it might be important to use the most sophisticated class - something with cross validation, per class. It’s worth noting, that increased stringency isn’t always a good thing. Statistical models, by definition aren’t supposed to cover every case perfectly. They are supposed to be flexible. So you should only use the most strigent checks if you truly have a ton of data. Otherwise, you will more or less ‘overfit’ your test suite to try and look for errors. Testing in machine learning like in software engineering is very much an art. You need to be sure to cover enough cases, without going overboard.

classifier_testing_per_class(precision_lower_boundary: dict, recall_lower_boundary: dict, f1_lower_boundary: dict, average='binary')¶

This is a slightly less naive stragey, it checks the: * precision score per class, * recall score per class, * f1 score per class Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:

precision_lower_boundary (dict) – the lower boundary for each class’ precision score
recall_lower_boundary (dict) – the lower boundary for each class’ recall score
f1_lower_boundary (dict) – the lower boundary for each class’ f1 score
average (string) – how to calculate the precision

Returns:

True if all the classes of the precision scores are
greater than the lower_boundary
False if the classes for the precision scores are
less than the lower_boundary

cross_val_classifier_testing(precision_lower_boundary: float, recall_lower_boundary: float, f1_lower_boundary: float, cv=3, average='binary')¶

runs the cross validated lower boundary methods for: * precision, * recall, * f1 score The basic idea for these three methods is to check if the accuracy metric stays above a given lower bound. We can set the same precision, recall, or f1 score lower boundary or specify each depending on necessary criteria.

Parameters:

precision_lower_boundary (float) – the lower boundary for a given precision score
recall_lower_boundary (float) – the lower boundary for a given recall score
f1_lower_boundary (float) – the lower boundary for a given f1 score
cv (int) – the number of folds to consider
average (string) – how to calculate the metrics (precision, recall, f1)

Returns:

Returns True if precision, recall and f1 tests
work.
False otherwise

cross_val_f1_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')¶

This checks the k fold (cross validation) f1 score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:

tolerance (float) – the tolerance from the average f1 score
cv (int) – the number of folds to consider
average (string) – how to calculate the f1 score
method (string) – how to calculate the center

Returns:

True if all the deviances from average for all the folds
are above tolerance for f1 score
False if any of the deviances from the average for any of
the folds are below the tolerance for f1 score

cross_val_f1_avg(minimum_center_tolerance, cv=3, average='binary', method='mean')¶

This generates the k fold (cross validation) f1 scores, then based on computes the average of those scores. The way the average scheme works is, an average is calculated and then if the average is less than the minimum tolerance, then False is returned.

Parameters:

minimum_center_tolerance (float) – the average f1 score must be greater than this number
cv (int) – the number of folds to consider
average (string) – how to calculate the f1 score
method (string) – how to calculate the center

Returns:

True if all the folds of the f1 score are greater than
the minimum_center_tolerance
False if the average folds for the f1 score are less than
the minimum_center_tolerance

cross_val_f1_lower_boundary(lower_boundary, cv=3, average='binary')¶

This is possibly the most naive stragey, it generates the k fold (cross validation) f1 scores, if any of the k folds are less than the lower boundary, then False is returned.

Parameters:

lower_boundary (float) – the lower boundary for a given f1 score
cv (int) – the number of folds to consider
average (string) – how to calculate the f1 score

Returns:

True if all the folds of the f1 scores are greater than
the lower_boundary
False if the folds for the f1 scores are less than
the lower_boundary

cross_val_per_class_f1_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')¶

This checks the cross validated per class f1 score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:

tolerance (float) – the tolerance from the average f1 score
cv (int) – the number of folds to consider
average (string) – how to calculate the f1 score
method (string) – how to calculate the center

Returns:

True if all the deviances from average for all the folds
are above tolerance for f1 score
False if any of the deviances from the average for any of
the folds are below the tolerance for f1 score

cross_val_per_class_precision_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')¶

This checks the cross validated per class percision score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:

tolerance (float) – the tolerance from the average precision
cv (int) – the number of folds to consider
average (string) – how to calculate the precision
method (string) – how to calculate the center

Returns:

True if all the deviances from average for all the folds
are above tolerance for precision
False if any of the deviances from the average for any of
the folds are below the tolerance for precision

cross_val_per_class_recall_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')¶

This checks the cross validated per class recall score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:

tolerance (float) – the tolerance from the average recall
cv (int) – the number of folds to consider
average (string) – how to calculate the recall
method (string) – how to calculate the center

Returns:

True if all the deviances from average for all the folds
are above tolerance for recall
False if any of the deviances from the average for any of
the folds are below the tolerance for recall

cross_val_per_class_roc_auc_anomaly_detection(tolerance: float, cv=3, average='micro', method='mean')¶

This checks the cross validated per class roc auc score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:

tolerance (float) – the tolerance from the average roc auc
cv (int) – the number of folds to consider
average (string) – how to calculate the roc auc
method (string) – how to calculate the center

Returns:

True if all the deviances from average for all the folds
are above tolerance for roc auc
False if any of the deviances from the average for any of
the folds are below the tolerance for roc auc

cross_val_precision_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')¶

This checks the k fold (cross validation) precision score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:

tolerance (float) – the tolerance from the average precision
cv (int) – the number of folds to consider
average (string) – how to calculate the precision
method (string) – how to calculate the center

Returns:

True if all the deviances from average for all the folds
are above tolerance for precision
False if any of the deviances from the average for any of
the folds are below the tolerance for precision

cross_val_precision_avg(minimum_center_tolerance, cv=3, average='binary', method='mean')¶

This generates the k fold (cross validation) precision scores, then based on computes the average of those scores. The way the average scheme works is, an average is calculated and then if the average is less than the minimum tolerance, then False is returned.

Parameters:

minimum_center_tolerance (float) – the average precision must be greater than this number
cv (int) – the number of folds to consider
average (string) – how to calculate the precision
method (string) – how to calculate the center

Returns:

True if all the folds of the precision are greater than
the minimum_center_tolerance
False if the average folds for the precision are less than
the minimum_center_tolerance

cross_val_precision_lower_boundary(lower_boundary, cv=3, average='binary')¶

This is possibly the most naive stragey, it generates the k fold (cross validation) precision scores, if any of the k folds are less than the lower boundary, then False is returned.

Parameters:

lower_boundary (float) – the lower boundary for a given precision score
cv (int) – the number of folds to consider
average (string) – how to calculate the precision

Returns:

True if all the folds of the precision scores are
greater than the lower_boundary
False if the folds for the precision scores are
less than the lower_boundary

cross_val_recall_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')¶

This checks the k fold (cross validation) recall score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:

tolerance (float) – the tolerance from the average recall
cv (int) – the number of folds to consider
average (string) – how to calculate the recall
method (string) – how to calculate the center

Returns:

True if all the deviances from average for all the folds
are above tolerance for recall
False if any of the deviances from the average for any of
the folds are below the tolerance for recall

cross_val_recall_avg(minimum_center_tolerance, cv=3, average='binary', method='mean')¶

This generates the k fold (cross validation) recall scores, then based on computes the average of those scores. The way the average scheme works is, an average is calculated and then if the average is less than the minimum tolerance, then False is returned.

Parameters:

minimum_center_tolerance (float) – the average recall must be greater than this number
cv (int) – the number of folds to consider
average (string) – how to calculate the recall
method (string) – how to calculate the center

Returns:

True if all the folds of the recall are greater than
the minimum_center_tolerance
False if the average folds for the recall are less than
the minimum_center_tolerance

cross_val_recall_lower_boundary(lower_boundary, cv=3, average='binary')¶

This is possibly the most naive stragey, it generates the k fold (cross validation) recall scores, if any of the k folds are less than the lower boundary, then False is returned.

Parameters:

lower_boundary (float) – the lower boundary for a given recall score
cv (int) – the number of folds to consider
average (string) – how to calculate the recall

Returns:

True if all the folds of the recall scores are greater than
the lower_boundary
False if the folds for the recall scores are less than
the lower_boundary

cross_val_roc_auc_anomaly_detection(tolerance: float, cv=3, average='micro', method='mean')¶

This checks the k fold (cross validation) roc auc score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:

tolerance (float) – the tolerance from the average roc auc
cv (int) – the number of folds to consider
average (string) – how to calculate the roc auc
method (string) – how to calculate the center

Returns:

True if all the deviances from average for all the folds
are above tolerance for roc auc
False if any of the deviances from the average for any of
the folds are below the tolerance for roc auc

cross_val_roc_auc_avg(minimum_center_tolerance, cv=3, average='micro', method='mean')¶

This generates the k fold (cross validation) roc auc scores, then based on computes the average of those scores. The way the average scheme works is, an average is calculated and then if the average is less than the minimum tolerance, then False is returned.

Parameters:

minimum_center_tolerance (float) – the average roc auc must be greater than this number
cv (int) – the number of folds to consider
average (string) – how to calculate the roc auc
method (string) – how to calculate the center

Returns:

True if all the folds of the roc auc are greater than
the minimum_center_tolerance
False if the average folds for the roc auc are less than
the minimum_center_tolerance

cross_val_roc_auc_lower_boundary(lower_boundary, cv=3, average='micro')¶

This is possibly the most naive stragey, it generates the k fold (cross validation) roc auc scores, if any of the k folds are less than the lower boundary, then False is returned.

Parameters:

lower_boundary (float) – the lower boundary for a given roc auc score
cv (int) – the number of folds to consider
average (string) – how to calculate the roc auc

Returns:

True if all the folds of the roc auc scores are greater than
the lower_boundary
False if the folds for the roc auc scores are less than
the lower_boundary

describe_scores(scores, method)¶

Describes scores.

Parameters:

scores (array-like) – the scores from the model, as a list or numpy array
method (string) – the method to use to calculate central tendency and spread

Returns:

Returns the central tendency, and spread
by method.
Methods
mean
* central tendency (mean)
* spread (standard deviation)
median
* central tendency (median)
* spread (interquartile range)
trimean
* central tendency (trimean)
* spread (trimean absolute deviation)

f1_cv(cv, average='binary')¶

This method performs cross-validation over f1-score.

Parameters:	cv () – The number of cross validation folds to perform average* (*) – [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data. ’binary’: Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary. ’micro’: Calculate metrics globally by counting the total true positives, false negatives and false positives. ’macro’: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account. ’weighted’: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall. ’samples’: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
Returns:
Return type:	Returns a scores of the k-fold f1-score.

f1_lower_boundary_per_class(lower_boundary: dict, average='binary')¶

This is a slightly less naive stragey, it checks the f1 score, Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:

lower_boundary (dict) – the lower boundary for each class’ f1 score
average (string) – how to calculate the f1

Returns:

True if all the classes of the f1 scores are
greater than the lower_boundary
False if the classes for the f1 scores are
less than the lower_boundary

get_test_score(cross_val_dict)¶

is_binary()¶: If number of classes == 2 returns True False otherwise

precision_cv(cv, average='binary')¶

This method performs cross-validation over precision.

Parameters:	cv () – The number of cross validation folds to perform average* (*) – [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data. ’binary’: Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary. ’micro’: Calculate metrics globally by counting the total true positives, false negatives and false positives. ’macro’: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account. ’weighted’: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall. ’samples’: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
Returns:
Return type:	Returns a scores of the k-fold precision.

precision_lower_boundary_per_class(lower_boundary: dict, average='binary')¶

This is a slightly less naive stragey, it checks the precision score, Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:

lower_boundary (dict) – the lower boundary for each class’ precision score
average (string) – how to calculate the precision

Returns:

True if all the classes of the precision scores are
greater than the lower_boundary
False if the classes for the precision scores are
less than the lower_boundary

recall_cv(cv, average='binary')¶

This method performs cross-validation over recall.

Parameters:	cv () – The number of cross validation folds to perform average* (*) – [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data. ’binary’: Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary. ’micro’: Calculate metrics globally by counting the total true positives, false negatives and false positives. ’macro’: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account. ’weighted’: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall. ’samples’: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
Returns:
Return type:	Returns a scores of the k-fold recall.

recall_lower_boundary_per_class(lower_boundary: dict, average='binary')¶

This is a slightly less naive stragey, it checks the recall score, Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:

lower_boundary (dict) – the lower boundary for each class’ recall score
average (string) – how to calculate the recall

Returns:

True if all the classes of the recall scores are
greater than the lower_boundary
False if the classes for the recall scores are
less than the lower_boundary

reset_average(average)¶: Resets the average to the correct thing. If the number of classes are not binary, Then average is changed to micro. Otherwise, return the current average.

roc_auc_cv(cv, average='micro')¶

This method performs cross-validation over roc_auc.

Parameters:	cv () – The number of cross validation folds to perform average* (*) – [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data. ’binary’: Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary. ’micro’: Calculate metrics globally by counting the total true positives, false negatives and false positives. ’macro’: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account. ’weighted’: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall. ’samples’: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
Returns:
Return type:	Returns a scores of the k-fold roc_auc.

roc_auc_exception()¶: Ensures roc_auc score is used correctly. ROC AUC is only defined for binary classification.

roc_auc_lower_boundary_per_class(lower_boundary: dict, average='micro')¶

This is a slightly less naive stragey, it checks the roc auc score, Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:

lower_boundary (dict) – the lower boundary for each class’ roc auc score
average (string) – how to calculate the roc auc

Returns:

True if all the classes of the roc auc scores are
greater than the lower_boundary
False if the classes for the roc auc scores are
less than the lower_boundary

run_energy_stress_test(sample_sizes: list, max_energy_usages: list, print_to_screen=False, print_to_pdf=False)¶

This is a performance test to ensure that the model is energy efficient.

Note: the model must take longer than 5 seconds to run otherwise energyusage cannot accurate estimate the energy cost. At this point, the cost is neglible. Therefore, when testing, please try to use reasonable size estimates based on expected throughput.

Parameters:

sample_sizes (list) – the size of each sample to test for doing a prediction, each sample size is an integer
max_energy_usages (list) – the maximum time in seconds that each sample should take to predict, at a maximum.

Returns:

True if all samples predict within the maximum allowed
energy usage.
False otherwise.

run_time_stress_test(sample_sizes: list, max_run_times: list)¶

This is a performance test to ensure that the model runs fast enough.

sample_sizes : list: the size of each sample to test for doing a prediction, each sample size is an integer
max_run_times : list: the maximum time in seconds that each sample should take to predict, at a maximum.

Returns:	True if all samples predict within the maximum allowed time. False otherwise.

spread_cross_val_classifier_testing(precision_tolerance: float, recall_tolerance: float, f1_tolerance: float, method='mean', cv=10, average='binary')¶

This is a somewhat intelligent stragey, it generates the k fold (cross validation) the following scores: * precision scores, * recall scores * f1 scores if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:

tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
method (string) –
see describe for more details. * mean : the center is the mean, the spread is standard

deviation.
- median : the center is the median, the spread is
  
  the interquartile range.
- trimean : the center is the trimean, the spread is
  
  trimean absolute deviation.
average (string) – how to calculate the precision

Returns:

True if all the folds of the precision, recall, f1 scores
are greater than the center - (spread * tolerance)
False if the folds for the precision, recall, f1 scores
are less than the center - (spread * tolerance)

spread_cross_val_f1_anomaly_detection(tolerance, method='mean', cv=10, average='binary')¶

This is a somewhat intelligent stragey, it generates the k fold (cross validation) f1 scores, if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:

tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
method (string) –
see describe for more details. * mean : the center is the mean, the spread is standard

deviation.
- median : the center is the median, the spread is
  
  the interquartile range.
- trimean : the center is the trimean, the spread is
  
  trimean absolute deviation.
average (string) – how to calculate the f1 score

Returns:

True if all the folds of the f1 scores are greater than
the center - (spread * tolerance)
False if the folds for the f1 scores are less than
the center - (spread * tolerance)

spread_cross_val_precision_anomaly_detection(tolerance, method='mean', cv=10, average='binary')¶

This is a somewhat intelligent stragey, it generates the k fold (cross validation) precision scores, if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:

tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
method (string) –
see describe for more details. * mean : the center is the mean, the spread is standard

deviation.
- median : the center is the median, the spread is
  
  the interquartile range.
- trimean : the center is the trimean, the spread is
  
  trimean absolute deviation.
average (string) – how to calculate the precision

Returns:

True if all the folds of the precision scores are greater than
the center - (spread * tolerance)
False if the folds for the precision scores are less than
the center - (spread * tolerance)

spread_cross_val_recall_anomaly_detection(tolerance, method='mean', cv=3, average='binary')¶

This is a somewhat intelligent stragey, it generates the k fold (cross validation) recall scores, if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:

tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
method (string) –
see describe for more details. * mean : the center is the mean, the spread is standard

deviation.
- median : the center is the median, the spread is
  
  the interquartile range.
- trimean : the center is the trimean, the spread is
  
  trimean absolute deviation.
average (string) – how to calculate the recall

Returns:

True if all the folds of the recall scores are greater than
the center - (spread * tolerance)
False if the folds for the recall scores are less than
the center - (spread * tolerance)

spread_cross_val_roc_auc_anomaly_detection(tolerance, method='mean', cv=10, average='micro')¶

This is a somewhat intelligent stragey, it generates the k fold (cross validation) roc auc scores, if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:

tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
method (string) –
see describe for more details. * mean : the center is the mean, the spread is standard

deviation.
- median : the center is the median, the spread is
  
  the interquartile range.
- trimean : the center is the trimean, the spread is
  
  trimean absolute deviation.
average (string) – how to calculate the precision

Returns:

True if all the folds of the roc auc scores are greater than
the center - (spread * tolerance)
False if the folds for the roc auc scores are less than
the center - (spread * tolerance)

trimean(data)¶

I’m exposing this as a public method because the trimean is not implemented in enough packages.

Formula: (25th percentile + 2*50th percentile + 75th percentile)/4

Parameters:	data (array-like) – an iterable, either a list or a numpy array
Returns:	the trimean
Return type:	float

trimean_absolute_deviation(data)¶

The trimean absolute deviation is the the average distance from the trimean.

Parameters:	data (array-like) – an iterable, either a list or a numpy array
Returns:	the average distance to the trimean
Return type:	float

class drifter_ml.classification_tests.classification_tests.ClassifierComparison(clf_one, clf_two, test_data, target_name, column_names)¶

Bases: drifter_ml.classification_tests.classification_tests.FixedClassificationMetrics

cross_val_f1(clf, cv=3, average='binary')¶

cross_val_f1_per_class(clf, cv=3, average='binary')¶

cross_val_per_class_two_model_classifier_testing(cv=3, average='binary')¶

cross_val_precision(clf, cv=3, average='binary')¶

cross_val_precision_per_class(clf, cv=3, average='binary')¶

cross_val_recall(clf, cv=3, average='binary')¶

cross_val_recall_per_class(clf, cv=3, average='binary')¶

cross_val_roc_auc(clf, cv=3, average='micro')¶

cross_val_roc_auc_per_class(clf, cv=3, average='micro')¶

cross_val_two_model_classifier_testing(cv=3, average='binary')¶

f1_per_class(clf, average='binary')¶

is_binary()¶

precision_per_class(clf, average='binary')¶

recall_per_class(clf, average='binary')¶

reset_average(average)¶

roc_auc_exception()¶

roc_auc_per_class(clf, average='micro')¶

two_model_classifier_testing(average='binary')¶

two_model_prediction_run_time_stress_test(sample_sizes)¶

class drifter_ml.classification_tests.classification_tests.FixedClassificationMetrics¶

Bases: object

f1_score(y_true, y_pred, labels=None, pos_label=1, average='binary', sample_weight=None)¶

The Scikit-Learn precision score, see the full documentation here: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html

The difference between this f1 score and the one in scikit-learn, is we fix a small bug. When all the values in y_true are zero and y_pred are zero the f1_score returns one. (Which Scikit-learn does not do at present).

Parameters:	y_true (1d array-like, or label indicator array / sparse matrix) – Ground truth (correct) target values. y_pred (1d array-like, or label indicator array / sparse matrix) – Estimated targets as returned by a classifier labels (list, optional) – The set of labels to include when average != binary, and their order if average is None. Labels present in the data can be excluded, for example to calculate a multiclass average ignoring a majority negative class, while labels not present in the data will result in 0 components in a macro average. For multilabel targets, labels are column indices. By default, all labels in y_true and y_pred are used in sorted order. pos_label () – The class to report if average=’binary’ and the data is binary. If the data are multiclass or multilabel, this will be ignored; setting labels=[pos_label] and average != ‘binary’ will report scores for that label only. average* () – [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data. ’binary’: Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary. ’micro’: Calculate metrics globally by counting the total true positives, false negatives and false positives. ’macro’: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account. ’weighted’: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall. ’samples’: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score). sample_weight* (*) – array-like of shape = [n_samples], optional. Sample weights.
Returns:	* f1 – (if average is not None) or array of float, shape = [n_unique_labels] F1 score of the positive class in binary classification or weighted average of the f1 scores of each class for the multiclass task.
Return type:	float

precision_score(y_true, y_pred, labels=None, pos_label=1, average='binary', sample_weight=None)¶

The Scikit-Learn precision score, see the full documentation here: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html

The difference between this precision score and the one in scikit-learn, is we fix a small bug. When all the values in y_true are zero and y_pred are zero the precision_score returns one. (Which Scikit-learn does not do at present).

Parameters:	y_true (1d array-like, or label indicator array / sparse matrix) – Ground truth (correct) target values. y_pred (1d array-like, or label indicator array / sparse matrix) – Estimated targets as returned by a classifier labels (list, optional) – The set of labels to include when average != binary, and their order if average is None. Labels present in the data can be excluded, for example to calculate a multiclass average ignoring a majority negative class, while labels not present in the data will result in 0 components in a macro average. For multilabel targets, labels are column indices. By default, all labels in y_true and y_pred are used in sorted order. pos_label (str or int, 1 by default) – The class to report if average=’binary’ and the data is binary. If the data are multiclass or multilabel, this will be ignored; setting labels=[pos_label] and average != ‘binary’ will report scores for that label only. average (string,) – [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data. ’binary’ : string Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary. ’micro’ : string Calculate metrics globally by counting the total true positives, false negatives and false positives. ’macro’ : string Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account. ’weighted’ : string Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall. ’samples’ : string Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score). sample_weight (array-like) – array-like of shape = [n_samples], optional Sample weights.
Returns:	precision – (if average is not None) or array of float, shape = [n_unique_labels] Precision of the positive class in binary classification or weighted average of the precision of each class for the multiclass task.
Return type:	float

recall_score(y_true, y_pred, labels=None, pos_label=1, average='binary', sample_weight=None)¶

The Scikit-Learn precision score, see the full documentation here: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html

The difference between this recall score and the one in scikit-learn, is we fix a small bug. When all the values in y_true are zero and y_pred are zero the recall_score returns one. (Which Scikit-learn does not do at present).

Parameters:	y_true (1d array-like, or label indicator array / sparse matrix) – Ground truth (correct) target values. y_pred (1d array-like, or label indicator array / sparse matrix) – Estimated targets as returned by a classifier labels (list, optional) – The set of labels to include when average != binary, and their order if average is None. Labels present in the data can be excluded, for example to calculate a multiclass average ignoring a majority negative class, while labels not present in the data will result in 0 components in a macro average. For multilabel targets, labels are column indices. By default, all labels in y_true and y_pred are used in sorted order. pos_label (str or int, 1 by default) – The class to report if average=’binary’ and the data is binary. If the data are multiclass or multilabel, this will be ignored; setting labels=[pos_label] and average != ‘binary’ will report scores for that label only. average (string,) – [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data. ’binary’: Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary. ’micro’: Calculate metrics globally by counting the total true positives, false negatives and false positives. ’macro’: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account. ’weighted’: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall. ’samples’: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score). sample_weight (array-like) – array-like of shape = [n_samples], optional Sample weights.
Returns:	recall – (if average is not None) or array of float, shape = [n_unique_labels] Recall of the positive class in binary classification or weighted average of the recall of each class for the multiclass task.
Return type:	float

roc_auc_score(y_true, y_pred, labels=None, pos_label=1, average='micro', sample_weight=None)¶

The Scikit-Learn precision score, see the full documentation here: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html

The difference between this roc_auc score and the one in scikit-learn, is we fix a small bug. When all the values in y_true are zero and y_pred are zero the roc_auc_score returns one. (Which Scikit-learn does not do at present).

Parameters:	y_true () – Ground truth (correct) target values. y_pred* () – Estimated targets as returned by a classifier labels* () – The set of labels to include when average != binary, and their order if average is None. Labels present in the data can be excluded, for example to calculate a multiclass average ignoring a majority negative class, while labels not present in the data will result in 0 components in a macro average. For multilabel targets, labels are column indices. By default, all labels in y_true and y_pred are used in sorted order. pos_label* () – The class to report if average=’binary’ and the data is binary. If the data are multiclass or multilabel, this will be ignored; setting labels=[pos_label] and average != ‘binary’ will report scores for that label only. average* () – string, [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data. ’binary’: Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary. ’micro’: Calculate metrics globally by counting the total true positives, false negatives and false positives. ’macro’: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account. ’weighted’: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall. ’samples’: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score). sample_weight* (*) – array-like of shape = [n_samples], optional Sample weights.
Returns:	* roc_auc – (if average is not None) or array of float, shape = [n_unique_labels] Roc_auc score of the positive class in binary classification or weighted average of the roc_auc scores of each class for the multiclass task.
Return type:	float

Module contents¶

class drifter_ml.classification_tests.ClassificationTests(clf, test_data, target_name, column_names)¶

Bases: drifter_ml.classification_tests.classification_tests.FixedClassificationMetrics

The general goal of this class it to test classification algorithms. The tests in this class move from simple to sophisticated:

cross_val_average : the average of all folds must be above some number
cross_val_lower_boundary : each fold must be above the lower boundary
lower_boundary_per_class : each class must be above a given lower boundary the lower boundary per class can be different
cross_val_anomaly_detection : the score for each fold must have a deviance from the average below a set tolerance
cross_val_per_class_anomaly_detection : the score for each class for each fold must have a deviance from the average below a set tolerance

As you can see, at each level of sophistication we need more data to get representative sets. But if more data is available, then we are able to test increasingly more cases. The more data we have to test against, the more sure we can be about how well our model does.

Another lense to view each classes of tests, is with respect to stringency. If we need our model to absolutely work all the time, it might be important to use the most sophisticated class - something with cross validation, per class. It’s worth noting, that increased stringency isn’t always a good thing. Statistical models, by definition aren’t supposed to cover every case perfectly. They are supposed to be flexible. So you should only use the most strigent checks if you truly have a ton of data. Otherwise, you will more or less ‘overfit’ your test suite to try and look for errors. Testing in machine learning like in software engineering is very much an art. You need to be sure to cover enough cases, without going overboard.

classifier_testing_per_class(precision_lower_boundary: dict, recall_lower_boundary: dict, f1_lower_boundary: dict, average='binary')¶

This is a slightly less naive stragey, it checks the: * precision score per class, * recall score per class, * f1 score per class Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:

precision_lower_boundary (dict) – the lower boundary for each class’ precision score
recall_lower_boundary (dict) – the lower boundary for each class’ recall score
f1_lower_boundary (dict) – the lower boundary for each class’ f1 score
average (string) – how to calculate the precision

Returns:

True if all the classes of the precision scores are
greater than the lower_boundary
False if the classes for the precision scores are
less than the lower_boundary

cross_val_classifier_testing(precision_lower_boundary: float, recall_lower_boundary: float, f1_lower_boundary: float, cv=3, average='binary')¶

runs the cross validated lower boundary methods for: * precision, * recall, * f1 score The basic idea for these three methods is to check if the accuracy metric stays above a given lower bound. We can set the same precision, recall, or f1 score lower boundary or specify each depending on necessary criteria.

Parameters:

precision_lower_boundary (float) – the lower boundary for a given precision score
recall_lower_boundary (float) – the lower boundary for a given recall score
f1_lower_boundary (float) – the lower boundary for a given f1 score
cv (int) – the number of folds to consider
average (string) – how to calculate the metrics (precision, recall, f1)

Returns:

Returns True if precision, recall and f1 tests
work.
False otherwise

cross_val_f1_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')¶

This checks the k fold (cross validation) f1 score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:

tolerance (float) – the tolerance from the average f1 score
cv (int) – the number of folds to consider
average (string) – how to calculate the f1 score
method (string) – how to calculate the center

Returns:

True if all the deviances from average for all the folds
are above tolerance for f1 score
False if any of the deviances from the average for any of
the folds are below the tolerance for f1 score

cross_val_f1_avg(minimum_center_tolerance, cv=3, average='binary', method='mean')¶

This generates the k fold (cross validation) f1 scores, then based on computes the average of those scores. The way the average scheme works is, an average is calculated and then if the average is less than the minimum tolerance, then False is returned.

Parameters:

minimum_center_tolerance (float) – the average f1 score must be greater than this number
cv (int) – the number of folds to consider
average (string) – how to calculate the f1 score
method (string) – how to calculate the center

Returns:

True if all the folds of the f1 score are greater than
the minimum_center_tolerance
False if the average folds for the f1 score are less than
the minimum_center_tolerance

cross_val_f1_lower_boundary(lower_boundary, cv=3, average='binary')¶

This is possibly the most naive stragey, it generates the k fold (cross validation) f1 scores, if any of the k folds are less than the lower boundary, then False is returned.

Parameters:

lower_boundary (float) – the lower boundary for a given f1 score
cv (int) – the number of folds to consider
average (string) – how to calculate the f1 score

Returns:

True if all the folds of the f1 scores are greater than
the lower_boundary
False if the folds for the f1 scores are less than
the lower_boundary

cross_val_per_class_f1_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')¶

This checks the cross validated per class f1 score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:

tolerance (float) – the tolerance from the average f1 score
cv (int) – the number of folds to consider
average (string) – how to calculate the f1 score
method (string) – how to calculate the center

Returns:

True if all the deviances from average for all the folds
are above tolerance for f1 score
False if any of the deviances from the average for any of
the folds are below the tolerance for f1 score

cross_val_per_class_precision_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')¶

This checks the cross validated per class percision score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:

tolerance (float) – the tolerance from the average precision
cv (int) – the number of folds to consider
average (string) – how to calculate the precision
method (string) – how to calculate the center

Returns:

True if all the deviances from average for all the folds
are above tolerance for precision
False if any of the deviances from the average for any of
the folds are below the tolerance for precision

cross_val_per_class_recall_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')¶

This checks the cross validated per class recall score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:

tolerance (float) – the tolerance from the average recall
cv (int) – the number of folds to consider
average (string) – how to calculate the recall
method (string) – how to calculate the center

Returns:

True if all the deviances from average for all the folds
are above tolerance for recall
False if any of the deviances from the average for any of
the folds are below the tolerance for recall

cross_val_per_class_roc_auc_anomaly_detection(tolerance: float, cv=3, average='micro', method='mean')¶

This checks the cross validated per class roc auc score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:

tolerance (float) – the tolerance from the average roc auc
cv (int) – the number of folds to consider
average (string) – how to calculate the roc auc
method (string) – how to calculate the center

Returns:

True if all the deviances from average for all the folds
are above tolerance for roc auc
False if any of the deviances from the average for any of
the folds are below the tolerance for roc auc

cross_val_precision_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')¶

This checks the k fold (cross validation) precision score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:

tolerance (float) – the tolerance from the average precision
cv (int) – the number of folds to consider
average (string) – how to calculate the precision
method (string) – how to calculate the center

Returns:

True if all the deviances from average for all the folds
are above tolerance for precision
False if any of the deviances from the average for any of
the folds are below the tolerance for precision

cross_val_precision_avg(minimum_center_tolerance, cv=3, average='binary', method='mean')¶

This generates the k fold (cross validation) precision scores, then based on computes the average of those scores. The way the average scheme works is, an average is calculated and then if the average is less than the minimum tolerance, then False is returned.

Parameters:

minimum_center_tolerance (float) – the average precision must be greater than this number
cv (int) – the number of folds to consider
average (string) – how to calculate the precision
method (string) – how to calculate the center

Returns:

True if all the folds of the precision are greater than
the minimum_center_tolerance
False if the average folds for the precision are less than
the minimum_center_tolerance

cross_val_precision_lower_boundary(lower_boundary, cv=3, average='binary')¶

This is possibly the most naive stragey, it generates the k fold (cross validation) precision scores, if any of the k folds are less than the lower boundary, then False is returned.

Parameters:

lower_boundary (float) – the lower boundary for a given precision score
cv (int) – the number of folds to consider
average (string) – how to calculate the precision

Returns:

True if all the folds of the precision scores are
greater than the lower_boundary
False if the folds for the precision scores are
less than the lower_boundary

cross_val_recall_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')¶

This checks the k fold (cross validation) recall score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:

tolerance (float) – the tolerance from the average recall
cv (int) – the number of folds to consider
average (string) – how to calculate the recall
method (string) – how to calculate the center

Returns:

True if all the deviances from average for all the folds
are above tolerance for recall
False if any of the deviances from the average for any of
the folds are below the tolerance for recall

cross_val_recall_avg(minimum_center_tolerance, cv=3, average='binary', method='mean')¶

This generates the k fold (cross validation) recall scores, then based on computes the average of those scores. The way the average scheme works is, an average is calculated and then if the average is less than the minimum tolerance, then False is returned.

Parameters:

minimum_center_tolerance (float) – the average recall must be greater than this number
cv (int) – the number of folds to consider
average (string) – how to calculate the recall
method (string) – how to calculate the center

Returns:

True if all the folds of the recall are greater than
the minimum_center_tolerance
False if the average folds for the recall are less than
the minimum_center_tolerance

cross_val_recall_lower_boundary(lower_boundary, cv=3, average='binary')¶

This is possibly the most naive stragey, it generates the k fold (cross validation) recall scores, if any of the k folds are less than the lower boundary, then False is returned.

Parameters:

lower_boundary (float) – the lower boundary for a given recall score
cv (int) – the number of folds to consider
average (string) – how to calculate the recall

Returns:

True if all the folds of the recall scores are greater than
the lower_boundary
False if the folds for the recall scores are less than
the lower_boundary

cross_val_roc_auc_anomaly_detection(tolerance: float, cv=3, average='micro', method='mean')¶

This checks the k fold (cross validation) roc auc score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:

tolerance (float) – the tolerance from the average roc auc
cv (int) – the number of folds to consider
average (string) – how to calculate the roc auc
method (string) – how to calculate the center

Returns:

True if all the deviances from average for all the folds
are above tolerance for roc auc
False if any of the deviances from the average for any of
the folds are below the tolerance for roc auc

cross_val_roc_auc_avg(minimum_center_tolerance, cv=3, average='micro', method='mean')¶

This generates the k fold (cross validation) roc auc scores, then based on computes the average of those scores. The way the average scheme works is, an average is calculated and then if the average is less than the minimum tolerance, then False is returned.

Parameters:

minimum_center_tolerance (float) – the average roc auc must be greater than this number
cv (int) – the number of folds to consider
average (string) – how to calculate the roc auc
method (string) – how to calculate the center

Returns:

True if all the folds of the roc auc are greater than
the minimum_center_tolerance
False if the average folds for the roc auc are less than
the minimum_center_tolerance

cross_val_roc_auc_lower_boundary(lower_boundary, cv=3, average='micro')¶

This is possibly the most naive stragey, it generates the k fold (cross validation) roc auc scores, if any of the k folds are less than the lower boundary, then False is returned.

Parameters:

lower_boundary (float) – the lower boundary for a given roc auc score
cv (int) – the number of folds to consider
average (string) – how to calculate the roc auc

Returns:

True if all the folds of the roc auc scores are greater than
the lower_boundary
False if the folds for the roc auc scores are less than
the lower_boundary

describe_scores(scores, method)¶

Describes scores.

Parameters:

scores (array-like) – the scores from the model, as a list or numpy array
method (string) – the method to use to calculate central tendency and spread

Returns:

Returns the central tendency, and spread
by method.
Methods
mean
* central tendency (mean)
* spread (standard deviation)
median
* central tendency (median)
* spread (interquartile range)
trimean
* central tendency (trimean)
* spread (trimean absolute deviation)

f1_cv(cv, average='binary')¶

This method performs cross-validation over f1-score.

Parameters:	cv () – The number of cross validation folds to perform average* (*) – [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data. ’binary’: Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary. ’micro’: Calculate metrics globally by counting the total true positives, false negatives and false positives. ’macro’: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account. ’weighted’: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall. ’samples’: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
Returns:
Return type:	Returns a scores of the k-fold f1-score.

f1_lower_boundary_per_class(lower_boundary: dict, average='binary')¶

This is a slightly less naive stragey, it checks the f1 score, Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:

lower_boundary (dict) – the lower boundary for each class’ f1 score
average (string) – how to calculate the f1

Returns:

True if all the classes of the f1 scores are
greater than the lower_boundary
False if the classes for the f1 scores are
less than the lower_boundary

get_test_score(cross_val_dict)¶

is_binary()¶: If number of classes == 2 returns True False otherwise

precision_cv(cv, average='binary')¶

This method performs cross-validation over precision.

Parameters:	cv () – The number of cross validation folds to perform average* (*) – [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data. ’binary’: Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary. ’micro’: Calculate metrics globally by counting the total true positives, false negatives and false positives. ’macro’: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account. ’weighted’: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall. ’samples’: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
Returns:
Return type:	Returns a scores of the k-fold precision.

precision_lower_boundary_per_class(lower_boundary: dict, average='binary')¶

This is a slightly less naive stragey, it checks the precision score, Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:

lower_boundary (dict) – the lower boundary for each class’ precision score
average (string) – how to calculate the precision

Returns:

True if all the classes of the precision scores are
greater than the lower_boundary
False if the classes for the precision scores are
less than the lower_boundary

recall_cv(cv, average='binary')¶

This method performs cross-validation over recall.

Parameters:	cv () – The number of cross validation folds to perform average* (*) – [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data. ’binary’: Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary. ’micro’: Calculate metrics globally by counting the total true positives, false negatives and false positives. ’macro’: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account. ’weighted’: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall. ’samples’: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
Returns:
Return type:	Returns a scores of the k-fold recall.

recall_lower_boundary_per_class(lower_boundary: dict, average='binary')¶

This is a slightly less naive stragey, it checks the recall score, Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:

lower_boundary (dict) – the lower boundary for each class’ recall score
average (string) – how to calculate the recall

Returns:

True if all the classes of the recall scores are
greater than the lower_boundary
False if the classes for the recall scores are
less than the lower_boundary

reset_average(average)¶: Resets the average to the correct thing. If the number of classes are not binary, Then average is changed to micro. Otherwise, return the current average.

roc_auc_cv(cv, average='micro')¶

This method performs cross-validation over roc_auc.

Parameters:	cv () – The number of cross validation folds to perform average* (*) – [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data. ’binary’: Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary. ’micro’: Calculate metrics globally by counting the total true positives, false negatives and false positives. ’macro’: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account. ’weighted’: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall. ’samples’: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
Returns:
Return type:	Returns a scores of the k-fold roc_auc.

roc_auc_exception()¶: Ensures roc_auc score is used correctly. ROC AUC is only defined for binary classification.

roc_auc_lower_boundary_per_class(lower_boundary: dict, average='micro')¶

This is a slightly less naive stragey, it checks the roc auc score, Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:

lower_boundary (dict) – the lower boundary for each class’ roc auc score
average (string) – how to calculate the roc auc

Returns:

True if all the classes of the roc auc scores are
greater than the lower_boundary
False if the classes for the roc auc scores are
less than the lower_boundary

run_energy_stress_test(sample_sizes: list, max_energy_usages: list, print_to_screen=False, print_to_pdf=False)¶

This is a performance test to ensure that the model is energy efficient.

Note: the model must take longer than 5 seconds to run otherwise energyusage cannot accurate estimate the energy cost. At this point, the cost is neglible. Therefore, when testing, please try to use reasonable size estimates based on expected throughput.

Parameters:

sample_sizes (list) – the size of each sample to test for doing a prediction, each sample size is an integer
max_energy_usages (list) – the maximum time in seconds that each sample should take to predict, at a maximum.

Returns:

True if all samples predict within the maximum allowed
energy usage.
False otherwise.

run_time_stress_test(sample_sizes: list, max_run_times: list)¶

This is a performance test to ensure that the model runs fast enough.

sample_sizes : list: the size of each sample to test for doing a prediction, each sample size is an integer
max_run_times : list: the maximum time in seconds that each sample should take to predict, at a maximum.

Returns:	True if all samples predict within the maximum allowed time. False otherwise.

spread_cross_val_classifier_testing(precision_tolerance: float, recall_tolerance: float, f1_tolerance: float, method='mean', cv=10, average='binary')¶

This is a somewhat intelligent stragey, it generates the k fold (cross validation) the following scores: * precision scores, * recall scores * f1 scores if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:

tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
method (string) –
see describe for more details. * mean : the center is the mean, the spread is standard

deviation.
- median : the center is the median, the spread is
  
  the interquartile range.
- trimean : the center is the trimean, the spread is
  
  trimean absolute deviation.
average (string) – how to calculate the precision

Returns:

True if all the folds of the precision, recall, f1 scores
are greater than the center - (spread * tolerance)
False if the folds for the precision, recall, f1 scores
are less than the center - (spread * tolerance)

spread_cross_val_f1_anomaly_detection(tolerance, method='mean', cv=10, average='binary')¶

This is a somewhat intelligent stragey, it generates the k fold (cross validation) f1 scores, if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:

tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
method (string) –
see describe for more details. * mean : the center is the mean, the spread is standard

deviation.
- median : the center is the median, the spread is
  
  the interquartile range.
- trimean : the center is the trimean, the spread is
  
  trimean absolute deviation.
average (string) – how to calculate the f1 score

Returns:

True if all the folds of the f1 scores are greater than
the center - (spread * tolerance)
False if the folds for the f1 scores are less than
the center - (spread * tolerance)

spread_cross_val_precision_anomaly_detection(tolerance, method='mean', cv=10, average='binary')¶

This is a somewhat intelligent stragey, it generates the k fold (cross validation) precision scores, if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:

tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
method (string) –
see describe for more details. * mean : the center is the mean, the spread is standard

deviation.
- median : the center is the median, the spread is
  
  the interquartile range.
- trimean : the center is the trimean, the spread is
  
  trimean absolute deviation.
average (string) – how to calculate the precision

Returns:

True if all the folds of the precision scores are greater than
the center - (spread * tolerance)
False if the folds for the precision scores are less than
the center - (spread * tolerance)

spread_cross_val_recall_anomaly_detection(tolerance, method='mean', cv=3, average='binary')¶

This is a somewhat intelligent stragey, it generates the k fold (cross validation) recall scores, if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:

tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
method (string) –
see describe for more details. * mean : the center is the mean, the spread is standard

deviation.
- median : the center is the median, the spread is
  
  the interquartile range.
- trimean : the center is the trimean, the spread is
  
  trimean absolute deviation.
average (string) – how to calculate the recall

Returns:

True if all the folds of the recall scores are greater than
the center - (spread * tolerance)
False if the folds for the recall scores are less than
the center - (spread * tolerance)

spread_cross_val_roc_auc_anomaly_detection(tolerance, method='mean', cv=10, average='micro')¶

This is a somewhat intelligent stragey, it generates the k fold (cross validation) roc auc scores, if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:

tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
method (string) –
see describe for more details. * mean : the center is the mean, the spread is standard

deviation.
- median : the center is the median, the spread is
  
  the interquartile range.
- trimean : the center is the trimean, the spread is
  
  trimean absolute deviation.
average (string) – how to calculate the precision

Returns:

True if all the folds of the roc auc scores are greater than
the center - (spread * tolerance)
False if the folds for the roc auc scores are less than
the center - (spread * tolerance)

trimean(data)¶

I’m exposing this as a public method because the trimean is not implemented in enough packages.

Formula: (25th percentile + 2*50th percentile + 75th percentile)/4

Parameters:	data (array-like) – an iterable, either a list or a numpy array
Returns:	the trimean
Return type:	float

trimean_absolute_deviation(data)¶

The trimean absolute deviation is the the average distance from the trimean.

Parameters:	data (array-like) – an iterable, either a list or a numpy array
Returns:	the average distance to the trimean
Return type:	float

class drifter_ml.classification_tests.ClassifierComparison(clf_one, clf_two, test_data, target_name, column_names)¶

Bases: drifter_ml.classification_tests.classification_tests.FixedClassificationMetrics

cross_val_f1(clf, cv=3, average='binary')¶

cross_val_f1_per_class(clf, cv=3, average='binary')¶

cross_val_per_class_two_model_classifier_testing(cv=3, average='binary')¶

cross_val_precision(clf, cv=3, average='binary')¶

cross_val_precision_per_class(clf, cv=3, average='binary')¶

cross_val_recall(clf, cv=3, average='binary')¶

cross_val_recall_per_class(clf, cv=3, average='binary')¶

cross_val_roc_auc(clf, cv=3, average='micro')¶

cross_val_roc_auc_per_class(clf, cv=3, average='micro')¶

cross_val_two_model_classifier_testing(cv=3, average='binary')¶

f1_per_class(clf, average='binary')¶

is_binary()¶

precision_per_class(clf, average='binary')¶

recall_per_class(clf, average='binary')¶

reset_average(average)¶

roc_auc_exception()¶

roc_auc_per_class(clf, average='micro')¶

two_model_classifier_testing(average='binary')¶

two_model_prediction_run_time_stress_test(sample_sizes)¶