drifter_ml.classification_tests package

Submodules

drifter_ml.classification_tests.classification_tests module

class drifter_ml.classification_tests.classification_tests.ClassificationTests(clf, test_data, target_name, column_names)

Bases: drifter_ml.classification_tests.classification_tests.FixedClassificationMetrics

The general goal of this class it to test classification algorithms. The tests in this class move from simple to sophisticated:

  • cross_val_average : the average of all folds must be above some number
  • cross_val_lower_boundary : each fold must be above the lower boundary
  • lower_boundary_per_class : each class must be above a given lower boundary the lower boundary per class can be different
  • cross_val_anomaly_detection : the score for each fold must have a deviance from the average below a set tolerance
  • cross_val_per_class_anomaly_detection : the score for each class for each fold must have a deviance from the average below a set tolerance

As you can see, at each level of sophistication we need more data to get representative sets. But if more data is available, then we are able to test increasingly more cases. The more data we have to test against, the more sure we can be about how well our model does.

Another lense to view each classes of tests, is with respect to stringency. If we need our model to absolutely work all the time, it might be important to use the most sophisticated class - something with cross validation, per class. It’s worth noting, that increased stringency isn’t always a good thing. Statistical models, by definition aren’t supposed to cover every case perfectly. They are supposed to be flexible. So you should only use the most strigent checks if you truly have a ton of data. Otherwise, you will more or less ‘overfit’ your test suite to try and look for errors. Testing in machine learning like in software engineering is very much an art. You need to be sure to cover enough cases, without going overboard.

classifier_testing_per_class(precision_lower_boundary: dict, recall_lower_boundary: dict, f1_lower_boundary: dict, average='binary')

This is a slightly less naive stragey, it checks the: * precision score per class, * recall score per class, * f1 score per class Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:
  • precision_lower_boundary (dict) – the lower boundary for each class’ precision score
  • recall_lower_boundary (dict) – the lower boundary for each class’ recall score
  • f1_lower_boundary (dict) – the lower boundary for each class’ f1 score
  • average (string) – how to calculate the precision
Returns:

  • True if all the classes of the precision scores are
  • greater than the lower_boundary
  • False if the classes for the precision scores are
  • less than the lower_boundary

cross_val_classifier_testing(precision_lower_boundary: float, recall_lower_boundary: float, f1_lower_boundary: float, cv=3, average='binary')

runs the cross validated lower boundary methods for: * precision, * recall, * f1 score The basic idea for these three methods is to check if the accuracy metric stays above a given lower bound. We can set the same precision, recall, or f1 score lower boundary or specify each depending on necessary criteria.

Parameters:
  • precision_lower_boundary (float) – the lower boundary for a given precision score
  • recall_lower_boundary (float) – the lower boundary for a given recall score
  • f1_lower_boundary (float) – the lower boundary for a given f1 score
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the metrics (precision, recall, f1)
Returns:

  • Returns True if precision, recall and f1 tests
  • work.
  • False otherwise

cross_val_f1_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')

This checks the k fold (cross validation) f1 score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:
  • tolerance (float) – the tolerance from the average f1 score
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the f1 score
  • method (string) – how to calculate the center
Returns:

  • True if all the deviances from average for all the folds
  • are above tolerance for f1 score
  • False if any of the deviances from the average for any of
  • the folds are below the tolerance for f1 score

cross_val_f1_avg(minimum_center_tolerance, cv=3, average='binary', method='mean')

This generates the k fold (cross validation) f1 scores, then based on computes the average of those scores. The way the average scheme works is, an average is calculated and then if the average is less than the minimum tolerance, then False is returned.

Parameters:
  • minimum_center_tolerance (float) – the average f1 score must be greater than this number
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the f1 score
  • method (string) – how to calculate the center
Returns:

  • True if all the folds of the f1 score are greater than
  • the minimum_center_tolerance
  • False if the average folds for the f1 score are less than
  • the minimum_center_tolerance

cross_val_f1_lower_boundary(lower_boundary, cv=3, average='binary')

This is possibly the most naive stragey, it generates the k fold (cross validation) f1 scores, if any of the k folds are less than the lower boundary, then False is returned.

Parameters:
  • lower_boundary (float) – the lower boundary for a given f1 score
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the f1 score
Returns:

  • True if all the folds of the f1 scores are greater than
  • the lower_boundary
  • False if the folds for the f1 scores are less than
  • the lower_boundary

cross_val_per_class_f1_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')

This checks the cross validated per class f1 score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:
  • tolerance (float) – the tolerance from the average f1 score
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the f1 score
  • method (string) – how to calculate the center
Returns:

  • True if all the deviances from average for all the folds
  • are above tolerance for f1 score
  • False if any of the deviances from the average for any of
  • the folds are below the tolerance for f1 score

cross_val_per_class_precision_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')

This checks the cross validated per class percision score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:
  • tolerance (float) – the tolerance from the average precision
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the precision
  • method (string) – how to calculate the center
Returns:

  • True if all the deviances from average for all the folds
  • are above tolerance for precision
  • False if any of the deviances from the average for any of
  • the folds are below the tolerance for precision

cross_val_per_class_recall_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')

This checks the cross validated per class recall score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:
  • tolerance (float) – the tolerance from the average recall
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the recall
  • method (string) – how to calculate the center
Returns:

  • True if all the deviances from average for all the folds
  • are above tolerance for recall
  • False if any of the deviances from the average for any of
  • the folds are below the tolerance for recall

cross_val_per_class_roc_auc_anomaly_detection(tolerance: float, cv=3, average='micro', method='mean')

This checks the cross validated per class roc auc score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:
  • tolerance (float) – the tolerance from the average roc auc
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the roc auc
  • method (string) – how to calculate the center
Returns:

  • True if all the deviances from average for all the folds
  • are above tolerance for roc auc
  • False if any of the deviances from the average for any of
  • the folds are below the tolerance for roc auc

cross_val_precision_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')

This checks the k fold (cross validation) precision score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:
  • tolerance (float) – the tolerance from the average precision
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the precision
  • method (string) – how to calculate the center
Returns:

  • True if all the deviances from average for all the folds
  • are above tolerance for precision
  • False if any of the deviances from the average for any of
  • the folds are below the tolerance for precision

cross_val_precision_avg(minimum_center_tolerance, cv=3, average='binary', method='mean')

This generates the k fold (cross validation) precision scores, then based on computes the average of those scores. The way the average scheme works is, an average is calculated and then if the average is less than the minimum tolerance, then False is returned.

Parameters:
  • minimum_center_tolerance (float) – the average precision must be greater than this number
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the precision
  • method (string) – how to calculate the center
Returns:

  • True if all the folds of the precision are greater than
  • the minimum_center_tolerance
  • False if the average folds for the precision are less than
  • the minimum_center_tolerance

cross_val_precision_lower_boundary(lower_boundary, cv=3, average='binary')

This is possibly the most naive stragey, it generates the k fold (cross validation) precision scores, if any of the k folds are less than the lower boundary, then False is returned.

Parameters:
  • lower_boundary (float) – the lower boundary for a given precision score
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the precision
Returns:

  • True if all the folds of the precision scores are
  • greater than the lower_boundary
  • False if the folds for the precision scores are
  • less than the lower_boundary

cross_val_recall_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')

This checks the k fold (cross validation) recall score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:
  • tolerance (float) – the tolerance from the average recall
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the recall
  • method (string) – how to calculate the center
Returns:

  • True if all the deviances from average for all the folds
  • are above tolerance for recall
  • False if any of the deviances from the average for any of
  • the folds are below the tolerance for recall

cross_val_recall_avg(minimum_center_tolerance, cv=3, average='binary', method='mean')

This generates the k fold (cross validation) recall scores, then based on computes the average of those scores. The way the average scheme works is, an average is calculated and then if the average is less than the minimum tolerance, then False is returned.

Parameters:
  • minimum_center_tolerance (float) – the average recall must be greater than this number
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the recall
  • method (string) – how to calculate the center
Returns:

  • True if all the folds of the recall are greater than
  • the minimum_center_tolerance
  • False if the average folds for the recall are less than
  • the minimum_center_tolerance

cross_val_recall_lower_boundary(lower_boundary, cv=3, average='binary')

This is possibly the most naive stragey, it generates the k fold (cross validation) recall scores, if any of the k folds are less than the lower boundary, then False is returned.

Parameters:
  • lower_boundary (float) – the lower boundary for a given recall score
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the recall
Returns:

  • True if all the folds of the recall scores are greater than
  • the lower_boundary
  • False if the folds for the recall scores are less than
  • the lower_boundary

cross_val_roc_auc_anomaly_detection(tolerance: float, cv=3, average='micro', method='mean')

This checks the k fold (cross validation) roc auc score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:
  • tolerance (float) – the tolerance from the average roc auc
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the roc auc
  • method (string) – how to calculate the center
Returns:

  • True if all the deviances from average for all the folds
  • are above tolerance for roc auc
  • False if any of the deviances from the average for any of
  • the folds are below the tolerance for roc auc

cross_val_roc_auc_avg(minimum_center_tolerance, cv=3, average='micro', method='mean')

This generates the k fold (cross validation) roc auc scores, then based on computes the average of those scores. The way the average scheme works is, an average is calculated and then if the average is less than the minimum tolerance, then False is returned.

Parameters:
  • minimum_center_tolerance (float) – the average roc auc must be greater than this number
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the roc auc
  • method (string) – how to calculate the center
Returns:

  • True if all the folds of the roc auc are greater than
  • the minimum_center_tolerance
  • False if the average folds for the roc auc are less than
  • the minimum_center_tolerance

cross_val_roc_auc_lower_boundary(lower_boundary, cv=3, average='micro')

This is possibly the most naive stragey, it generates the k fold (cross validation) roc auc scores, if any of the k folds are less than the lower boundary, then False is returned.

Parameters:
  • lower_boundary (float) – the lower boundary for a given roc auc score
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the roc auc
Returns:

  • True if all the folds of the roc auc scores are greater than
  • the lower_boundary
  • False if the folds for the roc auc scores are less than
  • the lower_boundary

describe_scores(scores, method)

Describes scores.

Parameters:
  • scores (array-like) – the scores from the model, as a list or numpy array
  • method (string) – the method to use to calculate central tendency and spread
Returns:

  • Returns the central tendency, and spread
  • by method.
  • Methods
  • mean
  • * central tendency (mean)
  • * spread (standard deviation)
  • median
  • * central tendency (median)
  • * spread (interquartile range)
  • trimean
  • * central tendency (trimean)
  • * spread (trimean absolute deviation)

f1_cv(cv, average='binary')

This method performs cross-validation over f1-score.

Parameters:
  • cv (*) – The number of cross validation folds to perform
  • average (*) –

    [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data.

    ’binary’:
    Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary.
    ’micro’:
    Calculate metrics globally by counting the total true positives, false negatives and false positives.
    ’macro’:
    Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
    ’weighted’:
    Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall.
    ’samples’:
    Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
Returns:

Return type:

Returns a scores of the k-fold f1-score.

f1_lower_boundary_per_class(lower_boundary: dict, average='binary')

This is a slightly less naive stragey, it checks the f1 score, Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:
  • lower_boundary (dict) – the lower boundary for each class’ f1 score
  • average (string) – how to calculate the f1
Returns:

  • True if all the classes of the f1 scores are
  • greater than the lower_boundary
  • False if the classes for the f1 scores are
  • less than the lower_boundary

get_test_score(cross_val_dict)
is_binary()

If number of classes == 2 returns True False otherwise

precision_cv(cv, average='binary')

This method performs cross-validation over precision.

Parameters:
  • cv (*) – The number of cross validation folds to perform
  • average (*) –

    [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data.

    ’binary’:
    Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary.
    ’micro’:
    Calculate metrics globally by counting the total true positives, false negatives and false positives.
    ’macro’:
    Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
    ’weighted’:
    Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall.
    ’samples’:
    Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
Returns:

Return type:

Returns a scores of the k-fold precision.

precision_lower_boundary_per_class(lower_boundary: dict, average='binary')

This is a slightly less naive stragey, it checks the precision score, Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:
  • lower_boundary (dict) – the lower boundary for each class’ precision score
  • average (string) – how to calculate the precision
Returns:

  • True if all the classes of the precision scores are
  • greater than the lower_boundary
  • False if the classes for the precision scores are
  • less than the lower_boundary

recall_cv(cv, average='binary')

This method performs cross-validation over recall.

Parameters:
  • cv (*) – The number of cross validation folds to perform
  • average (*) –

    [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data.

    ’binary’:
    Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary.
    ’micro’:
    Calculate metrics globally by counting the total true positives, false negatives and false positives.
    ’macro’:
    Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
    ’weighted’:
    Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall.
    ’samples’:
    Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
Returns:

Return type:

Returns a scores of the k-fold recall.

recall_lower_boundary_per_class(lower_boundary: dict, average='binary')

This is a slightly less naive stragey, it checks the recall score, Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:
  • lower_boundary (dict) – the lower boundary for each class’ recall score
  • average (string) – how to calculate the recall
Returns:

  • True if all the classes of the recall scores are
  • greater than the lower_boundary
  • False if the classes for the recall scores are
  • less than the lower_boundary

reset_average(average)

Resets the average to the correct thing. If the number of classes are not binary, Then average is changed to micro. Otherwise, return the current average.

roc_auc_cv(cv, average='micro')

This method performs cross-validation over roc_auc.

Parameters:
  • cv (*) – The number of cross validation folds to perform
  • average (*) –

    [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data.

    ’binary’:
    Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary.
    ’micro’:
    Calculate metrics globally by counting the total true positives, false negatives and false positives.
    ’macro’:
    Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
    ’weighted’:
    Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall.
    ’samples’:
    Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
Returns:

Return type:

Returns a scores of the k-fold roc_auc.

roc_auc_exception()

Ensures roc_auc score is used correctly. ROC AUC is only defined for binary classification.

roc_auc_lower_boundary_per_class(lower_boundary: dict, average='micro')

This is a slightly less naive stragey, it checks the roc auc score, Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:
  • lower_boundary (dict) – the lower boundary for each class’ roc auc score
  • average (string) – how to calculate the roc auc
Returns:

  • True if all the classes of the roc auc scores are
  • greater than the lower_boundary
  • False if the classes for the roc auc scores are
  • less than the lower_boundary

run_energy_stress_test(sample_sizes: list, max_energy_usages: list, print_to_screen=False, print_to_pdf=False)

This is a performance test to ensure that the model is energy efficient.

Note: the model must take longer than 5 seconds to run otherwise energyusage cannot accurate estimate the energy cost. At this point, the cost is neglible. Therefore, when testing, please try to use reasonable size estimates based on expected throughput.

Parameters:
  • sample_sizes (list) – the size of each sample to test for doing a prediction, each sample size is an integer
  • max_energy_usages (list) – the maximum time in seconds that each sample should take to predict, at a maximum.
Returns:

  • True if all samples predict within the maximum allowed
  • energy usage.
  • False otherwise.

run_time_stress_test(sample_sizes: list, max_run_times: list)

This is a performance test to ensure that the model runs fast enough.

sample_sizes : list
the size of each sample to test for doing a prediction, each sample size is an integer
max_run_times : list
the maximum time in seconds that each sample should take to predict, at a maximum.
Returns:
  • True if all samples predict within the maximum allowed
  • time.
  • False otherwise.
spread_cross_val_classifier_testing(precision_tolerance: float, recall_tolerance: float, f1_tolerance: float, method='mean', cv=10, average='binary')

This is a somewhat intelligent stragey, it generates the k fold (cross validation) the following scores: * precision scores, * recall scores * f1 scores if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:
  • tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
  • method (string) –

    see describe for more details. * mean : the center is the mean, the spread is standard

    deviation.
    • median : the center is the median, the spread is
      the interquartile range.
    • trimean : the center is the trimean, the spread is
      trimean absolute deviation.
  • average (string) – how to calculate the precision
Returns:

  • True if all the folds of the precision, recall, f1 scores
  • are greater than the center - (spread * tolerance)
  • False if the folds for the precision, recall, f1 scores
  • are less than the center - (spread * tolerance)

spread_cross_val_f1_anomaly_detection(tolerance, method='mean', cv=10, average='binary')

This is a somewhat intelligent stragey, it generates the k fold (cross validation) f1 scores, if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:
  • tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
  • method (string) –

    see describe for more details. * mean : the center is the mean, the spread is standard

    deviation.
    • median : the center is the median, the spread is
      the interquartile range.
    • trimean : the center is the trimean, the spread is
      trimean absolute deviation.
  • average (string) – how to calculate the f1 score
Returns:

  • True if all the folds of the f1 scores are greater than
  • the center - (spread * tolerance)
  • False if the folds for the f1 scores are less than
  • the center - (spread * tolerance)

spread_cross_val_precision_anomaly_detection(tolerance, method='mean', cv=10, average='binary')

This is a somewhat intelligent stragey, it generates the k fold (cross validation) precision scores, if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:
  • tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
  • method (string) –

    see describe for more details. * mean : the center is the mean, the spread is standard

    deviation.
    • median : the center is the median, the spread is
      the interquartile range.
    • trimean : the center is the trimean, the spread is
      trimean absolute deviation.
  • average (string) – how to calculate the precision
Returns:

  • True if all the folds of the precision scores are greater than
  • the center - (spread * tolerance)
  • False if the folds for the precision scores are less than
  • the center - (spread * tolerance)

spread_cross_val_recall_anomaly_detection(tolerance, method='mean', cv=3, average='binary')

This is a somewhat intelligent stragey, it generates the k fold (cross validation) recall scores, if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:
  • tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
  • method (string) –

    see describe for more details. * mean : the center is the mean, the spread is standard

    deviation.
    • median : the center is the median, the spread is
      the interquartile range.
    • trimean : the center is the trimean, the spread is
      trimean absolute deviation.
  • average (string) – how to calculate the recall
Returns:

  • True if all the folds of the recall scores are greater than
  • the center - (spread * tolerance)
  • False if the folds for the recall scores are less than
  • the center - (spread * tolerance)

spread_cross_val_roc_auc_anomaly_detection(tolerance, method='mean', cv=10, average='micro')

This is a somewhat intelligent stragey, it generates the k fold (cross validation) roc auc scores, if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:
  • tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
  • method (string) –

    see describe for more details. * mean : the center is the mean, the spread is standard

    deviation.
    • median : the center is the median, the spread is
      the interquartile range.
    • trimean : the center is the trimean, the spread is
      trimean absolute deviation.
  • average (string) – how to calculate the precision
Returns:

  • True if all the folds of the roc auc scores are greater than
  • the center - (spread * tolerance)
  • False if the folds for the roc auc scores are less than
  • the center - (spread * tolerance)

trimean(data)

I’m exposing this as a public method because the trimean is not implemented in enough packages.

Formula: (25th percentile + 2*50th percentile + 75th percentile)/4

Parameters:data (array-like) – an iterable, either a list or a numpy array
Returns:the trimean
Return type:float
trimean_absolute_deviation(data)

The trimean absolute deviation is the the average distance from the trimean.

Parameters:data (array-like) – an iterable, either a list or a numpy array
Returns:the average distance to the trimean
Return type:float
class drifter_ml.classification_tests.classification_tests.ClassifierComparison(clf_one, clf_two, test_data, target_name, column_names)

Bases: drifter_ml.classification_tests.classification_tests.FixedClassificationMetrics

cross_val_f1(clf, cv=3, average='binary')
cross_val_f1_per_class(clf, cv=3, average='binary')
cross_val_per_class_two_model_classifier_testing(cv=3, average='binary')
cross_val_precision(clf, cv=3, average='binary')
cross_val_precision_per_class(clf, cv=3, average='binary')
cross_val_recall(clf, cv=3, average='binary')
cross_val_recall_per_class(clf, cv=3, average='binary')
cross_val_roc_auc(clf, cv=3, average='micro')
cross_val_roc_auc_per_class(clf, cv=3, average='micro')
cross_val_two_model_classifier_testing(cv=3, average='binary')
f1_per_class(clf, average='binary')
is_binary()
precision_per_class(clf, average='binary')
recall_per_class(clf, average='binary')
reset_average(average)
roc_auc_exception()
roc_auc_per_class(clf, average='micro')
two_model_classifier_testing(average='binary')
two_model_prediction_run_time_stress_test(sample_sizes)
class drifter_ml.classification_tests.classification_tests.FixedClassificationMetrics

Bases: object

f1_score(y_true, y_pred, labels=None, pos_label=1, average='binary', sample_weight=None)

The Scikit-Learn precision score, see the full documentation here: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html

The difference between this f1 score and the one in scikit-learn, is we fix a small bug. When all the values in y_true are zero and y_pred are zero the f1_score returns one. (Which Scikit-learn does not do at present).

Parameters:
  • y_true (1d array-like, or label indicator array / sparse matrix) – Ground truth (correct) target values.
  • y_pred (1d array-like, or label indicator array / sparse matrix) – Estimated targets as returned by a classifier
  • labels (list, optional) – The set of labels to include when average != binary, and their order if average is None. Labels present in the data can be excluded, for example to calculate a multiclass average ignoring a majority negative class, while labels not present in the data will result in 0 components in a macro average. For multilabel targets, labels are column indices. By default, all labels in y_true and y_pred are used in sorted order.
  • pos_label (*) – The class to report if average=’binary’ and the data is binary. If the data are multiclass or multilabel, this will be ignored; setting labels=[pos_label] and average != ‘binary’ will report scores for that label only.
  • average (*) –

    [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data.

    ’binary’:
    Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary.
    ’micro’:
    Calculate metrics globally by counting the total true positives, false negatives and false positives.
    ’macro’:
    Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
    ’weighted’:
    Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall.
    ’samples’:
    Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
  • sample_weight (*) – array-like of shape = [n_samples], optional. Sample weights.
Returns:

* f1 – (if average is not None) or array of float, shape = [n_unique_labels] F1 score of the positive class in binary classification or weighted average of the f1 scores of each class for the multiclass task.

Return type:

float

precision_score(y_true, y_pred, labels=None, pos_label=1, average='binary', sample_weight=None)

The Scikit-Learn precision score, see the full documentation here: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html

The difference between this precision score and the one in scikit-learn, is we fix a small bug. When all the values in y_true are zero and y_pred are zero the precision_score returns one. (Which Scikit-learn does not do at present).

Parameters:
  • y_true (1d array-like, or label indicator array / sparse matrix) – Ground truth (correct) target values.
  • y_pred (1d array-like, or label indicator array / sparse matrix) – Estimated targets as returned by a classifier
  • labels (list, optional) – The set of labels to include when average != binary, and their order if average is None. Labels present in the data can be excluded, for example to calculate a multiclass average ignoring a majority negative class, while labels not present in the data will result in 0 components in a macro average. For multilabel targets, labels are column indices. By default, all labels in y_true and y_pred are used in sorted order.
  • pos_label (str or int, 1 by default) – The class to report if average=’binary’ and the data is binary. If the data are multiclass or multilabel, this will be ignored; setting labels=[pos_label] and average != ‘binary’ will report scores for that label only.
  • average (string,) –

    [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data.

    ’binary’ : string
    Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary.
    ’micro’ : string
    Calculate metrics globally by counting the total true positives, false negatives and false positives.
    ’macro’ : string
    Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
    ’weighted’ : string
    Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall.
    ’samples’ : string
    Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
  • sample_weight (array-like) –
    array-like of shape = [n_samples], optional
    Sample weights.
Returns:

precision – (if average is not None) or array of float, shape = [n_unique_labels] Precision of the positive class in binary classification or weighted average of the precision of each class for the multiclass task.

Return type:

float

recall_score(y_true, y_pred, labels=None, pos_label=1, average='binary', sample_weight=None)

The Scikit-Learn precision score, see the full documentation here: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html

The difference between this recall score and the one in scikit-learn, is we fix a small bug. When all the values in y_true are zero and y_pred are zero the recall_score returns one. (Which Scikit-learn does not do at present).

Parameters:
  • y_true (1d array-like, or label indicator array / sparse matrix) – Ground truth (correct) target values.
  • y_pred (1d array-like, or label indicator array / sparse matrix) – Estimated targets as returned by a classifier
  • labels (list, optional) – The set of labels to include when average != binary, and their order if average is None. Labels present in the data can be excluded, for example to calculate a multiclass average ignoring a majority negative class, while labels not present in the data will result in 0 components in a macro average. For multilabel targets, labels are column indices. By default, all labels in y_true and y_pred are used in sorted order.
  • pos_label (str or int, 1 by default) – The class to report if average=’binary’ and the data is binary. If the data are multiclass or multilabel, this will be ignored; setting labels=[pos_label] and average != ‘binary’ will report scores for that label only.
  • average (string,) –

    [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data.

    ’binary’:
    Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary.
    ’micro’:
    Calculate metrics globally by counting the total true positives, false negatives and false positives.
    ’macro’:
    Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
    ’weighted’:
    Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall.
    ’samples’:
    Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
  • sample_weight (array-like) – array-like of shape = [n_samples], optional Sample weights.
Returns:

recall – (if average is not None) or array of float, shape = [n_unique_labels] Recall of the positive class in binary classification or weighted average of the recall of each class for the multiclass task.

Return type:

float

roc_auc_score(y_true, y_pred, labels=None, pos_label=1, average='micro', sample_weight=None)

The Scikit-Learn precision score, see the full documentation here: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html

The difference between this roc_auc score and the one in scikit-learn, is we fix a small bug. When all the values in y_true are zero and y_pred are zero the roc_auc_score returns one. (Which Scikit-learn does not do at present).

Parameters:
  • y_true (*) – Ground truth (correct) target values.
  • y_pred (*) – Estimated targets as returned by a classifier
  • labels (*) – The set of labels to include when average != binary, and their order if average is None. Labels present in the data can be excluded, for example to calculate a multiclass average ignoring a majority negative class, while labels not present in the data will result in 0 components in a macro average. For multilabel targets, labels are column indices. By default, all labels in y_true and y_pred are used in sorted order.
  • pos_label (*) – The class to report if average=’binary’ and the data is binary. If the data are multiclass or multilabel, this will be ignored; setting labels=[pos_label] and average != ‘binary’ will report scores for that label only.
  • average (*) –

    string, [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data.

    ’binary’:
    Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary.
    ’micro’:
    Calculate metrics globally by counting the total true positives, false negatives and false positives.
    ’macro’:
    Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
    ’weighted’:
    Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall.
    ’samples’:
    Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
  • sample_weight (*) – array-like of shape = [n_samples], optional Sample weights.
Returns:

* roc_auc – (if average is not None) or array of float, shape = [n_unique_labels] Roc_auc score of the positive class in binary classification or weighted average of the roc_auc scores of each class for the multiclass task.

Return type:

float

Module contents

class drifter_ml.classification_tests.ClassificationTests(clf, test_data, target_name, column_names)

Bases: drifter_ml.classification_tests.classification_tests.FixedClassificationMetrics

The general goal of this class it to test classification algorithms. The tests in this class move from simple to sophisticated:

  • cross_val_average : the average of all folds must be above some number
  • cross_val_lower_boundary : each fold must be above the lower boundary
  • lower_boundary_per_class : each class must be above a given lower boundary the lower boundary per class can be different
  • cross_val_anomaly_detection : the score for each fold must have a deviance from the average below a set tolerance
  • cross_val_per_class_anomaly_detection : the score for each class for each fold must have a deviance from the average below a set tolerance

As you can see, at each level of sophistication we need more data to get representative sets. But if more data is available, then we are able to test increasingly more cases. The more data we have to test against, the more sure we can be about how well our model does.

Another lense to view each classes of tests, is with respect to stringency. If we need our model to absolutely work all the time, it might be important to use the most sophisticated class - something with cross validation, per class. It’s worth noting, that increased stringency isn’t always a good thing. Statistical models, by definition aren’t supposed to cover every case perfectly. They are supposed to be flexible. So you should only use the most strigent checks if you truly have a ton of data. Otherwise, you will more or less ‘overfit’ your test suite to try and look for errors. Testing in machine learning like in software engineering is very much an art. You need to be sure to cover enough cases, without going overboard.

classifier_testing_per_class(precision_lower_boundary: dict, recall_lower_boundary: dict, f1_lower_boundary: dict, average='binary')

This is a slightly less naive stragey, it checks the: * precision score per class, * recall score per class, * f1 score per class Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:
  • precision_lower_boundary (dict) – the lower boundary for each class’ precision score
  • recall_lower_boundary (dict) – the lower boundary for each class’ recall score
  • f1_lower_boundary (dict) – the lower boundary for each class’ f1 score
  • average (string) – how to calculate the precision
Returns:

  • True if all the classes of the precision scores are
  • greater than the lower_boundary
  • False if the classes for the precision scores are
  • less than the lower_boundary

cross_val_classifier_testing(precision_lower_boundary: float, recall_lower_boundary: float, f1_lower_boundary: float, cv=3, average='binary')

runs the cross validated lower boundary methods for: * precision, * recall, * f1 score The basic idea for these three methods is to check if the accuracy metric stays above a given lower bound. We can set the same precision, recall, or f1 score lower boundary or specify each depending on necessary criteria.

Parameters:
  • precision_lower_boundary (float) – the lower boundary for a given precision score
  • recall_lower_boundary (float) – the lower boundary for a given recall score
  • f1_lower_boundary (float) – the lower boundary for a given f1 score
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the metrics (precision, recall, f1)
Returns:

  • Returns True if precision, recall and f1 tests
  • work.
  • False otherwise

cross_val_f1_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')

This checks the k fold (cross validation) f1 score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:
  • tolerance (float) – the tolerance from the average f1 score
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the f1 score
  • method (string) – how to calculate the center
Returns:

  • True if all the deviances from average for all the folds
  • are above tolerance for f1 score
  • False if any of the deviances from the average for any of
  • the folds are below the tolerance for f1 score

cross_val_f1_avg(minimum_center_tolerance, cv=3, average='binary', method='mean')

This generates the k fold (cross validation) f1 scores, then based on computes the average of those scores. The way the average scheme works is, an average is calculated and then if the average is less than the minimum tolerance, then False is returned.

Parameters:
  • minimum_center_tolerance (float) – the average f1 score must be greater than this number
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the f1 score
  • method (string) – how to calculate the center
Returns:

  • True if all the folds of the f1 score are greater than
  • the minimum_center_tolerance
  • False if the average folds for the f1 score are less than
  • the minimum_center_tolerance

cross_val_f1_lower_boundary(lower_boundary, cv=3, average='binary')

This is possibly the most naive stragey, it generates the k fold (cross validation) f1 scores, if any of the k folds are less than the lower boundary, then False is returned.

Parameters:
  • lower_boundary (float) – the lower boundary for a given f1 score
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the f1 score
Returns:

  • True if all the folds of the f1 scores are greater than
  • the lower_boundary
  • False if the folds for the f1 scores are less than
  • the lower_boundary

cross_val_per_class_f1_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')

This checks the cross validated per class f1 score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:
  • tolerance (float) – the tolerance from the average f1 score
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the f1 score
  • method (string) – how to calculate the center
Returns:

  • True if all the deviances from average for all the folds
  • are above tolerance for f1 score
  • False if any of the deviances from the average for any of
  • the folds are below the tolerance for f1 score

cross_val_per_class_precision_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')

This checks the cross validated per class percision score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:
  • tolerance (float) – the tolerance from the average precision
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the precision
  • method (string) – how to calculate the center
Returns:

  • True if all the deviances from average for all the folds
  • are above tolerance for precision
  • False if any of the deviances from the average for any of
  • the folds are below the tolerance for precision

cross_val_per_class_recall_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')

This checks the cross validated per class recall score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:
  • tolerance (float) – the tolerance from the average recall
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the recall
  • method (string) – how to calculate the center
Returns:

  • True if all the deviances from average for all the folds
  • are above tolerance for recall
  • False if any of the deviances from the average for any of
  • the folds are below the tolerance for recall

cross_val_per_class_roc_auc_anomaly_detection(tolerance: float, cv=3, average='micro', method='mean')

This checks the cross validated per class roc auc score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:
  • tolerance (float) – the tolerance from the average roc auc
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the roc auc
  • method (string) – how to calculate the center
Returns:

  • True if all the deviances from average for all the folds
  • are above tolerance for roc auc
  • False if any of the deviances from the average for any of
  • the folds are below the tolerance for roc auc

cross_val_precision_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')

This checks the k fold (cross validation) precision score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:
  • tolerance (float) – the tolerance from the average precision
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the precision
  • method (string) – how to calculate the center
Returns:

  • True if all the deviances from average for all the folds
  • are above tolerance for precision
  • False if any of the deviances from the average for any of
  • the folds are below the tolerance for precision

cross_val_precision_avg(minimum_center_tolerance, cv=3, average='binary', method='mean')

This generates the k fold (cross validation) precision scores, then based on computes the average of those scores. The way the average scheme works is, an average is calculated and then if the average is less than the minimum tolerance, then False is returned.

Parameters:
  • minimum_center_tolerance (float) – the average precision must be greater than this number
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the precision
  • method (string) – how to calculate the center
Returns:

  • True if all the folds of the precision are greater than
  • the minimum_center_tolerance
  • False if the average folds for the precision are less than
  • the minimum_center_tolerance

cross_val_precision_lower_boundary(lower_boundary, cv=3, average='binary')

This is possibly the most naive stragey, it generates the k fold (cross validation) precision scores, if any of the k folds are less than the lower boundary, then False is returned.

Parameters:
  • lower_boundary (float) – the lower boundary for a given precision score
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the precision
Returns:

  • True if all the folds of the precision scores are
  • greater than the lower_boundary
  • False if the folds for the precision scores are
  • less than the lower_boundary

cross_val_recall_anomaly_detection(tolerance: float, cv=3, average='binary', method='mean')

This checks the k fold (cross validation) recall score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:
  • tolerance (float) – the tolerance from the average recall
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the recall
  • method (string) – how to calculate the center
Returns:

  • True if all the deviances from average for all the folds
  • are above tolerance for recall
  • False if any of the deviances from the average for any of
  • the folds are below the tolerance for recall

cross_val_recall_avg(minimum_center_tolerance, cv=3, average='binary', method='mean')

This generates the k fold (cross validation) recall scores, then based on computes the average of those scores. The way the average scheme works is, an average is calculated and then if the average is less than the minimum tolerance, then False is returned.

Parameters:
  • minimum_center_tolerance (float) – the average recall must be greater than this number
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the recall
  • method (string) – how to calculate the center
Returns:

  • True if all the folds of the recall are greater than
  • the minimum_center_tolerance
  • False if the average folds for the recall are less than
  • the minimum_center_tolerance

cross_val_recall_lower_boundary(lower_boundary, cv=3, average='binary')

This is possibly the most naive stragey, it generates the k fold (cross validation) recall scores, if any of the k folds are less than the lower boundary, then False is returned.

Parameters:
  • lower_boundary (float) – the lower boundary for a given recall score
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the recall
Returns:

  • True if all the folds of the recall scores are greater than
  • the lower_boundary
  • False if the folds for the recall scores are less than
  • the lower_boundary

cross_val_roc_auc_anomaly_detection(tolerance: float, cv=3, average='micro', method='mean')

This checks the k fold (cross validation) roc auc score, based on anolamies. The way the anomaly detection scheme works is, an average is calculated and then if the deviance from the average is greater than the set tolerance, then False is returned.

Parameters:
  • tolerance (float) – the tolerance from the average roc auc
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the roc auc
  • method (string) – how to calculate the center
Returns:

  • True if all the deviances from average for all the folds
  • are above tolerance for roc auc
  • False if any of the deviances from the average for any of
  • the folds are below the tolerance for roc auc

cross_val_roc_auc_avg(minimum_center_tolerance, cv=3, average='micro', method='mean')

This generates the k fold (cross validation) roc auc scores, then based on computes the average of those scores. The way the average scheme works is, an average is calculated and then if the average is less than the minimum tolerance, then False is returned.

Parameters:
  • minimum_center_tolerance (float) – the average roc auc must be greater than this number
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the roc auc
  • method (string) – how to calculate the center
Returns:

  • True if all the folds of the roc auc are greater than
  • the minimum_center_tolerance
  • False if the average folds for the roc auc are less than
  • the minimum_center_tolerance

cross_val_roc_auc_lower_boundary(lower_boundary, cv=3, average='micro')

This is possibly the most naive stragey, it generates the k fold (cross validation) roc auc scores, if any of the k folds are less than the lower boundary, then False is returned.

Parameters:
  • lower_boundary (float) – the lower boundary for a given roc auc score
  • cv (int) – the number of folds to consider
  • average (string) – how to calculate the roc auc
Returns:

  • True if all the folds of the roc auc scores are greater than
  • the lower_boundary
  • False if the folds for the roc auc scores are less than
  • the lower_boundary

describe_scores(scores, method)

Describes scores.

Parameters:
  • scores (array-like) – the scores from the model, as a list or numpy array
  • method (string) – the method to use to calculate central tendency and spread
Returns:

  • Returns the central tendency, and spread
  • by method.
  • Methods
  • mean
  • * central tendency (mean)
  • * spread (standard deviation)
  • median
  • * central tendency (median)
  • * spread (interquartile range)
  • trimean
  • * central tendency (trimean)
  • * spread (trimean absolute deviation)

f1_cv(cv, average='binary')

This method performs cross-validation over f1-score.

Parameters:
  • cv (*) – The number of cross validation folds to perform
  • average (*) –

    [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data.

    ’binary’:
    Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary.
    ’micro’:
    Calculate metrics globally by counting the total true positives, false negatives and false positives.
    ’macro’:
    Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
    ’weighted’:
    Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall.
    ’samples’:
    Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
Returns:

Return type:

Returns a scores of the k-fold f1-score.

f1_lower_boundary_per_class(lower_boundary: dict, average='binary')

This is a slightly less naive stragey, it checks the f1 score, Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:
  • lower_boundary (dict) – the lower boundary for each class’ f1 score
  • average (string) – how to calculate the f1
Returns:

  • True if all the classes of the f1 scores are
  • greater than the lower_boundary
  • False if the classes for the f1 scores are
  • less than the lower_boundary

get_test_score(cross_val_dict)
is_binary()

If number of classes == 2 returns True False otherwise

precision_cv(cv, average='binary')

This method performs cross-validation over precision.

Parameters:
  • cv (*) – The number of cross validation folds to perform
  • average (*) –

    [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data.

    ’binary’:
    Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary.
    ’micro’:
    Calculate metrics globally by counting the total true positives, false negatives and false positives.
    ’macro’:
    Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
    ’weighted’:
    Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall.
    ’samples’:
    Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
Returns:

Return type:

Returns a scores of the k-fold precision.

precision_lower_boundary_per_class(lower_boundary: dict, average='binary')

This is a slightly less naive stragey, it checks the precision score, Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:
  • lower_boundary (dict) – the lower boundary for each class’ precision score
  • average (string) – how to calculate the precision
Returns:

  • True if all the classes of the precision scores are
  • greater than the lower_boundary
  • False if the classes for the precision scores are
  • less than the lower_boundary

recall_cv(cv, average='binary')

This method performs cross-validation over recall.

Parameters:
  • cv (*) – The number of cross validation folds to perform
  • average (*) –

    [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data.

    ’binary’:
    Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary.
    ’micro’:
    Calculate metrics globally by counting the total true positives, false negatives and false positives.
    ’macro’:
    Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
    ’weighted’:
    Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall.
    ’samples’:
    Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
Returns:

Return type:

Returns a scores of the k-fold recall.

recall_lower_boundary_per_class(lower_boundary: dict, average='binary')

This is a slightly less naive stragey, it checks the recall score, Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:
  • lower_boundary (dict) – the lower boundary for each class’ recall score
  • average (string) – how to calculate the recall
Returns:

  • True if all the classes of the recall scores are
  • greater than the lower_boundary
  • False if the classes for the recall scores are
  • less than the lower_boundary

reset_average(average)

Resets the average to the correct thing. If the number of classes are not binary, Then average is changed to micro. Otherwise, return the current average.

roc_auc_cv(cv, average='micro')

This method performs cross-validation over roc_auc.

Parameters:
  • cv (*) – The number of cross validation folds to perform
  • average (*) –

    [None, ‘binary’(default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data.

    ’binary’:
    Only report results for the class specified by pos_label. This is applicable only if targets (y_{true, pred}) are binary.
    ’micro’:
    Calculate metrics globally by counting the total true positives, false negatives and false positives.
    ’macro’:
    Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
    ’weighted’:
    Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that isnot between precision and recall.
    ’samples’:
    Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
Returns:

Return type:

Returns a scores of the k-fold roc_auc.

roc_auc_exception()

Ensures roc_auc score is used correctly. ROC AUC is only defined for binary classification.

roc_auc_lower_boundary_per_class(lower_boundary: dict, average='micro')

This is a slightly less naive stragey, it checks the roc auc score, Each class is boundary is mapped to the class via a dictionary allowing for different lower boundaries, per class. if any of the classes are less than the lower boundary, then False is returned.

Parameters:
  • lower_boundary (dict) – the lower boundary for each class’ roc auc score
  • average (string) – how to calculate the roc auc
Returns:

  • True if all the classes of the roc auc scores are
  • greater than the lower_boundary
  • False if the classes for the roc auc scores are
  • less than the lower_boundary

run_energy_stress_test(sample_sizes: list, max_energy_usages: list, print_to_screen=False, print_to_pdf=False)

This is a performance test to ensure that the model is energy efficient.

Note: the model must take longer than 5 seconds to run otherwise energyusage cannot accurate estimate the energy cost. At this point, the cost is neglible. Therefore, when testing, please try to use reasonable size estimates based on expected throughput.

Parameters:
  • sample_sizes (list) – the size of each sample to test for doing a prediction, each sample size is an integer
  • max_energy_usages (list) – the maximum time in seconds that each sample should take to predict, at a maximum.
Returns:

  • True if all samples predict within the maximum allowed
  • energy usage.
  • False otherwise.

run_time_stress_test(sample_sizes: list, max_run_times: list)

This is a performance test to ensure that the model runs fast enough.

sample_sizes : list
the size of each sample to test for doing a prediction, each sample size is an integer
max_run_times : list
the maximum time in seconds that each sample should take to predict, at a maximum.
Returns:
  • True if all samples predict within the maximum allowed
  • time.
  • False otherwise.
spread_cross_val_classifier_testing(precision_tolerance: float, recall_tolerance: float, f1_tolerance: float, method='mean', cv=10, average='binary')

This is a somewhat intelligent stragey, it generates the k fold (cross validation) the following scores: * precision scores, * recall scores * f1 scores if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:
  • tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
  • method (string) –

    see describe for more details. * mean : the center is the mean, the spread is standard

    deviation.
    • median : the center is the median, the spread is
      the interquartile range.
    • trimean : the center is the trimean, the spread is
      trimean absolute deviation.
  • average (string) – how to calculate the precision
Returns:

  • True if all the folds of the precision, recall, f1 scores
  • are greater than the center - (spread * tolerance)
  • False if the folds for the precision, recall, f1 scores
  • are less than the center - (spread * tolerance)

spread_cross_val_f1_anomaly_detection(tolerance, method='mean', cv=10, average='binary')

This is a somewhat intelligent stragey, it generates the k fold (cross validation) f1 scores, if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:
  • tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
  • method (string) –

    see describe for more details. * mean : the center is the mean, the spread is standard

    deviation.
    • median : the center is the median, the spread is
      the interquartile range.
    • trimean : the center is the trimean, the spread is
      trimean absolute deviation.
  • average (string) – how to calculate the f1 score
Returns:

  • True if all the folds of the f1 scores are greater than
  • the center - (spread * tolerance)
  • False if the folds for the f1 scores are less than
  • the center - (spread * tolerance)

spread_cross_val_precision_anomaly_detection(tolerance, method='mean', cv=10, average='binary')

This is a somewhat intelligent stragey, it generates the k fold (cross validation) precision scores, if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:
  • tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
  • method (string) –

    see describe for more details. * mean : the center is the mean, the spread is standard

    deviation.
    • median : the center is the median, the spread is
      the interquartile range.
    • trimean : the center is the trimean, the spread is
      trimean absolute deviation.
  • average (string) – how to calculate the precision
Returns:

  • True if all the folds of the precision scores are greater than
  • the center - (spread * tolerance)
  • False if the folds for the precision scores are less than
  • the center - (spread * tolerance)

spread_cross_val_recall_anomaly_detection(tolerance, method='mean', cv=3, average='binary')

This is a somewhat intelligent stragey, it generates the k fold (cross validation) recall scores, if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:
  • tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
  • method (string) –

    see describe for more details. * mean : the center is the mean, the spread is standard

    deviation.
    • median : the center is the median, the spread is
      the interquartile range.
    • trimean : the center is the trimean, the spread is
      trimean absolute deviation.
  • average (string) – how to calculate the recall
Returns:

  • True if all the folds of the recall scores are greater than
  • the center - (spread * tolerance)
  • False if the folds for the recall scores are less than
  • the center - (spread * tolerance)

spread_cross_val_roc_auc_anomaly_detection(tolerance, method='mean', cv=10, average='micro')

This is a somewhat intelligent stragey, it generates the k fold (cross validation) roc auc scores, if any of the k folds score less than the center - (spread * tolerance), then False is returned.

Parameters:
  • tolerance (float) – the tolerance modifier for how far below the center the score can be before a false is returned
  • method (string) –

    see describe for more details. * mean : the center is the mean, the spread is standard

    deviation.
    • median : the center is the median, the spread is
      the interquartile range.
    • trimean : the center is the trimean, the spread is
      trimean absolute deviation.
  • average (string) – how to calculate the precision
Returns:

  • True if all the folds of the roc auc scores are greater than
  • the center - (spread * tolerance)
  • False if the folds for the roc auc scores are less than
  • the center - (spread * tolerance)

trimean(data)

I’m exposing this as a public method because the trimean is not implemented in enough packages.

Formula: (25th percentile + 2*50th percentile + 75th percentile)/4

Parameters:data (array-like) – an iterable, either a list or a numpy array
Returns:the trimean
Return type:float
trimean_absolute_deviation(data)

The trimean absolute deviation is the the average distance from the trimean.

Parameters:data (array-like) – an iterable, either a list or a numpy array
Returns:the average distance to the trimean
Return type:float
class drifter_ml.classification_tests.ClassifierComparison(clf_one, clf_two, test_data, target_name, column_names)

Bases: drifter_ml.classification_tests.classification_tests.FixedClassificationMetrics

cross_val_f1(clf, cv=3, average='binary')
cross_val_f1_per_class(clf, cv=3, average='binary')
cross_val_per_class_two_model_classifier_testing(cv=3, average='binary')
cross_val_precision(clf, cv=3, average='binary')
cross_val_precision_per_class(clf, cv=3, average='binary')
cross_val_recall(clf, cv=3, average='binary')
cross_val_recall_per_class(clf, cv=3, average='binary')
cross_val_roc_auc(clf, cv=3, average='micro')
cross_val_roc_auc_per_class(clf, cv=3, average='micro')
cross_val_two_model_classifier_testing(cv=3, average='binary')
f1_per_class(clf, average='binary')
is_binary()
precision_per_class(clf, average='binary')
recall_per_class(clf, average='binary')
reset_average(average)
roc_auc_exception()
roc_auc_per_class(clf, average='micro')
two_model_classifier_testing(average='binary')
two_model_prediction_run_time_stress_test(sample_sizes)