metrics
Module¶
Metrics that can be used to evaluate the performance of learners.
author:  Nitin Madnani (nmadnani@ets.org) 

author:  Michael Heilman (mheilman@ets.org) 
author:  Dan Blanchard (dblanchard@ets.org) 
organization:  ETS 

skll.metrics.
correlation
(y_true, y_pred, corr_type='pearson')[source]¶ Calculate given correlation between
y_true
andy_pred
.y_pred
can be multidimensional. Ify_pred
is 1dimensional, it may either contain probabilities, mostlikely classification labels, or regressor predictions. In that case, we simply return the correlation betweeny_true
andy_pred
. Ify_pred
is multidimensional, it contains probabilties for multiple classes in which case, we infer the most likely labels and then compute the correlation between those andy_true
.Parameters:  y_true (arraylike of float) – The true/actual/gold labels for the data.
 y_pred (arraylike of float) – The predicted/observed labels for the data.
 corr_type (str, optional) – Which type of correlation to compute. Possible
choices are
pearson
,spearman
, andkendall_tau
. Defaults topearson
.
Returns: ret_score – correlation value if welldefined, else 0.0
Return type: float

skll.metrics.
f1_score_least_frequent
(y_true, y_pred)[source]¶ Calculate the F1 score of the least frequent label/class in
y_true
fory_pred
.Parameters:  y_true (arraylike of float) – The true/actual/gold labels for the data.
 y_pred (arraylike of float) – The predicted/observed labels for the data.
Returns: ret_score – F1 score of the least frequent label.
Return type: float

skll.metrics.
kappa
(y_true, y_pred, weights=None, allow_off_by_one=False)[source]¶ Calculates the kappa interrater agreement between two the gold standard and the predicted ratings. Potential values range from 1 (representing complete disagreement) to 1 (representing complete agreement). A kappa value of 0 is expected if all agreement is due to chance.
In the course of calculating kappa, all items in
y_true
andy_pred
will first be converted to floats and then rounded to integers.It is assumed that y_true and y_pred contain the complete range of possible ratings.
This function contains a combination of code from yorchopolis’s kappastats and Ben Hamner’s Metrics projects on Github.
Parameters:  y_true (arraylike of float) – The true/actual/gold labels for the data.
 y_pred (arraylike of float) – The predicted/observed labels for the data.
 weights (str or np.array, optional) –
Specifies the weight matrix for the calculation. Options are
 None = unweightedkappa  'quadratic' = quadraticweighted kappa  'linear' = linearweighted kappa  twodimensional numpy array = a custom matrix of
weights. Each weight corresponds to the \(w_{ij}\) values in the wikipedia description of how to calculate weighted Cohen’s kappa. Defaults to None.
 allow_off_by_one (bool, optional) – If true, ratings that are off by one are counted as equal, and all other differences are reduced by one. For example, 1 and 2 will be considered to be equal, whereas 1 and 3 will have a difference of 1 for when building the weights matrix. Defaults to False.
Returns: k – The kappa score, or weighted kappa score.
Return type: float
Raises: AssertionError
– Ify_true
!=y_pred
.ValueError
– If labels cannot be converted to int.ValueError
– If invalid weight scheme.

skll.metrics.
register_custom_metric
(custom_metric_path, custom_metric_name)[source]¶ Import, load, and register the custom metric function from the given path.
Parameters:  custom_metric_path (str) – The path to a custom metric.
 custom_metric_name (str) – The name of the custom metric function to load. This function must take only two arraylike arguments: the true labels and the predictions, in that order.
Raises: ValueError
– If the custom metric path does not end in ‘.py’.NameError
– If the name of the custom metric file conflicts with an already existing attribute inskll.metrics
or if the custom metric name conflicts with a scikitlearn or SKLL metric.

skll.metrics.
use_score_func
(func_name, y_true, y_pred)[source]¶ Call the scoring function in
sklearn.metrics.SCORERS
with the given name. This takes care of handling keyword arguments that were prespecified when creating the scorer. This applies any signflipping that was specified bymake_scorer()
when the scorer was created.Parameters:  func_name (str) – The name of the objective function to use from SCORERS.
 y_true (arraylike of float) – The true/actual/gold labels for the data.
 y_pred (arraylike of float) – The predicted/observed labels for the data.
Returns: ret_score – The scored result from the given scorer.
Return type: float