Solution proposal - exercise 8
Support Vector Machines
INF4300 - Digital Image Analysis
Ole-Johan Skrede
University of Oslo
October 2017
Task 1
Note: This is a somewhat elaborate explanation, but you will find the answer to Task 1 in it somewhere.
We introduce the Support Vector Machines with a binary, linearly separable classification problem. Let be a -dimensional feature vector, and the label of the class which belongs to. In the case of a binary classification, we have a training set
consisting of examples from two classes: and . (Notice that I use the more general “label 0” and “label 1” in stead of the labels which I find a bit limiting and confusing.) To make things easier below, we define the index sets , and .
We know that the data points (the feature vectors ) are linearly separable, which means that there exist (infinite) hyperplanes on the form
such that and .
We are going to write this last result on a particular form. In order to illustrate it, we use the 2D example shown in Figure 1.
In this example, all red datapoints belong to class 0 and all green datapoints belong to class 1. We have drawn in 3 line segments.
- A: The line segment “closest” to class 0 such that all elements in class 0 are above it, and all elements in class 1 are below it.
- C: The line segment “closest” to class 1 such that all elements in class 1 are below it, and all elements in class 0 are above it. It also has the same slope (governed by as the line segment A.
- B: The line segment with slope that lies in the middle of A and C.
If you study the figure, you will notice that
or
If we scale the equations in the set with and respectively, we end up with
By introducing some indicator variables
and multiplying them into the equations above, we get
Updating and , we end up with the equation
This illustrates that we have some freedom to vary and within the constraints of the last equation.
Task 2
First, let us define the data
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('ggplot')
X_0 = np.array([[1, 9], [5, 5], [1, 1]])
X_1 = np.array([[8, 5], [13, 1], [13, 9]])
# Concatenate them and provide an extra label vector
X = np.vstack((X_0, X_1))
Y = np.array([0, 0, 0, 1, 1, 1])
Then, we create a scatter plot function that will be used throughout this exercise.
def make_meshgrid(X, h=.02, x_range=None, y_range=None):
"""Make a meshgrid covering the range of X. This is used to display classification regions.
Args:
X: numpy array of shape [n, 2] containing n 2d feature vectors
h: parameter controlling the resolution of the
"""
if x_range is None:
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
else:
x_min, x_max = x_range
if y_range is None:
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
else:
y_min, y_max = y_range
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
return xx, yy
def scatter(X, Y, xx=None, yy=None, Z=None, plot_mean=False, x_range=None, y_range=None):
"""
Scatter plot with optional classification area and mean plot
Args:
X: numpy array of shape [n, 2] where n is the total number of datapoints
Y: numpy array of shape [n] containing the labels {1, 2, 3, ...} of X
xx: meshgrid x
yy: meshgrid y
Z: The result of applying some prediction function on all points in xx and yy
"""
#import seaborn as sns
#current_palette = sns.color_palette("muted", n_colors=5)
#cmap = ListedColormap(sns.color_palette(current_palette).as_hex())
cmap=plt.cm.coolwarm
if x_range is None:
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
else:
x_min, x_max = x_range
if y_range is None:
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
else:
y_min, y_max = y_range
plt.figure(figsize=(10, 10))
if xx is not None and yy is not None and Z is not None:
# Color class regions
plt.gca().contourf(xx, yy, Z, cmap=cmap, alpha=0.7)
plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=cmap, marker='o', edgecolors='k')
if plot_mean:
mean_0 = np.mean(X[Y == 0], axis=0)
mean_1 = np.mean(X[Y == 1], axis=0)
plt.scatter(mean_0[0], mean_0[1], c='cyan', cmap=cmap, marker='x')
plt.scatter(mean_1[0], mean_1[1], c='magenta', cmap=cmap, marker='x')
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.axes().set_aspect('equal')
plt.show()
In this case, the decision boundary will be the vertical line segment at $x = 6.5.$ The support vectors will be the data points closest to the decision boundary, namely $(5, 5)$ for Class 0 and $(8, 5)$ for Class 1.
# Create a meshgrid of our domain
xx, yy = make_meshgrid(X)
# Define our class regions according to the decision boundary above
Z = (xx > 6.5)*1.0
scatter(X, Y, xx, yy, Z)
Task 3
a)
X = np.array([[2, 2], [3, 3], [4, 4], [5, 5], [4, 6], [3, 7], [4, 8], [5, 9], [6, 10],
[6, 2], [7, 3], [8, 4], [9, 5], [8, 6], [7, 7], [7, 8], [7, 9], [8, 10]])
Y = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1])
scatter(X, Y)
b)
In the case of a Gaussian classifier with a scalar identity covariance matrix, the decision boundary (assuming equal priors) would be the line equidistant to the two class means, and normal on the line segment joining the two means. So, let us create a function that compute this line segment.
def compute_normal(X, Y):
"""
Computes the line consisting of points equidistant from the means
of data_0 and data_1.
A line (x, y) through a point (x0, y0) and normal to the line given by
the vector v = [a, b] will obey the following equation
a(x - x0) + b(y - y0) = 0.
In this case, the vector between the mean points is our normal vector,
and the point it should pass through is the average between the means.
"""
X_0 = X[Y == 0]
X_1 = X[Y == 1]
mean_0 = np.mean(X_0, axis=0)
mean_1 = np.mean(X_1, axis=0)
avg_01 = [(mean_0[0] + mean_1[0])/2, (mean_0[1] + mean_1[1])/2]
# Compute min and max for x, mostly for plotting reasons
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
# Compute the line equidistant to both means
if mean_0[1] == mean_0[1]: # Vertical decision boundary
x = [avg_01[0] for _ in range(y_min, y_max)]
y = [i for i in range(y_min, y_max)]
else:
x = np.linspace(x_min, x_max)
y = avg_01[1] - (mean_1[0] - mean_0[0])/(mean_1[1] - mean_0[1])*(x - avg_01[0])
return np.column_stack((x, y))
Now, we can use this function to plot the mean values together with the decision boundary.
decision_boundary = compute_normal(X, Y)
# In this case, the decision boundary happens to also be vertical
xx, yy = make_meshgrid(X)
Z = (xx > decision_boundary[0, 0])*1.0
scatter(X, Y, xx, yy, Z, plot_mean=True)
c)
We see that we classify one datapoint from Class 1 as Class 2, and since we have 18 data points in total, this would be an error rate of $1 / 18$ or about $5.6\%$.
d)
In this case, we can see (by inspection of the scatter plot) that the support vectors are $(5, 5)$, $(6, 10)$, $(7, 9)$. In order to draw the classification area, I use the svm
module from the sklearn
library. We will explore this library further below, so no explanation is given here.
from sklearn import svm
classifier = svm.SVC(C=100.0, kernel='linear')
classifier.fit(X, Y)
xx, yy = make_meshgrid(X)
Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
scatter(X, Y, xx, yy, Z)
e)
As we can see, the SVM classifies the training set with an error rate of 0.
Task 4
Assuming that we have downloaded the data and put it in a reasonable location, we can take a look at it.
a)
## Load the data
import scipy.io
normaldistdata = scipy.io.loadmat('../../images/mynormaldistdataset.mat')
print("Normaldistdata keys: ", normaldistdata.keys())
bananadata = scipy.io.loadmat('../../images/mybananadataset.mat')
print("Bananadata keys: ", bananadata.keys())
Normaldistdata keys: dict_keys(['A', '__version__', '__header__', '__globals__', 'a'])
Bananadata keys: dict_keys(['__version__', 'A', 'B', 'a', '__header__', '__globals__', 'b'])
In this case, A
contains data points and a
contains corresponding labels. Similarly, b
contains the labels for the data in B
.
First, let us look at the data in normaldistdata
.
X = normaldistdata['A']
Y = np.squeeze(normaldistdata['a'])
print("Number of datapoints in class 0: ", len(X[Y == 0]))
print("Number of datapoints in class 1: ", len(X[Y == 1]))
scatter(X, Y)
Number of datapoints in class 0: 100
Number of datapoints in class 1: 100
Then, dataset A
in mybananadataset
X = bananadata['A']
Y = np.squeeze(bananadata['a'])
print("Number of datapoints in class 0: ", len(X[Y == 0]))
print("Number of datapoints in class 1: ", len(X[Y == 1]))
scatter(X, Y)
Number of datapoints in class 0: 94
Number of datapoints in class 1: 106
And finally, dataset B
in mybananadataset
X = bananadata['B']
Y = np.squeeze(bananadata['b'])
print("Number of datapoints in class 0: ", len(X[Y == 0]))
print("Number of datapoints in class 1: ", len(X[Y == 1]))
scatter(X, Y)
Number of datapoints in class 0: 94
Number of datapoints in class 1: 106
b)
First, we create some functions that we can use to evaluate the results
from collections import defaultdict
def evaluate(reference, proposal):
"""Compute evaluation metrics based on the confusion matrix of the proposed labeling,
given a reference labeling.
Confusion matrix relative to class 1:
PROPOSAL
class 0 1
REFERENCE 0 TN FP
1 FN TP
Note that in a multilabel (> 2) case, it would be useful to compute a metric for each label. In the binary
case, we can do it simpler by just naming label 1 as positive and label 0 as negative, and get one
value for each metric.
"""
num_classes = 2
cm = np.zeros((num_classes, num_classes))
for i, ref_val in enumerate(reference):
prop_val = proposal[i]
cm[ref_val, prop_val] += 1
metrics = defaultdict(list)
for c in range(num_classes):
metrics['TP'].append(cm[c, c]) # True positive
metrics['FP'].append(np.sum(cm[:, c]) - metrics['TP'][c]) # False positive
metrics['FN'].append(np.sum(cm[c, :]) - metrics['TP'][c]) # False negative
metrics['TN'].append(np.sum(cm) - metrics['TP'][c] - metrics['FN'][c] - metrics['FP'][c]) # True negative
# Sensitivity
if metrics['FN'][c] + metrics['TP'][c] > 0:
metrics['tpr'].append(metrics['TP'][c] / (metrics['FN'][c] + metrics['TP'][c]))
else:
metrics['tpr'].append(0.0)
# Specificity
if metrics['FP'][c] + metrics['TN'][c] > 0:
metrics['tnr'].append(metrics['TN'][c] / (metrics['FP'][c] + metrics['TN'][c]))
else:
metrics['tnr'].append(0.0)
# Precision
if metrics['FP'][c] + metrics['TP'][c] > 0:
metrics['ppv'].append(metrics['TP'][c] / (metrics['FP'][c] + metrics['TP'][c]))
else:
metrics['ppv'].append(0.0)
# Accuracy
if np.sum(cm) > 0:
metrics['acc'].append((metrics['TP'][c] + metrics['TN'][c]) / np.sum(cm))
else:
metrics['acc'].append(0.0)
return metrics
def average_metrics(list_of_metrics):
"""Average list of metrics."""
sum_metrics = {}
for metrics in list_of_metrics:
for key, metric in metrics.items():
if key in sum_metrics.keys():
sum_metrics[key] += np.array(metric)
else:
sum_metrics[key] = np.array(metric)
avg_metrics = {}
for key, val in sum_metrics.items():
avg_metrics[key] = val / len(list_of_metrics)
return avg_metrics
def pretty_print(metrics):
"""Print metrics in a table"""
print_header = True
for name, values in sorted(metrics.items()):
if print_header:
printstr = "{0:<20} ".format("Metric name")
for i, _ in enumerate(values):
printstr += "Label {0:<2} ".format(i)
printstr += "Average"
print_header = False
print(printstr)
printstr = "{0:<20} ".format(name)
for val in values:
printstr += "{0:>8,.3f} ".format(val)
printstr += "{0:>8,.3f}".format(np.mean(values))
print(printstr)
Now, we can start to explore the svm
module.
from sklearn import svm
X = normaldistdata['A']
Y = np.squeeze(normaldistdata['a'])
classifier = svm.SVC()
classifier.fit(X, Y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
The model is now trained on the data, with the default hyperparameter values listed above. We can use this to classify new data points.
print("Point (1, 2) is predicted to have label", classifier.predict([[1, 2]]))
Point (1, 2) is predicted to have label [1]
The parameters we care about in this task are C
, kernel
, and gamma
. In this subtask we shall use a linear kernel, and various values of $C$. Let us try with $C = 1$ first.
classifier = svm.SVC(C=1.0, kernel='linear')
classifier.fit(X, Y)
xx, yy = make_meshgrid(X)
Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
scatter(X, Y, xx, yy, Z)
pretty_print(evaluate(Y, classifier.predict(X)))
Metric name Label 0 Label 1 Average
FN 2.000 0.000 1.000
FP 0.000 2.000 1.000
TN 100.000 98.000 99.000
TP 98.000 100.000 99.000
acc 0.990 0.990 0.990
ppv 1.000 0.980 0.990
tnr 1.000 0.980 0.990
tpr 0.980 1.000 0.990
It seems like some of the outliers are missclassified. By increasing $C$, we can enforce stricter regularization, and try to leviate this.
classifier = svm.SVC(C=10.0, kernel='linear')
classifier.fit(X, Y)
xx, yy = make_meshgrid(X)
Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
scatter(X, Y, xx, yy, Z)
pretty_print(evaluate(Y, classifier.predict(X)))
Metric name Label 0 Label 1 Average
FN 0.000 1.000 0.500
FP 1.000 0.000 0.500
TN 99.000 100.000 99.500
TP 100.000 99.000 99.500
acc 0.995 0.995 0.995
ppv 0.990 1.000 0.995
tnr 0.990 1.000 0.995
tpr 1.000 0.990 0.995
This is better, but it still seems like some of the outliers, now from the other class, are missclassified. We will try to increase $C$ once more.
classifier = svm.SVC(C=100.0, kernel='linear')
classifier.fit(X, Y)
xx, yy = make_meshgrid(X)
Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
scatter(X, Y, xx, yy, Z)
pretty_print(evaluate(Y, classifier.predict(X)))
Metric name Label 0 Label 1 Average
FN 0.000 0.000 0.000
FP 0.000 0.000 0.000
TN 100.000 100.000 100.000
TP 100.000 100.000 100.000
acc 1.000 1.000 1.000
ppv 1.000 1.000 1.000
tnr 1.000 1.000 1.000
tpr 1.000 1.000 1.000
We now are able to classify the training set perfectly.
c)
The classifier also gives information about the support vectors.
print("Support vectors: ", classifier.support_vectors_)
print("Support vector indices: ", classifier.support_)
print("Number of support vectors for class 0 and class respectively: ", classifier.n_support_)
Support vectors: [[ 0.07008741 0.16943029]
[ 0.05 0.25 ]
[ 0.2316275 0.28493687]]
Support vector indices: [114 199 39]
Number of support vectors for class 0 and class respectively: [2 1]
So, we can try to remove some vectors which indices are not [114, 199, 39]
.
X_sub = X[30:, :]
Y_sub = Y[30:]
classifier = svm.SVC(C=100.0, kernel='linear')
classifier.fit(X_sub, Y_sub)
xx, yy = make_meshgrid(X_sub)
Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
scatter(X_sub, Y_sub, xx, yy, Z)
pretty_print(evaluate(Y_sub, classifier.predict(X_sub)))
Metric name Label 0 Label 1 Average
FN 0.000 0.000 0.000
FP 0.000 0.000 0.000
TN 70.000 100.000 85.000
TP 100.000 70.000 85.000
acc 1.000 1.000 1.000
ppv 1.000 1.000 1.000
tnr 1.000 1.000 1.000
tpr 1.000 1.000 1.000
As expected, removing non-support vectors did not change the decision boundary, but let us try removing all except the support vectors.
X_sub = np.array([X[114, :], X[199, :], X[39, :]])
Y_sub = np.array([Y[114], Y[199], Y[39]])
classifier = svm.SVC(C=100.0, kernel='linear')
classifier.fit(X_sub, Y_sub)
xx, yy = make_meshgrid(X_sub, x_range=(-3, 3), y_range=(-3, 3))
Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
scatter(X_sub, Y_sub, xx, yy, Z, x_range=(-3, 3), y_range=(-3, 3))
pretty_print(evaluate(Y_sub, classifier.predict(X_sub)))
Metric name Label 0 Label 1 Average
FN 0.000 0.000 0.000
FP 0.000 0.000 0.000
TN 1.000 2.000 1.500
TP 2.000 1.000 1.500
acc 1.000 1.000 1.000
ppv 1.000 1.000 1.000
tnr 1.000 1.000 1.000
tpr 1.000 1.000 1.000
d)
As we see below, the linear svm does a decent job of separating the features, but is of course unable to get a perfect fit on this data.
X = bananadata['A']
Y = np.squeeze(bananadata['a'])
classifier = svm.SVC(C=0.001, kernel='linear')
classifier.fit(X, Y)
xx, yy = make_meshgrid(X)
Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
scatter(X, Y, xx, yy, Z)
pretty_print(evaluate(Y, classifier.predict(X)))
Metric name Label 0 Label 1 Average
FN 16.000 11.000 13.500
FP 11.000 16.000 13.500
TN 95.000 78.000 86.500
TP 78.000 95.000 86.500
acc 0.865 0.865 0.865
ppv 0.876 0.856 0.866
tnr 0.896 0.830 0.863
tpr 0.830 0.896 0.863
classifier = svm.SVC(C=0.1, kernel='linear')
classifier.fit(X, Y)
xx, yy = make_meshgrid(X)
Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
scatter(X, Y, xx, yy, Z)
pretty_print(evaluate(Y, classifier.predict(X)))
Metric name Label 0 Label 1 Average
FN 11.000 12.000 11.500
FP 12.000 11.000 11.500
TN 94.000 83.000 88.500
TP 83.000 94.000 88.500
acc 0.885 0.885 0.885
ppv 0.874 0.895 0.884
tnr 0.887 0.883 0.885
tpr 0.883 0.887 0.885
classifier = svm.SVC(C=10.0, kernel='linear')
classifier.fit(X, Y)
xx, yy = make_meshgrid(X)
Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
scatter(X, Y, xx, yy, Z)
pretty_print(evaluate(Y, classifier.predict(X)))
Metric name Label 0 Label 1 Average
FN 13.000 12.000 12.500
FP 12.000 13.000 12.500
TN 94.000 81.000 87.500
TP 81.000 94.000 87.500
acc 0.875 0.875 0.875
ppv 0.871 0.879 0.875
tnr 0.887 0.862 0.874
tpr 0.862 0.887 0.874
e)
Lets try some different values. The default value seems to be okay.
num_features = 2
classifier = svm.SVC(C=1.0, kernel='rbf', gamma=1/num_features) # This is the default gamma
classifier.fit(X, Y)
xx, yy = make_meshgrid(X)
Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
scatter(X, Y, xx, yy, Z)
pretty_print(evaluate(Y, classifier.predict(X)))
Metric name Label 0 Label 1 Average
FN 0.000 0.000 0.000
FP 0.000 0.000 0.000
TN 106.000 94.000 100.000
TP 94.000 106.000 100.000
acc 1.000 1.000 1.000
ppv 1.000 1.000 1.000
tnr 1.000 1.000 1.000
tpr 1.000 1.000 1.000
Reducing this seems to underfit the data
classifier = svm.SVC(C=1.0, kernel='rbf', gamma=1e-2)
classifier.fit(X, Y)
xx, yy = make_meshgrid(X)
Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
scatter(X, Y, xx, yy, Z)
pretty_print(evaluate(Y, classifier.predict(X)))
Metric name Label 0 Label 1 Average
FN 9.000 8.000 8.500
FP 8.000 9.000 8.500
TN 98.000 85.000 91.500
TP 85.000 98.000 91.500
acc 0.915 0.915 0.915
ppv 0.914 0.916 0.915
tnr 0.925 0.904 0.914
tpr 0.904 0.925 0.914
While increasing it seems to overfit
classifier = svm.SVC(C=1.0, kernel='rbf', gamma=10)
classifier.fit(X, Y)
xx, yy = make_meshgrid(X)
Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
scatter(X, Y, xx, yy, Z)
pretty_print(evaluate(Y, classifier.predict(X)))
Metric name Label 0 Label 1 Average
FN 0.000 0.000 0.000
FP 0.000 0.000 0.000
TN 106.000 94.000 100.000
TP 94.000 106.000 100.000
acc 1.000 1.000 1.000
ppv 1.000 1.000 1.000
tnr 1.000 1.000 1.000
tpr 1.000 1.000 1.000
f)
Manual grid search with cross-validation
We create a function to do the whole thing.
def manual_grid_search_with_cross_validation(C_arr, gamma_arr):
"""Do a grid search of values in C_arr and gamma_arr using cross-validation.
Note that the sample size of 200 is hard-coded, and the 10-fold cross validation is
carried out with this in mind.
Return a dict of dicts of dicts with evaluation metrics.
metrics_collection[C][gamma] = [metrics_1, metrics_2, ..., metrics_10]
"""
# We partition the data into 10 radom partitions (since we are doing 10-fold cross-validation).
# One way of doing this is to shuffle indices $[0, n-1]$ and then using a different part of indices for each fold.
num_samples = len(Y) # n = 200 in this case
indices = np.arange(0, num_samples)
np.random.shuffle(indices, )
metrics_collection = {}
for C in C_arr:
gamma_metrics = {}
for gamma in gamma_arr:
list_of_metrics = []
for i in range(10):
# Select indices for train and test
start_ind = 20*i
end_ind = start_ind + 20
temp_test_indices = indices[start_ind:end_ind]
test_indices = np.zeros(num_samples, dtype=bool)
test_indices[temp_test_indices] = True
train_indices = (1 - test_indices).astype(bool)
# Partition data
X_train = X[train_indices, :]
Y_train = Y[train_indices]
X_test = X[test_indices, :]
Y_test = Y[test_indices]
# Train
classifier = svm.SVC(C=C, kernel='rbf', gamma=gamma)
classifier.fit(X_train, Y_train)
# Test
Y_test_proposal = classifier.predict(X_test)
# Evaluate
list_of_metrics.append(evaluate(Y_test, Y_test_proposal))
gamma_metrics[gamma] = average_metrics(list_of_metrics)
metrics_collection[C] = gamma_metrics
return metrics_collection
Let us create a printout so that we can make sense of the results
def print_table(metrics_collection, label=None):
"""Print a table of results from each iteration of the grid search.
Either for a given label or the average over labels (default)
"""
header_str = "{0:>11} {1:>11}".format("C", "gamma")
print_header=True
max_vals = {}
for C, gamma_dict in sorted(metrics_collection.items()):
for gamma, avg_met in sorted(gamma_dict.items()):
if print_header:
for key, _ in sorted(avg_met.items()):
header_str += " {0:>11}".format(key)
max_vals[key] = [0.0, 0.0, 0.0]
print(header_str)
print_header = False
result_str = "{0:>11,.4f} {1:>11,.4f}".format(C, gamma)
for key, values in sorted(avg_met.items()):
if label is None:
this_val = np.mean(values)
else:
this_val = values[label]
result_str += " {0:>11,.3f}".format(this_val)
if this_val > max_vals[key][2]:
max_vals[key] = [C, gamma, this_val]
print(result_str)
print("\nMaximum values")
print("{0:<20} {1:>7} {2:>8} {3:>8}".format("Metric name", "value", "C", "gamma"))
for key, val in sorted(max_vals.items()):
print("{0:<20} {1:>7,.3f} {2:>8,.4f} {3:>8,.4f}".format(key, val[2], val[0], val[1]))
def result_matrix(metrics_collection, metric, label):
"""Return a numpy array of values given a metric and a label together with dicts that map
indices to keys and vice versa"""
num_gamma_values = len(next(iter(metrics_collection.values())))
num_C_values = len(metrics_collection.values())
result = np.zeros((num_C_values, num_gamma_values))
C_map = {}
gamma_map = {}
for C_ind, (C, gamma_dict) in enumerate(sorted(metrics_collection.items())):
C_map["key2ind"] = {C_ind: C}
C_map["ind2key"] = {C: C_ind}
for gamma_ind, (gamma, avg_metric) in enumerate(sorted(gamma_dict.items())):
gamma_map["key2ind"] = {gamma_ind: gamma}
gamma_map["ind2key"] = {gamma: gamma_ind}
result[C_ind, gamma_ind] = avg_metric[metric][label]
return result, C_map, gamma_map
def result_array(metrics_collection):
"""Create a 4D matrix of metric results and return it, together with a map to navigate the values in the array.
Args:
metrics_collection: Dictionary of dictionaries of metrics: metrics_collection[C][gamma] = avg_metrics
Returns:
result: A numpy array of shape [num_labels, num_C_values, num_gamma_values, num_metrics]
array_map: A dict of dicts of dicts: array_map[field][mapping][label/key] = key/label
"""
num_labels = len(next(iter(next(iter(next(iter(metrics_collection.values())).values())).values)))
num_metrics = len(next(iter(next(iter(metrics_collection.values())))))
num_gamma_values = len(next(iter(metrics_collection.values())))
num_C_values = len(metrics_collection.values())
result = np.zeros((num_labels, num_C_values, num_gamma_values, num_metrics))
label_map = {}
for label in range(num_labels):
label_map["key2ind"] = {label: label}
label_map["ind2key"] = {label: label}
metric_map = {}
for metric_ind, (metric_key, val) in enumerate(sorted(avg_met.items())):
metric_map["key2ind"] = {metric_key, metric_ind}
metric_map["ind2key"] = {metric_ind, metric_key}
C_gamma_matrix, C_map, gamma_map = result_matrix(metrics_collection, metric_key, label)
result[label, :, :, label] = C_gamma_matrix
array_map = {"label": label_map, "C": C_map, "gamma": gamma_map, "metric": metric_map}
return result, array_map
We are now ready to make some actual experiments.
X = bananadata['A']
Y = np.squeeze(bananadata['a'])
# Note: we do not use the ranges for `C` and `gamma` suggested in the lecture notes.
C_arr = np.power(2.0, np.arange(-5, 10, 2))
gamma_arr = np.power(2.0, np.arange(-5, 6, 2))
metrics_collection = manual_grid_search_with_cross_validation(C_arr, gamma_arr)
print_table(metrics_collection)
C gamma FN FP TN TP acc ppv tnr tpr
0.0312 0.0312 1.250 1.250 8.750 8.750 0.875 0.870 0.871 0.871
0.0312 0.1250 1.700 1.700 8.300 8.300 0.830 0.879 0.839 0.839
0.0312 0.5000 4.700 4.700 5.300 5.300 0.530 0.265 0.500 0.500
0.0312 2.0000 4.700 4.700 5.300 5.300 0.530 0.265 0.500 0.500
0.0312 8.0000 4.700 4.700 5.300 5.300 0.530 0.265 0.500 0.500
0.0312 32.0000 4.700 4.700 5.300 5.300 0.530 0.265 0.500 0.500
0.1250 0.0312 0.750 0.750 9.250 9.250 0.925 0.928 0.928 0.928
0.1250 0.1250 0.300 0.300 9.700 9.700 0.970 0.971 0.973 0.973
0.1250 0.5000 0.100 0.100 9.900 9.900 0.990 0.990 0.991 0.991
0.1250 2.0000 4.650 4.650 5.350 5.350 0.535 0.316 0.506 0.506
0.1250 8.0000 4.700 4.700 5.300 5.300 0.530 0.265 0.500 0.500
0.1250 32.0000 4.700 4.700 5.300 5.300 0.530 0.265 0.500 0.500
0.5000 0.0312 0.450 0.450 9.550 9.550 0.955 0.956 0.954 0.954
0.5000 0.1250 0.100 0.100 9.900 9.900 0.990 0.990 0.991 0.991
0.5000 0.5000 0.050 0.050 9.950 9.950 0.995 0.995 0.995 0.995
0.5000 2.0000 0.350 0.350 9.650 9.650 0.965 0.960 0.972 0.972
0.5000 8.0000 2.900 2.900 7.100 7.100 0.710 0.825 0.700 0.700
0.5000 32.0000 4.700 4.700 5.300 5.300 0.530 0.265 0.500 0.500
2.0000 0.0312 0.150 0.150 9.850 9.850 0.985 0.986 0.986 0.986
2.0000 0.1250 0.050 0.050 9.950 9.950 0.995 0.995 0.995 0.995
2.0000 0.5000 0.050 0.050 9.950 9.950 0.995 0.995 0.995 0.995
2.0000 2.0000 0.100 0.100 9.900 9.900 0.990 0.988 0.992 0.992
2.0000 8.0000 1.400 1.400 8.600 8.600 0.860 0.888 0.874 0.874
2.0000 32.0000 3.150 3.150 6.850 6.850 0.685 0.814 0.669 0.669
8.0000 0.0312 0.100 0.100 9.900 9.900 0.990 0.991 0.991 0.991
8.0000 0.1250 0.050 0.050 9.950 9.950 0.995 0.995 0.995 0.995
8.0000 0.5000 0.050 0.050 9.950 9.950 0.995 0.995 0.995 0.995
8.0000 2.0000 0.100 0.100 9.900 9.900 0.990 0.988 0.992 0.992
8.0000 8.0000 1.400 1.400 8.600 8.600 0.860 0.888 0.874 0.874
8.0000 32.0000 3.150 3.150 6.850 6.850 0.685 0.814 0.669 0.669
32.0000 0.0312 0.100 0.100 9.900 9.900 0.990 0.991 0.988 0.988
32.0000 0.1250 0.050 0.050 9.950 9.950 0.995 0.995 0.995 0.995
32.0000 0.5000 0.050 0.050 9.950 9.950 0.995 0.995 0.995 0.995
32.0000 2.0000 0.100 0.100 9.900 9.900 0.990 0.988 0.992 0.992
32.0000 8.0000 1.400 1.400 8.600 8.600 0.860 0.888 0.874 0.874
32.0000 32.0000 3.150 3.150 6.850 6.850 0.685 0.814 0.669 0.669
128.0000 0.0312 0.000 0.000 10.000 10.000 1.000 1.000 1.000 1.000
128.0000 0.1250 0.050 0.050 9.950 9.950 0.995 0.995 0.995 0.995
128.0000 0.5000 0.050 0.050 9.950 9.950 0.995 0.995 0.995 0.995
128.0000 2.0000 0.100 0.100 9.900 9.900 0.990 0.988 0.992 0.992
128.0000 8.0000 1.400 1.400 8.600 8.600 0.860 0.888 0.874 0.874
128.0000 32.0000 3.150 3.150 6.850 6.850 0.685 0.814 0.669 0.669
512.0000 0.0312 0.100 0.100 9.900 9.900 0.990 0.991 0.990 0.990
512.0000 0.1250 0.050 0.050 9.950 9.950 0.995 0.995 0.995 0.995
512.0000 0.5000 0.050 0.050 9.950 9.950 0.995 0.995 0.995 0.995
512.0000 2.0000 0.100 0.100 9.900 9.900 0.990 0.988 0.992 0.992
512.0000 8.0000 1.400 1.400 8.600 8.600 0.860 0.888 0.874 0.874
512.0000 32.0000 3.150 3.150 6.850 6.850 0.685 0.814 0.669 0.669
Maximum values
Metric name value C gamma
FN 4.700 0.0312 0.5000
FP 4.700 0.0312 0.5000
TN 10.000 128.0000 0.0312
TP 10.000 128.0000 0.0312
acc 1.000 128.0000 0.0312
ppv 1.000 128.0000 0.0312
tnr 1.000 128.0000 0.0312
tpr 1.000 128.0000 0.0312
Note: A pseudo-random number generator is invoked both in the partitioning of the data, and internally in the svm classifier. The analysis may therefore vary from run to run. Ideally, we should have set a random seed in order to reproduce the results.
From this run, it looks like $C = 128$ and $\gamma = 0.0312$ is the most promising parameter combination. We will now retrain the model using these values on the entire A
dataset, and test it on the B
dataset.
X_train = bananadata['A']
Y_train = np.squeeze(bananadata['a'])
classifier = svm.SVC(C=128.0, kernel='rbf', gamma=0.0312)
classifier.fit(X_train, Y_train)
X_test = bananadata['B']
Y_test = np.squeeze(bananadata['b'])
xx, yy = make_meshgrid(X_test)
Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
scatter(X_test, Y_test, xx, yy, Z)
pretty_print(evaluate(Y_test, classifier.predict(X_test)))
Metric name Label 0 Label 1 Average
FN 1.000 2.000 1.500
FP 2.000 1.000 1.500
TN 104.000 93.000 98.500
TP 93.000 104.000 98.500
acc 0.985 0.985 0.985
ppv 0.979 0.990 0.985
tnr 0.981 0.989 0.985
tpr 0.989 0.981 0.985
In this case it looks like we achieved quite a good result on the test set. For fun, lets try out with some of the other promising candidates, e.g. $C = 8$ and $\gamma = 0.5$.
classifier = svm.SVC(C=8.0, kernel='rbf', gamma=0.5)
classifier.fit(X_train, Y_train)
X_test = bananadata['B']
Y_test = np.squeeze(bananadata['b'])
xx, yy = make_meshgrid(X_test)
Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
scatter(X_test, Y_test, xx, yy, Z)
pretty_print(evaluate(Y_test, classifier.predict(X_test)))
Metric name Label 0 Label 1 Average
FN 1.000 1.000 1.000
FP 1.000 1.000 1.000
TN 105.000 93.000 99.000
TP 93.000 105.000 99.000
acc 0.990 0.990 0.990
ppv 0.989 0.991 0.990
tnr 0.991 0.989 0.990
tpr 0.989 0.991 0.990
This model actually performs better on the test set. This is something we often encounter in classification, and if we have enough data, it is common to have a hold-out validation set for hyperparameter optimization (in stead of cross-validation). But of course, we cannot have validation sets all the way down, so at some point we have to settle with some model. Inspired by this observation, it is not uncommon to train an ensamble of models (so called ensamble learning) and using some sort of average of the models in the classification.
Parameter grid search with cross-validation using sklearn
For more information, see sklearn
’s page on
corss-validation and
grid search. The code below is based on an example found here.
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.svm import SVC
print(__doc__)
parameters = [{'kernel': ['rbf'],
'gamma': gamma_arr,
'C': C_arr}]
metrics = ['accuracy', 'recall_macro']
for metric in metrics:
print("# Tuning hyper-parameters for {}".format(metric))
classifier = GridSearchCV(SVC(), parameters, cv=10, scoring=metric)
classifier.fit(X_train, Y_train)
print("\nBest parameter set found on training set:")
print(classifier.best_params_)
print("\nGrid scores on training set:")
means = classifier.cv_results_['mean_test_score']
stds = classifier.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, classifier.cv_results_['params']):
print("{0:>7,.3f} (+/-{1:>7,.3f}) for {2}".format(mean, std * 2, params))
print("\nDetailed classification report:")
print("\nThe model is trained on the full training set.")
print("The scores are computed on the full test set.")
Y_true, Y_pred = Y_test, classifier.predict(X_test)
print(classification_report(Y_true, Y_pred))
Automatically created module for IPython interactive environment
# Tuning hyper-parameters for accuracy
Best parameter set found on training set:
{'gamma': 0.5, 'kernel': 'rbf', 'C': 0.125}
Grid scores on training set:
0.890 (+/- 0.100) for {'gamma': 0.03125, 'kernel': 'rbf', 'C': 0.03125}
0.865 (+/- 0.128) for {'gamma': 0.125, 'kernel': 'rbf', 'C': 0.03125}
0.530 (+/- 0.020) for {'gamma': 0.5, 'kernel': 'rbf', 'C': 0.03125}
0.530 (+/- 0.020) for {'gamma': 2.0, 'kernel': 'rbf', 'C': 0.03125}
0.530 (+/- 0.020) for {'gamma': 8.0, 'kernel': 'rbf', 'C': 0.03125}
0.530 (+/- 0.020) for {'gamma': 32.0, 'kernel': 'rbf', 'C': 0.03125}
0.925 (+/- 0.095) for {'gamma': 0.03125, 'kernel': 'rbf', 'C': 0.125}
0.980 (+/- 0.049) for {'gamma': 0.125, 'kernel': 'rbf', 'C': 0.125}
0.995 (+/- 0.030) for {'gamma': 0.5, 'kernel': 'rbf', 'C': 0.125}
0.530 (+/- 0.020) for {'gamma': 2.0, 'kernel': 'rbf', 'C': 0.125}
0.530 (+/- 0.020) for {'gamma': 8.0, 'kernel': 'rbf', 'C': 0.125}
0.530 (+/- 0.020) for {'gamma': 32.0, 'kernel': 'rbf', 'C': 0.125}
0.950 (+/- 0.079) for {'gamma': 0.03125, 'kernel': 'rbf', 'C': 0.5}
0.990 (+/- 0.040) for {'gamma': 0.125, 'kernel': 'rbf', 'C': 0.5}
0.990 (+/- 0.039) for {'gamma': 0.5, 'kernel': 'rbf', 'C': 0.5}
0.970 (+/- 0.049) for {'gamma': 2.0, 'kernel': 'rbf', 'C': 0.5}
0.695 (+/- 0.151) for {'gamma': 8.0, 'kernel': 'rbf', 'C': 0.5}
0.530 (+/- 0.020) for {'gamma': 32.0, 'kernel': 'rbf', 'C': 0.5}
0.990 (+/- 0.040) for {'gamma': 0.03125, 'kernel': 'rbf', 'C': 2.0}
0.985 (+/- 0.064) for {'gamma': 0.125, 'kernel': 'rbf', 'C': 2.0}
0.990 (+/- 0.039) for {'gamma': 0.5, 'kernel': 'rbf', 'C': 2.0}
0.980 (+/- 0.066) for {'gamma': 2.0, 'kernel': 'rbf', 'C': 2.0}
0.835 (+/- 0.172) for {'gamma': 8.0, 'kernel': 'rbf', 'C': 2.0}
0.690 (+/- 0.166) for {'gamma': 32.0, 'kernel': 'rbf', 'C': 2.0}
0.985 (+/- 0.064) for {'gamma': 0.03125, 'kernel': 'rbf', 'C': 8.0}
0.990 (+/- 0.060) for {'gamma': 0.125, 'kernel': 'rbf', 'C': 8.0}
0.990 (+/- 0.039) for {'gamma': 0.5, 'kernel': 'rbf', 'C': 8.0}
0.980 (+/- 0.066) for {'gamma': 2.0, 'kernel': 'rbf', 'C': 8.0}
0.835 (+/- 0.172) for {'gamma': 8.0, 'kernel': 'rbf', 'C': 8.0}
0.690 (+/- 0.166) for {'gamma': 32.0, 'kernel': 'rbf', 'C': 8.0}
0.990 (+/- 0.060) for {'gamma': 0.03125, 'kernel': 'rbf', 'C': 32.0}
0.990 (+/- 0.060) for {'gamma': 0.125, 'kernel': 'rbf', 'C': 32.0}
0.990 (+/- 0.039) for {'gamma': 0.5, 'kernel': 'rbf', 'C': 32.0}
0.980 (+/- 0.066) for {'gamma': 2.0, 'kernel': 'rbf', 'C': 32.0}
0.835 (+/- 0.172) for {'gamma': 8.0, 'kernel': 'rbf', 'C': 32.0}
0.690 (+/- 0.166) for {'gamma': 32.0, 'kernel': 'rbf', 'C': 32.0}
0.985 (+/- 0.064) for {'gamma': 0.03125, 'kernel': 'rbf', 'C': 128.0}
0.990 (+/- 0.060) for {'gamma': 0.125, 'kernel': 'rbf', 'C': 128.0}
0.990 (+/- 0.039) for {'gamma': 0.5, 'kernel': 'rbf', 'C': 128.0}
0.980 (+/- 0.066) for {'gamma': 2.0, 'kernel': 'rbf', 'C': 128.0}
0.835 (+/- 0.172) for {'gamma': 8.0, 'kernel': 'rbf', 'C': 128.0}
0.690 (+/- 0.166) for {'gamma': 32.0, 'kernel': 'rbf', 'C': 128.0}
0.980 (+/- 0.081) for {'gamma': 0.03125, 'kernel': 'rbf', 'C': 512.0}
0.990 (+/- 0.060) for {'gamma': 0.125, 'kernel': 'rbf', 'C': 512.0}
0.990 (+/- 0.039) for {'gamma': 0.5, 'kernel': 'rbf', 'C': 512.0}
0.980 (+/- 0.066) for {'gamma': 2.0, 'kernel': 'rbf', 'C': 512.0}
0.835 (+/- 0.172) for {'gamma': 8.0, 'kernel': 'rbf', 'C': 512.0}
0.690 (+/- 0.166) for {'gamma': 32.0, 'kernel': 'rbf', 'C': 512.0}
Detailed classification report:
The model is trained on the full training set.
The scores are computed on the full test set.
precision recall f1-score support
0 0.98 0.99 0.98 94
1 0.99 0.98 0.99 106
avg / total 0.99 0.98 0.99 200
# Tuning hyper-parameters for recall_macro
Best parameter set found on training set:
{'gamma': 0.5, 'kernel': 'rbf', 'C': 0.125}
Grid scores on training set:
0.888 (+/- 0.100) for {'gamma': 0.03125, 'kernel': 'rbf', 'C': 0.03125}
0.856 (+/- 0.136) for {'gamma': 0.125, 'kernel': 'rbf', 'C': 0.03125}
0.500 (+/- 0.000) for {'gamma': 0.5, 'kernel': 'rbf', 'C': 0.03125}
0.500 (+/- 0.000) for {'gamma': 2.0, 'kernel': 'rbf', 'C': 0.03125}
0.500 (+/- 0.000) for {'gamma': 8.0, 'kernel': 'rbf', 'C': 0.03125}
0.500 (+/- 0.000) for {'gamma': 32.0, 'kernel': 'rbf', 'C': 0.03125}
0.924 (+/- 0.099) for {'gamma': 0.03125, 'kernel': 'rbf', 'C': 0.125}
0.979 (+/- 0.052) for {'gamma': 0.125, 'kernel': 'rbf', 'C': 0.125}
0.994 (+/- 0.033) for {'gamma': 0.5, 'kernel': 'rbf', 'C': 0.125}
0.500 (+/- 0.000) for {'gamma': 2.0, 'kernel': 'rbf', 'C': 0.125}
0.500 (+/- 0.000) for {'gamma': 8.0, 'kernel': 'rbf', 'C': 0.125}
0.500 (+/- 0.000) for {'gamma': 32.0, 'kernel': 'rbf', 'C': 0.125}
0.949 (+/- 0.084) for {'gamma': 0.03125, 'kernel': 'rbf', 'C': 0.5}
0.989 (+/- 0.044) for {'gamma': 0.125, 'kernel': 'rbf', 'C': 0.5}
0.990 (+/- 0.041) for {'gamma': 0.5, 'kernel': 'rbf', 'C': 0.5}
0.969 (+/- 0.052) for {'gamma': 2.0, 'kernel': 'rbf', 'C': 0.5}
0.675 (+/- 0.163) for {'gamma': 8.0, 'kernel': 'rbf', 'C': 0.5}
0.500 (+/- 0.000) for {'gamma': 32.0, 'kernel': 'rbf', 'C': 0.5}
0.989 (+/- 0.044) for {'gamma': 0.03125, 'kernel': 'rbf', 'C': 2.0}
0.985 (+/- 0.064) for {'gamma': 0.125, 'kernel': 'rbf', 'C': 2.0}
0.990 (+/- 0.041) for {'gamma': 0.5, 'kernel': 'rbf', 'C': 2.0}
0.980 (+/- 0.065) for {'gamma': 2.0, 'kernel': 'rbf', 'C': 2.0}
0.824 (+/- 0.185) for {'gamma': 8.0, 'kernel': 'rbf', 'C': 2.0}
0.670 (+/- 0.182) for {'gamma': 32.0, 'kernel': 'rbf', 'C': 2.0}
0.984 (+/- 0.066) for {'gamma': 0.03125, 'kernel': 'rbf', 'C': 8.0}
0.990 (+/- 0.061) for {'gamma': 0.125, 'kernel': 'rbf', 'C': 8.0}
0.990 (+/- 0.041) for {'gamma': 0.5, 'kernel': 'rbf', 'C': 8.0}
0.980 (+/- 0.065) for {'gamma': 2.0, 'kernel': 'rbf', 'C': 8.0}
0.824 (+/- 0.185) for {'gamma': 8.0, 'kernel': 'rbf', 'C': 8.0}
0.670 (+/- 0.182) for {'gamma': 32.0, 'kernel': 'rbf', 'C': 8.0}
0.990 (+/- 0.061) for {'gamma': 0.03125, 'kernel': 'rbf', 'C': 32.0}
0.990 (+/- 0.061) for {'gamma': 0.125, 'kernel': 'rbf', 'C': 32.0}
0.990 (+/- 0.041) for {'gamma': 0.5, 'kernel': 'rbf', 'C': 32.0}
0.980 (+/- 0.065) for {'gamma': 2.0, 'kernel': 'rbf', 'C': 32.0}
0.824 (+/- 0.185) for {'gamma': 8.0, 'kernel': 'rbf', 'C': 32.0}
0.670 (+/- 0.182) for {'gamma': 32.0, 'kernel': 'rbf', 'C': 32.0}
0.985 (+/- 0.066) for {'gamma': 0.03125, 'kernel': 'rbf', 'C': 128.0}
0.990 (+/- 0.061) for {'gamma': 0.125, 'kernel': 'rbf', 'C': 128.0}
0.990 (+/- 0.041) for {'gamma': 0.5, 'kernel': 'rbf', 'C': 128.0}
0.980 (+/- 0.065) for {'gamma': 2.0, 'kernel': 'rbf', 'C': 128.0}
0.824 (+/- 0.185) for {'gamma': 8.0, 'kernel': 'rbf', 'C': 128.0}
0.670 (+/- 0.182) for {'gamma': 32.0, 'kernel': 'rbf', 'C': 128.0}
0.979 (+/- 0.084) for {'gamma': 0.03125, 'kernel': 'rbf', 'C': 512.0}
0.990 (+/- 0.061) for {'gamma': 0.125, 'kernel': 'rbf', 'C': 512.0}
0.990 (+/- 0.041) for {'gamma': 0.5, 'kernel': 'rbf', 'C': 512.0}
0.980 (+/- 0.065) for {'gamma': 2.0, 'kernel': 'rbf', 'C': 512.0}
0.824 (+/- 0.185) for {'gamma': 8.0, 'kernel': 'rbf', 'C': 512.0}
0.670 (+/- 0.182) for {'gamma': 32.0, 'kernel': 'rbf', 'C': 512.0}
Detailed classification report:
The model is trained on the full training set.
The scores are computed on the full test set.
precision recall f1-score support
0 0.98 0.99 0.98 94
1 0.99 0.98 0.99 106
avg / total 0.99 0.98 0.99 200
As is evident, me and sklearn did not end up with entirely the same results, but for this data (which is quite easy), the solution space near the solution is quite flat, so that is no surprise. There is also, obviously, the possibility that there are some bugs in my implementation.