Hyperparameter Tuning
This tutorial for MPSTime will take you through tuning the hyperparameters of the fitMPS
algorithm to maximise either imputation or classification performance.
Setup
For this tutorial, we'll be solving a classification hyperoptimisation problem and an imputation hyperoptimisation problem use the same noisy trendy sinusoid dataset from the Classification
and Imputation
sections.
using MPSTime, Random
rng = Xoshiro(1); # fix rng seed
ntimepoints = 100; # specify number of samples per instance
ntrain_instances = 300; # specify num training instances
ntest_instances = 200; # specify num test instances
X_train = vcat(
trendy_sine(ntimepoints, ntrain_instances ÷ 2; sigma=0.1, slope=[-3,0,3], period=(12,15), rng=rng)[1],
trendy_sine(ntimepoints, ntrain_instances ÷ 2; sigma=0.1, slope=[-3,0,3], period=(16,19), rng=rng)[1]
);
y_train = vcat(
fill(1, ntrain_instances ÷ 2),
fill(2, ntrain_instances ÷ 2)
);
X_test = vcat(
trendy_sine(ntimepoints, ntest_instances ÷ 2; sigma=0.2, slope=[-3,0,3], period=(12,15), rng=rng)[1],
trendy_sine(ntimepoints, ntest_instances ÷ 2; sigma=0.2, slope=[-3,0,3], period=(16,19), rng=rng)[1]
);
y_test = vcat(
fill(1, ntest_instances ÷ 2),
fill(2, ntest_instances ÷ 2)
);
Given how computationally intensive hyperparameter tuning can be, if you're running these examples yourself it's a good idea to take advantage of the multiprocessing built into the MPSTime library (see the Distributed Computing section).
Hyperoptimising classification
The hyperparameter tuning algorithms supported by MPSTime supports every numerical hyperparameter that may be specified by MPSOptions
. For this problem, we'll generate a small search space over the three most important hyperparameters: the maximum MPS bond dimension chi_max
, the physical dimension d
, and the learning rate eta
. Every other hyperparamter will be left at its default value.
The variables to optimise, along with their upper and lower bounds are specified with the syntax params = (<variable_name_1>=(<lower_bound_1>, <upper_bound_1>), ...)
, e.g.
params = (
eta=(1e-3, 1e-1),
d=(2,8),
chi_max=(20,40)
)
When solving real-world problems, it's a good idea to explore a larger d
and chi_max
search space, but for this example it will serve well enough.
K-fold cross validation with tune()
To optimise the hyperparameters on your dataset, simply call tune
:
julia> nfolds = 5;
julia> best_params, cache = tune(
X_train,
y_train,
nfolds,
params,
MPSRandomSearch();
objective=MisclassificationRate(),
maxiters=20, # for demonstration purposes only, typically this should be much larger
logspace_eta=true
);
iter 1, cvfold 1: training MPS with (chi_max = 37, d = 8, eta = 0.023357214690901226)...
iter 1, cvfold 2: training MPS with (chi_max = 37, d = 8, eta = 0.023357214690901226)...
iter 1, cvfold 3: training MPS with (chi_max = 37, d = 8, eta = 0.023357214690901226)...
iter 1, cvfold 4: training MPS with (chi_max = 37, d = 8, eta = 0.023357214690901226)...
iter 1, cvfold 5: training MPS with (chi_max = 37, d = 8, eta = 0.023357214690901226)...
iter 1, cvfold 1: finished. MPS (chi_max = 37, d = 8, eta = 0.023357214690901226) finished in 128.25s (train=126.38s, loss=1.87s)
iter 1, cvfold 2: finished. MPS (chi_max = 37, d = 8, eta = 0.023357214690901226) finished in 129.0s (train=127.11s, loss=1.89s)
iter 1, cvfold 3: finished. MPS (chi_max = 37, d = 8, eta = 0.023357214690901226) finished in 128.57s (train=126.74s, loss=1.84s)
iter 1, cvfold 4: finished. MPS (chi_max = 37, d = 8, eta = 0.023357214690901226) finished in 128.78s (train=126.96s, loss=1.82s)
iter 1, cvfold 5: finished. MPS (chi_max = 37, d = 8, eta = 0.023357214690901226) finished in 127.77s (train=125.94s, loss=1.83s)
iter 1, t=139.39: Mean CV Loss: 0.040000000000000015
iter 2, cvfold 1: training MPS with (chi_max = 39, d = 6, eta = 0.0379269019073225)...
[...]
iter 20, cvfold 4: finished. MPS (chi_max = 21, d = 2, eta = 0.018329807108324356) finished in 13.52s (train=13.45s, loss=0.08s)
iter 20, cvfold 5: finished. MPS (chi_max = 21, d = 2, eta = 0.018329807108324356) finished in 14.98s (train=14.47s, loss=0.5s)
iter 20, t=843.55: Mean CV Loss: 0.08
julia> best_params
(chi_max = 31,
d = 4,
eta = 0.07847599703514611,)
julia> cache[values(best_params)] # retrieve loss from the cache
0.0033333333333333435 # equivalent to 99.67% accuracy
which returns best params
: a named tuple containing the optimised hyperparameters, and cache
: a dictionary that saves the mean loss of every tested hyperparameter combination.
The arguments used here are:
X_train
: timeseries data in a matrix, time series are rows.y_train
: Vector of time series class labels.nfolds
: Number of cross validiation folds to use. Folding type can be specified with thefoldmethod
keyword.params
: Hyperparameters to tune and their upper and lower bounds, see previous section.MPSRandomSearch()
: The tuning algorithm to use, see thetuning algorithm
section.objective=MisclassificationRate()
: Optimise the raw misclassification rate (1 - accuracy). The other options areBalancedMisclassificationRate
(optimises balanced accuracy), orImputationLoss()
(see imputation hyperoptimisation).maxiters=20
: The maximum number of solver iterations. Here we use a very small number for demonstration reasons.logspace_eta=true
: A useful option that tells the tuning algorithm to sample the eta search space logarithmically.
There are many more customisation options for tune
, see the docstring and extended help for more information / advanced usecases.
Evaluating model performance with evaluate()
If you want to estimate the performance of MPSTime on a dataset, you can call the evaluate
function, which resamples your data into train/test splits using a provided resampling strategy (default is k-fold cross validation), tunes each split on the "training" set, and evaluates the test set. It can be called with the following syntax:
julia> nresamples = 3; # number of outer "resampling" folds - usually 30
julia> Xs = vcat(X_train, X_test);
julia> ys = vcat(y_train, y_test);
julia> results = evaluate(
Xs,
ys,
nresamples,
params,
MPSRandomSearch();
n_cvfolds=nfolds, # the number of folds used by tune()
objective=MisclassificationRate(),
tuning_maxiters=20
);
Beginning fold 1:
Fold 1: iter 1, cvfold 1: training MPS with (chi_max = 40, d = 8, eta = 0.09478947368421053)...
Fold 1: iter 1, cvfold 2: training MPS with (chi_max = 40, d = 8, eta = 0.09478947368421053)...
Fold 1: iter 1, cvfold 3: training MPS with (chi_max = 40, d = 8, eta = 0.09478947368421053)...
Fold 1: iter 1, cvfold 4: training MPS with (chi_max = 40, d = 8, eta = 0.09478947368421053)...
Fold 1: iter 1, cvfold 5: training MPS with (chi_max = 40, d = 8, eta = 0.09478947368421053)...
[...]
Fold 3: iter 20, cvfold 4: finished. MPS (chi_max = 28, d = 2, eta = 0.06873684210526317) finished in 19.26s (train=19.14s, loss=0.12s)
Fold 3: iter 20, cvfold 5: finished. MPS (chi_max = 28, d = 2, eta = 0.06873684210526317) finished in 19.86s (train=19.78s, loss=0.08s)
Fold 3: iter 20, t=875.4: Mean CV Loss: 0.011985526910900046
fold 3: t=2899.6: training MPS with (chi_max = 39, d = 3, eta = 0.016631578947368424)... done
julia> results[1]
Dict{String, Any} with 13 entries:
"time" => 1055.61
"objective" => "MisclassificationRate()"
"train_inds" => [340, 445, 262, 132, 89, 379, 225, 59, 224, 57 … 495, 484, 355, 322, 284, 363, …
"optimiser" => "MPSRandomSearch(:LatinHypercube)"
"fold" => 1
"test_inds" => [133, 477, 112, 148, 13, 453, 342, 83, 252, 455 … 151, 483, 74, 380, 61, 297, 1…
"tuning_windows" => nothing
"eval_windows" => nothing
"cache" => Dict((39, 6, 0.0166316)=>0.0149706, (33, 7, 0.0426842)=>0.0300769, (36, 7, 0.0635…
"tuning_pms" => nothing
"loss" => [0.00598802]
"eval_pms" => nothing
"opts" => MPSOptions(-5, 10, 32, 0.0531053, 4, :Legendre_No_Norm, false, 2, 1.0e-10, 1, Flo…
julia> losses = getindex.(results, "loss")
3-element Vector{Vector{Float64}}:
[0.005988023952095856]
[0.005988023952095856]
[0.0060240963855421326]
Which outputs a results dictionary, containing the losses on each resample fold, as well as a lot of other useful information. See the docstring
for more detail as well as a plethora of customistation options.
A very common extension of evaluate
is to customise the resampling strategy. The simplest way to do this is to pass a vector of (training_indices, testing_indices)
to the foldmethod
keyword. For example, to use scikit-learn's StratifiedShuffleSplit
to generate the train/test splits:
julia> using PyCall
julia> nresamples = 3;
julia> py"""
from sklearn.model_selection import StratifiedShuffleSplit # requires a python environment with sklearn installed
sp = StratifiedShuffleSplit(n_splits=$nresamples, test_size=$(length(y_test)), random_state=1)
folds_py = sp.split($Xs, $ys)
"""
julia> folds = [(tr_inds .+ 1, te_inds .+ 1) for (tr_inds, te_inds) in py"folds_py"] # shift indices up by 1
3-element Vector{Tuple{Vector{Int64}, Vector{Int64}}}:
([197, 472, 462, 108, 133, 258, 179, 57, 149, 373 … 279, 94, 234, 473, 319, 378, 387, 92, 359, 35], [354, 313, 137, 239, 316, 479, 274, 145, 134, 485 … 110, 417, 346, 141, 165, 50, 77, 23, 347, 130])
([399, 105, 322, 289, 281, 187, 131, 18, 56, 231 … 463, 38, 491, 288, 408, 430, 330, 185, 481, 353], [345, 469, 396, 10, 96, 452, 245, 76, 367, 84 … 202, 153, 94, 446, 372, 152, 79, 387, 51, 301])
[...]
julia> results = evaluate(
Xs,
ys,
nresamples,
params,
MPSRandomSearch();
objective=MisclassificationRate(),
tuning_maxiters=20,
foldmethod=folds
);
[...]
Hyperoptimising imputation
The tune
and evaluate
methods may both be used to minimise imputation loss, with a small amount of extra setup. Setting objective=ImputationLoss()
will optimise an MPS for imputation performance by minimising the mean absolute error (MAE) between predicted and unseen data. To accomplish this, MPSTime takes data from the test (or validation) set, corrupts a portion of it, and then predicts what the corrupted data should be based on the uncorrupted values. There are two methods for how the test (or validation) data can be corrupted.
- Setting the
windows
(oreval_windows
) keyword intune
(orevaluate
, respectively) to a vector of 'windows'. Each window is a vector of missing/corrupted data indices, for example
windows = [[1,3,7],[4,5,6]]
will take each timeseries in the test set, and create two 'corrupted' test series, missing the 1st, 3rd, and 7th; and the 4th, 5th, and 6th values respectively.
- Setting the
pms
(oreval_pms
) keyword intune
(orevaluate
, respectively) to a vector of 'percentage missings'. This generates corrupted time series by removing randomly selected contiguous blocks that make up a specified percentage of the data. For example,
pms = [0.05, 0.05, 0.6, 0.95]
will generate four corrupted time series from each element of the test (or validation) set. Two will have missing blocks that make up 5% of their length, and one each will have blocks with 60% and 95% missing. The imputation tuning loss is the average MAE of imputing every window on every element of the test (or validation) set.
Example: Calling tune with percentages missing Tune the MPS on an imputation problem with randomly selected 5%, 15%, 25%, ... , 95% long missing blocks.
params = (
d=(8,12),
chi_max=(30,50)
)
best_params, cache = tune(
X_train,
y_train,
nfolds,
params,
MPSRandomSearch();
objective=ImputationLoss(),
pms=collect(0.05:0.1:0.95),
maxiters=20,
logspace_eta=true
)
iter 1, cvfold 1: training MPS with (chi_max = 47, d = 12)...
iter 1, cvfold 2: training MPS with (chi_max = 47, d = 12)...
iter 1, cvfold 3: training MPS with (chi_max = 47, d = 12)...
iter 1, cvfold 4: training MPS with (chi_max = 47, d = 12)...
iter 1, cvfold 5: training MPS with (chi_max = 47, d = 12)...
iter 1, cvfold 1: finished. MPS (chi_max = 47, d = 12) finished in 515.58s (train=402.83s, loss=112.75s)
iter 1, cvfold 2: finished. MPS (chi_max = 47, d = 12) finished in 511.0s (train=399.52s, loss=111.48s)
iter 1, cvfold 3: finished. MPS (chi_max = 47, d = 12) finished in 516.33s (train=406.92s, loss=109.41s)
iter 1, cvfold 4: finished. MPS (chi_max = 47, d = 12) finished in 519.06s (train=405.13s, loss=113.93s)
iter 1, cvfold 5: finished. MPS (chi_max = 47, d = 12) finished in 522.12s (train=406.87s, loss=115.25s)
iter 1, t=534.28: Mean CV Loss: 0.44716868140737487
iter 2, cvfold 1: training MPS with (chi_max = 50, d = 11)...
[...]
iter 20, cvfold 4: finished. MPS (chi_max = 31, d = 8) finished in 112.81s (train=66.33s, loss=46.48s)
iter 20, cvfold 5: finished. MPS (chi_max = 31, d = 8) finished in 113.9s (train=66.32s, loss=47.58s)
iter 20, t=5223.05: Mean CV Loss: 0.4755302875557089
julia> best_params
(chi_max = 48,
d = 9,)
julia> cache[values(best_params)]
0.39402101779354365
Example: Using evaluate with the Missing Completely At Random tool Tune the MPS on an imputation problem by completely randomly corrupting 5%, 15%, 25%, ... , or 95% of each test (or validation) time series. See the Missing Completely at Random
tool.
inds = collect(1:size(Xs,2))
rng = Xoshiro(42)
pms = 0.05:0.1:0.95
mcar_windows = [mcar(inds, pm; rng=rng)[2] for pm in pms]
results = evaluate(
Xs,
ys,
nresamples,
params,
MPSRandomSearch();
objective=ImputationLoss(),
eval_windows=mcar_windows,
tuning_maxiters=20,
)
Fold 1: iter 1, cvfold 1: training MPS with (chi_max = 50, d = 12)...
[...]
Fold 3: iter 20, cvfold 5: finished. MPS (chi_max = 35, d = 8) finished in 635.05s (train=255.05s, loss=380.0s)
Fold 3: iter 20, t=20328.2: Mean CV Loss: 0.24440015696620948
fold 3: t=42210.43: training MPS with (chi_max = 48, d = 12)... done
julia> losses = getindex.(results, "opts");
julia> mean.(losses)
3-element Vector{Float64}:
0.21739511880650658
0.21129840885487566
0.21441462843171047
julia> getindex.(results, "opts") # print out the MPSOptions objects
3-element Vector{MPSOptions}:
MPSOptions(-5, 10, 47, 0.01, 11, :Legendre_No_Norm, false, 2, 1.0e-10, 1, Float64, :KLD, :TSGO, false,
(false, true), false, false, false, true, false, false, 1234, 4, -1, (0.0, 1.0), false, "divide_and_conquer")
MPSOptions(-5, 10, 50, 0.01, 12, :Legendre_No_Norm, false, 2, 1.0e-10, 1, Float64, :KLD, :TSGO, false,
(false, true), false, false, false, true, false, false, 1234, 4, -1, (0.0, 1.0), false, "divide_and_conquer")
MPSOptions(-5, 10, 48, 0.01, 12, :Legendre_No_Norm, false, 2, 1.0e-10, 1, Float64, :KLD, :TSGO, false,
(false, true), false, false, false, true, false, false, 1234, 4, -1, (0.0, 1.0), false, "divide_and_conquer")
julia> results[1]
Dict{String, Any} with 13 entries:
"time" => 6152.08
"objective" => "ImputationLoss()"
"train_inds" => [340, 445, 262, 132, 89, 379, 225, 59, 224, 57 … 495, 484, 355, 322, 284, 363, …
"optimiser" => "MPSRandomSearch(:LatinHypercube)"
"fold" => 1
"test_inds" => [133, 477, 112, 148, 13, 453, 342, 83, 252, 455 … 151, 483, 74, 380, 61, 297, 1…
"tuning_windows" => [[35, 97], [10, 36, 38, 49, 60, 62, 65, 72, 81, 82, 88, 92, 93, 97], [9, 16, 24, …
"eval_windows" => [[35, 97], [10, 36, 38, 49, 60, 62, 65, 72, 81, 82, 88, 92, 93, 97], [9, 16, 24, …
"cache" => Dict((49, 10)=>0.217511, (41, 10)=>0.233205, (45, 8)=>0.231089, (31, 8)=>0.259456…
"tuning_pms" => nothing
"loss" => [0.192093, 0.193801, 0.183081, 0.189859, 0.194329, 0.19409, 0.211727, 0.235688, 0…
"eval_pms" => nothing
"opts" => MPSOptions(-5, 10, 47, 0.01, 11, :Legendre_No_Norm, false, 2, 1.0e-10, 1, Float64…
Minimising a custom loss function
Custom objectives can be used by implementing a custom loss value type (CustomLoss <: MPSTime.TuningLoss
) and extending the definition of MPSTime.eval_loss
with the signature
eval_loss(
CustomLoss(),
mps::TrainedMPS,
X_validation::AbstractMatrix,
y_validation::AbstractVector,
windows;
p_fold=nothing,
distribute::Bool=false
) -> Union{Float64, Vector{Float64}}
As a simple example, we could implement a custom misclassification rate with the following:
julia> import MPSTime: TuningLoss, eval_loss;
julia> struct CustomMisclassificationRate <: TuningLoss end;
julia> function eval_loss(
::CustomMisclassificationRate,
mps::TrainedMPS,
X_val::AbstractMatrix,
y_val::AbstractVector,
windows;
p_fold=nothing,
distribute::Bool=false
)
return [1. - mean(classify(mps, X_val) .== y_val)] # misclassification rate, vector for type stability
end;
julia> results = evaluate(
Xs,
ys,
nfolds,
params,
MPSRandomSearch();
objective=CustomMisclassificationRate(),
tuning_maxiters=20
)
[...]
Since imputation type losses have multiple windows, the general output type of the eval_loss
function is a vector. Because of this, the tune()
function always optimises mean(eval_loss(...))
The p_fold
variable is a tuple containing information used for logging. The distribute
flag is enabled by evaluate
when distribute_final_eval
is true. As an example of implementing both of these, here's MPSTime's implementation of eval_loss(::ImputationLoss, ...)
:
`eval_loss` source code
function eval_loss(
::ImputationLoss,
mps::TrainedMPS,
X_val::AbstractMatrix,
y_val::AbstractVector,
windows::Union{Nothing, AbstractVector}=nothing;
p_fold::Union{Nothing, Tuple}=nothing,
distribute::Bool=false,
method::Symbol=:median
)
if ~isnothing(p_fold)
verbosity, pre_string, tstart, fold, nfolds = p_fold
logging = verbosity >= 2
foldstr = isnothing(fold) ? "" : "cvfold $fold:"
else
logging = false
end
imp = init_imputation_problem(mps, X_val, y_val, verbosity=-5);
numval = size(X_val, 1)
# conversion from instance index to something MPS_impute understands.
cmap = countmap(y_val) # from StatsBase
classes = vcat([fill(k,v) for (k,v) in pairs(cmap)]...)
class_ind = vcat([1:v for v in values(cmap)]...)
if distribute
loss_by_window = @sync @distributed (+) for inst in 1:numval
logging && print(pre_string, "$foldstr Evaluating instance $inst/$numval...")
t = time()
ws = Vector{Float64}(undef, length(windows))
for (iw, impute_sites) in enumerate(windows)
stats = MPS_impute(imp, classes[inst], class_ind[inst], impute_sites, method; NN_baseline=false, plot_fits=false)[4]
ws[iw] = stats[1][:MAE]
end
logging && println("done ($(MPSTime.rtime(t))s)") # rtime just nicely prints time elapsed since $t in seconds
ws
end
loss_by_window /= numval
else
instance_scores = Matrix{Float64}(undef, numval, length(windows)) # score for each instance across all % missing
for inst in 1:numval
logging && print(pre_string, "$foldstr Evaluating instance $inst/$numval...")
t = time()
for (iw, impute_sites) in enumerate(windows)
stats = MPS_impute(imp, classes[inst], class_ind[inst], impute_sites, method; NN_baseline=false, plot_fits=false)[4]
instance_scores[inst, iw] = stats[1][:MAE]
end
logging && println("done ($(MPSTime.rtime(t))s)") # rtime just nicely prints time elapsed since $t in seconds
end
loss_by_window = mean(instance_scores; dims=1)[:]
end
return loss_by_window
end
Choice of tuning algorithm
The hyperparameter tuning algorithm used by tune
(or evaluate
) can be specified with the optimiser
argument. This supports the default builtin MPSRandomSearch
methods, as well as (in theory) any solver that is supported by the Optimization.jl interface
. Note that many of these solvers struggle with discrete search spaces, such as tuning the integer valued chi_max
and d
. Some of them require initial conditions (set provide_x0=true
), and some require no initial conditions (set provide_x0=false
), so your mileage may vary. By default, tune()
handles optimisers attempting to evaluate discrete hyperparameters at a non-integer value by rounding and using its own cache to avoid rounding based cache misses. This is effective, but has the downside of causing maxiters
to be inaccurate (as repeated hyperparameter evaluations caused by rounding result in a 'skipped' iteration).
MPSTime.MPSRandomSearch
— Type MPSRandomSearch(sampling::Symbol=:LatinHypercube)
Value type used to specify a random search algorithm for hyperparameter tuning
an MPS.
Sampling
Specifies the method used to determine the search space. The supported sampling methods are
:LatinHypercube
: An implementation ofLatinHypercubeSampling.jl
's random (pseudo-) Latin Hypercubesearch space generator. Supports both discrete and continuous hyperparameters.:UniformRandom
: Generate a search space by randomly sampling from the interval [lower bound, upper bound] for each hyperparameter. Supports both discrete andcontinuous hyperparameters.Exhaustive
: Perform an exhaustive gridsearch of all hyperparameters within the lower and upper bounds. Only supports discrete hyperparameters. Not actually a random search.
Distributed computing
Both tune tune
and evaluate
support several different parallel processing paradigms for different use cases, compatible with processors added via Distributed.jl's addprocs
function. For example, to distribute each fold of the classification style evaluation above, run:
using Distributed
nfolds = 30;
e = copy(ENV);
e["OMP_NUM_THREADS"] = "1"; # attempt to prevent threading
e["JULIA_NUM_THREADS"] = "1"; # attempt to prevent threading
addprocs(nfolds; env=e);
@everywhere using MPSTime
Xs = vcat(X_train, X_test);
ys = vcat(y_train, y_test);
results = evaluate(
Xs,
ys,
nfolds,
params,
MPSRandomSearch();
objective=MisclassificationRate(),
tuning_maxiters=20,
distribute_folds=true
)
[...]
See the respective docstrings for more information.
Docstrings
Hyperparameter tuning
MPSTime.tune
— Functionfunction tune(
Xs::AbstractMatrix,
[ys::AbstractVector],
nfolds::Integer,
parameters::NamedTuple,
optimiser=MPSRandomSearch(:LatinHypercubeSampling);
<Keyword Arguments>) -> best_parameters::NamedTuple, cache::Dictionary
Perform nfolds
-fold hyperparameter tuning of an MPS on the timeseries data Xs
, optionally specifying the data classes ys
. Returns a NamedTuple containing the optimal hyperparameters, and a cache Dictionary that saves the loss of every tested hyperparameter combination.
parameters
specifies the hyperparameters to tune and (optionally) a search space. Currently, every numeric field of MPSOptions
is supported. parameters
are specified as named tuples, with the key being the name of the hyperparameter (See Example below). There are a couple of options for specifying the bounds:
- The preferred option is
Tuple
of lower/upper bounds: "params=(eta=(upper_bound,lower_bound), ...)
", which allows the optimiser to choose any value in the interval [upperbound, lowerbound]. You can also pass an empty tuple: "params=(eta=(), ...)
" which allows the parameter to take any non-negative value. - As a
Vector
of possible values, e.g.params = (d=[1,3,6,7,8], ...)
. For convenience, you can also use theTuple
syntax "params=(d=(start,step,stop), ...)
", which is equivalent to "params = (d=start:step:stop, ...)
"
Note that if you use the first method, the upper and lower bounds are passed to the hyperparameter tuning algorithm, but may not be strictly enforced depending on your choice of algorithm. Alternatively, use the enforce_bounds=false
keyword argument to disable bounds checking completely (for compatible optimisers).
Example:
To tune a classification problem by searching the hyperparameter space η ∈ [0.001, 0.1], d ∈ {5,6,7}, and χmax ∈ {20,21,...,25}:
julia> params = (
eta=(1e-3, 1e-1),
d=(5,7),
chi_max=(20,25)
); # configure the search space
julia> nfolds = 5;
julia> best_params, cache = tune(
X_train, # Training data as a matrix, rows are time series
y_train, # Vector of time series class labels
nfolds, # Number of cross validation folds
params,
MPSRandomSearch(); # Tuning algorithm
objective=MisclassificationRate(), # Type of loss to use
maxiters=20, # Maximum number of tuning iterations
logspace_eta=true # When true, the eta search space [0.001, 0.1] is sampled logarithmically
)
[...]
julia> best_opts = MPSOptions(best_params); # convert to an MPSOptions object
[...]
Other problem classes are available, see the MPSTime documentation.
Hyperparameter Tuning Methods
The hyperparameter tuning algorithm can be specified with the optimiser
argument. This supports the default builtin MPSRandomSearch
methods, as well as (in theory) any solver that is supported by the Optimization.jl interface
. Note that many of these solvers struggle with discrete search spaces, such as tuning the integer valued chi_max
and d
, or tuning an eta
specified with a vector. Some of them require initial conditions (set provide_x0=true
), and some require no initial conditions (set provide_x0=false
), so your mileage may vary. By default, tune()
handles optimisers attempting to evaluate discrete hyperparameters at a non-integer value by rounding, and using its own cache to avoid rounding based cache misses.
There are a lot of keyword arguments... Extended help is avaliable with ??tune
See also: evaluate
Extended Help
Keyword Arguments
Hyperparameter Options
opts0::AbstractMPSOptions=MPSOptions(; verbosity=-5, log_level=-1, sigmoid_transform=objective isa ClassificationLoss)
: Default hyperparamaters to pass tofitMPS
. Hyperparameter candidates are generated by modifyingopts0
with the values in the search space specified byparameters
.enforce_bounds::Bool=true
: Whether to pass the constraints given in params to the optimisation algorithm.logspace_eta::Bool=false
: Whether to treat theeta
parameterspace as logarithmic. E.g. settingparameters=(eta=(10^-3,10^-1) )
andlogspace_eta=true
will sample eacheta_candidate
from the log10 search space [-3.,-1.], and then passeta = 10^(eta_candidate)
toMPSOptions
.input_supertype::Type=Float64
: A numeric type that can represent the types of each hyperparameter being tuned as well as their upper and lower bounds. Typically,Float64
is sufficient, but it can be set toInt
for purely discrete optimisation problems etc. This is necessary for mixed integer / Float hyperparameter tuning because certain solvers inOptimization.jl
require variables in the search space to all be the same type.
Loss and Windowing
objective::TuningLoss=ImputationLoss()
: The objective of the hyperparameter optimisation. This comes in two categories:
Imputation Loss
obvjective=ImputationLoss()
Uses the mean of the mean absolute error to measure the performance of an MPS on imputing unseen data. First, it generates 'corrupted' time series data by applying missing data windows to the validation set, using one of the following options:
pms::Union{Nothing, AbstractVector}=nothing
: Stands for 'percentage missings'. Will remove a randomly selected contiguous blocks from each time series in the validation set, according to the percentages missing listed in thepms
vector. For example,pms=[0.05, 0.05, 0.6, 0.95]
will generate four windows, two with 5% missing, and one each with 60% and 95% missingwindows::Union{Nothing, AbstractVector, Dict}=nothing
: Manually input missing windows. Expects a vector of missing windows, or a dictionary wherevalues(windows)
is a vector of missing windows.
The tuning loss is the average of computing the mean absolute error of imputing every window on every element of the validation set.
Classification Loss
Classification type problems can be hyperparameter tuned to minimise either MisclassificationRate()
(1 - classification accuracy), or BalancedMisclassificationRate()
(1 - balanced accuracy).
Custom Loss Functions
Custom losses can be used by implementing a custom loss value type (CustomLoss <: TuningLoss
) and extending the definition of MPSTime.eval_loss
with the signature
eval_loss(
CustomLoss(),
mps::TrainedMPS,
X_validation::AbstractMatrix,
y_validation::AbstractVector,
windows;
p_fold=nothing,
distribute::Bool=false
) -> Union{Float64, Vector{Float64}
if eval_loss
returns a vector, then tune()
will optimise mean(eval_loss(...))
. For concrete examples, see the documentation
Tuning algorithm
abstol::Float64=1e-3
: Passed directly toOptimization.jl
: Absolute tolerance in changes to the objective (loss) functionmaxiters::Integer=250
: Maximum number of iterations allowed when solvingprovide_x0::Bool=true
: Whether to provide initial conditions to the solve, ignored byMPSRandomSearch
. The initial condition will beopts0
, unless it contains a hyperparameter outside the range specified byparameters
, in which case the lower bound of that hyperparameter will be used.rng::Union{Integer, AbstractRNG}=1
: An Integer or RNG object used to seed any randomness in imputation window or search space generation.kwargs...
: Any further keyword arguments to passed through toOptimization.jl
through theOptimization.solve
function
Folds and Cross validation
foldmethod::Union{Function, AbstractVector}=make_stratified_cvfolds
: The method used to generate the train/validation folds fromXs
. Can either be annfolds
-long Vector of[train_indices::Vector, validation_indices::Vector]
pairs, or a function that produces them, with the signaturefoldmethod(Xs,ys, nfolds; rng::AbstractRNG)
To clarify, thetune
function determines the train/validation splits for the ith fold in the following way:
julia> folds::Vector = foldmethod isa Function ? foldmethod(Xs,ys, nfolds; rng=rng) : foldmethod;
julia> train_inds, validation_inds = folds[i];
julia> X_train, y_train = Xs[train_inds, :], ys[train_inds];
julia> X_validation, y_validation = Xs[validation_inds, :], ys[validation_inds];
Logging
verbosity::Integer=1
: Controls how explicit the logging is. 0 for none, 5 for maximimum. This is separate to the verbosity in MPSOptions.pre_string::String=""
: Prints this string on the same line before logging messages are printed. Useful for logging when callingtune()
multiple times in parallel.
Distributed Computing
Parallel processing is available using processors added via Distributed.jl's addprocs
function.
distribute_iters::Bool=false
: When using anMPSRandomSearch
, distribute the search grid across all available processors. For thread safety, usingdistribute_iters
disables caching.distribute_folds::Bool=false
: Distribute each fold to its own processor. Scales up to at mostnfolds
processors. Not very compatible withdistribute_iters
.workers::AbstractVector{Int}=distribute_folds ? workers() : Int[]
: Workers that may be used to distribute folds, does not affectdistribute_iters
. This can be used to run multiple instances oftune()
on different sets of workers.disable_nondistributed_threading::Bool=false
: Attempts to disable threading usingBLAS.set_num_threads(1)
andStrided.disable_threads()
(May not work if using the MKL.jl linear algebra backend).
MPSTime.evaluate
— Functionfunction evaluate(
Xs::AbstractMatrix,
[ys::AbstractVector],
nfolds::Integer,
tuning_parameters::NamedTuple,
tuning_optimiser=MPSRandomSearch();
<Keyword Arguments>) -> results::Vector{Dictionary}
Evaluate the performance of MPSTime by hyperparameter tuning
on nfolds
resampled folds of the timeseries dataset Xs
with classes ys
.
tuning_parameters
controls the hyperparamters to tune over, and tuning_optimiser
specifies the hyperparameter tuning algorithm. These are passed directly to tune
, refer to its documentation for details.
Example
To evaluate a classification problem by searching the hyperparameter space η ∈ [0.001, 0.1], d ∈ {5,6,7}, and χmax ∈ {20,21,...,25},
julia> params = (
eta=(1e-3, 1e-1),
d=(5,7),
chi_max=(20,25)
);
julia> nfolds = 30;
julia> results = evaluate(
X_train, # training data as a matrix, rows are time series
y_train, # Vector of time series class labels
nfolds, # number of resample folds
params, # search space
MPSRandomSearch(); # Hyperparameter search method, see Extended help
objective=MisclassificationRate(), # Type of lsso to use
maxiters=20, # Maximum number of tuning iterations
logspace_eta=true # the eta search space [0.001, 0.1] is sampled logarithmically
)
[...]
Return value
A length nfolds
vector of dictionaries that contain detailed informatation about each fold. Each dictionary has the following keys:
"fold"=>Integer
: The fold index."objective"=>String
: The objective (loss function) this fold was trained on."train_inds"=>Vector
: The indices (rows of Xs) this fold was tuned/trained on."test_inds"=>Vector
: The indices (rows of Xs) this fold was tested on."optimiser"=>String
: Name of the optimiser used to hyperparameter tune each fold."tuning_windows"=>Vector
: The windows used to hyperparameter tune this fold."tuning_pms"=>Vector
: The 'Percentages Missing' used to hyperparameter tune this fold (possibly used to generate tuning_windows)."eval_windows"=>Vector
: The windows used to evaluate the test loss."eval_pms"=>eval_pms
: The 'Percentages Missing' used to evaluate the test loss (possibly used to generateeval_windows
)."time"=>Vector
: Total time to tune and test this fold in seconds."opts"=>MPSOptions
: Optimal options for this fold as determined by tune(). Used to compute the test loss."cache"=>Dict
: Cache of the validation losses of every set of hyperparameters evaluated on this fold. Disabled ifdistribute_iters
is true."loss"=>Union{Vector{Float64}, Float64}
. The test loss of this fold. Ifobjective
is anImputationLoss()
, this is a vector with each entry corresponding to a window inresults[fold]["eval_windows"]
.
There are a lot of keyword arguments... Extended help is avaliable with ??evaluate
Extended Help
Keyword Arguments
Loss and Windowing
objective::TuningLoss=ImputationLoss()
: The loss used to evaluate and tune the MPS. If its an ImputationLoss, then eitherpms
orwindows
must be specified for each of evaluation and tuning. See thetune
extended documentation for more details.eval_pms::Union{Nothing, AbstractVector}=nothing
: 'Percentage MissingS' used to evaluate the test loss.eval_windows::Union{Nothing, AbstractVector, Dict}=nothing
: Windows used to evaluate the test loss.tuning_pms::Union{Nothing, AbstractVector}=eval_pms
: 'Percentage MissingS' passed to tune, and used to compute validation loss.tuning_windows::Union{Nothing, AbstractVector, Dict}=eval_windows
: Windows passed to tune, and used to compute validation loss.rng::Union{Integer, AbstractRNG}=1
: An integer or RNG object used to seed any randomness in imputation window or search space generation.Random.seed!(fold)
` is called prior to tuning each fold, so that any optimization algorithms that are random but don't take rng objects should still be deterministic.tuning_rng::AbstractVector{<:Union{Integer, AbstractRNG}=collect(1:nfolds)
: Passed through totune
. An integer or RNG object used to seed any randomness in tuning imputation window generation or hyperparameter searching.opts0::AbstractMPSOptions=MPSOptions(; verbosity=-5, log_level=-1, sigmoid_transform=(objective isa ClassificationLoss))
: Options that are modified by the best options returned by tune. Used to train the MPS which evalutates the test loss.tuning_opts0::AbstractMPSOptions=opts0
: Initial guess passed asopts0
totune
that sets the values of the non-tuned hyperparameters. Should generally always be the same asopts0
, but can be specified separately in case you wish to make the final mps train with more verbosity etc.
Resampling and Cross Validation
foldmethod::Union{Function, Vector}=make_stratified_cvfolds
: Can either be annfolds
-long Vector of[train_indices::Vector, test_indices::Vector]
pairs, or a function that produces them, with the signaturefoldmethod(Xs,ys, nfolds; rng::AbstractRNG)
To clarify, thetune
function determines the train/test splits for the ith fold in the following way:
julia> folds::Vector = foldmethod isa Function ? foldmethod(Xs,ys, nfolds; rng=rng) : foldmethod;
julia> train_inds, test_inds = folds[i];
julia> X_train, y_train = Xs[train_inds, :], ys[train_inds];
julia> X_test, y_test = Xs[test_inds, :], ys[test_inds];
foldmethod
defaults to nfold
-fold cross validation.
tuning_foldmethod::Union{Function, Vector}=make_stratified_cvfolds
: Same as above, although it is passed totune
and used to split the training set into hyperparameter train/validation sets. The fold number specified by then_cvfolds
keyword.fold_inds::Vector{<:Integer}=collect(1:nfolds)
: A vector of the fold indices to evaluate. This can be used to split large training runs into batches, or to resume a halted benchmark.
Distributed Computing
Several parallel processing paradigms are availble for different use cases, implented using processors added via Distributed.jl's addprocs
` function.
distribute_iters::Bool=false
: When using anMPSRandomSearch
, for each fold, distribute the search grid across all available processors. For thread safety, usingdistributed_iters
disables caching.distribute_folds::Bool=false
: Allocate one processor to each fold.distribute_cvfolds::Bool=false
: Equivalent to passingdistribute_folds
totune
. Allocates a processor to each hyperparameter train/val split.distribute_final_eval::Bool=false
: Allocate a processor to each test timeseries when computing the test loss. Useful when the test set is very large. The only option compatible with the others.
Saving and Resuming
If write
is enabled, evaluate
will automatically resume if it finds a partially complete run.
write::Bool=false
: Whether to write output to files. If true, it will save temporary files, saving each completed fold inside"$writedir/$(simname)_temp/"
, and the final result to"$writedir/$(simname).jld2"
.writedir::String="evals"
: The directory to save data to.simname::String="$(objective)_$(tuning_optimiser)_f=$(nfolds)_cv$(n_cvfolds)_iters=$(tuning_maxiters)"
: The simulation name. Used to determine save location.delete_tmps::Bool=length(fold_inds)==nfolds
: Whether to delete the temp directory at the end.
Logging
verbosity::Integer=1
: Controls how explicit the logging is. 0 for none, 5 for maximimum. This is separate to the verbosity in MPSOptions.
Hyperparameter Tuning Options
These options are passed directly to their corresponding keywords in tune
n_cvfolds::Integer=5
: Corresponds tonfolds
in tune, number of train/val splits.logspace_eta::Bool=false
: Whether to treat theeta
parameterspace as logarithmic. E.g. settingparameters=(eta=(10^-3,10^-1) )
andlogspace_eta=true
will sample eacheta_candidate
from the log10 search space [-3.,-1.], and then passeta = 10^(eta_candidate)
toMPSOptions
.input_supertype::Type=Float64
: A numeric type that can represent the types of each hyperparameter being tuned as well as their upper and lower bounds. Typically,Float64
is sufficient, but it can be set toInt
for purely discrete optimisation problems etc. This is necessary for mixed Integer / Float hyperparameter tuning because certain solvers inOptimization.jl
require variables in the search space to all be the same type.tuning_abstol::Float64=1e-3
: Passed directly toOptimization.jl
: Absolute tolerance in changes to the objective (loss) function.tuning_maxiters::Integer=250
: Maximum number of iterations allowed when solving.provide_x0::Bool=true
: Whether to provide initial conditions to the solve, ignored byMPSRandomSearch
. The initial condition will beopts0
, unless it contains a hyperparameter outside the range specified byparameters
, in which case the lower bound of that hyperparameter will be used.
Further keyword arguments to evaluate
are passed through to tune
, and then Optimization.jl
through the Optimization.solve
function.
Hyperparameter tuning utilities
MPSTime.make_stratified_cvfolds
— Functionmake_stratified_cvfolds(
Xs::AbstractMatrix,
ys::AbstractVector,
nfolds::Integer;
rng=Union{Integer, AbstractRNG},
shuffle::Bool=true
) -> folds::Vector{Vector{Vector{Int}}}
Creates nfold
-fold stratified cross validation train/validation splits for hyperparameter tuning, with the form:
julia> train_indices_fold_i, validation_indices_fold_i = folds[i];
Uses MLJ's StratifiedCV
method (https://juliaai.github.io/MLJBase.jl/stable/resampling/#MLJBase.StratifiedCV).
MPSTime.eval_loss
— Functioneval_loss(
::TuningLoss,
mps::TrainedMPS,
X_val::AbstractMatrix,
y_val::AbstractVector,
windows::Union{Nothing, AbstractVector}=nothing;
p_fold::Union{Nothing, Tuple}=nothing,
distribute::Bool=false,
)
Evaluate the TuningLoss
of mps
on the validation time-series dataset specified by X_val
, y_val
.
p_fold
is to allow verbose logging during runs of tune
and evaluate
. When computing an imputation loss, windows
are used to compute imputation losses, as specified in tune
, and distribute
will distribute the loss calculation across each time-series instance.
MPSTime.MPSRandomSearch
— Type MPSRandomSearch(sampling::Symbol=:LatinHypercube)
Value type used to specify a random search algorithm for hyperparameter tuning
an MPS.
Sampling
Specifies the method used to determine the search space. The supported sampling methods are
:LatinHypercube
: An implementation ofLatinHypercubeSampling.jl
's random (pseudo-) Latin Hypercubesearch space generator. Supports both discrete and continuous hyperparameters.:UniformRandom
: Generate a search space by randomly sampling from the interval [lower bound, upper bound] for each hyperparameter. Supports both discrete andcontinuous hyperparameters.Exhaustive
: Perform an exhaustive gridsearch of all hyperparameters within the lower and upper bounds. Only supports discrete hyperparameters. Not actually a random search.