Sets which of the following tools will be executed:
For example, this line will run the first six tools:
tools = 1-6
Value of this setting is a comma separated list of numbers representing tools, ranges of those numbers (e.g., 1-6) or a combination thereof (e.g., 1,3-6).
The precondition for running tools 2-6 is to first distribute examples into cross-validation folds (1). Suppose that we want to run the algorithms 3 and 5 in separate runs. In the first run we will set "tools = 1, 3" and in the second "tools = 5". In this manner, both algorithms will run on the same cross-validation folds.
Sets path to a data set with hierarchical class.
baselineDataset = data/Enron.harff.zip
Sets path to a folder where the pipeline will write its output.
outputFolder = hierarchy-decomposition-pipeline/output/Enron/
The pipeline constructs random forest ensembles. This setting sets the number of trees in the forest.
numTrees = 500
Default value is 500.
Maximal amount of memory available to machine learning algorithms.
memory = 5g
Value is composed of a number and a letter ‘k' or ‘K' when the number indicates kilobytes, ‘m' or ‘M' when the number indicates megabytes, or ‘g' or ‘G' when the number indicates gigabytes. Default value is 2g.
Maximal number of processors available for parallel tasks.
numProcessors = 4
Default value is 2.
Sets the number of cross-validation folds.
numFolds = 10
Default value is 10.
Sets a comma separated list of confidence thresholds at which the threshold-dependent performance is measured. Thresholds are probabilities.
thresholds = 0.5, 0.7
Default value is "0.5, 0.7, 0.9".
Sets path to an unlabelled set. This setting is mandatory when annotation tool is used.
unlabelledSet = data/Enron-unlabelledSet.harff.zip