The algorithm constructs an ensemble for each non-leaf node in the hierarchy. The ensemble differentiates among children nodes. Steps:
A new example is classified with all of the ensembles. A PCT leaf that the example reaches contains probabilities of children labels knowing their parent. The probabilities from multiple ensembles are pulled together and hierarchy constraint is applied on them.
A set of the most specific labels is obtained by following the steps:
The algorithm constructs an ensemble that differentiates among the most specific labels. Steps:
When classifying a new example, a PCT leaf that the example reaches contains probabilities that the example is associated with each of the most specific labels. Probabilities for inner labels are zero.
The algorithm constructs an ensemble for each of the most specific labels. Each ensemble differentiates between a specific label and all the other labels in the training set. Steps:
A new example is classified with all of the ensembles. A PCT leaf that the example reaches contains a probability that the example is associated with the label l. The probabilities from multiple ensembles are pulled together. Probabilities for inner labels are zero.
Data sets with hierarchical class differ:
Accordingly, there is no single best performing algorithm. For example:
In many cases it is not clear which algorithm will outperform the others. Therefore, the pipeline implements the five algorithms and the tool that compares their predictive performance in cross-validation.
PCTs are explained in the paper:
Vens C., Struyf J., Schietgat L., Džeroski S., Blockeel H. (2008) Decision Trees for Hierarchical Multi-label Classification. Machine Learning, 73, 185-214. https://doi.org/10.1007/s10994-008-5077-3
The five algorithms that construct ensemble models from data sets with hierarchical class are explained in the paper:
Vidulin V., Džeroski S. (2020) Hierarchy Decomposition Pipeline: A Toolbox for Comparison of Model Induction Algorithms on Hierarchical Multi-label Classification Problems. In: Appice A., Tsoumakas G., Manolopoulos Y., Matwin S. (eds) Discovery Science. DS 2020. Lecture Notes in Computer Science, vol 12323. Springer, Cham. https://doi.org/10.1007/978-3-030-61527-7_32