Skip to Main content Skip to Navigation
Journal articles

Risk bounds for embedded variable selection in classification trees

Abstract : The problems of model and variable selections for classification trees are jointly considered. A penalized criterion is proposed which explicitly takes into account the number of variables, and a risk bound inequality is provided for the tree classifier minimizing this criterion. This penalized criterion is compared to the one used during the pruning step of the CART algorithm. It is shown that the two criteria are similar under some specific margin assumptions. In practice, the tuning parameter of the CART penalty has to be calibrated by hold-out or cross-validation. A simulation study is performed to compare the form of the theoretical penalized criterion we propose with the form obtained after tuning the regularization parameter via cross-validation.
Document type :
Journal articles
Complete list of metadata

https://hal.inrae.fr/hal-02636643
Contributor : Migration ProdInra Connect in order to contact the contributor
Submitted on : Wednesday, May 27, 2020 - 9:03:34 PM
Last modification on : Friday, August 5, 2022 - 2:38:10 PM

Links full text

Identifiers

Citation

Servane Gey, Tristan Mary-Huard. Risk bounds for embedded variable selection in classification trees. IEEE Transactions on Information Theory, Institute of Electrical and Electronics Engineers, 2014, 60 (3), pp.1688-1699. ⟨10.1109/TIT.2014.2298874⟩. ⟨hal-02636643⟩

Share

Metrics

Record views

3