Accounting for the dimensionality of the dependence in analyses of contingency tables obtained with Check-All-That-Apply and Free-Comment
Résumé
Check-All-That-Apply (CATA) and Free-Comment (FC) provide a so-called contingency table containing citation counts of words or descriptors (columns) by products (rows). This table is most often analysed using correspondence analysis (CA). CA aims at decomposing dependence between products and descriptors into axes of maximal and decreasing dependencies, which is reasonable if the dependence has been previously established by a chi-square test. However, the p-value of this test is not valid when the observations are not independent or when the contingency table contains too many low expected citation rates. In addition, rejecting independence with a chi-square test only means that at least the first CA axis captures some dependence. This paper presents a test to determine the number of axes of the CA that capture significant dependence and proposes a Monte-Carlo approach to compute valid p-values for this test. The variability in the products' coordinates in the CA space is often evaluated by means of a total bootstrap procedure. The paper proposes to rely on this test to determine the number of axes to consider for the Procrustes rotations of such a procedure. Finally, to investigate which words are cited more often for each product, the paper proposes performing Fisher's exact tests per cell on the derived contingency table obtained by reversing the CA computations on the axes capturing significant dependence. The benefits of accounting for the dimensionality of the dependence in the analyses are demonstrated on real CATA data.