Learning to Reason: Predict-then-Optimize vs Predict-and-Optimize

Marianne Defresne; Thomas Schiex; Sophie Barbe

Résumé

Many real-life decision making problems involve reasoning or optimizing over discrete variables on ill-defined problems, where exact constraints or parameters are unknown and only indirect correlated variables are observed together with solutions. In these situations, one may want to machine learn how to predict a solution directly from the observed variables, learning effectively "how to reason" or "how to optimize". A promising research direction involves the combination of Machine Learning (ML) and discrete reasoning (DR) [1]: the problem is formulated as the discrete optimization/reasoning problem whose parameters are predicted from data. Because of the various types of observed variables and of the complexity of their relationship to hidden constraints and parameters, Deep Learning (DL) is often considered as a suitable learning tool here. However, DL requires differentiable loss functions, which are either zero or unavailable for discrete reasoning [2]. Two approaches have emerged to tackle such problems. On the one hand, the Predict-thenoptimize approach learns how to predict the discrete problem to solve and then solves it. The basic components are well understood, learning can be efficient and exact DR can be performed during inference. But the effect of errors on parameters on the final solution are not accounted for [3] which makes the approach sub-optimal in low-data regimes. For this reason, much of the attention has been focused on the direct integration of optimization in the loss function, an approach that is often denoted as Predict-and-optimize or Decision-focused learning [4]. The challenges here is to provide efficient end-to-end differentiable training. Some approaches rely on efficient differentiable optimization layers that capture the DR problem through a continuous relaxation [5, 6, 7]. They can tackle complex NP-hard problems at the cost of approximate solving during inference: even optimally parameterized relaxations will fail to offer exact solutions. Another family of approaches extracts meaningful gradients out of exact DR solvers during training [2, 8, 9], enabling exact solving during inference. This however limits their application to very small instances of NP-hard DR problems (e.g., scheduling with 3 tasks only [3]) because of the tremendous optimization cost during training, a cost which is worsened by the essentially randomly parameterized nature of predicted problems in the first training epochs. We compare a Predict-then-optimize and a Predict-and-optimize approach on a classical Constraint Programming (CP) benchmark, the NP-complete sudoku problem. Our goal is to learn how to solve new grids of sudoku from examples of solved sudoku grids (as we already did using ML technology [10]), described here as a sequence of 81 numbers, each with an associated grid coordinate. To relax the Boolean nature of constraints (satisfied or violated), we learn the parameters of a weighted CP model (a binary Cost Function Network or CFN [11]) using the CFN solver toulbar2 [12] during inference and training (in Predict-and-optimize mode). For Predict-then-optimize, we introduce the Gangster-PLL loss, a variant of the well-established Pseudo-loglikelihood loss [13] and use the Hinge Loss [9] for Predict-and-optimize, both with regularization on the CFN parameters learned. We observe that the Predict-then-optimize approach combines efficient learning with exact reasoning during inference, providing totally accurate solutions even in relatively low regime situations. The far more expensive Predict-and-optimize approach struggles at learning how to solve a sudoku grid, requiring more tuning of the exact

Learning to Reason: Predict-then-Optimize vs Predict-and-Optimize

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager