The objective of this study was the empirically compare the partial least squares (PLS) regression model and the principal components regression (PCR) model to predict the protein percentage in rice semolina. The estimates were carried out using the absorbance values in the near infrared region. 135 samples of rice semolina were collected between 2004 and 2012 from several pet food plants in Costa Rica. The convergence of the results was validated through Bootstrapping techniques. The observations were split in two groups: one data set to estimate the best regression model (n=120) and a data set of validation (n=15). The models estimated in the data set showed difficulties with outliers, consequently an observation was removed to obtain the best PLS model. In the validation of the regression model, the goodness of fit referred to statistics of the mean standard error of prediction (MSEP), the root mean square error of prediction (RMSEP), the standard error of prediction (SEP), the ratio of performance to deviation (RPD), and the graphics of observed against predicted values confirmed better adjustments for the PLS regression (SEP=0.304) in comparison to the PCR model (SEP=0.312). The simulation method showed a better convergence in the results of the PLS regression technique, to predict the percentage of protein in rice semolina.

Keywords: principal components regression, partial least squares, near infrared spectroscopy, Bootstrap.