Main Article Content
It is of vital importance of both the selection of data preprocessing and the selection of training set sample and modeling variables in the process of establishing the Partial Least Square (PLS) quantitative model of hyperspectral data. In this paper, we study the content of sugar in the sugar-sweetened apple, a characteristic fruit of Southern Xinjiang. We compare the effects of four pretreatment methods, i.e., multiple scattering correction (MSC), Savitzky-Golay (SG) smoothing, range standardization and centralized transformation on the model, and the influence of Random Sampling (RS), Kennard-Stone (KS), Sample Set Partitioning based on joint X-Y Distance (SPXY) and Spacing Selection on the model as well. By using totally 10-fold cross validations in the interaction test of the model (the interaction test of the model used 10 fold cross validation), we also give a comparison of the model established by genetic algorithm (GA) and full-wave PLS modeling. The results show that the best data preprocessing method is the combination of SG smoothing and range standardization, and the model established by the training set selected by the interval sampling method is superior to the other three methods. Compared with full-wave PLS modeling, the model established by GA-PLS has great improvement in terms of prediction speed, prediction ability and principal component selection. The correlation coefficient, RMS and prediction accuracy of the model are 0.936229, 0.919618 and 0.938575, respectively.