CLASSIFICATION OF SOYBEAN PODS USING THE DEEP LEARNING TECHNIQUES

Crop productivity estimate aims at the economic definitions about crop, agricultural management, and land use, among others. However, it is common to observe the use of visual methods to estimate the productivity of the soybean crop through the classification of pods, resulting in a slow, costly method besides being susceptible to human errors. Thus, the objective of this work was to carry out the training of two deep learning methods to classify soybean pods according to the number of grains based on images obtained using a smartphone. Data collection was carried out at the Federal University of Viçosa (UFV). Data consisted of capturing images from a smartphone and training two deep learning models: Mask R-CNN and YOLOv4. To capture the images, the soybean pods were pulled from the plants and placed in a white-bottom container. This procedure occurred for each plant collected. Both models tended towards a better classification for the two-and three-grain pods, reaching a value of 90% for the F1 score metric. This fact may have occurred because of the greater amount of these two types of pods present in the chosen cultivars. Finally, the potential of using deep learning to classify soybean pods based on the number of grains was observed.


INTRODUCTION
The soybean crop is highly valued in international production due to its high nutritional value and its productive potential (MAUAD et al. 2011).Thus, estimating soybean productivity becomes a strategy for rural producers, agro-industrial companies, and even the government itself because of the impact that this crop generates on the country's economy (RAMOS et al., 2017).However, it is observed that the most used method to estimate soybean productivity depends on human sight, which is a costly and error-prone method.
The use of unconventional techniques to obtain information to estimate the productivity of crops is being applied in the field (ALVES et al., 2018;CHAN et al., 2020;MILLER et al., 2018).These technologies aim to reduce human interference, making the process less costly and less susceptible to human errors.Among current technologies, deep learning techniques stand out, seeking to solve issues from visual data.
Deep learning is an artificial intelligence technique that uses artificial neural networks to learn to recognize patterns in complex data.This technique may be a contribution to solving problems that require visual estimates of data, such as visualization in images.Thus, it is important to choose a tool that can be implemented in image capture equipment.Smartphones are an example of an affordable and popular tool for capturing images.Its applicability in the field can be seen to estimate crop yields (TEDESCO-OLIVEIRA et al. 2020), and disease classification (NGUGI et al., 2020), among others.The union of two technologies, smartphones, and deep learning, can generate facilitating and agile information to estimate the productivity of a crop of extreme domestic economic importance.
Specifically for the soybean crop, deep learning has been used to identify and diagnose diseases and pests that affect crop productivity and to select soybean varieties resistant to atypical conditions (ETIENNE et al., 2021;ZHU et al., 2019 ).Thus, the objective of this work was to carry out the training of deep learning models in the classification of soybean pods based on the number of grains they have, which is a factor that can contribute to the calculation of soybean productivity estimates.

MATERIAL AND METHODS
The material and methods were described following a chronology, in which data were initially collected, then processed, and finally evaluated.the process in which the work was developed is briefly described in Figure 1.

Data collection
In this work, two soybean cultivars were used, TMG 7063 IPRO (cultivar 1) and TMG 7363 RR (cultivar 2).The cultivars were sown in the experimental area of the Federal University of Viçosa (UFV) in Viçosa, state of Minas Gerais.The area was sown on December, 22020, with an average of 26 to 30 plants per 1 m².
The acquisition of digital images, for the construction of the database, was from February 18 and March 4, 2021, with soybean plants in the phenological stages between R6 and R8.The images were captured in the field, and the images were captured in the period from 8:00 a.m. to 12 p.m.This time variation in capturing images is important to obtain data with different levels of luminosity, making the models capable of working in different lighting conditions.As a result, the models can handle images at different times of the day or in environments with shadows and irregular lighting.
The plants used in the experiment were selected at random, including plants of the two cultivars that varied in size, from small to large.To capture the images, the plants were collected from the experimental area.All pods were removed from the plants and placed in a white-bottom container (Figure 2).This procedure was performed for all collected plants and there was no standardization of the distance between the camera and the container filled with pods.Distance non-standardization is chosen to allow deep learning models to be able to classify objects in different situations, making them practical and useful for those who use them.Furthermore, it is not necessary to use any additional equipment to assist in fixing the distance between the pods and the camera lens, resulting in a simple and accessible process.
The images were obtained using a smartphone that has a dual camera of 48 megapixels and five megapixels, with a 1/2" sensor and a 1.8-focal aperture.Throughout the collection, the smartphone settings were kept on automatic.According to the characteristics of the mobile device used, the generated images had a size of 3000 x 4000 pixels.In all, 495 images of both cultivars were captured, as seen in Figure 2, each image containing more than one pod.Therefore, the total database had 23193 pods.The Mask R-CNN model allows the use of three subsets: training (80% of the database), validation (10% of the database), and test (10% of the database), while the YOLO model allows up to the moment, only the use of the training subsets (80% of the database) and test (20% of the database).

Model pre-processing and selection
For the classification of pods, two deep-learning models were chosen: Mask R-CNN (He et   2018) and YOLOv4.The Mask R-CNN stands out for its efficiency in instance segmentation, being an extension of the Faster R-CNN model with the addition of mask prediction that can surround the object of interest, showing the exact location of the object in the image.This model has been widely used in agriculture, being a solution to problems in the agricultural sector (DE CARVALHO et al., 2021;Lee et al., 2020;MEKHALFI et al., 2021;VALICHARLA, 2021).In turn, YOLO (You Only Look Once) (REDMON et al., 2016) has several versions, with improvements and minor modifications among them.In general, the YOLO model is known for its high detection speed, a result of the simultaneous processing that takes place inside it, which allows the determination of the bounding box coordinates and classification of the objects of interest in a single moment (REDMON et al., 2016).
As supervised models require labels for their training, it was necessary to label each pod to generate a true bounding box.With this box, the model can identify the object of interest and extract the inherent characteristics of that object, allowing model learning.In this work, the objects of interest were the pods, which were labeled based on the number of grains they had: "one", "two", "three" and "four".The work consisted of using different models, so it was necessary to label the images in different software, as each model requires a different extension for the labels.In Figure 3a, the labeling for the Mask R-CNN model is presented, in which the online software VGG Image Annotator was used (DUTTA; ZISSERMAN, 2019).Figure 2b

Evaluation of the models
The metrics of Precision (Equation 1), recall (Equation 3), and F1 score (Equation 3) were used to evaluate both models.Precision quantifies the proportion of predicted values that are true values.Recall measures the proportion of actual values that the model was able to classify correctly.Finally, the F1 score is the harmonic mean between the two values, precision and recall, meaning the average performance of a single class. (1) Where, P = precision, %; R = recall, %; F1 = F1 score, %; TP =True Positive, dimensionless; FP = False Positive, dimensionless; FN = False Negative, dimensionless.

RESULTS AND DISCUSSION
As shown in Figure 4, the result of the evaluative metrics using the Mask R-CNN model and Yolov4.In both models, it was observed that classes "three" and "two" achieved the best performance compared to classes "one" and "four".This result can be related to the number of pods of each class per plant.The cultivars used had a higher incidence of two-and three-grain pods.Therefore, the superiority in the classification may be the result of this higher incidence, providing a greater number of samples for training both models and causing the presented disparity.According to the results of Figure 4, it was observed that the Mask R-CNN model presented a value of 100% in class "one" accuracy.This metric informs the percentage of predictions that match the actual value.Thus, even without the correct prediction of all objects of this class, the model able to classify as class "one" was correct.On the other hand, when analyzing the recall, the opposite occurs, with a value of 18.75% (Figure 4a).This metric reports the percentage of actual values that were predicted correctly.At this point, the model failed because it classified incorrectly or did not detect all pods in class "one".
In the work carried out by Yu et al., (2019), the Mask R-CNN model was used to detect strawberry fruits, in which it reached 95.78% and 95.41% for accuracy and recall, respectively.In the experiment carried out by Ganesh et al., (2019), from different color channels, maximum values of 97.54%, 86.73%, and 88.67% were reached for accuracy, recall, and F1 score in the classification of oranges.Using YOLOv3-tiny, Mazzia et al. (2020) found the performance for detecting apple fruits, achieving results of 83% and 69% for accuracy and recall, respectively.In the aforementioned works, the object of interest had a visually different color from the vegetation.In the current work, however, the models were required to overcome the difficulty of classifying similar objects, whose differences were based on subtle characteristics such as size and shape, which are less expressive traits in comparison to the color characteristic.
Among the models, a better performance was observed for the Mask R-CNN model.This fact may be due to the training time as they are models with different constitutions, it is possible that for YOLO a longer training time was required since the time of observation and extraction of information about each image is reduced in comparison to Mask R-CNN.However, it was observed that in some images, both models presented similar difficulties.As seen in Figure 5, it is observed that both models were not able to detect the same pod.The visible absence of grain boundaries in this pod may have been the factor that led to the models' inability to detect it correctly.This is because it can be an important characteristic for the models to classify them correctly.
Despite the differences found in the results, both models present satisfactory values (Figure 4) and are close to those in the literature (AFONSO et al., 2020;DAVIS et al., 2020;HUANG et al., 2020;UZAL et al., 2018;XU et al., 2020).Such a fact is important as the classification of soybean pods is a crucial factor for estimating productivity, as well as for the genetic improvement of the crop.By removing dependence on human sight, errors tend to be minimized, as the machine does not reduce functioning and does not get tired like human beings.In addition to being a method that

Figure 1 .
Figure 1.Flowchart showing the material and methods used in the execution of this work

Figure 2 .
Figure 2. Image captured from pods of a single plant using the image acquisition method

Figure 3 .
Figure 3. Labeling performed for the LabelImg for the YOLO model (a) and labeling performed in VGG Image Annotator for the Mask R-CNN model (b)

Figure 4 .
Figure 4. Performance of the Mask R-CNN (a) and YOLOv4 (b) models regarding the classification of soybean pods according to the number of grains they contain, using the Metrics of Precision, Recall, and F1 score

Figure 5 .
Figure 5. Classification with the use of Mask R-CNN (a) and YOLOv4 (b) on the test subset, where both failed to classify the same pod al., BANDEIRA, P. M. C. et al.