Labeled temperate hardwood tree stomatal image datasets from seven taxa of Populus and 17 hardwood species – Scientific Data

admin
5 Min Read

Machine learning (ML) algorithms have shown potential in automatically detecting and measuring stomata. However, ML algorithms require substantial data to efficiently train and optimize models, but their potential is restricted by the limited availability and quality of stomatal images. To overcome this obstacle, we have compiled a collection of around 11,000 unique images of temperate broadleaf angiosperm tree leaf stomata from various projects conducted between 2015 and 2022. The dataset includes over 7,000 images of 17 commonly encountered hardwood species, such as oak, maple, ash, elm, and hickory, and over 3,000 images of 55 genotypes from seven Populus taxa. Inner_guard_cell_walls and whole_stomata (stomatal aperture and guard cells) were labeled and had a corresponding YOLO label file that can be converted into other annotation formats. With the use of our dataset, users can (1) employ state-of-the-art machine learning models to identify, count, and quantify leaf stomata; (2) explore the diverse range of stomatal characteristics across different types of hardwood trees; and (3) develop new indices for measuring stomata.

Stomatal responses to environmental factors, such as humidity and soil moisture, are crucial for driving photosynthesis, productivity, water yield, ecohydrology, and climate forcing. However, to fully understand these responses, we must improve our understanding of the mechanistic basis of stomatal response to environmental factors. Unfortunately, current stomatal studies are limited by the laborious and time-consuming process of manually counting and measuring stomatal properties, resulting in small dataset size and image scales when observing stomata. Therefore, having large stomatal image datasets for developing fast and high-throughput methods for studying stomata is highly warranted.

The potential of artificial intelligence (AI) for developing annotated, high-throughput stomatal measuring methods is high, which could significantly enhance scientists’ ability to conduct large-scale and intensive stomatal studies. Recently, state-of-the-art machine learning algorithms, such as deep learning, specifically convolutional neural networks (CNNs), have been designed to solve complex image detection and segmentation problems, resulting in various applications tailored to specific objectives. One of the most efficient and straightforward CNN architectures is You Only Look Once (YOLO), proposed by Redmon, et al.. This architecture has been used for stomatal detection, counting, and measuring. These studies have shown the potential of using machine learning algorithms for automated stomatal detection and measurement. However, fine-tuning and improvement of machine learning-based stomatal study methods are currently limited by the small, inconsistent, and monotypic nature of stomatal image datasets, which are also poorly accessible.

Many studies have increased stomatal image datasets during machine learning training to avoid overfitting using augmentation techniques such as random translation, rotation, flipping, and zooming. While image preprocessing techniques can increase the training sample size, model performance may still be limited due to variability in stomatal characteristics. For example, some methods trained using specific species datasets may only be sensitive to those species and cannot be generalized for other species. Therefore, it is crucial to create a publicly accessible leaf stomatal image database to develop machine learning-based, state-of-the-art stomatal measuring methods to be used by ecologists, plant biologists, and ecophysiologists.

Our collection consists of around 11,000 unique images of hardwood leaf stomata collected from projects conducted between 2015 and 2022. Within the hardwood stomatal dataset, there are more than 7,000 images of 17 common hardwood species, such as oak, maple, ash, elm, and hickory. Additionally, the dataset contains over 3,000 images of 55 genotypes from seven Populus taxa (Tables 1, 2). We labeled inner_guard_cell_walls as “0”, whole_stomata (stomatal aperture and guard cells) as “1” and created a YOLO label file for each image. These images and corresponding labels are freely accessible, making it easier to train machine-learning models and analyze leaf stomatal traits. With the help of our dataset, individuals can: (1) utilize cutting-edge machine learning models to train for high-throughput detection, counting, and measurement of leaf stomata of temperate hardwood trees; (2) investigate the diversity in stomatal characteristics across various types of hardwood trees; (3) develop novel indices for measuring stomata.

Share This Article
By admin
test bio
Please login to use this feature.