Project Details
Description
Artificial intelligence and Machine Learning have in recent years been applied to the analysis of very large scale (VLS) images such as those encountered in the analysis of aerial or satellite imagery and digital histopathology, so that domain scientists can explore the data and form novel hypotheses. The use of the current state-of-the-art deep learning techniques requires vast amounts of detailed annotations (a.k.a. labels) as training data, which can be proportional to the size of the input images. Thus, it is either impossible or very expensive to acquire enough high-resolution training data. In this project, the research team will develop a methodology that uses weaker (or auxiliary) signals collected in much smaller, low-resolution images to efficiently constrain the spatial (or temporal) statistical distribution of the labels in the high-resolution image. The framework significantly reduces the human effort needed for the mundane task of annotating VLS images, which is crucial for several exciting applications to predict environmental trends and cancer treatment outcomes. The developed techniques are general, and their application will be demonstrated in two different domains involving very large images, satellite imagery and digital histopathology. In environmental applications, the ability to directly connect satellite imagery to policy-relevant metrics of interest (e.g., population trends, urbanization, biodiversity loss, etc.) would radically improve our capacity to monitor the globe. Similarly, being able to reliably extract high resolution information from whole slide images of histopathology will be highly useful for cancer research focused on the development of novel diagnostic tests and numerous precision medicine applications (e.g., patient stratification, treatment selection, prediction of disease progression, recurrence, treatment response, and disease-free survival through downstream correlations with clinical, radiologic, laboratory, molecular, pharmacologic, and outcomes data).
The technical aims of the project are: i) The research team addresses the problem of super-resolving dense annotations by matching label statistics across resolutions. The general methodology for differentiable loss functions maps auxiliary constraints to high-resolution labels. Each Label Super-Resolution loss is a differentiable distance metric between a distribution and a set of statistical values; ii) The research team generalizes the concept of super-resolution to topological information (through persistent homology) and use multi-task learning to produce latent representations that can be the basis of various inference tasks; iii) In the developed framework, the research team models missing auxiliary data, heterogeneous auxiliary data, and dynamic image sets of the same area and our losses can be easily integrated in RNN/transformer architectures and adversarial learning paradigms; iv) The research team evaluates two modalities of incremental human engagement: 1) Showing the annotator the effects of their annotation choices to help develop intuition for high return areas and 2) A reinforcement learning based active learning framework that imitates how domain experts select what kinds of data to label; and v) The research team develops and evaluates ideas through a number of well-grounded applications of Label Super-Resolution.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
| Status | Active |
|---|---|
| Effective start/end date | 09/1/22 → 08/31/26 |
Funding
- National Science Foundation: $1,129,040.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.