The microsatellite instability (MSI) status of a patient’s cancerous tissue can be used to determine the proper cancer treatment.
MSI status is an indicator of the extent of genetic mutations and can be used to predict whether immunotherapy (a new type of cancer treatment) will be effective.
Currently, the standard methods of determining MSI status are immunohistochemistry (IHC) analysis and polymerase chain reaction (PCR)-based assays. However, both of these methods require trained professionals to process the tissue samples and additional costs. Therefore, many patients globally are not given these tests in clinical practice, missing the opportunity to get better treatment.
In order to make MSI profiling accessible to more patients, researchers have attempted to build prediction models using the standard hematoxylin and eosin (H&E)-stained histology images. Images of patients’ cancerous tissue samples, obtained using the inexpensive, common procedure of H&E staining, served as the input of machine learning models to predict the MSI status.
However, H&E histology using deep learning was not very accurate in determining MSI status. A study using this method published in 2019 reported a 77% area under the receiver operating characteristic curve (AUC, a representation of ML model performance). In contrast, IHC is approximately 92% accurate when using PCR genotyping results as the ground truth.
Fast-forwarding to today, by using a more complex model and SambaNova’s Reconfigurable Dataflow Architecture™, an 89.4% AUC is achieved on the same data set. This boost in model performance is the combined result of using a more sophisticated model architecture and the fact that SambaNova’s Reconfigurable Dataflow Units™ (RDUs) can take large images as input that other chips cannot. As we support even larger image sizes in the future, we can expect the AUC percentage to further increase, which could potentially bring accurate, economic, and universal MSI status testing to all patients.
To demonstrate the effect of using larger images as input, we compare the AUC achieved by training a model using 5760x5760 images as the input versus that achieved by two other models trained using 512x512 images as the input (see next section for details). These models use the same advanced model architecture (RescaleNet50) and are run on the same hardware (RDUs). The result shows that input image size has a significant impact on AUC.
By using high resolution images as input in machine learning applications, a feat achieved only by SambaNova, better results can be attained.
Each hardware, be it GPU or RDU, has an input size limit. When working with images larger than the size limit, there are two common techniques to overcome this limitation.
The parent image is tiled into an array of smaller images, which are used as individual input. If the percentage of tiles that gave a positive prediction exceeds a certain threshold, an overall positive prediction is made.
When images of cancerous tissues were tiled into 5760x5760 patches, 89.4% AUC was achieved. Meanwhile, tiling them into 512x512 patches resulted in 83.5% AUC.
Generally, the smaller the size of the tiles, the more the information loss, and consequently less model accuracy. Consider the extreme case where each tile is only one pixel large. These individual single pixels don’t give much information about about the image, does it? In contrast with other IC chips on the market, SambaNova’s RDUs support significantly larger tile sizes, bringing impressive improvements to model accuracy.
Alternatively, an image can be downsampled so that the size is below the limit.
However, just as a low resolution image looks blurry to us, models trained on these downsampled images suffer a lower accuracy.
To demonstrate the effect of downsampling, we first break the entire image into tiles of 5760x5760 pixels then downsample these tiles into to 512x512-pixel tiles.
Try running the simulation below. This demonstrates an example of when using a large tile size at original resolution produces the correct prediction while using smaller tiles or large tiles with lower resolution generates an erroneous prediction.
Reference MSI Status Testing Result (obtained using PCR)
MSI Status: Microsatellite Stable (MSS)
Recommendation: Immunotherapy not recommended
Only Supported By