Determining MSI Status for Effective Cancer Treatment


The microsatellite instability (MSI) status of a patient’s cancerous tissue can be used to determine the proper cancer treatment.

MSI status is an indicator of the extent of genetic mutations and can be used to predict whether immunotherapy (a new type of cancer treatment) will be effective.


Currently, the standard methods of determining MSI status are immunohistochemistry (IHC) analysis and polymerase chain reaction (PCR)-based assays. However, both of these methods require trained professionals to process the tissue samples and additional costs. Therefore, many patients globally are not given these tests in clinical practice, missing the opportunity to get better treatment.

pcr_ihc PCR & IHC

In order to make MSI profiling accessible to more patients, researchers have attempted to build prediction models using the standard hematoxylin and eosin (H&E)-stained histology images. Images of patients’ cancerous tissue samples, obtained using the inexpensive, common procedure of H&E staining, served as the input of machine learning models to predict the MSI status.

histology_deep_learning Histology Deep Learning

However, H&E histology using deep learning was not very accurate in determining MSI status. A study using this method published in 2019 reported a 77% area under the receiver operating characteristic curve (AUC, a representation of ML model performance). In contrast, IHC is approximately 92% accurate when using PCR genotyping results as the ground truth.

AUC_comparison.5b7b6977 PCR genotyping results

Fast-forwarding to today, by using a more complex model and SambaNova’s Reconfigurable Dataflow Architecture™, an 89.4% AUC is achieved on the same data set. This boost in model performance is the combined result of using a more sophisticated model architecture and the fact that SambaNova’s Reconfigurable Dataflow Units™ (RDUs) can take large images as input that other chips cannot. As we support even larger image sizes in the future, we can expect the AUC percentage to further increase, which could potentially bring accurate, economic, and universal MSI status testing to all patients.

To demonstrate the effect of using larger images as input, we compare the AUC achieved by training a model using 5760x5760 images as the input versus that achieved by two other models trained using 512x512 images as the input (see next section for details). These models use the same advanced model architecture (RescaleNet50) and are run on the same hardware (RDUs). The result shows that input image size has a significant impact on AUC.

Why Input Image Size Matters in Machine Learning:
Using True Resolution Images

By using high resolution images as input in machine learning applications, a feat achieved only by SambaNova, better results can be attained.

Each hardware, be it GPU or RDU, has an input size limit. When working with images larger than the size limit, there are two common techniques to overcome this limitation.

Technique 1:
Tiling Tiling

The parent image is tiled into an array of smaller images, which are used as individual input. If the percentage of tiles that gave a positive prediction exceeds a certain threshold, an overall positive prediction is made.

tiling_comparison tiling_comparison_mobile.4a5acca0

When images of cancerous tissues were tiled into 5760x5760 patches, 89.4% AUC was achieved. Meanwhile, tiling them into 512x512 patches resulted in 83.5% AUC.

Generally, the smaller the size of the tiles, the more the information loss, and consequently less model accuracy. Consider the extreme case where each tile is only one pixel large. These individual single pixels don’t give much information about about the image, does it? In contrast with other IC chips on the market, SambaNova’s RDUs support significantly larger tile sizes, bringing impressive improvements to model accuracy.

Technique 2:
downsampling downsampling_mobile.4eeba047

Alternatively, an image can be downsampled so that the size is below the limit.

However, just as a low resolution image looks blurry to us, models trained on these downsampled images suffer a lower accuracy.

downsampling_comparison downsampling_comparison_mobile.bd58e77a

To demonstrate the effect of downsampling, we first break the entire image into tiles of 5760x5760 pixels then downsample these tiles into to 512x512-pixel tiles.

Compare the Different Methods

Try running the simulation below. This demonstrates an example of when using a large tile size at original resolution produces the correct prediction while using smaller tiles or large tiles with lower resolution generates an erroneous prediction.

Reference MSI Status Testing Result (obtained using PCR)

MSI Status: Microsatellite Stable (MSS)

Recommendation: Immunotherapy not recommended

large tile size
Each 5760x5760 pixel tiles are downsampled into 512x512 blocks and then fed to the machine learning model. This technique generates the lowestest accuracy compared to that of the other two.
MSI %: --
MSI Status: --
Recommendation: --
sim image
large tile size
512x512 pixel tiles are fed to the machine learning model to acquire predictions. This takes more time and is less accurate compared to using larger tiles.
MSI %: --
MSI Status: --
Recommendation: --
sim image
large tile size
5760x5760 pixel tiles are fed to the machine learning model to acquire predictions.

Only Supported By

MSI %: --
MSI Status: --
Recommendation: --
sim image
With SambaNova’s true resolution vision ML technology, accurate universal MSI testing for cancer patients could be made possible.
What do you think you can achieve
with this ground-breaking technology
image resolution image resolution

Schedule a meeting

Learn how SambaNova can advance your AI initiatives to help you achieve your impossible.
Get Started