New Capabilities Lead to a Model Quality Breakthrough
A Partnership with Argonne National Laboratory
Using the capabilities of SambaNova’s DataScale system, together, researchers at the U.S. Department of Energy’s Argonne National Laboratory and SambaNova have advanced state-of-the-art accuracy for an important neutrino physics image segmentation problem (see image below). The SambaNova DataScale system has been recently deployed as part of the Argonne Leadership Computing Facility’s (ALCF) AI-Testbed – an infrastructure of next-generation AI-accelerator machines to help evaluate usability and performance of machine learning-based high-performance computing applications. While Argonne researchers previously trained their model on graphics processing unit (GPU) based platforms, they were fundamentally restricted by the image size that those platforms could train on. In contrast, the reconfigurable dataflow architecture of SambaNova’s DataScale system seamlessly enables new capabilities for training on massive image sizes. In partnership with Argonne, we are using this capability to advance model quality on many important and challenging image processing problems.
In this blog post, we detail how the SambaNova DataScale system enabled Argonne to improve their model quality for the task of tagging cosmic pixels. While this blog post details a case study on a specific (neutrino physics) image processing problem, the techniques we use are generalizable to any convolutional neural network (CNN) on a SambaNova DataScale system. With high-resolution cameras and datasets becoming increasingly common, this removal of legacy barriers to high-resolution image processing is crucial.
Beyond State of the Art – Cosmic Tagger
“Cosmic Background Removal with Deep Neural Networks in SBND” introduces a modified UResNet Architecture optimized for removing cosmic backgrounds from liquid argon time projection chamber (LArTPC) images. It is a classic image segmentation task to classify each input pixel into one of three classes – Cosmic, Muon, or Background. The original input images are 1280 pixels tall and 2048 pixels wide with 3 channels. Since the images to segment are so large, processing even a single batch runs out of memory on the GPU (V100).
To overcome this issue of training on GPUs, the authors had previously downsampled their input images to 50% resolution and trained the model with inputs containing 3x640x1024 pixels. However, this results in a loss of information which is crucial to this problem and many other sensitive domains such as medical imaging and astronomy (see accuracy drop in figure).
In contrast, the reconfigurable dataflow architecture of SambaNova’s DataScale system does not have these problems. The Argonne and SambaNova team are able to seamlessly train CNNs with images beyond 50k x 50k resolution. We use the same model, configuration, and hyperparameters, except we are able to use images with their original sizes without downsampling. For comparing the performance of different models, we use Mean Intersection over Union (MIoU) of only non-background pixels as the evaluation metric. From the results shown below, using larger images clearly outperforms the existing state-of-the-art model by close to 6% MIoU.
Even though the model on DataScale’s Reconfigurable DataFlow Unit (RDU) is trained at a lower precision (bfloat16) compared to GPU’s FP32, we are able to ensure stable convergence and achieve better results. Certain loss functions such as focal loss are inferior when using a lower batch size per replica. While GPUs (A100) can fit only one image per replica at full image sizes, RDUs let you train with up to 32 samples per replica and further improve accuracy.
With the advancement in technology, we now have access to datasets with images consisting of billions of pixels. This introduces new challenges in using deep learning and computer vision to process and utilize such abundant information. With minimal changes to the original code, SambaNova’s DataScale system provides a way to train deep CNN models with gigapixel images efficiently. Other computer vision tasks, such as classification and image superpixel resolution, would benefit greatly from the ability to train models without losing any information. This work is only a sneak peek at what is possible with high-resolution image training.
This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.