

The result is a cleaner, crisper, and much more attractive output image whose size is the same as the input. So as a first step, we use a sparse transformer which we scaled up to support extremely large sequence lengths (to adequately process the large context of an image) to “enhance” the image. We believe this is because of the large datasets we trained on and the superior long-range memory capabilities of transformer architectures. For example, noise around a person’s face was handled differently than the noise on a highly textured photo of a forest. When testing with very noisy images like highly compressed photos or aerial photos taken from long range satellites, we found that transformer-based models did a very good job at “cleaning” up the noise tailored to what was in the image. Ultimately, we incorporated the positive aspects of both architectures by breaking the problem space into two stages: 1) Enhance and 2) Zoom.ĭeepEnhance – Cleaning and Enhancing Images Given the Microsoft Turing team’s expertise and success applying transformers in large, language models and our recent use of transformers in our multi-modal Turing Bletchley model, we experimented with transformers and found it had some compelling advantages (along with some disadvantages).

#Super resolution photoshop 2022 how to#
We found that by randomly applying distortions on our training input images such as blurring, compression, and gaussian noise, our model learned how to recover details for a much wider set of low-quality images. While this approach produced good results in many cases, we found that it was not robust enough to handle many of the “true” low-resolution images we were testing with, like ones coming from the web or old cameras. We then trained the model with the downscaled image as input and an objective of recovering the ground truth as closely as possible. Noise modeling: In the first version of our training recipe, we created low-resolution and high-resolution pairs for training by simply taking a large number of high-quality images and downscaling them.In response, we built a side-by-side evaluation tool that measured the preferences of human judges and we used this tool as the north star metric to build and improve our model.

Human eyes as the north star: The metrics widely used in industry and academic models, PSNR and SSIM, were not always aligned with the perception and preference of the human eye and also required a ground truth image to be computed.In our exploration of the super-resolution problem space, there were four major findings that played a key role in the development of our state-of-the-art (SOTA) model: We have seen early promising feedback from our users and are continuing to improve the experience as it scales to serve all images on the internet! In Microsoft Edge, we are starting to roll out the model to allow users to enhance the images they see on the web with the goal of turning Microsoft Edge into the best browser for viewing images on the web.In addition, we are able to bring this experience not only to our Bing Maps users, but also to our customers on Azure who leverage Azure Maps satellite imagery for their own products and services. We have rolled out the model to most of the world’s land area, benefiting the vast majority of our users across the globe. In Bing Maps, we are using the model to improve the quality of our aerial imagery for our users around the world! The model is robust to all types of terrain and in our side-by-side tests with users, the super-resolved aerial images are preferred over the original imagery 98% of the time.At Microsoft we have been steadily rolling out this capability in a variety of our products and have seen a strong positive response from our users: We call this model Turing Image Super-Resolution (T-ISR), another addition to the Microsoft Turing family of image and language models powering various AI experiences across the company.
