It was a trope all too familiar in the 1990s — law enforcement in movies and TV taking a pixellated, blurry image, and hitting the magic “enhance” button to reveal suspects to be brought to justice. Creating data where there simply was none before was a great way to ruin immersion for anyone with a modicum of technical expertise, and spoiled many movies and TV shows.
Of course, technology marches on and what was once an utter impossibility often becomes trivial in due time. These days, it’s expected that a sub-$100 computer can easily differentiate between a banana, a dog, and a human, something that was unfathomable at the dawn of the microcomputer era. This capability is rooted in the technology of neural networks, which can be trained to do all manner of tasks formerly considered difficult for computers.
With neural networks and plenty of processing power at hand, there have been a flood of projects aiming to “enhance” everything from low-resolution human faces to old film footage, increasing resolution and filling in for the data that simply isn’t there. But what’s really going on behind the scenes, and is this technology really capable of accurately enhancing anything?
An Educated Guess
We’ve featured neural networks doing such feats before, such as the DAIN algorithm that upscales footage to 60FPS. Others, like [Denis Shiryaev], combine a variety of tools to colorize old footage, smooth out frame rates, and upscale resolutions to 4K. Neural networks can do all this and more, and fundamentally, the method is the same at the basic level. For example, to create a neural network to upscale footage to 4K resolution, it must first be trained. The network learns from image pairs, with a low-resolution picture and the corresponding high-resolution original. It then attempts to find transformation parameters that take the low-resolution data and produce a result corresponding as closely as possible to the high-resolution original. Once appropriately trained on a large enough number of images, the neural network can then be used to apply similar transformations to other material. The process is similar for increasing frame rates, and even colorization, too. Show a network color content, and then show it the black and white version. With enough training, it can develop algorithms to apply likely colors to other black and white footage.
The important thing to note about this technology is that it’s merely using a wide base of experience to produce what it thinks is appropriate. It’s not dissimilar from a human watching a movie, and guessing at the ending after having seen many similar tropes in other films before. There’s a high likelihood the guess will be in the ballpark, but no guarantee it’s 100% correct. This is a common thread in using AI for upscaling, as explained by the team behind the PULSE facial imaging tool. The PULSE algorithm synthesizes an image based on a very low-resolution input of a human face. The algorithm takes its best guess on what the original faces might have looked like, based on the data from its training set, checking its work by re-downscaling to see if the result matches the original low-resolution input. There’s no guarantee the face generated has any real resemblance to the real one, of course. The high-resolution output is merely a computer’s idea of a realistic human face that could have been the source of the low resolution image. The technique has even been applied to video game textures, but results can be mixed. A neural network doesn’t always get the guess right, and often, a human in the loop is required to refine the output for best results. Sometimes the results are amusing, however.
It remains a universal truth that when working with low-resolution imagery, or black and white footage, it’s not possible to accurately fill in data that isn’t there. It just so happens that with the help of neural networks, we can make excellent guesses that may seem real to a casual observer. The limitations of this technology come up more often then you might think, too. Colorization, for example, can be very effective on things like city streets and trees, but performs very poorly on others, such as clothing. Leaves are usually some shade of green, while roads are generally grey. A hat, however, could be any color; while a rough idea of shade can be gleaned from a black and white image, the exact hue is lost forever. In these cases, neural networks can only take a stab in the dark.
Due to these reasons, it’s important not to consider footage “enhanced ” in this way as historically relevant. Nothing generated by such an algorithm can be definitively trusted to have basis in truth. Take a colorized film of a political event as an example. The algorithm could change subtle details such as the color of a lapel pin or banner, creating the suggestion of an allegiance with no basis in fact. Upscaling algorithms could create faces with an uncanny resemblance to historical figures that may never have been present at all. Thus, archivists and those who work on restoring old footage eschew such tools as anathema to their cause of maintaining an accurate recording of history.
Genuine perceived quality is also an issue. Comparing a 4K upscaled film from Paris in 1890 simply pales in comparison to footage shot with a genuine 1080p camera in New York in 1993. Even a powerful neural network’s best guess struggles to measure up against high quality raw data. Of course, one must account for over 100 years worth of improvement in camera technology as well, but regardless, neural networks won’t be replacing quality camera gear anytime soon. There’s simply no substitute for capturing good data in high quality.
Applications do exist for “enhancement” algorithms; one can imagine the interest from Hollywood in upscaling old footage for use in period works. However, the use of such techniques for purposes such as historical analysis or law enforcement purposes is simply out of the question. Computer fabricated data simply bears no actual connection to reality, and thus can’t be used in such fields to seek the truth. This won’t necessarily stop anyone trying, however. Thus, having a strong understanding of the underlying concepts of how such tools work is key for anyone looking to see through the smoke and mirrors.