New AI does video colorizer in real-time. A new deep learning system enables editors to colorize a whole video simply by changing the color of one frame in the scene. It’s incredibly precise, efficient, and up to 50 times faster than previous methods. Videos include a lot of redundant data in-between frames, and painstakingly colorizing each black and white frame takes a long time. These types of redundancies have been studied extensively in video encoding and compression, but they have received less attention in sophisticated video processing such as colorizing a clip.
To propagate data, a variety of algorithms (such as the bilateral CNN model, similarity-guided filtering, and optical flow-based warping) process local associations between consecutive frames. To model the similarities between frames and pixels, they may use apparent motion or pre-designed pixel-level properties.
These algorithms, on the other hand, have several drawbacks, including the inability to represent high-level relationships between frames and the inability to effectively reflect the picture’s structure. To address these issues, NVIDIA researchers developed a novel deep learning-based system that allows editors to colorize a whole clip by colorizing a single frame in the scene.
Researchers designed a temporal propagation network that consists of a propagation component for transferring the attributes (such color) of one frame to another to explicitly learn high-level similarity between subsequent frames. It accomplishes this by employing a linear transformation matrix controlled by a convolutional neural network (CNN).
The CNN determines which colors from the colorized frame should be transmitted to the other black and white frames and fills them in. You might be wondering how this strategy differs from others. Better colorization can be achieved by using an interactive approach in which the editor annotates a piece of an image, which results in a finished product.
Researchers established two principles for learning propagation in the temporal domain. For starters, flame propagation must be invertible. Second, the target element must be kept intact throughout the procedure. They demonstrated that the proposed strategy may yield decent results that are equivalent to existing state-of-the-art procedures without using any image-based segmentation methods.
Researchers used NVIDIA Titan XP GPUs to train this network. For high dynamic range, color, and mask propagation, it was trained on hundreds of clips from multiple datasets. The network is based on the ACT dataset, which contains around 600,000 frames in 7,260 video sequences.