October 28, 2024 by JooHyeon Heo, UNIST
Collected at: https://techxplore.com/news/2024-10-lightweight-ai-automation-advances-technology.html
Professor Jaejun Yoo and his research team from the Graduate School of Artificial Intelligence at UNIST recently presented their pioneering work on the future of artificial intelligence (AI) technology at the European Conference on Computer Vision (ECCV 2024).
ECCV serves as a gathering place for researchers from around the world to share their research results, exchange information, and discuss the future of computer vision industries and technologies. At this forum, the team showcased three significant research papers that highlight innovative achievements in enhancing AI performance, reducing model sizes, and automating design processes using multimodal AI techniques.
One of the major accomplishments involves the compression of generative adversarial networks (GANs) for image generation by an astounding factor of 323, all while maintaining performance quality. By employing knowledge distillation techniques, the researchers demonstrated the potential for efficient AI utilization even on edge devices or low-power computers, eliminating the need for high-performance computing resources.
Professor Yoo remarked, “Our research has proven that a GAN compressed by 323 times smaller can still generate high-quality images comparable to existing models. This breakthrough paves the way for deploying high-performance AI in edge computing environments and on low-power devices.”
Yeo Sang-yeop, first author of the study “Nickel and Diming Your GAN: A Dual-Method Approach to Enhancing GAN Efficiency via Knowledge Distillation,” posted to the arXiv preprint server, added, “We aim to significantly broaden the scope of AI applications by enabling the implementation of high-performance AI capabilities with limited resources.”
The team introduced two innovative techniques, the Distribution Matching for Efficient compression (DiME) and the Network Interactive Compression via Knowledge Exchange and Learning (NICKEL), designed to enhance model stability by comparing distributions rather than evaluating images individually.
The NICKEL approach optimizes the interaction between the generator and the classifier, enabling the maintenance of high performance in a lightweight model. The combination of these techniques allowed the compressed GAN model to continue producing high-quality images similar to those generated by larger counterparts.
In another significant advancement, Professor Yoo and his team developed a hybrid video generation model, HVDM, capable of efficiently producing high-resolution videos even in environments with limited computational resources. By integrating a 2D triple-lane representation with a 3D wavelet transformation, HVDM adeptly processes both global context and intricate details within images. This paper is also posted to the arXiv preprint server.
While existing video generation models have relied heavily on high-performance computing resources, HVDM successfully implements natural, high-quality images, overcoming the limitations associated with traditional CNN-based autoencoder methods.
The researchers validated HVDM’s superiority through rigorous testing on benchmark video datasets, including UCF-101, SkyTimelapse, and Tai Chi, where HVDM consistently demonstrated higher quality videos and realistic details.
Professor Yoo emphasized, “HVDM represents a transformative model that can efficiently generate high-resolution videos, even in resource-constrained environments, with applications extending widely across industries such as video production and simulation.”
In a third paper also posted to arXiv, the research team also introduced a multi-modal layout generation model designed to automate the production of advertising banners and web UI layouts with minimal data input. This model processes images and text simultaneously, generating appropriate layouts based solely on user input.
Previous models have struggled to adequately integrate text and visual information due to limited data resources. The new model addresses this limitation, significantly enhancing the practicality of advertising design and web UI creation. By maximizing the interaction between text and images, it automatically produces optimized designs that seamlessly reflect both visual and textual elements.
To enable this functionality, the team transformed layout information into HTML code. Leveraging extensive pre-training data from language models, they established an automated generation pipeline that yields exceptional results, even with sparse datasets. Benchmark evaluations revealed performance improvements of up to 2,800% compared to existing methodologies.
In the pre-training process, the team utilized the image caption dataset, combining depth-map and ControlNet techniques to enhance performance through data augmentation. This approach significantly improved the quality of layout generation and created natural designs by reducing potential distortions that may occur during data preprocessing.
“Our model outperforms existing solutions that require over 60,000 data points, showing effective results with as few as 5,000 samples,” noted Professor Yoo. “This innovation is accessible not only to experts but also to everyday users, signaling significant advancements in the automation of advertising banners and web UI design.”
More information: Sangyeop Yeo et al, Nickel and Diming Your GAN: A Dual-Method Approach to Enhancing GAN Efficiency via Knowledge Distillation, arXiv (2024). DOI: 10.48550/arxiv.2405.11614
Kihong Kim et al, Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation, arXiv (2024). DOI: 10.48550/arxiv.2402.13729
Jaejung Seol et al, PosterLlama: Bridging Design Ability of Langauge Model to Contents-Aware Layout Generation, arXiv (2024). DOI: 10.48550/arxiv.2404.00995
Journal information: arXiv
Leave a Reply