Pruna AI, a European startup specializing in AI model compression, has announced that it is making its AI model optimization framework open source. The move aims to help developers enhance AI efficiency through methods like caching, pruning, quantization, and distillation.
Pruna AI’s framework simplifies the compression process by standardizing the saving, loading, and evaluation of optimized models. According to John Rachwan, co-founder and CTO of Pruna AI, the framework ensures that developers can gauge both quality retention and performance improvements after applying compression.

A Hugging Face for AI Efficiency?
Pruna AI’s approach mirrors how Hugging Face revolutionized transformer and diffuser models—by providing an accessible and standardized way to implement and fine-tune efficiency techniques. Leading AI labs, including OpenAI, have long leveraged distillation to create optimized versions of their models, such as GPT-4 Turbo.
Distillation involves training a smaller, more efficient AI model (the “student”) using knowledge extracted from a larger model (the “teacher”). This technique helps retain essential capabilities while significantly reducing computational costs.
Addressing a Major Gap in AI Model Optimization Framework
While large companies develop in-house compression methods, open-source tools have traditionally focused on isolated techniques, such as single quantization or caching methods. Pruna AI aims to bridge this gap by offering an all-in-one optimization platform that integrates multiple compression techniques seamlessly.
The framework supports a variety of AI models, including large language models, diffusion models, speech-to-text systems, and computer vision models. However, the company is currently prioritizing applications in image and video generation.
AI Optimization Made Easy with Automation
One of Pruna AI’s most anticipated features is a compression agent, set to launch soon. This agent will allow developers to set optimization goals—such as improving model speed while maintaining accuracy—and automatically apply the best combination of compression techniques.
“The agent will just do its magic,” Rachwan explained. “You don’t have to do anything as a developer. It will find the best combination for you and return the optimized model.”
Enterprise Offerings and Monetization
While Pruna AI is providing an open-source version of its framework, it also offers a pro version with advanced optimization features, charged on an hourly basis—similar to renting GPU resources on cloud services like AWS.
Companies using AI-driven models at scale can benefit significantly from Pruna AI’s framework. For example, the startup successfully compressed a Llama model to eight times its original size with minimal accuracy loss, leading to massive inference cost reductions.
Pruna AI recently secured $6.5 million in seed funding from investors such as EQT Ventures, Daphni, Motier Ventures, and Kima Ventures. With its innovative optimization framework, the company aims to position itself as a game-changer in AI model efficiency.