2023 Recap: JuiceFS’ Journey in AI and Data Storage

JuiceFS
4 min readJan 9, 2024

The year 2023 witnessed the emergence of generative AI as a transformative moment, marking substantial growth for JuiceFS in the AI domain. This growth brought in new users and diverse application scenarios, prompting a series of impactful transformations.

Product

Throughout the year, JuiceFS Community Edition underwent 8 significant releases, with v1.1 LTS (Long-Term Support) being the most noteworthy. Fully compatible with v1.0, v1.1 introduced eagerly anticipated functionalities such as directory statistics, quotas, and cloning. Following the Golang model, JuiceFS embraced a dual LTS version maintenance approach, ensuring continuous support for LTS versions while facilitating swift product evolution. Learn more about v1.1 here.

To enhance the user experience within Kubernetes environments, we made substantial optimizations to JuiceFS CSI Driver, issuing 10 versions. These updates included crucial features such as the introduction of JuiceFS CSI Dashboard and support for data migration.

Documentation, being an indispensable resource, underwent continuous expansion and refinement in the past year. Structural adjustments and version cues for new features were incorporated into the Command Reference document to assist users in finding and using commands. Additionally, visual representations were added to elucidate the technical architecture and read/write processes, making underlying storage principles easier to understand. Presently, the JuiceFS documentation site receives 100,000+ visits per month.

The evolution of JuiceFS Community Edition is indebted to user feedback and contributions. Since its open-source debut in January 2021, our project has garnered 1,100+ issues, with 90% of them successfully resolved. More than 2,700 pull requests have been submitted, engaging 100+ contributors in the JuiceFS project.

Ecosystem

In the dynamic landscape of data management and storage solutions, we’ve strategically aligned ourselves with key projects and platforms, fostering collaborative partnerships and expanding our reach across diverse ecosystems:

  • TiKV, a CNCF graduated project, has become a popular choice for JuiceFS metadata engines.
  • An increasing number of users are adopting the CNCF Sandbox project Fluid+JuiceFS for managing and scheduling AI datasets.
  • JuiceFS extended support to the CNCF Sandbox project Dragonfly, accelerating AI model distribution and deployment through P2P technology.
  • We’ve provided storage support for Byzer, simplifying data mining and AI modeling.
  • Integration with data lakes like Hudi, Iceberg, and Delta Lake allowed users to build unified storage solutions.

In 2023, JuiceFS Enterprise Edition underwent significant improvements to cater better to high-performance scenarios. The latest release, v5.0, introduced various features, including shared block devices to enhance high-load small file write performance, transparent cache acceleration for object storage, periodic file format conversion to objects, and numerous optimizations for distributed cache management. Explore more about JuiceFS Enterprise Edition 5.0 here.

Community development

JuiceFS Cloud Service was launched in 2017, and the Community Edition was open-sourced in January 2021. Over the past years, our user base has continued to grow. Currently, the JuiceFS repository has gained 9.1k stars on GitHub, with increasing discussions and use cases in various industries.

Reported statistics from Community Edition users indicate remarkable growth:

  • The number of JuiceFS file systems exceeded 3,400 (100% growth).
  • Active clients surpassed 35,000 (400% growth).
  • Data volume reached 138 PiB (180% growth).
  • File count totaled 69.7 billion (120% growth).

JuiceFS found applications in the generative AI field, with users like MiniMax, working on AI solutions similar to OpenAI’s ChatGPT. Stable Diffusion model sharing communities like SeaArt and LiblibAI, along with SaaS services like LeptonAI, BentoML, and Diffus, have JuiceFS embedded in their operations.

Besides, JuiceFS also finds applications in various AI-applying industries like autonomous driving, financial quant trading, consumer electronics, biomedicine, and social platforms. Notable users include Momenta, vivo, Xiaomi, DP Technology, MemVerge, Xiaohongshu, and Zhihu. We are honored to serve these industry leaders.

These impactful case studies and technical articles highlight JuiceFS’ effectiveness in addressing diverse challenges and delivering tangible benefits across a wide range of applications and industries:

AI, machine learning, and deep learning:

Big data:

Numerous JuiceFS articles were published on renowned technical platforms like InfoQ and DZone, highlighting our commitment to sharing valuable insights and expertise in the field. At the end of 2023, we received a case study submission from NAVER, South Korea’s largest search engine, showcasing JuiceFS as the storage foundation for their AI platform.

What’s next

Looking ahead, we remain committed to advancing our technology, fostering global partnerships, and serving as a reliable solution for data management needs. We appreciate the trust and support from our growing community and industry leaders, and we eagerly anticipate the exciting developments that lie ahead in our journey. Thank you for being a part of the JuiceFS story, and here’s to continued innovation and success in the future!

--

--

JuiceFS

JuiceFS(https://github.com/juicedata/juicefs) is a distributed POSIX file system built on top of Redis and S3.