Our first anniversary of open source
JuiceFS, which started in 2017, is a cloud-native distributed file system designed to help enterprises solve unstructured data management challenges in multi-cloud, cross-cloud, and hybrid cloud environments including but not limit to data security and protection, big data architecture upgrade, lots of small files, cloud native storage. JuiceFS, with fully-managed services on global public clouds, is fully compatible with POSIX, HDFS, S3 access protocols, and Kubernetes CSI drivers. In order to better build developers’ favorite storage products, we open-sourced JuiceFS on GitHub on January 11, 2021.
Today, JuiceFS has been open sourced for one year!
Last year on this date, we made JuiceFS open source on GitHub. The original thought and intention is very simple: We hope JuiceFS can be known, understood and used by more developers via the open source community. After all, the greatest value of software is to be used. The open-sourced JuiceFS makes users no longer worry about the black box cloud service. Users can download the code by themselves and explore the infinite possibilities of JuiceFS; developers can review the code of JuiceFS, understand, familiarize, trust him from the bottom, and even participate in the development of JuiceFS. We want to create a community with a culture of mutual respect, where users can not only use JuiceFS, but also share the new use case industry-wise; developers can discuss JuiceFS architecture design and have influence on the future direction of the product.
The feedback from developers upon the action of open-sourced JuiceFS has also exceeded our expectations. JuiceFS as a project has appeared on GitHub Trending in the first week. Some developer-centric media platforms including but not limited to Hacker News, InfoQ gave the high praise about the action.
After a year, JuiceFS has made great progress in both the community and the product. However, we know the difficulty of perseverance and will continue to forge ahead with an open and connected mentality.
Comprehensive product upgradement, more open ecosystem
JuiceFS was just open-sourced, the only choice for the metadata engine was Redis. Redis, whose storage is entirely in memory, has many challenges in data reliability and scalability. We have carried out pluggable transformation of the relevant code of the metadata engine, introduced support for relational databases and transactional KV storage like TiKV, solved reliability and scalability issues, and gave users more choices .
We support more than 40 different types of object storage serving as JuiceFS’s data persistence layer, which basically covers the common types deployed in public cloud, edge cloud, private cloud and other environments. Of course, if there is any omission, please open an issue on GitHub, and we will support it as soon as possible. Broadening the ecology of JuiceFS and improving the openness of JuiceFS are our unswerving pursuits.
At the beginning, JuiceFS only supported the most widely used POSIX API. Since then, it has supported HDFS, S3 API, Kubernetes CSI and Windows operating system. We will support more and more flexible access methods in the future. These protocols are dotted into lines, weaving the data islands scattered within the enterprise into a network, better helping enterprises to open up the data of polymorphic business systems, integrating different technical systems, connecting multiple clouds, and helping customers build a more open data storage platform .
JuiceFS also provides metadata backup and import features, allowing users to have more protection and reliability in the face of “accidents”. This feature gives users the ability to back up in JSON format, which improves data readability and ensures data interchangeability between different metadata engines. Finally, the reliable JuiceFS also provides a “recycle bin”, where you can find those accidentally deleted data.
In addition to our continued investment in product openness, we also focus on the openness and ease of use of documentation. We deeply understand that documentation is an important link between users and products! Since JuiceFS was open sourced, we have always adhered to the principle of parallel output of high-quality technology and high-quality documents. In 2021, we have carried out three complete iterations of the document, realizing the continuous transformation of the document from “professional” to “universal”, and then to “experiential”. The work of optimizing the documentation is still ongoing, and efforts are made to ensure that JuiceFS’s documentation can be “used immediately by new users” and “used by existing users with peace of mind”. In addition to the documentation work, JuiceFS has always maintained the compatibility of data formats and communication protocols in the rapid version iteration, ensuring the versions forward compatibility, allowing users to upgrade smoothly.
In the year that JuiceFS was open sourced, the products have also undergone tremendous changes, which also made us more determined to follow the open source route. It is extremely correct, because only an open ecosystem is the most vital.
Enriched use case are implemented, ecological co-construction
During the past year, relying on our open source community, we have completed the connection between developers and code contributions, users and usage scenarios.
In just one year, more than 4,400 developers gave JuiceFS a thumbs up. These developers are not only from China, but also developers from Europe, America, Africa, and even the Middle East. Although the COVID-19 epidemic has cut off our physical connection, the open source community has brought us together to contribute to the JuiceFS community. Over the past year, more than 40 contributors have completed over 800 Pull Requests which is 800 connections we’ve made on GitHub with the developer community. With these 800 connections, JuiceFS has released 16 new versions, and the community of users who have been following JuiceFS silently behind the scenes has given us a lot of motivation while increasing the pressure.
Based on the Slack and WeChat user groups, a user group of more than 1,500 people has been established, and they have participated in 9 activities. Everyone started from the use and returned with 33 technical articles and scene practice about JuiceFS. Here, we connect the scene and the user. File system has a variety of applications. The cornerstone of development,how to combine with other applications,providing outstanding performance and a good experience, the formation of ecological, is an important work of the JuiceFS community. In the past year, JuiceFS has been recognized by everyone in some areas and has made good progress.
Big Data ecosystem
JuiceFS is fully HDFS-compatible and integrates seamlessly with the Hadoop ecosystem, and some customers have already replaced HDFS with an architecture upgrade that separates storage and computation.
- Apache Kylin 4.0 released a solution for building clusters using JuiceFS.
- Leveraging the data lifecycle features of ClickHouse and Elasticsearch, JuiceFS makes it easy to implement tiered data storage, increasing efficiency and reducing costs for users.
JuiceFS multi-access protocol support can eliminate a lot of data migration scheduling work in business processes, and is compatible with all mainstream machine learning and deep learning training frameworks.
- Megvii’s engineering team also contributed to the JuiceFS Python SDK to facilitate accessing JuiceFS data in Serverless environments.
- JuiceFS cache acceleration is the most popular feature for AI training scenarios, and PaddlePaddle has integrated JuiceFS into Paddle Operator for training acceleration.
- UniSound’s engineer team has contributed JuiceFSRuntime to the Fluid community.
- Milvus, a vector search engine, has also released a solution for building distributed clusters based on JuiceFS.
- The Byzer community has also integrated JuiceFS as a cloud-native file system into their solution.
The Kubernetes ecosystem
JuiceFS is very suitable for use as a PV (PersistentVolume), which is a container native storage (Container Native Storage). Community-provided CSI drivers and comprehensive documentation guides are available on the KubeSphere app store, making it equally easy to use in Rancher and cloud-hosted Kubernetes services.
Friends who are using JuiceFS, also hope to feedback your experience and questions to the JuiceFS community, not only can you get support and help, but also let your experience help many people, which is the value and charm of the open source community.
Verified by multi-industry production environments, JuiceFS 1.0 is coming.
For storage systems, reliability always comes first. JuiceFS innovatively saves metadata and data in mature databases and object storage respectively, which guarantees reliability from the very beginning. This is also the reason why many technology companies can put into production environments and ensure stable operation within half a year of JuiceFS release. The key is. Relying on the standard access protocol, JuiceFS uses the existing test sets of the open source community to ensure compatibility and reliability, as well as various unit tests, stress tests, chaos tests and performance tests to ensure rapid product iteration while ensuring that every time Versions are released in high quality.
In the year since JuiceFS was open sourced, many users such as Xiaomi, Shopee, Li Auto, Zhihu, PIESAT, Yaoxin, etc. have deployed JuiceFS in the production environment, stably running for more than half a year.
- Xiaomi uses JuiceFS as the fundamental storage of the AI platform.
- Shopee takes advantage of JuiceFS as a cloud native file storage platform providing service to various lines of business for improving agility and solving business challenges. — Li Auto uses JuiceFS to cut the cost by implementing the separation of storage and calculation of data warehouses.
- Zhihu uses JuiceFS to speed up the initiation of Flink stream computing platform by 4 times.
JuiceFS has been running stably and continuously in the production environment of many Internet and AI enterprises. It not only reduces costs for customers, but also improves the efficiency of data use and shortens the cycle of new business launch. Of course, the built-in data protection and encryption also allow customers to be greatly relieved. In the past year, the number of JuiceFS clusters online every day has also increased steadily, from the initial few to more than 500 now, maintaining a high growth rate. It is worth mentioning that this is only the recorded data, and I believe that there are still many users who have not been exposed to our vision.
After comprehensive evaluation and validation in various use cases, JuiceFS community will release JuiceFS v1.0-beta today under the support, validation and continuous feedback from many industries such as Internet, autonomous driving, gene sequencing, fintech, smart manufacturing, etc. at home and abroad, as well as the majority of community developers, and will release it after improvement based on feedback v1.0-GA.
Rethinking the open source license
Back in the beginning of the 2021 release, JuiceFS only supported accessing data via POSIX after mounting, applications accessed data through the kernel and did not need to deal directly with JuiceFS, applications were not affected by the GPL family of licenses, so the most widely used GPL license in the file storage world (AGPL v3) was used at that time.
As JuiceFS continued to iterate, additional access protocols and SDKs were introduced (S3-compatible HTTP protocol and HDFS-compatible Java SDK), influencing users to develop commercial products based on them. At the same time, there are some open source communities and developers who want to integrate JuiceFS into their projects as a storage base, but the compatibility of AGPL v3 with other open source protocols (such as Apache protocol) is not very good, which prevents more people from enjoying the many benefits JuiceFS offers such as multi-protocol interoperability and efficient caching system.
Therefore, for the sake of our original intention — to build the most favorite storage product for developers, the Juicedata team decided to change the license to Apache 2.0 since JuiceFS v1.0.
Redefining file storage for a promising future
The research and development of open source products requires continuous capital investment. The commercial services we have spent 4 years verifying are also growing rapidly, providing continuous and reliable financial support for the development of JuiceFS. Open source is our sea of stars, and commercialization protects it.
JuiceFS v1.0 is an important milestone, representing that it can be used in various scenarios of production environments with confidence and start to accept more and more demanding challenges. The community will continue to invest more and more to bring more valuable features, such as the most requested quota management, Snapshot, support for more data engines, etc.
With the rapid growth of data size, distributed file systems are becoming more and more important. JuiceFS innovatively separates metadata and data storage, and reuses existing mature database and object storage infrastructure as much as possible, and the access protocol is compatible with all mainstream interfaces, which significantly reduces the system complexity and threshold of distributed file systems and redefines the way distributed file systems are built. It redefines the way distributed file systems are built, and through a set of systems and different components, it can meet the unstructured storage needs of different scales and scenarios. At the same time, JuiceFS is a completely cloud-native design, which can be well connected with the ecology on the cloud, in line with the general trend of cloud storage development, and has a very wide application prospect.
Although JuiceFS has done a lot of subtraction to avoid duplication of wheels, building a mature and reliable storage product still requires a huge engineering investment. We have further grown our team of engineers over the past year, many of whom have joined the Juicedata team from the JuiceFS community, and we welcome more like-minded friends to join us to create a new era of distributed file storage.
The development of open source products requires continuous financial investment, and our commercialization services, which have been verified for 4 years, are growing rapidly to provide continuous and reliable financial guarantee for the development of JuiceFS. Open source is our ocean of stars, and commercialization is our escort for it.
Long and difficult as the journey may be, sustained actions will take us to the destination！