JuiceFS is an open source enterprise distributed file system that uses object storage and database as the storage layer and supports almost all object storage services as well as databases such as Redis, MySQL, PostgreSQL, TiKV and so on. Any file deposited into JuiceFS is split into data blocks according to specific rules and stored in the object storage, and the corresponding metadata is stored in a separate database. There are no geographical or platform restrictions, and any server with access to the object storage and database can mount and use the storage via the JuiceFS client.
JuiceFS provides a variety of access interfaces including POSIX, Java SDK, CSI Driver, S3 Gateway, etc. From standard operating systems, Hadoop ecosystem, Kubernetes container platform to web applications, all can seamlessly interface to use JuiceFS to persistent data. Simply put, JuiceFS reliably connects massive cloud storage to local, providing nearly unlimited storage space. For systems and applications, using JuiceFS storage is indistinguishable from using local disk.
Requirements
Amazon AWS is the world’s leading cloud computing platform, offering almost all types of cloud computing services. Thanks to the rich product line of AWS, users can choose JuiceFS components in a very flexible way.
As you can see from the previous architecture, JuiceFS consists of the following three components:
- a JuiceFS client installed on the server
- the object storage used to store data
- a database for storing metadata
1. Servers
Amazon EC2 Cloud Server is one of the most basic and widely used cloud services on the AWS platform. It offers more than 400 instance sizes and 81 availability zones in 25 data centers around the world, giving users the flexibility to choose and adjust the configuration of EC2 instances according to their actual needs.
For new users, you don’t need to think too much about JuiceFS configuration requirements, because even the least configured EC2 instances can be easily created and mounted to use JuiceFS storage. Usually, you only need to consider the hardware requirements of your business system.
JuiceFS clients will occupy 1GB of disk as cache by default. When dealing with a large number of files, the client will cache the data on disk first and then upload it to the object storage asynchronously. Choosing a disk with higher IO and reserving and setting a larger cache will allow JuiceFS to have better performance.
2. Object Storage
Amazon S3 is the de facto standard for public cloud object storage services, and the object storage services provided by other major cloud platforms are usually compatible with the S3 API, which allows programs developed for S3 to freely switch between object storage services of other platforms.
JuiceFS fully supports Amazon S3 and all S3-like object storage services, and you can see the documentation for all storage types supported by JuiceFS setup_object_storage.md).
Amazon S3 offers a range of storage classes suitable for different use cases, the main ones being
- Amazon S3 STANDARD: general-purpose storage for frequently accessed data
- Amazon S3 STANDARD_IA: for data that is needed for a long time but accessed less frequently
- S3 Glacier: for data that is archived over time
The standard type of S3 should usually be used for JuiceFS, because other types than the standard type are less expensive but incur additional costs when retrieving data.
In addition, access to the object storage service requires user authentication via access key
and secret key
, which you can refer to the document Controlling Access to Storage Buckets with User Policies userguide/walkthrough1.html) to create it. When accessing S3 through EC2 cloud server, you can also assign IAM role to EC2 to enable key-free invocation of S3 API on EC2.
3. Database
The ability of data and metadata to be accessed by multiple hosts is key to a distributed file system, and in order for the metadata information generated by JuiceFS to be accessible via Internet requests like S3, the database for storing metadata should also be chosen as a network-oriented database.
Amazon RDS and ElastiCache are two cloud database services provided by AWS, both of which can be directly used for metadata storage in JuiceFS. Amazon RDS is a relational database that supports various engines such as MySQL, MariaDB, PostgreSQL, etc. ElastiCache is a memory-based caching cluster service which has two engines, the Redis engine is suite for JuiceFS.
In addition, you can also build your own database on EC2 cloud server for JuiceFS to store metadata.
4. Cautions
- JuiceFS is not business invasive and will not affect the normal operation of existing systems.
- When selecting cloud services, it is recommended to select all cloud services in the same region, which is equivalent to all services being on the same intranet, with the lowest latency and fastest inter-access. And, according to AWS billing rules, it is free to transfer data between basic cloud services in the same region. In other words, when you select cloud services in different regions, for example, EC2 is selected in
ap-east-1
, ElastiCache is selected inap-southeast-1
, and S3 is selected inus-east-2
, the inter-access between each cloud service in this case will incur traffic charges. - JuiceFS does not require the use of object storage and databases from the same cloud platform; you can flexibly mix and match cloud services from different platforms as needed. For example, you can use EC2 to run JuiceFS client with AliCloud’s Redis database and Backbalze B2 object storage. Of course, JuiceFS storage composed of cloud services on the same platform and in the same region will perform even better.
Deployment and Usage
Next, we briefly describe how to install and use JuiceFS using the ElastiCache cluster with EC2 cloud server, S3 object storage and Redis engine in the same region as an example.
1. Install the client
Here we are using a Linux system with x64 bit architecture. Execute the following commands, the latest version of JuiceFS client will be downloaded.
$ JFS_LATEST_TAG=$(curl -s https://api.github.com/repos/juicedata/juicefs/releases/latest | grep 'tag_name' | cut -d '"' -f 4 | tr -d 'v')$ wget "https://github.com/juicedata/juicefs/releases/download/v${JFS_LATEST_TAG}/juicefs-${JFS_LATEST_TAG}-linux-amd64.tar.gz"
After downloading, unzip the program into the juice
folder.
$ mkdir juice && tar -zxvf "juicefs-${JFS_LATEST_TAG}-linux-amd64.tar.gz" -C juice
Install the JuiceFS client to the system $PATH, e.g., /usr/local/bin
.
$ sudo install juice/juicefs /usr/local/bin
Execute the command and see the returned help message, which means the client installation is successful.
$ juicefs
NAME:
juicefs - A POSIX file system built on Redis and object storage.
USAGE:
juicefs [global options] command [command options] [arguments...]
VERSION:
0.17.0 (2021-09-24T04:17:26Z e115dc4)
COMMANDS:
format format a volume
mount mount a volume
umount unmount a volume
gateway S3-compatible gateway
sync sync between two storage
rmr remove directories recursively
info show internal information for paths or inodes
bench run benchmark to read/write/stat big/small files
gc collect any leaked objects
fsck Check consistency of file system
profile analyze access log
stats show runtime statistics
status show status of JuiceFS
warmup build cache for target directories/files
dump dump metadata into a JSON file
load load metadata from a previously dumped JSON file
help, h Shows a list of commands or help for one command
GLOBAL OPTIONS:
--verbose, --debug, -v enable debug log (default: false)
--quiet, -q only warning and errors (default: false)
--trace enable trace log (default: false)
--no-agent Disable pprof (:6060) and gops (:6070) agent (default: false)
--help, -h show help (default: false)
--version, -V print only the version (default: false)
COPYRIGHT:
AGPLv3
Hint: If you execute the
juicefs
command and the terminal returnscommand not found
, it may be because the/usr/local/bin
directory is not in the system'sPATH
executable path. You can use theecho $PATH
command to check the system's set executable path and reinstall the client to the correct location. You can also add/usr/local/bin
to thePATH
.
JuiceFS has good cross-platform compatibility and is supported on both Linux, Windows and macOS. If you need to know how to install it on other systems, please check the official documentation.
3. Create File System
The format
subcommand of the JuiceFS client is used to create (format) the file system, here we use S3 as the data store and ElastiCache as the metadata store, install the client on EC2 and create the JuiceFS file system with the following command format.
$ juicefs format \
--storage s3 \
--bucket https://<bucket>.s3.<region>.amazonaws.com \
--access-key <access-key-id> \
--secret-key <access-key-secret> \
redis://[<redis-username>]:<redis-password>@<redis-url>:6379/1 \
mystor
Option Description:
--storage
: Specify the type of object storage, here we use S3. For other object storage, please refer to the JuiceFS Supported Object Storage and Setup Guide.--bucket
: Bucket domain for object storage.--access-key
and--secret-key
: The secret key pair to access the S3 API.
For Redis 6.0 and above, authentication requires both username and password, and the address format is redis://username:password@redis-server-url:6379/1. For Reids 4.0 and 5.0, authentication requires only the password, and the username needs to be left blank when setting the Redis server address. For example: redis://:password@redis-server-url:6379/1
When using the IAM role to bind to EC2, you only need to specify --storage
and --bucket
options, and do not need to provide the API access key. It is also possible to assign ElastiCache access to the IAM role, and then instead of providing Redis authentication information, you can simply enter the Redis URL, which can be rewritten as
$ juicefs format \
--storage s3 \
--bucket https://herald-demo.s3.<region>.amazonaws.com \
redis://herald-demo.abcdefg.0001.apse1.cache.amazonaws.com:6379/1 \
mystor
Seeing output like the following means that the file system was created successfully.
2021/10/14 08:38:32.211044 juicefs[10391] <INFO>: Meta address: redis://herald-demo.abcdefg.0001.apse1.cache.amazonaws.com:6379/1
2021/10/14 08:38:32.216566 juicefs[10391] <INFO>: Ping redis: 383.789µs
2021/10/14 08:38:32.216915 juicefs[10391] <INFO>: Data use s3://herald-demo/mystor/
2021/10/14 08:38:32.412112 juicefs[10391] <INFO>: Volume is formatted as {Name:mystor UUID:21a2cafd-f5d8-4a76-ae4d-482c8e2d408d Storage:s3 Bucket:https://herald-demo.s3.ap-southeast-1.amazonaws.com AccessKey: SecretKey: BlockSize:4096 Compression:none Shards:0 Partitions:0 Capacity:0 Inodes:0 EncryptKey:}
4. Mount the file system
The process of creating the file system will store the object store including API keys into the database, so you do not need to input the bucket domain and the secret key of the object storage when mounting.
Use the mount
subcommand of the JuiceFS client to mount the file system to the /mnt/jfs
directory.
$ sudo juicefs mount -d redis://[<redis-username>]:<redis-password>@<redis-url>:6379/1 /mnt/jfs
Note: When mounting the file system, only the database address is required, not the file system name. The default cache path is
/var/jfsCache
, please make sure the current user has enough read/write permissions.
You can optimize JuiceFS by adjusting the mount parameter, for example by -- cache-size
to change the cache to 20GB.
$ sudo juicefs mount --cache-size 20480 -d redis://herald-demo.abcdefg.0001.apse1.cache.amazonaws.com:6379/1 /mnt/jfs
Seeing output like the following means the file system was mounted successfully.
2021/10/14 08:47:49.623814 juicefs[10601] <INFO>: Meta address: redis://herald-demo.abcdefg.0001.apse1.cache.amazonaws.com:6379/1
2021/10/14 08:47:49.628157 juicefs[10601] <INFO>: Ping redis: 426.127µs
2021/10/14 08:47:49.628941 juicefs[10601] <INFO>: Data use s3://herald-demo/mystor/
2021/10/14 08:47:49.629198 juicefs[10601] <INFO>: Disk cache (/var/jfsCache/21a2cafd-f5d8-4a76-ae4d-482c8e2d408d/): capacity (20480 MB), free ratio (10%), max pending pages (15)
2021/10/14 08:47:50.132003 juicefs[10601] <INFO>: OK, mystor is ready at /mnt/jfs
Using the df
command, you can see how the filesystem is mounted.
$ df -Th
File system type capacity used usable used% mount point
JuiceFS:mystor fuse.juicefs 1.0P 64K 1.0P 1% /mnt/jfs
Once mounted, it can be used like a local disk, and the data stored in the /mnt/jfs
directory is coordinated by the JuiceFS client and eventually stored in the S3 object store.
Multi-Host Sharing: JuiceFS supports being mounted by multiple hosts at the same time, you can install the JuiceFS client on any cloud server on any other platform using
redis://:<your-redis-password>@herald-sh-abc.redis.rds.aliyuncs.com:6379/1
The database address can be shared by mounting the filesystem, but you need to make sure that the host on which the filesystem is mounted has proper access to the database and the S3 used with it.
5. Uninstall JuiceFS Storage
The file system can be unmounted using the umount
command provided by the JuiceFS client, e.g.
$ sudo juicefs umount /mnt/jfs
Note: Forced unmount of the file system in use may result in data corruption or loss, so please be sure to proceed with caution. For more information, please refer to the official documentation.
6. Auto-mount on boot
If you don’t want to re-mount JuiceFS storage manually every time you reboot your system, you can set up an automatic mount.
First, you need to rename the juicefs
client to mount.juicefs
and copy it to the /sbin/
directory.
$ sudo cp juice/juicefs /sbin/mount.juicefs
Edit the /etc/fstab
configuration file and add a new record.
redis://[<redis-username>]:<redis-password>@<redis-url>:6379/1 /mnt/jfs juicefs _netdev,cache-size=20480 0 0
The mount option cache-size=20480
means to allocate 20GB local disk space for JuiceFS cache, please decide the allocated cache size based on your actual EBS disk capacity.
You can adjust the FUSE mount options in the above configuration as needed, for more details please check the documentation.
Note: Please replace the Redis address, mount point, and mount options in the above configuration file with your actual information.
Summary
This article provides a complete introduction to the deployment and usage of JuiceFS on AWS from architecture to usage, which is a valuable reference for users who need to expand storage space for applications on the cloud or need elastic storage space for data backup, archiving, and disaster recovery.
In addition to the use on standard operating systems introduced in this article, JuiceFS also supports mounting and use on Hadoop Big Data ecosystem and Kubernetes container orchestration platform, which will be specifically introduced in subsequent articles.