Deploy and use JuiceFS to store data on DigitalOcean

JuiceFS
11 min readOct 18, 2021

If your business systems are consistently producing large amounts of unstructured data and local storage is starting to look overwhelming, it is necessary to consider the open source JuiceFS to store your data.

What is JuiceFS

JuiceFS is an open source enterprise distributed file system that uses object storage and database as the storage layer and supports almost all object storage services as well as databases such as Redis, MySQL, PostgreSQL, TiKV and so on. Any file deposited into JuiceFS is split into data blocks according to specific rules and stored in the object storage, and the corresponding metadata is stored in a separate database. There are no geographical or platform restrictions, and any server with access to the object storage and database can mount and use the storage via the JuiceFS client.

JuiceFS provides a variety of access interfaces including POSIX, Java SDK, CSI Driver, S3 Gateway, etc. From standard operating systems, Hadoop ecosystem, Kubernetes container platform to web applications, all can seamlessly interface to use JuiceFS to persistent data. Simply put, JuiceFS reliably connects massive cloud storage to local, providing nearly unlimited storage space. For systems and applications, using JuiceFS storage is indistinguishable from using local disk.

JuiceFS is designed for the cloud, using the cloud platform out-of-the-box storage and database services, as soon as a few minutes to complete the configuration into use, this article on the DigitalOcean platform as an example, to introduce how to quickly and easily install and use JuiceFS in the cloud computing platform.

Requirement

JuiceFS is driven by a combination of storage and database, so you need to prepare:

1. Cloud Server

The cloud server on DigitalOcean is called Droplet. You don’t need to purchase a new Droplet separately to use JuiceFS. If you already have a Droplet in use, which cloud server needs JuiceFS storage, just install the JuiceFS client on it.

Hardware

JuiceFS has no special requirements for hardware, and Droplets of any specification can be used stably. However, it is recommended to choose a better-performing SSD and reserve at least 1GB of capacity for JuiceFS as a local cache.

Operating System

JuiceFS supports Linux, BSD, macOS and Windows. In this article, we use Ubuntu Server 20.04.

2. Object Storage

JuiceFS uses object storage to store all data. Using Spaces on DigitalOcean is the easiest solution. Spaces is an S3-compatible object storage service that works out of the box. It is recommended to select the same area as the Droplet when creating it, so that you can get the best access speed and avoid additional traffic charges.

Of course, you can also use object storage services on other platforms, or use Ceph or MinIO to build manually on Droplet. In short, you are free to choose the object storage you want to use, as long as you make sure that the JuiceFS client can access the object storage API.

Here, I created a Space named juicefs, the region is Singapore sgp1, and its access address is:

In addition, you need to create Spaces access keys in the API menu, and JuiceFS needs to use it to access the Spaces API.

3. Database

Unlike the local file system, JuiceFS stores all the metadata corresponding to the data in an independent database, so that the larger the size of the stored data, the better the performance.

Currently, JuiceFS supports common databases such as Redis, TiKV, MySQL/MariaDB, PostgreSQL, and SQLite, and it is also continuing to develop support for other databases. If the database you need is not yet supported, please submit Issuse feedback.

In terms of performance, scale, and reliability, each database has its own advantages and disadvantages, and you should choose according to actual scenarios.

Please don’t worry about the choice of database. The JuiceFS client supports metadata migration. You can easily export metadata from one database and migrate it to other databases.

In this article, we use DigitalOcean’s Redis 6 database managed service, select the region Singapore, and select the same VPC private network as the existing Droplet. It takes about 5 minutes to create a Redis cluster. We follow the setup wizard to initialize the database cluster.

By default, the Redis cluster allows all inbound connections. For security reasons, you should select the Droplet that have access to the Redis cluster in the security setting section of the setup wizard in the Add trusted sources, that is, only allow the selected host to access the Redis cluster .

In the setting of the eviction policy, it is recommended to select noeviction, that is, when the memory is exhausted, only errors are reported and no data is evictioned.

Note: In order to ensure the safety and integrity of metadata, please do not select allkeys-lru and allkey-random for the eviction policy.

The access address of the Redis cluster can be found in the Connection Details of the console. If all computing resources are in DigitalOcean, it is recommended to use the VPC private network for connection first, which can maximize security.

Installation and Use

1. Install JuiceFS client

I am currently using Ubuntu Server 20.04, execute the following commands in sequence to install the latest version of the client.

Check current system and set temporary environment variables:

$ JFS_LATEST_TAG=$(curl -s https://api.github.com/repos/juicedata/juicefs/releases/latest | grep 'tag_name' | cut -d '"' -f 4 | tr -d 'v')

Download the latest version of the client software package adapted to the current system:

$ wget "https://github.com/juicedata/juicefs/releases/download/v${JFS_LATEST_TAG}/juicefs-${JFS_LATEST_TAG}-linux-amd64.tar.gz"

Unzip the installation package:

$ mkdir juice && tar -zxvf "juicefs-${JFS_LATEST_TAG}-linux-amd64.tar.gz" -C juice

Install the client to /usr/local/bin:

$ sudo install juice/juicefs /usr/local/bin

Execute the command and see the command help information returned to juicefs, which means that the client is installed successfully.

$ juicefs
NAME:
juicefs - A POSIX file system built on Redis and object storage.
USAGE:
juicefs [global options] command [command options] [arguments...]
VERSION:
0.17.0 (2021-09-24T04:17:26Z e115dc4)
COMMANDS:
format format a volume
mount mount a volume
umount unmount a volume
gateway S3-compatible gateway
sync sync between two storage
rmr remove directories recursively
info show internal information for paths or inodes
bench run benchmark to read/write/stat big/small files
gc collect any leaked objects
fsck Check consistency of file system
profile analyze access log
stats show runtime statistics
status show status of JuiceFS
warmup build cache for target directories/files
dump dump metadata into a JSON file
load load metadata from a previously dumped JSON file
help, h Shows a list of commands or help for one command
GLOBAL OPTIONS:
--verbose, --debug, -v enable debug log (default: false)
--quiet, -q only warning and errors (default: false)
--trace enable trace log (default: false)
--no-agent Disable pprof (:6060) and gops (:6070) agent (default: false)
--help, -h show help (default: false)
--version, -V print only the version (default: false)
COPYRIGHT:
AGPLv3

In addition, you can also visit the JuiceFS GitHub Releases page to select other versions for manual installation.

2. Create a file system

To create a file system, use the format subcommand, the format is:

juicefs format [command options] META-URL NAME

The following command creates a file system named mystor:

juicefs format \
--storage space \
--bucket https://juicefs.sgp1.digitaloceanspaces.com \
--access-key <your-access-key-id> \
--secret-key <your-access-key-secret> \
rediss://default:your-password@private-db-redis-sgp1-03138-do-user-2500071-0.b.db.ondigitalocean.com:25061/1 \
mystor

Parameter Description:

  • --storage: Specify the data storage engine, here is space, click here to view all supported storage.
  • --bucket: Specify the bucket access address.
  • --access-key and --secret-key: Specify the secret key for accessing the object storage API.
  • The Redis cluster managed by DigitalOcean needs to be accessed with TLS/SSL encryption, so it needs to use the rediss:// protocol header. The /1 added at the end of the link represents the use of Redis's No. 1 database.

If you see output similar to the following, it means that the file system is created successfully.

2021/08/23 16:36:28.450686 juicefs[2869028] <INFO>: Meta address: rediss://default@private-db-redis-sgp1-03138-do-user-2500071-0.b.db.ondigitalocean.com:25061/1
2021/08/23 16:36:28.481251 juicefs[2869028] <WARNING>: AOF is not enabled, you may lose data if Redis is not shutdown properly.
2021/08/23 16:36:28.481763 juicefs[2869028] <INFO>: Ping redis: 331.706µs
2021/08/23 16:36:28.482266 juicefs[2869028] <INFO>: Data uses space://juicefs/mystor/
2021/08/23 16:36:28.534677 juicefs[2869028] <INFO>: Volume is formatted as {Name:mystor UUID:6b0452fc-0502-404c-b163-c9ab577ec766 Storage:space Bucket:https://juicefs.sgp1.digitaloceanspaces.com AccessKey:7G7WQBY2QUCBQC5H2DGK SecretKey:removed BlockSize:4096 Compression:none Shards:0 Partitions:0 Capacity:0 Inodes:0 EncryptKey:}

3. Mount a file system

To mount a file system, use the mount subcommand, and use the -d parameter to mount it as a daemon. The following command mounts the newly created file system to the mnt directory under the current directory:

$ sudo juicefs mount -d \
rediss://default:your-password@private-db-redis-sgp1-03138-do-user-2500071-0.b.db.ondigitalocean.com:25061/1 mnt

The purpose of using sudo to perform the mount operation is to allow juicefs to have the authority to create a cache directory under /var/. Please note that when mounting the file system, you only need to specify the database address and the mount point, not the name of the file system.

If you see output similar to the following, it means that the file system is mounted successfully.

2021/08/23 16:39:14.202151 juicefs[2869081] <INFO>: Meta address: rediss://default@private-db-redis-sgp1-03138-do-user-2500071-0.b.db.ondigitalocean.com:25061/1
2021/08/23 16:39:14.234925 juicefs[2869081] <WARNING>: AOF is not enabled, you may lose data if Redis is not shutdown properly.
2021/08/23 16:39:14.235536 juicefs[2869081] <INFO>: Ping redis: 446.247µs
2021/08/23 16:39:14.236231 juicefs[2869081] <INFO>: Data use space://juicefs/mystor/
2021/08/23 16:39:14.236540 juicefs[2869081] <INFO>: Disk cache (/var/jfsCache/6b0452fc-0502-404c-b163-c9ab577ec766/): capacity (1024 MB), free ratio (10%), max pending pages (15)
2021/08/23 16:39:14.738416 juicefs[2869081] <INFO>: OK, mystor is ready at mnt

Use the df command to see the mounting status of the file system:

$ df -Th
File system type capacity used usable used% mount point
JuiceFS:mystor fuse.juicefs 1.0P 64K 1.0P 1% /home/herald/mnt

As you can see from the output information of the mount command, JuiceFS defaults to sets 1024 MB as the local cache. Setting a larger cache can make JuiceFS have better performance. You can set the cache (in MiB) through the --cache-size option when mounting a file system. For example, set a 20GB local cache:

$ sudo juicefs mount -d --cache-size 20000 \
rediss://default:your-password@private-db-redis-sgp1-03138-do-user-2500071-0.b.db.ondigitalocean.com:25061/1 mnt

After the file system is mounted, you can store data in the ~/mnt directory just like using a local hard disk.

4. File system status

Use the status subcommand to view the basic information and connection status of a file system. You only need to specify the database URL.

$ juicefs status rediss://default:bn8l7ui2cun4iaji@private-db-redis-sgp1-03138-do-user-2500071-0.b.db.ondigitalocean.com:25061/1
2021/08/23 16:48:48.567046 juicefs[2869156] <INFO>: Meta address: rediss://default@private-db-redis-sgp1-03138-do-user-2500071-0.b.db.ondigitalocean.com:25061/1
2021/08/23 16:48:48.597513 juicefs[2869156] <WARNING>: AOF is not enabled, you may lose data if Redis is not shutdown properly.
2021/08/23 16:48:48.598193 juicefs[2869156] <INFO>: Ping redis: 491.003µs
{
"Setting": {
"Name": "mystor",
"UUID": "6b0452fc-0502-404c-b163-c9ab577ec766",
"Storage": "space",
"Bucket": "https://juicefs.sgp1.digitaloceanspaces.com",
"AccessKey": "7G7WQBY2QUCBQC5H2DGK",
"SecretKey": "removed",
"BlockSize": 4096,
"Compression": "none",
"Shards": 0,
"Partitions": 0,
"Capacity": 0,
"Inodes": 0
},
"Sessions": [
{
"Sid": 1,
"Heartbeat": "2021-08-23T16:46:14+08:00",
"Version": "0.16.2 (2021-08-25T04:01:15Z 29d6fee)",
"Hostname": "ubuntu-s-1vcpu-1gb-sgp1-01",
"MountPoint": "/home/herald/mnt",
"ProcessID": 2869091
},
{
"Sid": 2,
"Heartbeat": "2021-08-23T16:47:59+08:00",
"Version": "0.16.2 (2021-08-25T04:01:15Z 29d6fee)",
"Hostname": "ubuntu-s-1vcpu-1gb-sgp1-01",
"MountPoint": "/home/herald/mnt",
"ProcessID": 2869146
}
]
}

5. Unmount a file system

Use the umount subcommand to unmount a file system, for example:

$ sudo juicefs umount ~/mnt

Note: Force unmount the file system in use may cause data damage or loss, please be careful to operate.

6. Auto-mount on boot

If you don’t want to re-mount JuiceFS storage manually every time you reboot your system, you can set up an automatic mount.

First, you need to rename the juicefs client to mount.juicefs and copy it to the /sbin/ directory.

$ sudo cp /usr/local/bin/juicefs /sbin/mount.juicefs

Edit the /etc/fstab configuration file and add a new record:

rediss://default:bn8l7ui2cun4iaji@private-db-redis-sgp1-03138-do-user-2500071-0.b.db.ondigitalocean.com:25061/1    /home/herald/mnt       juicefs     _netdev,cache-size=20480     0  0

In the mount option, cache-size=20480 means to allocate 20GiB of local disk space as the local cache of JuiceFS. Please decide the allocated cache size according to the actual hardware. You can adjust the FUSE mount options in the above configuration according to your needs.

7. Multi-host shared

The JuiceFS file system supports being mounted by multiple cloud servers at the same time, and there is no requirement for the geographic location of the cloud server. It can easily realize the real-time data of servers between the same platform, between cross-cloud platforms, and between public and private clouds. shared.

Not only that, the shared mount of JuiceFS can also provide strong data consistency guarantee. When multiple servers mount the same file system, the writes confirmed on the file system will be visible in real time on all hosts.

To use the shared mount, it is important to ensure that the database and object storage services that make up the file system can be accessed by each host to mount it. In the demonstration environment of this article, the Spaces object storage is open to the entire Internet, and it can be read and written through the API as long as the correct access key is used. But for the Redis database cluster managed by DigitalOcean, you need to configure the access strategy reasonably to ensure that the hosts outside the platform have access permissions.

When you mount the same file system on multiple hosts, first create a file system on any host, then install the JuiceFS client on every hosts, and use the same database address to mount it with the mount command. Pay special attention to the fact that the file system only needs to be created once, and there should be no need to repeat file system creation operations on other hosts.

Summary

This article introduces the basic of installing and using JuiceFS on DigitalOcean, using Spaces object storage and the platform-managed Redis database cluster to create and mount a file system.

If you are interested, you can also try to create file systems using object storage and cloud databases on different platforms. In addition, if you are worried about the reliability of Redis, you can also try databases such as MySQL, TiKV, and PostgreSQL. Different databases will give you completely different Experience.

Open Source Contribution Guide

JuiceFS is an open source project, and it can’t be developed without the support of everyone. An article, a page of documentation, an idea, a suggestion, a report or a bug fix, they are the driving force behind the development of the open source project.

Things you can do for the community:

We invite everyone who loves open source to join our community and let’s make JuiceFS better together!

--

--

JuiceFS

JuiceFS(https://github.com/juicedata/juicefs) is a distributed POSIX file system built on top of Redis and S3.