Getting started with JuiceFS using TiKV database

JuiceFS
8 min readAug 19, 2021

--

As a cloud-native distributed storage system, JuiceFS was designed into a plug-in structure at the beginning of its birth to ensure that new technologies can be continuously integrated into the JuiceFS ecosystem. Users can flexibly choose the two core components of data storage engine and metadata engine according to their needs.

The data storage engine of JuiceFS is mainly object storage, and supports almost all public and private cloud object storage services. It also supports KV storage, WebDAV, and local disks. The metadata engine supports databases such as Redis, MySQL, PostgreSQL, and SQLite.

The newly released JuiceFS v0.16 officially supports TiKV key-value databases, which further meets the requirements for elastic scaling in high-performance, large-scale data storage.

This article will share with you how JuiceFS uses TiKV as a metadata engine.

TiKV

TiKV is a distributed transactional key-value database with high scalability, low latency, and ease of use. It has excellent performance and supports big data processing capabilities of petabytes of trillion rows of data.

In terms of design, TiKV supports unlimited horizontal expansion. Provides a distributed transaction interface that meets ACID constraints. Adopt Raft Protocol to ensure data consistency and high availability of multiple copies.

TiKV was developed by PingCAP and is one of the projects incubated by the Cloud Native Foundation (CNCF).

Install TiKV

PingCAP provides TiUP package manager, which can easily install TiKV and other products on Linux or macOS.

1. Install TiUP

$ curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh

This command will automatically detect the current system environment, download and install the appropriate version, add the program path of TiUP to the executable path of the terminal.

To make the settings take effect, you can open a new terminal. Or manually execute the command to take effect, for example, use bash:

$ source .bash_profile

Now try to execute the command tiup -v and see the version information similar to the following, which means the installation is successful.

$ tiup -v 
1.5.4 tiup
Go Version: go1.16.6
Git Ref: v1.5.4
GitHash: b629670276269cd1518eb28f362a5180135cc985

2. Deploy TiKV cluster

This article uses the playground component provided by TiUP to install a minimal TiKV cluster for testing purposes in the local environment.

$ tiup playground --mode tikv-slim

After the deployment is successful, the terminal will display a message similar to the following:

PD client endpoints: [127.0.0.1:2379]
To view the Prometheus: http://127.0.0.1:9090
To view the Grafana: http://127.0.0.1:3000

Among them, 127.0.0.1:2379 is the Placement Driver (PD) address, which is the management node of the TiKV cluster. JuiceFS will interact with TiKV through this address. The other two addresses are Prometheus and Grafana services, which are used for monitoring and data visualization of TiKV clusters.

Note: The playground component of TiUP is mainly used to quickly build a minimal test cluster of TiDB and TiKV in the local environment. For production environment deployment, please refer to TiKV Official Document.

Install JuiceFS

JuiceFS supports Linux, Windows and macOS systems at the same time. You only need to download the corresponding version of the client program and place it in the executable path of the system. For example, I am currently using a Linux distribution, and I can install the latest version of the client by executing the following commands in sequence.

Check current system information and set temporary environment variables:

$ JFS_LATEST_TAG=$(curl -s https://api.github.com/repos/juicedata/juicefs/releases/latest | grep 'tag_name' | cut -d '"' -f 4 | tr -d 'v')

Download the latest version of JuiceFS client installation package adapted to the current system:

$ wget "https://github.com/juicedata/juicefs/releases/download/v${JFS_LATEST_TAG}/juicefs-${JFS_LATEST_TAG}-linux-amd64.tar.gz"

Unzip the package:

$ mkdir juice && tar -zxvf "juicefs-${JFS_LATEST_TAG}-linux-amd64.tar.gz" -C juice

Install JuiceFS client to /usr/local/bin:

$ sudo install juice/juicefs /usr/local/bin

Execute the command and see the help information returned, which means that the client is installed successfully.

$ juicefs

NAME:
juicefs - A POSIX file system built on Redis and object storage.

USAGE:
juicefs [global options] command [command options] [arguments...]

VERSION:
0.16.1 (2021-08-16 2edcfc0)

COMMANDS:
format format a volume
mount mount a volume
umount unmount a volume
gateway S3-compatible gateway
sync sync between two storage
rmr remove directories recursively
info show internal information for paths or inodes
bench run benchmark to read/write/stat big/small files
gc collect any leaked objects
fsck Check consistency of file system
profile analyze access log
stats show runtime stats
status show status of JuiceFS
warmup build cache for target directories/files
dump dump metadata into a JSON file
load load metadata from a previously dumped JSON file
help, h Shows a list of commands or help for one command

GLOBAL OPTIONS:
--verbose, --debug, -v enable debug log (default: false)
--quiet, -q only warning and errors (default: false)
--trace enable trace log (default: false)
--no-agent Disable pprof (:6060) and gops (:6070) agent (default: false)
--help, -h show help (default: false)
--version, -V print only the version (default: false)

COPYRIGHT:
AGPLv3

In addition, you can also visit the JuiceFS GitHub Releases page to select other versions for manual installation.

Usage

Here I refer to JuiceFS Quick Start Guide, a MinIO object storage is built locally, and the access address is http://127.0.0.1:9000, Access Key ID and Access Key Secret are all minioadmin.

1. Create a file system

The following command uses the format subcommand provided by the JuiceFS client to create a file system named mystor, where TiKV database address format Refer to the official document setting, that is, use the PD address of the TiKV cluster:

$ juicefs format \
--storage minio \
--bucket http://127.0.0.1:9000/mystor \
--access-key minioadmin \
--secret-key minioadmin \
tikv://127.0.0.1:2379/mystor \
mystor

Parameter Description:

  • --storage: Specify the data storage engine, here is minio.
  • --bucket: Specify the bucket access URL. The bucket named mystor I created in advance on MinIO.
  • --access-key and --secret-key: Specify the secret key for accessing the object storage service API.
  • When using TiKV to store metadata, set the PD address of the cluster. When multiple file systems or applications share the same TiKV, it is recommended to add an optional prefix, where the prefix mystor is specified in the PD address.

If you see output similar to the following, it means that the file system was created successfully:

2021/08/12 23:28:36.932241 juicefs[101222] <INFO>: Meta address: tikv://127.0.0.1:2379/mystor
[2021/08/12 23:28:36.932 +08:00] [INFO] [client.go:214] ["[pd] create pd client with endpoints"] [pd-address="[127.0.0.1:2379]"]
[2021/08/12 23:28:36.935 +08:00] [INFO] [base_client.go:346] ["[pd] switch leader"] [new-leader=http://127.0.0.1:2379] [old-leader=]
[2021/08/12 23:28:36.935 +08:00] [INFO] [base_client.go:126] ["[pd] init cluster id"] [cluster-id=6995548759432331426]
[2021/08/12 23:28:36.935 +08:00] [INFO] [client.go:238] ["[pd] create tso dispatcher"] [dc-location=global]
2021/08/12 23:28:36.936892 juicefs[101222] <INFO>: Data uses minio://127.0.0.1:9000/mystor/mystor/
2021/08/12 23:28:36.976722 juicefs[101222] <INFO>: Volume is formatted as {Name:mystor UUID:0c9594a8-fe2c-463c-a4b6-eb815f38c843 Storage:minio Bucket:http://127.0.0.1:9000/mystor AccessKey:minioadmin SecretKey:removed BlockSize:4096 Compression:none Shards:0 Partitions:0 Capacity:0 Inodes:0 EncryptKey:}

2. Mount the file system

Use the mount subcommand to mount the file system to the jfs directory under the current user's home directory:

$ sudo juicefs mount -d tikv://127.0.0.1:2379/mystor ~/jfs

The sudo command is used here to mount the file system as a super user. The purpose is to allow JuiceFS to normally establish and use the /var/jfsCache directory to cache data.

If you see output similar to the following, it means that the file system is mounted successfully:

2021/08/12 23:34:44.288136 juicefs[101873] <INFO>: Meta address: tikv://127.0.0.1:2379/mystor
[2021/08/12 23:34:44.288 +08:00] [INFO] [client.go:214] ["[pd] create pd client with endpoints"] [pd-address="[127.0.0.1:2379]"]
[2021/08/12 23:34:44.291 +08:00] [INFO] [base_client.go:346] ["[pd] switch leader"] [new-leader=http://127.0.0.1:2379] [old-leader=]
[2021/08/12 23:34:44.291 +08:00] [INFO] [base_client.go:126] ["[pd] init cluster id"] [cluster-id=6995548759432331426]
[2021/08/12 23:34:44.291 +08:00] [INFO] [client.go:238] ["[pd] create tso dispatcher"] [dc-location=global]
2021/08/12 23:34:44.296270 juicefs[101873] <INFO>: Data use minio://127.0.0.1:9000/mystor/mystor/
2021/08/12 23:34:44.296768 juicefs[101873] <INFO>: Disk cache (/var/jfsCache/0c9594a8-fe2c-463c-a4b6-eb815f38c843/): capacity (1024 MB), free ratio (10%), max pending pages (15)
2021/08/12 23:34:44.800551 juicefs[101873] <INFO>: OK, mystor is ready at /home/herald/jfs

Use the df command to see the mounting status of the file system:

$ df -Th
File system type capacity used usable used% mount point
JuiceFS:mystor fuse.juicefs 1.0P 64K 1.0P 1% /home/herald/jfs

After mounting, you can now store data in the ~/jfs directory just like using a local hard disk.

3. View file system information

The status subcommand of the JuiceFS client can view the basic information and connection status of a file system.

$ juicefs status tikv://127.0.0.1:2379/mystor
{
"Setting": {
"Name": "mystor",
"UUID": "9f50f373-a7ec-4d5b-b790-3defbf6d0509",
"Storage": "minio",
"Bucket": "http://127.0.0.1:9000/mystor",
"AccessKey": "minioadmin",
"SecretKey": "removed",
"BlockSize": 4096,
"Compression": "none",
"Shards": 0,
"Partitions": 0,
"Capacity": 0,
"Inodes": 0
},
"Sessions": [
{
"Sid": 2,
"Heartbeat": "2021-08-13T10:43:35+08:00",
"Version": "0.16-dev (2021-08-12 a871c3d)",
"Hostname": "herald-manjaro",
"MountPoint": "/home/herald/jfs",
"ProcessID": 6309
}
]
}

In the output information, you can learn more about the data storage engine used by a file system and the status of the host that currently mounts the file system.

In addition, v0.16 and above can also learn the detailed configuration of the file system by viewing the .config virtual file in the root directory of the mount point:

$ sudo cat ~/jfs/.config
{
"Meta": {
"Strict": true,
"Retries": 10,
"CaseInsensi": false,
"ReadOnly": false,
"OpenCache": 0,
"MountPoint": "jfs",
"Subdir": ""
},
"Format": {
"Name": "myabc",
"UUID": "e9d8373c-7ced-49d9-a033-75f6abb44854",
"Storage": "minio",
"Bucket": "http://127.0.0.1:9000/mystor",
"AccessKey": "minioadmin",
"SecretKey": "removed",
"BlockSize": 4096,
"Compression": "none",
"Shards": 0,
"Partitions": 0,
"Capacity": 0,
"Inodes": 0
},
"Chunk": {
"CacheDir": "/var/jfsCache/e9d8373c-7ced-49d9-a033-75f6abb44854",
"CacheMode": 384,
"CacheSize": 1024,
"FreeSpace": 0.1,
"AutoCreate": true,
"Compress": "none",
"MaxUpload": 20,
"Writeback": false,
"Partitions": 0,
"BlockSize": 4194304,
"GetTimeout": 60000000000,
"PutTimeout": 60000000000,
"CacheFullBlock": true,
"BufferSize": 314572800,
"Readahead": 0,
"Prefetch": 1
},
"Version": "0.16.1 (2021-08-16 2edcfc0)",
"Mountpoint": "jfs"
}

Notice: It is important to note that this article uses a local demonstration environment. If you need to share and mount the same JuiceFS file system on multiple hosts, you need to ensure that the deployed object storage and TiKV cluster can be accessed by all hosts.

4. Unmount the file system

You can use the umount subcommand to unmount the file system, for example:

$ sudo juicefs umount ~/jfs

Warning: Force unmount the file system in use may cause data damage or loss, please be careful to operate.

5. Mount at boot

If you don’t want to manually remount JFS every time reboot, you can set up auto mounting.

First, you need to rename the juicefs client to mount.juicefs and copy it to the /sbin/ directory:

$ sudo cp /usr/local/bin/juicefs /sbin/mount.juicefs

Edit the /etc/fstab configuration file and add a new record:

tikv://127.0.0.1:2379/mystor    /home/herald/jfs       juicefs     _netdev,cache-size=20480     0  0

In the mount option, cache-size=20480 means to allocate 20GB of local disk space as JuiceFS cache. Please decide the allocated cache size according to the actual hardware configuration. Generally speaking, to allocate more cache space for JuiceFS, you can get better performance.

You can adjust the FUSE mount options in the above configuration according to your needs.

Summarize

For JuiceFS, opening support for TiKV is a milestone. It fills in the difficulty of horizontal expansion when Redis is used as a metadata engine, and at the same time fills in the performance shortcomings of SQL databases such as MySQL and PostgreSQL, and provides users with a new choice when selecting metadata engines.

Open Source Contribution Guide

JuiceFS is an open source project under the AGPLv3, and its development is inseparable from everyone’s support. An article, a page of documentation, an idea, a suggestion, a report, or a bug fix, no matter how big or small the contribution is, it is the driving force to promote the progress of an open source project.

Things you can do for the community:

We sincerely invite everyone who loves open source to join our community, let us make JuiceFS better together!

--

--

JuiceFS

JuiceFS(https://github.com/juicedata/juicefs) is a distributed POSIX file system built on top of Redis and S3.