Quobyte Introduces Fast File Metadata Queries

Quobyte
3 min readMay 21, 2024

--

At Quobyte, we focus on making it easy to run very large storage clusters with high performance. When you have 100s of petabytes and billions of files, understanding your data is crucial. That’s why we decided to complement Quobyte’s policy engine and analytics functionality with a distributed high-performance File Query Engine.

Users and administrators can use the query engine to query file system metadata for a range of use-cases:

  • AI/ML training can take advantage of the ability to query user-defined metadata (extended attributes and S3 custom metadata). Labeling the files with the data directly instead of working with the hard-to-manage tiny “metadata files.”
  • Admins can quickly answer questions like “which cold files consume most of the space?” or “where are all files owned by user Joe?” or “delete all files on scratch that are older than 30 days.”
  • Replace slow file system tree walks (“find“), which can take hours or days on large volumes.

How the Quobyte File Query Engine Works

Quobyte stores metadata in a distributed and replicated key-value store. The File Query Engine takes advantage of this fast metadata storage. Unlike most other products, Quobyte’s File Query Engine doesn’t need an additional database layer. Not only is Quobyte’s File Query Engine faster this way and never operates on outdated data, but it also significantly saves resources. There is no need for a copy of file metadata that must be kept in sync, nor does it require another database system.

The other advantage is that the File Query Engine leverages the distributed nature of the metadata store in Quobyte. Queries are executed in parallel across all metadata servers for fast scans across the entire cluster or select volumes.

The File Query Engine streams results back to the application (qmgmt or via the API) and supports very large result sets with billions of files. It automatically adjusts to the speed of the consumer.

The File Query Engine is part of Quobyte release 3.22 and is automatically available without any configuration. As described above, the File Query Engine does not require any additional resources (except when processing queries, obviously)

How to use the Quobyte File Query Engine

The most convenient option is to run file metadata queries with the command line tool “qmgmt”. Output can be generated in CSV format or JSON. The CSV format is ideal when piping the results to another application for processing.

The following query will find all jpeg or jpg files that have been modified in the last 10 minutes:
qmgmt query files --list-columns=volume_uuid_path,size "name~=.*(jpeg|jpg) AND mtime_age<10min"

When you have user-defined metadata stored in extended attributes (xattrs) you can query all files with matching xattrs. Quobyte supports dynamically typed xattrs, i.e. the query engine automatically converts them to the correct type. This example returns all files that have a user-defined attribute for “origin” set to FR (France) and have a width larger or equal to 1024:
qmgmt query files xattr.orgin="FR" AND xattr.width>=1024

Via the Quobyte API, queries can be initiated with queryFiles(“query string”), which returns a unique queryId. Results can be fetched repeatedly with getQueryProgress(queryId), which returns results and progress information.

Originally posted on Quobyte’s blog on May 14, 2024.

--

--

Quobyte

Quobyte empowers customers by providing real software storage so that they can keep up with the ever-increasing amounts of data in today’s data-driven world.