Monday, November 25, 2024

Posit AI Blog: safetensors 0.1.0

safetensors is a new, simple, fast, and safe file format for storing tensors. The design of the file format and its original implementation are being led
by Hugging Face, and it’s getting largely adopted in their popular ‘transformers’ framework. The safetensors R package is a pure-R implementation, allowing to both read and write safetensor files.

The initial version (0.1.0) of safetensors is now on CRAN.

Motivation

The main motivation for safetensors in the Python community is security. As noted
in the official documentation:

The main rationale for this crate is to remove the need to use pickle on PyTorch which is used by default.

Pickle is considered an unsafe format, as the action of loading a Pickle file can
trigger the execution of arbitrary code. This has never been a concern for torch
for R users, since the Pickle parser that is included in LibTorch only supports a subset
of the Pickle format, which doesn’t include executing code.

However, the file format has additional advantages over other commonly used formats, including:

  • Support for lazy loading: You can choose to read a subset of the tensors stored in the file.

  • Zero copy: Reading the file does not require more memory than the file itself.
    (Technically the current R implementation does makes a single copy, but that can
    be optimized out if we really need it at some point).

  • Simple: Implementing the file format is simple, and doesn’t require complex dependencies.
    This means that it’s a good format for exchanging tensors between ML frameworks and
    between different programming languages. For instance, you can write a safetensors file
    in R and load it in Python, and vice-versa.

There are additional advantages compared to other file formats common in this space, and
you can see a comparison table here.

Format

The safetensors format is described in the figure below. It’s basically a header file
containing some metadata, followed by raw tensor buffers.

Diagram describing the safetensors file format.

Basic usage

safetensors can be installed from CRAN using:

Nick Fewings on Unsplash

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don’t fall under this license and can be recognized by a note in their caption: “Figure from …”.

Citation

For attribution, please cite this work as

Falbel (2023, June 15). Posit AI Blog: safetensors 0.1.0. Retrieved from https://blogs.rstudio.com/tensorflow/posts/2023-06-15-safetensors/

BibTeX citation

@misc{safetensors,
  author = {Falbel, Daniel},
  title = {Posit AI Blog: safetensors 0.1.0},
  url = {https://blogs.rstudio.com/tensorflow/posts/2023-06-15-safetensors/},
  year = {2023}
}

Related Articles

Latest Articles