How does Google Zanzibar language work?

Jul 22, 2024

There are a lot of posts and articles, which talk about the Google authorization system, Zanzibar. However, I couldn’t find any detail explanation of how the permissions are resolved using the Zanibar domain-specific language; paper is not very clear about it. The following is based on the paper, Zanzibar: Google’s Wide-Area Data Replication System.

ReBAC

Zanzibar is an implementation of Relationship-based access control (ReBAC). Basically, you represent permissions in a form of graph. Vertices are users/groups and hierarchical objects. Labeled edges represent relations between vertices. For example, alice and tom are users, and a group, friends, is used by Alice to give access to her books.

(alice:User)-[:member]->(friends:Group)

If Alice has a collection of books, books, with two book in it, we could represent it as two vertices and an edge,

(book1:Object)-[:member]->(books:Object)
(book2:Object)-[:member]->(books:Object)

If Alice has a collection of books, books, with two books in it, we could represent it as two vertices and an edge,

Now, we can give alice permission to write book1 with a write edge,

(alice:User)-[:write]->(book1:Object)

If we want to check whether Alice can write to book1, we look for a path between alice User and book1 with the write edge.

(alice:User)->...-[:write]->...->(book1:Object)

Alice can grant read permissions to all books in books to anyone in the group, friends, by adding a read edge between books and friends.

(friends:Group)-[:read]->(books:Object)

If, tom is a member of friends group, i.e. (tom:User)-[:member]->(friends:Group), then tom can read book1 because there is a path between tom and book1 via the group, fiends, with an edge read.

(tom:User) -[:member]-> (friends:Group) -[:read]-> (books:Object) -[:member]-> (book1:Object)

Using a graph and path search is one way to express ReBAC. A brute force implementation of ReBAC is to use a graph, where all the entities in your systems have a corresponding vertex, and edges define relations between the entities; one type of such relations are permissions.

The problem with this approach is that graphs are very general. Representing an authorization model in a form of graph can be complicated; one would have to define the type of vertices and edges as well as queries, which would capture a desired permission model for your application.

Zanzibar language

Zanzibar solves this problem with a domain-specific language and resolving applies rules of the permission model at the time a permission is checked.

Let’s start with the language definition as described in the paper.

<tuple> ::= <object>'#'<relation>‘@'<user>
<object> ::= <namespace>':'<object id>
<user> ::= <user id> | <userset>
<userset> ::= <object>'#'<relation>

where <namespace> and <relation> are predefined in client configurations, <object id> and <user id> are strings. The primary keys required to identify a relation tuple are <namespace>, <object id>, <relation>, and <user>. In our example, Alice is a user with alice being her <user_id>.

How are permissions stored?

Zanzibar stores permissions as tuples; think of a table with 3 columns, object, relation, user. Each row represents a tuple, which captures permissions as well as relations between objects and users.

How are permissions checked?

To check whether a user, alice, has permission to read an object, doc:readme, we look for a tuple in the database.

CHECK(doc:readme # read @ alice)

is equivalent to checking whether the tuple, doc:readme#read#alice, is in the database.

Zanzibar allows you to rewrite a tuple you are checking for. In such cases, a tuple can be transformed into a set of tuples, which can be represented as unions and intersections of subsets. For example, doc:readme#read# alice can be transform into a set

doc:readme#reader@alice U doc:readme#writer@alice

which is a union of a singleton, doc:readme#reader@alice and a singleton doc:readme#writer@alice

When checking for permissions a set is transformed to a logical expression replacing union with OR, and an intersection with AND.

So, check for doc:readme#reader@alice U ns#writer@alice would be a check whether there is either a tuple, doc:readme#reader@user1 or a tuple, doc:readme#writer@alice in the database.

Tuple Rewrites

To reduce the number of edges and vertices, Zanzibar allows you to rewrite a tuple you are checking for using a set of rules defined in a configuration for the desired permissioning model.

In Zanzibar, it is the <userset> that is rewritten. Why userset and not the whole tuple? When checking permission, a user is well defined by the request and Zanzibar is not designed to support impersonations. Also note that a tuple can be expressed in terms of <userset> and <user id>; by definition a tuple can be either <userset>@<user id> or <userset>@<userset>. If <user id> is fixed, only <userset>’s are rewritten.

For example, the following rewrite rule applies to any tuple with a relation, read, and rewrites it to a tuple with a relationship reader. Note that the missing fields in userset_rewrite are filled in from the context, which is the tuple that was matched.

relation {
  name: "read"
  userset_rewrite {
    tupleset {
      relation: "reader"
    }
  }
}

So, doc1#reader@alice is rewritten as <object>#read@<user> and completed from the context, i.e. is doc1 and <user> is alice.

doc:readme#reader@alice

RBAC

First, define roles, which are plain relations

relation {
  name: reader
}

relation {
  name: writer
}

relation {
  name: admin
}

Permissions are granted to a user by adding a tuple between the object and user with a desired role, e.g. ADD: doc:readme#writer@alice

Then we define actions, which you can check against, read, write, and manage. These actions are resolved to roles via a rewrite.

First, only the admin role can manage the object. Hence the rewrite of manage action swaps it to the admin role.

relation {
  name: manage
  userset_rewrite {
    tupleset {
      relation: "admin"
    }
  }
}

At the check time, doc:readme#read@user1 is rewritten to doc:readme#reader@user1.

The write action can be performed by admin and writer roles. Rewrite creates a union of two role relations as follows.

relation {
  name: write
  userset_rewrite {
    union {
      tupleset {
        relation: "admin"
      }
      tupleset {
        relation: "writer"
      }
    }
  }
}

At the check time, doc:readme#write@alice is rewritten to doc:readme#admin@alice ∪ doc:readme#writer@alice. The permission engine will check a logic statement: there is doc:readme#reader@alice tuple or there is doc:readme#writer@alice tuple. For alice to write in doc:readme, she must have either the admin or writer roles. If either of these tuples is in the database, CHECK( doc:readme#write@alice) resolves to True.

Finally, read action can be performed by all 3 roles. At the check time the rewrite is

relation {
  name: read
  userset_rewrite {
    union {
      child {
        tupleset {
          relation: "admin"
        }
      }
      child {
        tupleset {
          relation: "writer"
        }
      }
      child {
        tupleset {
          relation: "reader"
        }
      }
    }
  }
}

At the check time, doc:readme#read@alice is rewritten to doc:readme#admin@alice ∪ doc:readme#writer@alice ∪ doc:readme#reader@alice. The permission engine will check a logic statement: there is doc:readme#reader@alice tuple or there is doc:readme#writer@alice tuple, or there is doc:readme#writer@alice tuple. For alice to read the document doc:readme, she must have either the admin, writer, or reader roles. If either of these tuples is in the database, CHECK( doc:readme#read@alice) resolves to True.

Object hierarchy

Permissions in the hierarchy of objects can be formulated using the tuple_to_userset and computed_userset rewrite rules and recursiveness of expression rewrite.

The hierarchy of objects can be captured with a tuple and designated relation, e.g. parent. For example, folder:A is a parent of doc:readme can be expressed with the tuple, doc:readme#parent@folder:A#_. Notice that in this tuple, <user> is not <user id> but <userset>, which has a form <object#relation>. We leave relations blank as indicated by _.

Object hierarchy can be nested. For example, folder:B can be a parent of folder:A.

folderA#parent@folder:B#_

computed_userset

As described in the paper computed_userset computes a new tuple for a given input and then is replaced with the union of the computed tuples. Any element of the tuple that is not specified in computed_userset is taken from the context. For example, suppose an input tuple is obj#rel@xyz#abc, where <userset> is xyz#abc. And suppose the doc:readme#read@alice tuple we are checking for, is a context. The following rule

computed_userset {
  object: $TUPLE_USERSET_OBJECT # parent folder
  relation: "reader"
}

resolves $TUPLE_USERSET_OBJECT from the input tuple to xyz. Relation is specified as reader. User is missing and is taken from the context, alice. The end result is xyz#reader@alice.

tuple_to_userset

A tuple_to_userset rule looks up tuples in the database for a given object and relation. A new tuple is computed for each fetched tuple as an input to the computed_userset rule. All computed tuples are concatenated as a union in place of tuple_to_userset.

For example, given doc:readme#reader@alice as a context, the following rule

tuple_to_userset {
  # read tuples
  tupleset { relation: "parent" }
  # generate tuples
  computed_userset {
    object: $TUPLE_USERSET_OBJECT
    relation: "reader"
  }
}

First, we read tuples for a userset ( remember <userset>::=<object>'#'<relation>) defined by tupleset { relation: "parent" }. Since <object> is missing we are using the object from the context, doc:readme. Query for tuples starting with doc:readme#parent will return doc:readme#parent@folder:A#_, which is used as an input to the computed_userset rule. The expression $TUPLE_USERSET_OBJECT will evaluate to folder:A, i.e. object in the userset in the user element of a tuple. Relation is specified as reader. User is filled in from the context, alice. The rewritten tuple is folder:A#reader@alice.

The engine will take multiple passes over the expression until there are no more rules to apply. Only then a tuple set expression is converted to a logical expression and is evaluated using tuples in the database.

Recursive hierarchy

To capture parent relations we will take advantage of multiple rewrites. Let’s define a rewrite role, recursive_reader, such that:

relation {
  name: "recursive_reader"
  userset_rewrite {
    union {
      child {
        tuple_to_userset {
          tupleset { relation: "parent" }
          computed_userset {
            object: $TUPLE_USERSET_OBJECT
            relation: "reader"
          }
        }
      }
      child {
        tuple_to_userset {
          tupleset { relation: "parent" }
          computed_userset {
            object: $TUPLE_USERSET_OBJECT
            relation: "recursive_reader"
          }
        }
      }
    }
  }
}

And we add recursive_reader to read rewrite.

relation {
  name: read
  userset_rewrite {
    union {
      child {
        tupleset {
          relation: "reader"
        }
      }
      child {
        tupleset {
          relation: "recursive_reader"
        }
      }
    }
  }
}

Let’s give alice a reader role to folder:A by adding a tuple, folder:A#parent@alice.

In the database we have the following tuples:

doc:readme#parent@folder:A#_
folder:A#reader@alice

When checking for read permission for alice on doc:readme, the tuple is rewritten as follows:

CHECK(`doc:readme#read@alice`)

By read relation rewrite we get

=> `doc:readme#reader@alice U doc:readme#recursive_reader@alice`

By recursive_reader relation rewrite we get folder:A#reader@alice from the first tuple_to_userset and folder:A#recursive_reader@alice from the second tuple_to_userset.

=> doc:readme#reader@alice U doc:readme#reader@alice U folder:A#recursive_reader@alice

On the second pass, we need to resolve folder:A#recursive_reader@alice. Since there is no tuples starting with folder:A#parent prefix, both tuple_to_userset resolve to empty set, and thus, folder:A#recursive_reader@alice comes an empty set. If we had a deeply nested folder structure, this recursion would keep on going until the root folder is reached.

Final expression is,

=> doc:readme#reader@alice U doc:readme#reader@alice U folder:A#recursive_reader@alice

Hierarchy between users and groups work the same way.

Summary

To summarize, Zanzibar has a domain-specific language, which basically is a graph query language allowing for graph traversal. One could come up with the equivalent Cypher queries to replicate Zanzibar data model using a graph database.

Zanzibar language looks quite complicated and verbose. This is not a surprise given that Zanzibar is most likely using protos for the model configuration, which is a common practice at Google. I wonder what the Zanzibar model would look like in Cypher.

ReBAC implementation using a graph database would have hard time matching the performance of Zanzibar, which has advantage for a few reasons:

Storage is simple and efficient. i.e. a single table with 3 columns. A graph database would store vertices separate from edges.
Horizontal scaling is relatively straight forward. Shard a single table. Split can be performed by namespace and object id of the first element of a tuple.
Zanzibar can precompute some relations to speed up the lookup.
Finally, Zanzibar allows for capturing an object/sub-object relation using purely rewrite rules without having to create entities and relations for sub-objects. This significantly reduces the number of vertices and edges in the permission graph.