There are a lot of posts and articles, which talk about the Google authorization system, Zanzibar. However, I couldn’t find any detail explanation of how the permissions are resolved using the Zanibar domain-specific language; paper is not very clear about it. The following is based on the paper, Zanzibar: Google’s Wide-Area Data Replication System.
ReBAC
Zanzibar is an implementation of Relationship-based access control (ReBAC). Basically, you represent permissions in a form of graph. Vertices are users/groups and hierarchical objects. Labeled edges represent relations between vertices. For example, alice and tom are users, and a group, friends, is used by Alice to give access to her books.
(alice:User)-[:member]->(friends:Group)
If Alice has a collection of books, books, with two book in it, we could represent it as two vertices and an edge,
(book1:Object)-[:member]->(books:Object)
(book2:Object)-[:member]->(books:Object)
If Alice has a collection of books, books, with two books in it, we could represent it as two vertices and an edge,
Now, we can give alice permission to write book1 with a write edge,
(alice:User)-[:write]->(book1:Object)
If we want to check whether Alice can write to book1, we look for a path between alice User and book1 with the write edge.
(alice:User)->...-[:write]->...->(book1:Object)
Alice can grant read permissions to all books in books to anyone in the group, friends, by adding a read edge between books and friends.
(friends:Group)-[:read]->(books:Object)
If, tom is a member of friends group, i.e. (tom:User)-[:member]->(friends:Group), then tom can read book1 because there is a path between tom and book1 via the group, fiends, with an edge read.
(tom:User) -[:member]-> (friends:Group) -[:read]-> (books:Object) -[:member]-> (book1:Object)
Using a graph and path search is one way to express ReBAC. A brute force implementation of ReBAC is to use a graph, where all the entities in your systems have a corresponding vertex, and edges define relations between the entities; one type of such relations are permissions.
The problem with this approach is that graphs are very general. Representing an authorization model in a form of graph can be complicated; one would have to define the type of vertices and edges as well as queries, which would capture a desired permission model for your application.
Zanzibar language
Zanzibar solves this problem with a domain-specific language and resolving applies rules of the permission model at the time a permission is checked.
Let’s start with the language definition as described in the paper.
<tuple> ::= <object>'#'<relation>‘@'<user>
<object> ::= <namespace>':'<object id>
<user> ::= <user id> | <userset>
<userset> ::= <object>'#'<relation>
where <namespace> and <relation> are predefined in client configurations, <object id> and <user id> are strings. The primary keys required to identify a relation tuple are <namespace>, <object id>, <relation>, and <user>. In our example, Alice is a user with alice being her <user_id>.
How are permissions stored?
Zanzibar stores permissions as tuples; think of a table with 3 columns, object, relation, user. Each row represents a tuple, which captures permissions as well as relations between objects and users.
How are permissions checked?
- To check whether a user,
alice, has permission to read an object,doc:readme, we look for a tuple in the database.
CHECK(doc:readme # read @ alice)
is equivalent to checking whether the tuple, doc:readme#read#alice, is in the database.
Zanzibar allows you to rewrite a tuple you are checking for. In such cases, a tuple can be transformed into a set of tuples, which can be represented as unions and intersections of subsets. For example, doc:readme#read# alice can be transform into a set
doc:readme#reader@alice U doc:readme#writer@alice
which is a union of a singleton, doc:readme#reader@alice and a singleton doc:readme#writer@alice
When checking for permissions a set is transformed to a logical expression replacing union with OR, and an intersection with AND.
So, check for doc:readme#reader@alice U ns#writer@alice would be a check whether there is either a tuple, doc:readme#reader@user1 or a tuple, doc:readme#writer@alice in the database.
Tuple Rewrites
To reduce the number of edges and vertices, Zanzibar allows you to rewrite a tuple you are checking for using a set of rules defined in a configuration for the desired permissioning model.
In Zanzibar, it is the <userset> that is rewritten. Why userset and not the whole tuple? When checking permission, a user is well defined by the request and Zanzibar is not designed to support impersonations. Also note that a tuple can be expressed in terms of <userset> and <user id>; by definition a tuple can be either <userset>@<user id> or <userset>@<userset>. If <user id> is fixed, only <userset>’s are rewritten.
For example, the following rewrite rule applies to any tuple with a relation, read, and rewrites it to a tuple with a relationship reader. Note that the missing fields in userset_rewrite are filled in from the context, which is the tuple that was matched.
relation {
name: "read"
userset_rewrite {
tupleset {
relation: "reader"
}
}
}
So, doc1#reader@alice is rewritten as
<object>#read@<user> and completed from the context, i.e. is doc1 and <user> is alice.
doc:readme#reader@alice
RBAC
First, define roles, which are plain relations
relation {
name: reader
}
relation {
name: writer
}
relation {
name: admin
}
Permissions are granted to a user by adding a tuple between the object and user with a desired role, e.g.
ADD: doc:readme#writer@alice
Then we define actions, which you can check against, read, write, and manage. These actions are resolved to roles via a rewrite.
First, only the admin role can manage the object. Hence the rewrite of manage action swaps it to the admin role.
relation {
name: manage
userset_rewrite {
tupleset {
relation: "admin"
}
}
}
At the check time, doc:readme#read@user1 is rewritten to doc:readme#reader@user1.
The write action can be performed by admin and writer roles. Rewrite creates a union of two role relations as follows.
relation {
name: write
userset_rewrite {
union {
tupleset {
relation: "admin"
}
tupleset {
relation: "writer"
}
}
}
}
At the check time, doc:readme#write@alice is rewritten to doc:readme#admin@alice ∪ doc:readme#writer@alice. The permission engine will check a logic statement: there is doc:readme#reader@alice tuple or there is doc:readme#writer@alice tuple. For alice to write in doc:readme, she must have either the admin or writer roles. If either of these tuples is in the database, CHECK( doc:readme#write@alice) resolves to True.
Finally, read action can be performed by all 3 roles. At the check time the rewrite is
relation {
name: read
userset_rewrite {
union {
child {
tupleset {
relation: "admin"
}
}
child {
tupleset {
relation: "writer"
}
}
child {
tupleset {
relation: "reader"
}
}
}
}
}
At the check time, doc:readme#read@alice is rewritten to doc:readme#admin@alice ∪ doc:readme#writer@alice ∪ doc:readme#reader@alice. The permission engine will check a logic statement: there is doc:readme#reader@alice tuple or there is doc:readme#writer@alice tuple, or there is doc:readme#writer@alice tuple. For alice to read the document doc:readme, she must have either the admin, writer, or reader roles. If either of these tuples is in the database, CHECK( doc:readme#read@alice) resolves to True.
Object hierarchy
Permissions in the hierarchy of objects can be formulated using the tuple_to_userset and computed_userset rewrite rules and recursiveness of expression rewrite.
The hierarchy of objects can be captured with a tuple and designated relation, e.g. parent. For example, folder:A is a parent of doc:readme can be expressed with the tuple, doc:readme#parent@folder:A#_.
Notice that in this tuple, <user> is not <user id> but <userset>, which has a form <object#relation>. We leave relations blank as indicated by _.
Object hierarchy can be nested. For example, folder:B can be a parent of folder:A.
folderA#parent@folder:B#_
computed_userset
As described in the paper computed_userset computes a new tuple for a given input and then is replaced with the union of the computed tuples. Any element of the tuple that is not specified in computed_userset is taken from the context. For example, suppose an input tuple is obj#rel@xyz#abc, where <userset> is xyz#abc. And suppose the doc:readme#read@alice tuple we are checking for, is a context. The following rule
computed_userset {
object: $TUPLE_USERSET_OBJECT # parent folder
relation: "reader"
}
resolves $TUPLE_USERSET_OBJECT from the input tuple to xyz. Relation is specified as reader. User is missing and is taken from the context, alice. The end result is xyz#reader@alice.
tuple_to_userset
A tuple_to_userset rule looks up tuples in the database for a given object and relation. A new tuple is computed for each fetched tuple as an input to the computed_userset rule. All computed tuples are concatenated as a union in place of tuple_to_userset.
For example, given doc:readme#reader@alice as a context, the following rule
tuple_to_userset {
# read tuples
tupleset { relation: "parent" }
# generate tuples
computed_userset {
object: $TUPLE_USERSET_OBJECT
relation: "reader"
}
}
First, we read tuples for a userset ( remember <userset>::=<object>'#'<relation>) defined by
tupleset { relation: "parent" }. Since <object> is missing we are using the object from the context, doc:readme. Query for tuples starting with doc:readme#parent will return doc:readme#parent@folder:A#_, which is used as an input to the computed_userset rule. The expression $TUPLE_USERSET_OBJECT will evaluate to folder:A, i.e. object in the userset in the user element of a tuple. Relation is specified as reader. User is filled in from the context, alice. The rewritten tuple is folder:A#reader@alice.
The engine will take multiple passes over the expression until there are no more rules to apply. Only then a tuple set expression is converted to a logical expression and is evaluated using tuples in the database.
Recursive hierarchy
To capture parent relations we will take advantage of multiple rewrites. Let’s define a rewrite role, recursive_reader, such that:
relation {
name: "recursive_reader"
userset_rewrite {
union {
child {
tuple_to_userset {
tupleset { relation: "parent" }
computed_userset {
object: $TUPLE_USERSET_OBJECT
relation: "reader"
}
}
}
child {
tuple_to_userset {
tupleset { relation: "parent" }
computed_userset {
object: $TUPLE_USERSET_OBJECT
relation: "recursive_reader"
}
}
}
}
}
}
And we add recursive_reader to read rewrite.
relation {
name: read
userset_rewrite {
union {
child {
tupleset {
relation: "reader"
}
}
child {
tupleset {
relation: "recursive_reader"
}
}
}
}
}
Let’s give alice a reader role to folder:A by adding a tuple, folder:A#parent@alice.
In the database we have the following tuples:
doc:readme#parent@folder:A#_folder:A#reader@alice
When checking for read permission for alice on doc:readme, the tuple is rewritten as follows:
CHECK(`doc:readme#read@alice`)
By read relation rewrite we get
=> `doc:readme#reader@alice U doc:readme#recursive_reader@alice`
By recursive_reader relation rewrite we get
folder:A#reader@alice from the first tuple_to_userset and folder:A#recursive_reader@alice from the second tuple_to_userset.
=> doc:readme#reader@alice U doc:readme#reader@alice U folder:A#recursive_reader@alice
On the second pass, we need to resolve folder:A#recursive_reader@alice. Since there is no tuples starting with folder:A#parent prefix, both tuple_to_userset resolve to empty set, and thus, folder:A#recursive_reader@alice comes an empty set. If we had a deeply nested folder structure, this recursion would keep on going until the root folder is reached.
Final expression is,
=> doc:readme#reader@alice U doc:readme#reader@alice U folder:A#recursive_reader@alice
Hierarchy between users and groups work the same way.
Summary
To summarize, Zanzibar has a domain-specific language, which basically is a graph query language allowing for graph traversal. One could come up with the equivalent Cypher queries to replicate Zanzibar data model using a graph database.
Zanzibar language looks quite complicated and verbose. This is not a surprise given that Zanzibar is most likely using protos for the model configuration, which is a common practice at Google. I wonder what the Zanzibar model would look like in Cypher.
ReBAC implementation using a graph database would have hard time matching the performance of Zanzibar, which has advantage for a few reasons:
- Storage is simple and efficient. i.e. a single table with 3 columns. A graph database would store vertices separate from edges.
- Horizontal scaling is relatively straight forward. Shard a single table. Split can be performed by namespace and object id of the first element of a tuple.
- Zanzibar can precompute some relations to speed up the lookup.
- Finally, Zanzibar allows for capturing an object/sub-object relation using purely rewrite rules without having to create entities and relations for sub-objects. This significantly reduces the number of vertices and edges in the permission graph.