Introduction

Cedar is an open source policy language and evaluation engine. This book collects all the accepted RFCs (Request for Comments) for Cedar, providing a central repository for easier access and reference.

RFC Status

RFC Template (Fill in your title here)

  • Reference Issues: (fill in existing related issues, if any)
  • Implementation PR(s): (leave this empty)

Timeline

  • Started: (fill in with today's date, YYYY-MM-DD)
  • Accepted: TBD
  • Stabilized: TBD

Summary

Brief explanation of the feature.

Basic example

If the proposal involves a new or changed API, include a basic code example. Omit this section if it's not applicable.

Motivation

Why are we doing this? What use cases does it support? What is the expected outcome?

Please focus on explaining the motivation so that if this RFC is not accepted, the motivation could be used to develop alternative solutions. In other words, enumerate the constraints you are trying to solve without coupling them too closely to the solution you have in mind.

Detailed design

This is the bulk of the RFC. Explain the design in enough detail for somebody familiar with Cedar to understand, and for somebody familiar with the implementation to implement. This should get into specifics and corner-cases, and include examples of how the feature is used. Any new terminology should be defined here.

Drawbacks

Why should we not do this? Please consider:

  • implementation cost, both in term of code size and complexity
  • integration of this feature with other existing and planned features
  • cost of migrating existing Cedar applications (is it a breaking change?)

There are tradeoffs to choosing any path. Attempt to identify them here.

Alternatives

What other designs have been considered? What is the impact of not doing this?

Unresolved questions

Optional, but suggested for first drafts. What parts of the design are still TBD?

is Operator

Timeline

  • Started: 2023-06-16
  • Accepted: 2023-07-28
  • Landed: 2023-11-08 on main
  • Released: 2023-12-15 in cedar-policy v3.0.0

Note: These statuses are based on the first version of the RFC process.

Summary

This RFC proposes to add an is operator to the Cedar language that allows users to check the type of entities.

Basic example

Say that a user wants to allow any User to view files in the folder Folder::"public". With an is operator, the following policy achieves this goal.

permit( principal is User, action == Action::"viewFile", resource is File in Folder::"public" );

At evaluation time, principal is User will evaluate to true if the type of principal is User, and resource is File in Folder::"public" will evaluate to true if the type of resource is File and it is a member of Folder::"public".

Motivation

The is operator allows users to restrict the scope of their policies to particular entity types.

In many cases, this can also be done by adding restrictions to the schema. For example, the appliesTo field for each action in the schema allows users to specify which entity types can be used as principals and resources for that action. In the example above, the schema could specify that viewFile should only apply to principals of type User and resources of type File. However, this doesn't help if the user is not using a schema, the action is unconstrained, or the policy refers to an action that may act on multiple principal/resource types.

For example, here is one case where the action is unconstrained, taken from the Cedar document cloud example.

forbid (principal, action, resource) when { resource has owner && principal != resource.owner && resource has isPrivate && resource.isPrivate };

This policy applies to any principal, action, resource pair, but requires that the resource has owner and isPrivate attributes. If the resource does not have these attributes, then the policy is a no-op.

Say that only one entity type (e.g., File) has the fields owner and isPrivate. Then this policy can be written more succinctly as follows.

forbid (principal, action, resource is File) when { principal != resource.owner && resource.isPrivate };

Detailed design

Changes required to parser/AST

  • Update the grammar and AST to support expressions of the form e is et where et is a Name (e.g., Namespace::EntityType).
  • Update the grammar and AST to support expressions of the form e1 is et in e2 (but not e1 in e2 is et, because this is more difficult to mentally parse).
  • Update the grammar to allow expressions like e is et and e1 is et in e2 in the policy scope for principals and resources. The action part of the scope will only allow expressions like action == Action::"foo" and action in [Action::"foo", Action::"bar"], as before.

Changes required to evaluator

e is et will evaluate to true iff et is a Name (e.g., Namespace::EntityType) and e is an expression that is an entity of type et. If e is an entity-typed expression, but does not have type et, then e is et will evaluate to false. If e is a non-entity-typed expression, the evaluator will produce a type error.

Note that this design maintain the property that "conditions in the scope never error" because principal and resource are guaranteed to be entities, so the expression (principal|resource) is et will never error.

Namespaces are considered "part of" the entity type. So:

  • User::"alice" is User is true
  • Namespace::User::"alice" is User is false
  • Namespace::User::"alice" is Namespace::User is true
  • User::"alice" is Namespace::User is false

Changes required to validator

e is et will be typed as True if the validator can guarantee that expression e is an entity of type et, or False if the validator can guarantee that e is an entity not of type et. If the validator can guarantee that e is an entity, but not determine the exact type, then e is et is given type Bool. Otherwise, the validator produces a type error.

To improve typing precision, we can extend the current notion of "effects" to track information about entity types. Currently, the validator uses effects to know whether an expression is guaranteed to have an attribute. In particular, the validator knows that if e has a evaluates to true, then expression e is guaranteed to have attribute a. We can do something similar for is: when the expression e is et evaluates to true, expression e is guaranteed to be an entity of type et.

Other changes

The Dafny formalization of the Cedar language, corresponding proofs, and differential testing framework in cedar-spec will need to be updated to include the new is operator.

Drawbacks

  1. This is a substantial new feature that will require significant development effort.

  2. This change will break existing applications that depend on the structure of policies. For example, if we decide to allow is in the policy scope (as shown in the examples above), tools that perform basic "policy slicing" based on the scope will need to decide how to handle the new syntax.

  3. The is operator may encourage coding practices that go against our recommended "best practices" (more details below).

Best practice: Using "typed" actions

From the discussion on https://github.com/cedar-policy/cedar/issues/94:

Guidance we have given other customers is that actions should be "typed," in the sense that they include in their name the type of the resource they apply to. So instead of writing

permit( principal == User::"admin", action == Action::"view", resource is Photo );

you would write

permit( principal == User::"admin", action == Action::"viewPhoto", resource );

In your schema, you indicate that the "viewPhoto" action should always be accompanied by a Photo for the resource, so the policy doesn't need to say that explicitly (and doesn't need to perform a runtime check).

If we support an is operator, then customers may be encouraged to use a single "view" action for multiple resource types, rather than a new "viewX" action for each resource type X.

Alternatives

  1. Another way to simulate is is to add an entity_type attribute to entities, and check the type of an entity using resource.entity_type == "File" (for example). However, this requires users to manually add an entity's type as an attribute, and leads to the possibility that an entity might be created with an attribute that doesn't actually match its type.

  2. The behavior of is can also be simulated using groups. For each entity type (e.g., File), a user could create a group (e.g., Type::"File") and add all entities of that type to the group. Then to check whether an entity has a particular type, users could use in (e.g., resource in Type::"File"). Like alternative (2), this requires some extra work on the part of the user (setting up the memberOf relation), and does not prevent entities from being added as members of the wrong group.

Resolved questions

  • Should we allow is in the policy scope? Currently we only allow == and in expressions in the scope, with the intention that these expressions can be used to represent RBAC constraints, whereas more general expressions in the policy condition can be used to express ABAC constraints.
    • Decision: We will allow is and is ... in in the scope.
    • Summary of discussion:
      • We feel that is is natural to write in the policy scope since it limits the "scope" of the policy.
      • Most requests we have received for this feature use an example with is in the scope, confirming our belief that this is the most natural location for its use.
      • Although is is not currently used for policy indexing, it could be in the future, and current indexing strategies can safely ignore is constraints in the meantime.
  • Should users be able to indicate that an entity may be of multiple types, e.g., principal is [User, Group]? Can this be combined with in (e.g., principal is [User, Group] in Group::"orgA", or even principal is [User, Group] in [Group::"orgA", Group::"orgB"])?
    • Decision: We will not support this. The syntax is more difficult to read and the same behavior can be achieved by writing multiple policies instead. We may reconsider this position based on user feedback.

Disallow embedded whitespace in string representations of the EntityUID

Timeline

  • Started: 2023-06-19
  • Accepted: 2023-06-26
  • Landed: 2023-06-28 on main
  • Released: 2023-06-29 in cedar-policy v2.3.0

Summary

Disallow embedded whitespace, comments, and control characters in string representations of the EntityUID

Basic example

Currently, the following syntax is legal:

EntityTypeName::from_str("ExampleCo :: Photoflash :: User //comment");

This change would modify the function so that the string input must be normalized; i.e. all extraneous whitespace, control characters, and comments removed.

EntityTypeName::from_str("ExampleCo::Photoflash::User");

Motivation

Similar to other programming languages such as Rust, Cedar is currently whitespace agnostic. This leads to the ability to embed whitespace, control characters like newlines, and comments around the :: delimiter in an Entity UID.

For example, the following syntax is valid:

permit( principal == ExampleCo :: Photoflash :: //This is a comment :: User::"alice", action, resource ); permit( principal == ExampleCo::Photoflash::User:://comment "alice", action, resource );

This capability was little known, even amongst many Cedar team members, and was only identified as a result of a typo in a Cedar demo. While this behavior may be OK inside policy statements, there is a risk when policy contents such as the EntityTypeName are used as strings outside the context of policy statements, such as in the JSON schema representation. For laypersons who are not Cedar experts, these values appear to be normalized strings rather than fragments of Cedar policy syntax, leading to bugs in application code when whitespace and comments can appear in the value.

Examples:

  1. The Cedar Schema format models the schema configuration under a JSON key for the namespace. Policy stores which index schemas by namespace are unlikely to recognize the need to normalize the value, leading to the possibility of storing duplicate schema definitions for "ExampleCo::Photoflash" and "ExampleCo :: Photoflash" and indeterminate behavior regarding which schema takes effect at runtime.
  2. Policy Stores can implement logic that relies on string comparisons against the EntityTypeName. In a real issue, an application using Cedar sought to preclude callers from passing Actions in the inline slice of entity definitions. It did so by checking if an EntityTypeName matched .*::Action. It presumed that :: Action was invalid syntax and would be rejected by the Cedar evalator, the same as any other syntatically invalid input. This resulted in a bug, as it allowed callers to bypass the extra validation that the application sought to enforce.
  3. Customers are anticipated to build meta-permissions layers that restrict callers to manipulating policy store contents for only certain namespaces. This may lead to policies such as forbid(...) when {context.namespace = "ExampleCo::Photoflash"};. There is a risk that an unauthorized actor could bypass this restriction by using a namespace with embedded spaces.

While it is technically possible for applications to mitigate this risk by diligently using Cedar tooling to normalize the values, the little-known nature of this Cedar behavior implies that few will know they should normalize the value. As a point of reference, application developers who have worked with Cedar extensively for over a year were bitten by this bug in production. Hence, this is likely to result in bugs in many other Cedar-based implementation with similar logic, risking a perception that Cedar is fragile or unsafe.

Detailed design

In EntityTypeName::from_str(), whitespace would be disallowed throughout -- either before, after, or in the middle of the typename. The analagous change will also be made in functions EntityNamespace::from_str() and EntityUid::from_str().

Drawbacks

This is a breaking change. We do not generally want breaking changes in Cedar. It is being proposed only because it is believed the risk of breakage is low, and any potential short-term impacts to stability will pay long-term dividends in Cedar's perceived reliability.

The parties most at risk are those who accept ad hoc values of EntityTypeName from callers in a string format where the values may contain embedded whitespace. At the current time, the only known party who does this in a production setting is Amazon Verified Permissions. Due to the relative newness of Amazon Verified Permissions, this party is accepting of the change as it is not expected to break any production scenarios, and appears to be a valid risk to accept in order to benefit the Cedar community in the long-term.

Other parties at risk are those who use static, programmatically defined values of EntityTypeName where the code contains a typo in the value that accidentally includes additional whitespace. The belief is that this is likely to get caught in a test environment before going to production, and represents a typo that the impacted party may wish to fix, regardless.

The last party at risk are those who allow customers to upload custom schema files where the namespace value may contain whitespaces. The only known party who accepts adhoc schema uploads from customers with namespaces is Amazon Verified Permissions, which is accepting of this change.

To test these assumptions, this change will preceeded by reach-outs to Cedar early adopters to highlight the risks and gain community buy-in on the approach.

Alternatives

An alternative is to modify the policy syntax to disallow embedded whitespace in Entity UID across all parts of Cedar, including within policy statements. This could be done by treating the UID as a single identifier with special structure (rather than tokenizing it into :: separated sequence of identifiers).

One additional benefit of this approach is that it can mitigate attempts by pentesters and malicious actors who may seek ways to craft Cedar policy statements that look like they do one thing, but actually do another. The existence of little known parsing behaviors provide an avenue for exploration by this audience, as demonstrated by the following example:

permit( principal == ExampleCo::Photoflash::User:://"alice" "bob", //TODO: Try various tricks to obfuscate this line from readers action, resource );

Despite the potential benefit, this alternative was rejected due to the high risk of breaking existing Cedar users. In the worst case scenario, a customer may have an existing forbid statement that relies on this behavior as illustrated below. By changing the behavior, the forbid statement will error at runtime and be skipped, and hence the system could fail open.

forbid( principal in ExampleCo:: Photoflash::UserGroup::"someGroup", //Accidential typo, but valid syntax action == ExampleCo:: Photoflash::Action::"write", resource );

This risk is too great. Therefore, the suggested approach is a compromise that mitigates the known production bugs with fewer risks. Any concerns about pentesters and malicious actors crafting obfuscated policies will need to be addressed by other non-breaking means, such as linter warnings and syntax highlighting.

Standalone strict validation

Timeline

  • Started: 2023-07-11
  • Accepted: 2023-09-05
  • Landed: 2023-09-08 on main
  • Released: 2023-12-15 in cedar-policy v3.0.0

Note: These statuses are based on the first version of the RFC process.

Summary

Internally, strict validation of a policy is implemented as 1/ checking and transforming the policy using a permissive validation mode; 2/ annotating the transformed policy with types, and 3/ checking the types against more restrictive rules. We would like to be able to at least explain strict mode independently of permissive mode, but there is no easy way to do that. This RFC proposes to separate strict mode from permissive mode, making the implementation simpler and more understandable/explainable, though somewhat more strict.

Motivation

Consider this policy, given in file policy.cedar, where principal is always of type User:

permit( principal, action == Action::"read", resource) when { (if context.sudo then Admin::"root" else principal) == resource.owner || resource.isPublic }

If we validate this policy against the schema schema.cedarschema.json, then resource.owner is always of type User. As a result, the policy is rejected by strict-mode validation, with two errors:

  1. the if/then/else is trying to return entity type Admin in the true branch, and entity type User on the false branch, and these two are not equal.
  2. since resource.owner has type User, it will have a type different than that returned by the if/then/else, which can sometimes return Admin.

But if we change "User" in schema.cedarschema.json on line 13 to "Org", then resource.owner is always of type Org. As a result, the policy is accepted! The two error conditions given above seem not to have changed -- the conditional returns different types, and those types may not equal the type of resource.owner -- so it's puzzling what changed to make the policy acceptable.

The reason is that strict-mode validation is dependent on permissive-mode validation. In particular, strict-mode validation is implemented in three steps: 1/ validate and transform the policy using permissive mode; 2/ annotate the transformed policy AST with types; and 3/ check the types against more restrictive rules.

In the first case, when resource.owner is always of type User, the expression (if ...) == resource.owner has type Boolean in permissive mode and as a result no transformation takes place in step 1. As a result, it is rejected in step 3 due to the errors given above.

In the second case (after we change the schema), resource.owner is always of type Org. Under permissive mode, the (if ...) == resource.owner expression has singleton type False. That’s because the (if ...) expression has type User|Admin ("User or Admin") and User|Admin entities can never be equal to Org entities. Both the singleton type False (which only validates an expression that always evaluates to false) and union types like User|Admin are features of permissive mode and are not present in strict mode. Because of these features, step 1 will replace the False-typed (if ...) == resource.owner with value false, transforming the policy to be permit(principal,action,resource) when { false || resource.isPublic }. Since this policy does not require union types and otherwise meets strict mode restrictions, it is accepted in step 3.

In sum: To see why the second case is acceptable, we have to understand the interaction between permissive mode and strict mode. This is subtle and difficult to explain. Instead, we want to implement strict mode as a standalone feature, which does not depend on features only present in permissive mode.

Detailed design

We propose to change strict-mode validation so that it has the following features taken from permissive mode:

  • Singleton boolean True/False types (which have 1/ special consideration of them with &&, ||, and conditionals, and 2/ subtyping between them and Boolean, as is done now for permissive mode).
  • Depth subtyping, but not width subtyping, to support examples like { foo: True } <: { foo: Boolean }.

Strict-mode validation will continue to apply the restriction it currently applies:

  • Arguments to == must have the compatibles types (i.e., an upper bound exists according to the strict - depth only - definitions of subtyping) unless we know that comparison will always be false, in which case we can give the == type False. The same goes for contains, containsAll, and containsAny.
  • Arguments to an in expression must be compatible with the entity type hierarchy, or we must know that the comparison is always false.
  • Branches of a conditional must have compatible types unless we can simplify the conditional due to the guard having a True/False type.
  • The elements of a set must have compatible types.
  • As a consequence of the two prior restrictions, union types cannot be constructed.
  • Empty set literals may not exist in a policy because we do not currently infer the type of the elements of an empty set.
  • Extension type constructors may only be applied to literals.

Validating Action entities and template slots ?principal and ?resource must be done differently so as not to rely on the "top" entity type AnyEntity, which is essentially an infinite-width union type, used internally. Next, we describe how we propose to handle these situations, in both permissive and strict validation. Importantly, these changes all retain or even improve precision compared to the status quo -- they will not be the reason that policies that typechecked under the old strict mode no longer do.

Actions

Permissive mode validation considers different Actions to have distinct types, e.g., Action::"view" and Action::"update" don't have the same type. This is in anticipation of supporting distinct actions having distinct attribute sets. The least upper bound of any two actions is the type AnyEntity. Such a type supports expressions that are found in the policy scope such as action in [Action::"view", Action::"edit"]. This expression typechecks because the least upper bound of Action::"view" and Action::"edit" is AnyEntity so the expression is type Set<AnyEntity>. Since AnyEntity cannot be used in strict validation, the current code special-cases the above expression, treating it as equivalent to action in Action::"view" || action in Action::"edit". However, this approach is unsatisfying as it only supports literal sets of Action entities.

It turns out we can resolve this issue by making all Action entities have one type -- Action -- in both strict and permissive validation. This change strictly adds precision to validation, compared to the status quo. Expressions like [Action::"view", Action::"edit"] have type Set<Action> and thus do not need AnyEntity to be well-typed. Expressions like if principal.isAdmin then Action::"edit" else Action::"editLimited" will typecheck in strict validation, since both branches have the same type (Action).

A possible drawback of this change is that it will not allow distinct actions to have distinct attribute sets (if indeed we add attributes to actions in the future). Rather, all actions must be typeable with the same record type. As a mitigation, for action entities that have an attribute that others do not, the attribute can be made optional. However, two entities would not be able to have the same attribute at different types, e.g., { foo: String } vs. { foo: Long }. Arguably, this restriction is clearer and more intuitive than what would have been allowed before.

Template slots

There are no restrictions on what entity type can be used to instantiate a slot in a template, so the permissive typechecker also uses AnyEntity here. The only restriction on the type of a slot is that it must be instantiated with an entity type that exists in the schema. The strict validator today does not properly support validating templates, which is something we want to allow.

Because AnyEntity cannot be used in strict validation, we will modify both permissive and strict validation to precisely typecheck a template by extending the query type environment to include a type environment for template slots. Doing so will strictly improve precision for permissive validation; all templates that validated before will still do so after this change.

Here is how the change will work. In the same way that we typecheck a policy once for each possible binding for the types of principal and resource for a given action, we will also typecheck a template once for every possible binding for the type of any slots in that template.

Typechecking for every entity type may seem excessive, but we know that a template slot can only occur in an == or in expression in the policy scope where the principal or resource is compared to the slot. We will infer the type False for these expression if the slot is an entity type that cannot be == or in the principal or resource entity type, short-circuiting the rest of typechecking for that type environment. Typechecking could be optimized by recognizing and ignoring type environments that will be False based only on the policy scope.

For example, using the schema from the motivating example, the following template will typecheck.

permit(principal == ?principal, action == Action::"read", resource);

Using the usual request type environment construction, there is one environment (User, Action::"read", Object). Because this is a template containing ?principal, we extend the environment with each possible type for the slot: User, Org, Admin and Object. When typechecking, the validator sees principal == ?principal && ... first in every environment which is always an equality between two entity types. The principal is always a User while ?principal is one of the possible slot types depending on the type environment. In every environment where ?principal is not User, the equality has type False, causing the && to short-circuit to type False. The rest of the policy is typechecked as usual when ?principal is User.

This change will match the precision of the validator today, and will enable more precise typechecking if we expand where in a policy template slots may be used. In particular, if we allowed expressions like ?principal.foo == 2 in the condition of a policy, the above approach would allow us to know precisely that ?principal will always have a foo attribute, whereas the current approach using AnyEntity would not be able to.

Implementation details

The current way we implement strict mode will change to accommodate these additions. We propose the following:

  1. Change the Rust code to implement strict mode by reusing the logic of permissive mode​ and using a flag to perform more restrictive checks. For example, when given User and Admin entity types, the code performing the least-upper-bound computation will return union type User|Admin in permissive mode, and will fail in strict mode. And, the code for conditionals would force branches' types to match in strict mode, but not in permissive mode. Etc.
  2. When checking in strict mode, we construct a new AST that performs the needed transformations, as in step 2 today.
  3. Either way, we can add type annotations. For strict mode, they could be added during the transformation; for permissive mode, no transformation is needed. No checking is done after annotations are added, since the checking is always done, regardless of mode, prior to the transformation.

Drawbacks

The main drawback of this approach is that strict-mode validation will be more restrictive. This is by design: Policies like the one in the motivating example will be rejected where before they were accepted. Accepting this RFC means we are prioritizing understandability over precision under the assumption that the lost precision will not be missed.

Policies that validated before but will not validate now are quite odd -- they would rely on the use of width subtyping, unions, etc. to determine that an expression will always evaluate to true (or false) and thus can be transformed away. But policies like these will be like those in the motivating example at the start of this RFC, and are probably not what the user intended in the first place. Ultimately, flagging them as erroneous is safer.

A secondary drawback is that this change will result in a non-trivial, and backwards-incompatible code change. Mitigating this issue is that the backward incompatibility will be minimal in practice (per the above), and we will use verification-guided development to prove that the change is sound (i.e., will not introduce design or implementation mistakes).

Alternatives

The proposal is, we believe, the minimal change to make strict mode self-contained, but not too different from what was there before. It also should not result in pervasive changes to the existing code. Alternative designs we considered (e.g., allowing more permissive mode features in strict mode) would be more invasive.

Updates

  • 2023-11-07: The original text said that the handling of unspecified entities needed to be adjusted. However, although released implementations of Cedar type unspecified entities as AnyEntity, == and in expressions involving unspecified entities are typed as False (rather than Bool). This provides the same behavior as using a special Unspecified type, as originally proposed in this RFC.

Disallow duplicate keys in Cedar records

Timeline

  • Started: 2023-07-14
  • Accepted: 2023-08-04
  • Landed: 2023-10-24 on main
  • Released: 2023-12-15 in cedar-policy v3.0.0

Note: These statuses are based on the first version of the RFC process.

Summary

Today, Cedar allows duplicate keys in record literals (and other record values, including context), with last-value-wins semantics. This RFC proposes to disallow duplicate keys in record values, making that an error.

Basic example

Today,

permit(principal, action, resource) when { resource.data == { foo: 2, foo: 3 } };

is a valid Cedar policy. The foo: 2 is dropped and foo will have the value 3 in the record literal. In other words, in Cedar,

{ foo: 2, foo: 3 } == { foo: 3 }

evaluates to true.

More precisely, Cedar's semantics does not just drop values prior to the last, but fully evaluates them before throwing them away. This has consequences in the presence of errors. For instance,

{ foo: 1 + "hello", foo: 3 }

does not evaluate to { foo: 3 } as you might expect based on the previous example, but rather, it results in an evaluation error.

This RFC proposes that all of the above expressions and policies would result in an evaluation error due to duplicate keys in a record.

Motivation

There are two main motivations for this change.

First, in the status quo, Cedar policy ASTs actually cannot losslessly roundtrip to EST and back. This is because the AST representation correctly allows duplicate keys in record literals, but the EST representation drops them (due to an oversight during EST design). In this way, the EST is unfaithful to Cedar semantics, and not all valid Cedar policies are expressible in the EST, which violates the EST's design goal of being lossless modulo whitespace/comments/etc.

More precisely, if a JSON EST contains duplicate record keys in a record literal, our JSON parser will take last-value-wins, and the internal EST representation will contain only the last value. This results in the correct behavior on examples like { foo: 2, foo: 3 }, but the incorrect behavior on examples containing errors, like { foo: 1 + "hello", foo: 3 }. It also means that roundtripping the AST representation to the EST representation and back necessarily loses information -- specifically, the values associated with all but the last instance of each key.

The second motivation is that the status quo makes things difficult for our Cedar formal specification and accompanying proofs. The proofs are required to reason about "well-formed" records as a pre- and post-condition, which could be avoided if records were structurally prohibited from having duplicate keys (e.g., by using a map type instead of an association list). The status quo also requires reasoning about the order keys appear in a record literal, meaning that the keys can't be naively sorted without preserving information about their original order. If duplicate keys were prohibited, the order of keys wouldn't matter (but see "Unresolved questions" below).

Two secondary, more minor motivations for this proposal are:

  • It would bring the Cedar language in line with our desired behavior on duplicate keys in other Cedar API surfaces. See cedar#159.
  • Cedar records resemble other languages' struct types more than they resemble map types. The current behavior would be unexpected for struct types, and the proposed behavior would be intuitive.

Detailed design

We'd do something very like cedar#169, except that instead of implicitly dropping all but the last value for each key (in .collect()), we'd be careful to emit an error when constructing an Expr if the record has duplicate keys.

In the case of record literals, the error could be caught in the CST->AST step, resulting in the user seeing a parse error rather than an evaluation error. In the case of context, the error would need to be caught when constructing the Context object. In the case of records in entity attributes, the error could be caught when constructing an Entity object. There are no other cases, because currently, Cedar has no other way to construct a record value: there are only record literals, and records preexisting in context or entity attribute values.

All of these cases would be caught prior to is_authorized() in Core, so we would not need a new category of evaluation error. No new evaluation errors would happen due to this proposal.

Another consequence of the above choices is that duplicate-key errors "take precedence" over (other) evaluation errors. The expression { one: 1 - "three", one: 1 } will throw a duplicate key error and not a type error.

I don't believe this change would have a significant effect on validation or width/depth subtyping of records, but please correct me if I'm wrong.

Drawbacks

This is a breaking change, not just to the Cedar API surface, but to the Cedar language (policy syntax) itself. That's a significant drawback not to be taken lightly. In this case, it's mitigated somewhat by the fact that we're turning previously-valid policies into errors, not changing them into a silently different behavior. It's also mitigated by the fact that most Cedar users are probably (citation needed) not writing policies in the class that will have a behavior change -- i.e., policies with duplicate keys in record literals. And finally, it's mitigated by the fact that Cedar's current behavior on duplicate keys was not documented in the public docs (e.g., in this section), so if any policies are relying on this behavior, it's probably by accident.

Alternatives

The main alternative is doing nothing, which of course has the reverse pros/cons of the proposal as explained in "Motivation" and "Drawbacks".

Another alternative, which would address the first motivation and not the second, would be to change the EST format so that it supports record literals with duplicate keys. I believe this would require a breaking change to the EST format, and record literals would not be expressible as JSON maps, which is unfortunate because I believe that's the most natural/ergonomic. (Devil's advocate: we could maybe find a way to support JSON maps in the common case with some escape syntax for expressing record literals that have duplicate keys. If we want to take this alternative seriously, I can investigate this.)

Unresolved questions

Order-independence for record keys

[Resolved: we won't make any further changes as part of this RFC, but may address some of this in a different RFC]

At first glance, one would hope that this change would be sufficient to ensure that the order of keys in a record literal (or more generally, any record value) does not matter. However, that's not quite true -- if keys are mapped to expressions with errors, the order does matter, because only the first error encountered will be returned. While we're making this change, should we also make a change so that we have true order-independence for Cedar record keys? I could think of at least three ways to do that:

  1. Always evaluate all values in a record, and return potentially multiple evaluation errors rather than just the first
  2. Provide no guarantee about which error the user will receive when multiple record keys have errors (nondeterministic error behavior) -- I think the formal spec currently guarantees the first error?
  3. Guarantee that record values will be evaluated in lexicographic order of their keys, regardless of the order the keys appear in the policy text

This issue is particularly hairy for records contained in attribute values in the entity store -- I believe these records go through a hashmap representation in our current datapath, so we lose ordering information and also duplicate keys (silently). Since the values in these records are restricted expressions, there are very few ways they can produce evaluation errors, but they still can, in the case of calls to extension functions. This is probably a potential source of difference between the errors produced in the Rust implementation and in the formal spec; I wonder why DRT hasn't caught this, particularly in the case where we drop a duplicate key that was mapped to an erroring expression. Maybe the Dafny JSON parser silently takes last-value-wins? or maybe we don't test extension values in attribute values?

0021-any-and-all-operators

Cedar Schema Syntax

Timeline

  • Started: 2023-07-24
  • Accepted: 2023-10-02
  • Landed: 2024-02-19 on main
  • Released: 2024-03-08 in cedar-policy v3.1.0

Note: These statuses are based on the first version of the RFC process.

Summary

This document proposes a custom syntax for Cedar schemas. The syntax was developed with the following goals in mind:

  • Use the same concepts in both the JSON-based and custom syntax, allowing them to be naturally inter-converted
  • Reuse Cedar policy syntax as much as possible, to help with intuition
  • When no Cedar policy syntax analogue exists, leverage ideas from other programming languages that favor readability
  • Use as few syntactic concepts as possible; similar ideas in the schema should look similar syntactically
  • Support converting back and forth between the JSON-based schema without losing information

This is not a proposal to replace the JSON-based syntax; the aim is to provide an additional syntax that better satisfies the goals above. The JSON syntax is appropriate in many circumstances, such as when constructing schemas systematically. Adopting this proposal would put schemas on the same footing as Cedar policies, which have a custom and a JSON syntax.

Basic example

Here is the proposed syntax for the TinyTodo schema, whose auto-formatted JSON version is 160 lines long.

entity Application; entity User in [Team,Application] { name: String }; entity Team in [Team,Application]; entity List in [Application] { owner: User, name: String, readers: Team, editors: Team, tasks: Set<{name: String, id: Long, state: String}> }; action CreateList, GetLists appliesTo { principal: [User], resource: [Application] }; action GetList, UpdateList, DeleteList, CreateTask, UpdateTask, DeleteTask, EditShares appliesTo { principal: [User], resource:[List] };

Motivation

Cedar schemas for non-toy applications are hard to read and write. The root cause of the difficulty is the use of JSON:

  • JSON has low information density. For three applications we’ve modeled — TinyTodo, Document Cloud, and GitHub — schemas span 4-5 pages.
  • JSON does not support comments. This means that any design intent for a schema cannot be expressed inline.

We believe that a custom syntax for Cedar schemas can help. It can be more information-dense, support comments, and leverage familiar syntactic constructs. We think customers will be happier with a custom syntax: we regularly hear favorable reviews of Cedar's custom policy syntax, as compared to a JSON-based syntax.

Detailed Design

In what follows we first present the syntax by example. A full grammar is given at the end.

Comments

Cedar schemas can use line-ending comments in the same style as Cedar, e.g., // this is a comment.

Namespace format

namespace My::Namespace { ... }

A Schema can have zero or more namespace declarations. The contents of each (... above) comprise entity, action, and common-type declarations, all of which are unordered; i.e., they are all allowed to mutually refer to one another (but see discussion on common types below). The recommended style is to put entity declarations first, followed by action declarations, with common type declarations interspersed as needed. Entity, action, and common types referenced in the same namespace in which they are defined need not be prefixed by that namespace.

Entity, action, and common-type declarations can appear on their own, outside of any namespace; in that case they are implicitly within the empty namespace.

Entity declarations

Here is an example illustrating the entity declaration syntax.

entity User in [Group] { personalGroup: Group, delegate?: User, blocked: Set<User>, }; entity Document { owner: User, isPrivate: Boolean, publicAccess: String, viewACL: DocumentShare, modifyACL: DocumentShare, manageACL: DocumentShare }; entity DocumentShare; entity Public in [DocumentShare];

The first declaration indicates that User is an entity type, can be a member of the Group type, and has two attributes: personalGroup with entity type Group, and blocked with type Set<User>.

To allow syntactic parity with common type declarations, discussed below, we can optionally add = just before the { ... } defining an entity's attributes, e.g., entity Dog = { name: String } rather than entity Dog { name: String }. Allowing = also supports using a common type name as the entity's shape, and also allows a future extension in which an entity's shape is a type other than a record.

Entity declarations may or may not have the in ... component, and they may or may not have attributes. A singleton list can be expressed without brackets, e.g., entity User in Group = ... Optional attributes are specified in the style of Typescript, with a ? after their name. The last element of an the attribute list can optionally include a comma.

Entity types that share a definition can be declared together. For example, the following declares two entity types User and Team, both of which can be members of Team and both of which have a name attribute.

entity User, Team in [Team] { name: String };

Since the JSON-based format does not support groups, this creates a challenge for back-and-forth translation. Our suggested solution is to add an optional groupid attribute to an entity declaration in the JSON. Those entities that are in the same group will be put together when translating to the custom syntax (after confirming that they indeed have the same type), and when translating to JSON such a group ID will be automatically generated.

Action declarations

Here is an example illustrating the action declaration syntax.

action ReadActions; action WriteActions; action CreateDocument appliesTo { principal: [User], resource: [Drive] }; action DeleteDocument in [WriteActions] appliesTo { principal: [User], resource: [Document] }; action ViewDocument in [ReadActions] appliesTo { principal: [User,Public], resource: Document, context: { network: ipaddr, browser: String } };

A declared action indicates either none, one, or both of the following:

  1. What action groups it is a member of, using in
  2. What principal, resource, and context types it may be coupled with in a request (if any), using appliesTo

Specifying [] for the in component is equivalent to omitting in entirely. An action that has no appliesTo component essentially names an action group: having no possible principal and/or resource types means there’s no way to submit a legal request with the action.

The appliesTo specificiation uses record syntax, where the "attributes" of the record may be any combination of principal, resource, and context, with the following constraints.

  • If principal and/or resource is given, the accompanying type must be either
    • a single entity type, or
    • a non-empty list of entity types
  • If a principal or resource element is not given, that means that this request component is unspecified, i.e., corresponding to the None option in the principal or resource component of a Request.
  • If context is given, the accompanying type must be a record type. If it is not given, the type {} is assumed.
  • At least one of principal, resource, or context must be included if appliesTo is present; i.e., writing appliesTo { } is not allowed.

Here are two additional examples that follow these rules:

action ViewDocument in ReadActions appliesTo { resource: Document, context: {} }; action Ping appliesTo { context: { source: ipaddr, dest: ipaddr } };

Since actions are entities, action names are entity IDs, which can be arbitrary strings. Thus we admit the following more general syntax as well.

action "Delete Document $$" in ["Write Actions"] appliesTo { principal: [User], resource: [Document] };

We anticipate future support for attributes as part of action entities. If/when they are supported, we can specify action attributes after the appliesTo part as an inline record using Cedar syntax following an attributes keyword. For example, here’s an extension of ViewDocument with action attributes.

action ViewDocument in [ReadActions] appliesTo { principal: [User,Public], resource: Document, context: { network: ipaddr, browser: String } } attributes { version: 1, group: "meta" };

As with entity declarations, since the JSON-based format does not support groups, we can add an optional groupid attribute to an action declaration in the JSON. Those actions that are in the same group will be put together when translating to the custom syntax (after confirming that they indeed have the same type), and when translating to JSON such a group ID will be automatically generated.

Common types

You can define names for types (as in Typescript, OCaml, Haskell, and many other languages) and then use them in action and entity declarations. Here is an example:

type authcontext = { ip: ipaddr, is_authenticated: Boolean, timestamp: Long }; entity Ticket { who: String, operation: Long, request: authcontext }; action view appliesTo { context: authcontext }; action upload appliesTo { context: authcontext };

Note the use of = for type declarations, but the lack of (required) = for entity declarations; the = is needed because the definining type may not be a record, e.g., type time = Long.

As with entity and action declarations, the name implicitly includes the surrounding namespace as a prefix.

While common types can be declared anywhere within a namespace and safely referenced by entity and action declarations, they cannot refer to one another, to avoid introducing definitional cycles. (To allow mutual reference, we could make declaration order matter: entity, action, and other type declarations cannot refer to a common type before it is defined.)

Much later note, May 2024: In the fullness of time we relaxed the above restriction and allowed common types to refer to one another, as long as there are no definitional cycles. See cedar#766.

Disambiguating Types in Custom Syntax

Cedar has four kinds of type name:

  1. Primitive types, e.g., Long, String, etc.
  2. Extension types, currently just ipaddr and decimal
  3. Entity types
  4. Common types

The first two are built in, and could potentially change in the future. The last two are schema-declarable; they have an associated namespace prefix, but references within the defining namespace can drop the prefix.

In the current JSON syntax, common types may not overlap with primitive types; e.g., declaring a common type Long is disallowed. However, common types are permitted to overlap with extension types; e.g., declaring common type ipaddr is allowed. This works in the JSON because specifying an extension type requires an explicit "type" designator, as in the following:

"addr": { "ip": { "type": "Extension", "name": "ipaddr" }, }

If we also had defined a common type ipaddr we would reference it by writing

"addr": { "myip": { "type": "ipaddr" }, }

Common types may also overlap with entity types, for similar reasons: To reference an entity type you must write { "type": "Entity", "name": "Foo" } while to reference a common type you'd just write { "type": "Foo" }.

The custom syntax does not require such designators, for readability, so we need a way to disambiguate type name references when necessary.

When two different kinds of type definitions are in separate namespaces, we rely on the namespace resolution rules to disambiguate. For example, suppose we defined the following schema:

type email_address = String; // (1) namespace Foo { entity email_address { addr: String }; // (2) entity User { email: email_address }; // (3) }

Here, in the definition of User, the unqualified type email_address at (3) resolves to the entity definition at (2), because it is in the closest enclosing scope.

When two different kinds of type definitions are in the same namespace, we propose the following five rules:

  1. Issue a warning (for both syntaxes) when a schema defines the same typename twice (within a namespace)
  2. Disallow overlap of extension and primitive type names
  3. Resolve name references in a priority order
  4. Reserve __cedar as a namespace to disambiguate extension/primitive types from others
  5. Handle entity/common typename overlaps in translation

These rules aim to address several (additional) tenets

  • Conflicting type names are an exception to be discouraged, rather than a norm, for readability and clarity
  • Schemas should not break when new extension types are added to Cedar
  • Backward-incompatible changes should be minimized

Rule 1: Issue a warning (for both syntaxes) when a schema defines the same typename twice

Defining an entity type with the same name as a primitive or extension type, or an entity and common type with the same name (within the same namespace), is unlikely to be a good idea. Though they can be technically disambiguated in the JSON syntax, the potential for confusion is nontrivial. None of the rules that follow need to be understood, and conversion to/from JSON and custom syntax is assured, if defined types are all kept distinct.

Rule 2: Disallow overlap of extension and primitive type names

Right now, all primitive types have different names than extension types. We propose that it should always be that way. Though the flexibility is there in today’s JSON schema to allow overlap, we see no good reason to allow it. Because we are currently the sole purveyor of extension types, this rule is easy to implement.

Rule 3: Resolve name references in a priority order

When the schema references a type name (without a namespace prefix), it should resolve in the following order:

  1. common type
  2. entity type
  3. primitive
  4. extension type

For example, suppose we declared a common type type ipaddr = Long; If the schema later referred to ipaddr, then that reference would be to this common type definition, not the extension type, since the common type is higher in the order. This ordering ensures that future-added extension types will not break existing schemas. For example, if we added a future extension type date, then schemas that define an entity or common type of the same name would have the same semantics as before: they would refer to the schema-defined type, not the new extension type.

In tabular form, this priority order is depicted below. We write n/a for the case that the two kinds of type can never share the same name (due to the status quo and rule 2).

| |common |entity |primitive |extension | |--- |--- |--- |--- |--- | |common | | | | | |entity |common | | | | |primitive |n/a |entity | | | |extension |common |entity |n/a | |

The next two rules consider how to disambiguate conflicting types.

Rule 4: Reserve __cedar as a namespace to disambiguate extension/primitive types from others

If a schema defines a common or entity type that overlaps with an extension type, there is no way to refer to extension type — the common or entity type always take precedence. This is particularly problematic if a new extension type overlaps with an existing definition, and an update to the schema might wish to use both.

To rectify this problem, we propose reserving the namespace __cedar and allowing it to be used as a prefix for both primitive and extension types. This way, an entity type ipaddr can be referred to directly, and the extension type ipaddr can be referred to as __cedar::ipaddr. Likewise, this allows one to (say) define an entity type String that can be be referenced distinctly from __cedar::String (though this practice should be discouraged!).

Rule 5: Handle entity/common typename overlaps in translation

If a schema defines an entity type and a common type with the same name, the references to that name will resolve to the common type, and there will be no way to reference the entity type in the custom syntax. Per rule 1, users will be warned of this.

Custom syntax users can rectify the situation by changing either the common type or entity type to a different name (or placing them in different namespaces). JSON syntax users can refer to both types distinctly, but will be warned that translation to the custom syntax will not be possible without changing the name. Translation tools can easily make this name change automatically.

Note that translation from JSON to custom syntax is only problematic when a schema defines both a common and entity type of the same name. Translation tools can easily address this issue by renaming the common type (and offering to change the type name in the original JSON)

Evaluating these rules against our added tenets above, we can see:

  • Conflicting type names are discouraged by issuing a warning when there is overlap
  • Due to the choice in priority order, schemas will not break when new extension types are added
  • Only rule 4 (introducing the __cedar reserved namespace) is backward incompatible, and it is very unlikely to affect existing schemas.

Rejected alternative designs for disambiguation

One alternative design we considered was to use prefixes on typename references to indicate the kind of type, e.g., extension types could be optionally prefixed with +, e.g., +ipaddr and +decimal, and entity types could be optionally prefixed with &, e.g., &User and &My::Namespace::List. However, we rejected this design because it would be unfamiliar to users (especially the use of +), and doesn't discourage the use of overlapping type names.

Another alternative we considered was to forbid type-name overlaps entirely. However, rejected this design because it is strongly backward-incompatible, and also because it could lead to surprise if a future definition of an extension type happened to match a common or entity type in use in an existing schema.

Mockup: Document Cloud

Here is a version of the Document Cloud schema.

namespace DocCloud { entity User in [Group] { personalGroup: Group, blocked: Set<User> }; entity Group in [DocumentShare] { owner: User }; entity Document { owner: User, isPrivate: Boolean, publicAccess: String, viewACL: DocumentShare, modifyACL: DocumentShare, manageACL: DocumentShare }; entity DocumentShare; entity Public in [DocumentShare]; entity Drive; action CreateDocument, CreateGroup appliesTo { principal: [User], resource: [Drive] }; action ViewDocument appliesTo { principal: [User,Public], resource: [Document] }; action DeleteDocument, ModifyDocument, EditIsPrivate, AddToShareACL, EditPublicAccess appliesTo { principal: [User], resource: [Document] }; action ModifyGroup, DeleteGroup appliesTo { principal: [User], resource: [Group] }; }

Mockup: GitHub Model

Here is a version of the GitHub schema.

namespace GitHub { entity User in [UserGroup,Team]; entity UserGroup in [UserGroup]; entity Repository { readers: UserGroup, triagers: UserGroup, writers: UserGroup, maintainers: UserGroup, admins: UserGroup }; entity Issue { repo: Repository, reporter: User }; entity Org { members: UserGroup, owners: UserGroup, memberOfTypes: UserGroup }; action pull, fork, push, add_reader, add_triager, add_writer, add_maintainer, add_admin appliesTo { principal: [User], resource: [Repository] }; action delete_issue, edit_issue, assign_issue appliesTo { principal: [User], resource: [Issue] }; }

Grammar

Here is a grammar for the proposed schema syntax.

The grammar applies the following the conventions. Capitalized words stand for grammar productions, and lexical tokens are given in all-caps. When productions or tokens match those in the Cedar policy grammar, we use the same names (e.g., IDENT and Path).

For grammar productions it uses | for alternatives, [] for optional productions, () for grouping, and {} for repetition of a form zero or more times.

Tokens are defined using regular expressions, where [] stands for character ranges; | stands for alternation; * , + , and ? stand for zero or more, one or more, and zero or one occurrences, respectively; ~ stands for complement; and - stands for difference. The grammar ignores whitespace and comments.

Schema := {Namespace} Namespace := ('namespace' Path '{' {Decl} '}') | {Decl} Decl := Entity | Action | TypeDecl Entity := 'entity' Idents ['in' EntOrTyps] [['='] RecType] ';' Action := 'action' Names ['in' (QualName | '[' [QualNames] ']')] [AppliesTo] [ActAttrs]';' TypeDecl := 'type' IDENT '=' Type ';' Type := PRIMTYPE | Path | SetType | RecType EntType := Path SetType := 'Set' '<' Type '>' RecType := '{' [AttrDecls] '}' AttrDecls := Name ['?'] ':' Type [',' | ',' AttrDecls] AppliesTo := 'appliesTo' '{' AppDecls '}' ActAttrs := 'attributes' '{' AttrDecls '}' AppDecls := ('principal' | 'resource') ':' EntOrTyps [',' | ',' AppDecls] | 'context' ':' RecType [',' | ',' AppDecls] QualName := Name | Path '::' STR QualNames := QualName {',' QualName } Path := IDENT {'::' IDENT} EntTypes := Path {',' Path} EntOrTyps := EntType | '[' [EntTypes] ']' Name := IDENT | STR Names := Name {',' Name} Idents := IDENT {',' IDENT} IDENT := ['_''a'-'z''A'-'Z']['_''a'-'z''A'-'Z''0'-'9']* - PRIMTYPE STR := Fully-escaped Unicode surrounded by '"'s PRIMTYPE := 'Long' | 'String' | 'Bool' WHITESPC := Unicode whitespace COMMENT := '//' ~NEWLINE* NEWLINE

Drawbacks

There are several reasons not to develop a custom syntax:

Multiple formats can raise cognitive burden

Adding another schema format raises the bar for what customers need to know. They may ignore one format at first, but eventually they may need to learn both. For example, if one team uses the JSON format and another uses the custom format, but then the teams merge, they will each end up having to read/update the other's format. They may also fight about which format to move to.

Mitigating this problem is that it's easy and predictable to convert between the two formats, since the syntactic concepts line up very closely. The features you lose when converting from the new format to the JSON one would be 1/ any comments, and 2/ any use of intermingling of action, entity, and type declarations, since they must all be in their own sections in the JSON. Otherwise the features line up very closely and the only difference is syntax. We would expect to write conversion tools as part of implementing the new syntax (which are easily achieved by parsing in the new format and pretty-printing in JSON).

Another mitigating point is that it's early days for Cedar, and we can promote the custom syntax as the preferred choice. JSON should be used when writing tooling to auto-author or manipulate schemas, e.g., as part of a GUI. We have a JSON syntax for Cedar policies for similar reasons, but it's the custom syntax that is front and center.

Requirement of good tooling

As a practical matter, having a custom schema syntax will require that we develop high-quality tools to help authors write correct schemas.

Parse error messages need to be of good quality, and report common issues such as missing curly braces, missing semi-colons, incorrect keywords, etc. A formatting tool can help ensure standardized presentation. An IDE plugin can put these things together to provide interactive feedback. For the JSON syntax, IDE plugins already exist to flag parse errors and to format documents uniformly.

However, JSON plugins do not understand the semantics of Cedar schemas, so they cannot provide hints about semantic errors (e.g., that setting principalTypes to [] for an action effectively means the action cannot be used in a request). We could develop a linting tool that flags these issues, and it could be applied to both syntaxes.

Note that the problem of matching up curly braces in JSON-formatted schemas is more acute than in the custom syntax, since it's common to have deep nesting levels where matching is hard to eyeball. For example, in the TinyTodo JSON schema, we have up to seven levels of direct nesting of curly braces, whereas the custom syntax has only three, and these levels are more visually isolated because of the other syntax in between them.

Greater implementation cost

Supporting a new syntax is an extra implementation cost, including the new tools mentioned above, now and going forward. More code/tools means more potential for bugs.

Mitigating this problem: The Rust internals already parse the JSON schema to a data structure, so adding a new format only involves adding a parser to this data structure and not, for example, making changes to the validator. We can use property-based testing to ensure that both formats are interconvertible, and correctly round-trip, i.e., that parse(pretty-print(AST)) == AST.

Alternatives

One alternative would be to replace the current JSON-based sytnax with the one in this proposal. This proposal would avoid the "cognitive burden" drawback mentioned above, but would be a disruptive, backward-incompatible change, and would lose the JSON format's benefits of existing tooling and easier programmatic schema construction.

Another alternative would be to adopt a Yaml-based syntax. This approach would meet our goals of greater information density and support for comments, and it would come with some existing tooling (such as IDE extensions). A downside of Yaml is that it provides more than we need, with a lack of conciseness leading to confusing. We could make our own parser for a subset of Yaml we wish to support for schemas, but that may lead to a confusing user experience. Yaml's indentation-sensitive parsing also means that an indentation mistake will be silently accepted, leading to a confusing user experience. Our custom syntax is whitespace-insensitive, and having total control over the grammar means better context for error messages.

Appendix A - AppliesTo Conversion Table

The relationship of appliesTo clauses between JSON schemas and this new syntax are non-obvious when unspecified entities or actions with no targets are involved. This appendix details a conversion table for the purposes of specification.

Both Unspecified

The following JSON schemas all set principal and resource to be unspecified, and leave the context as the empty record. They all map to the same schema

"actions" : { "read" : { } }
"actions" : { "read" : { "appliesTo" : null } }
"actions" : { "read" : { "appliesTo" : { } } }

This means that both principal and resource are unspecified and that context is the empty record. This becomes:

action read { context : {} };

We must include the context : {} as action A {}; is not valid in the grammar.

Missing Both/Yes Context

"actions" : { "read" : { "appliesTo" : { "context" : A } } }

This means that both principal and resource are unspecified and that context has type A. This becomes:

action read appliesTo { context : A };

Missing principalTypes /No Context

"actions" : { "read" : { "appliesTo" : { "resourceTypes" : [A] } } }

This means that principal is unspecified, but resource is specified. context is the empty record. Becomes: (Including the empty context would also be valid)

action read { resource : [A] };

Missing principalTypes /Yes Context

"actions" : { "read" : { "appliesTo" : { "resourceTypes" : [A] "context" : B } } }

This means that principal is unspecified, but resource is specified. context has type B. Becomes:

action read { resource : [A], context : B };

Missing resourceTypes/No Context

"actions" : { "read" : { "appliesTo" : { "principalTypes" : [A] } } }

This means that resource is unspecified, but principal is specified. context is the empty record. Becomes: (Including the empty context would also be valid).

action read appliesTo { principal: [A] }

Missing resourceTypes/Yes Context

"actions" : { "read" : { "applieTo" : { "principalTypes" : [A], "context" : B } } }

This means that resource is unspecified, but principal is specified. context has type B. Becomes:

action read appliesTo { principal: [A], context: B };

Non-Applicable Actions

The following all collapse to the same schema object. They all involve action types which are not applicable. Specifying that you are applicable to some principal P and not applicable to any resources is equivalent to not being applicable to any principals or resources.

"actions" : { "read" : { "appliesTo" : { "principalTypes" : [], "resourceTypes" : [] } } }
"actions" : { "read" : { "appliesTo" : { "principalTypes" : [], "resourceTypes" : [A] } } }
"actions" : { "read" : { "appliesTo" : { "principalTypes" : [A], "resourceTypes" : [] } } }
"actions" : { "read" : { "appliesTo" : { "principalTypes" : [], "resourceTypes" : [], "context" : A } } }
"actions" : { "read" : { "appliesTo" : { "principalTypes" : [A], "resourceTypes" : [], "context" : B } } }
"actions" : { "read" : { "appliesTo" : { "principalTypes" : [], "resourceTypes" : [A], "context" : B } } }

All of the above map to the following:

action read;

Port Cedar formalization to Lean

Timeline

  • Started: 2023-10-12
  • Accepted: 2023-10-24
  • Landed: 2023-10-26 on main
  • Released: The Dafny formalization was deprecated in cedar-spec v3.1.0 (released 2024-03-08)

Note: These statuses are based on the first version of the RFC process.

Summary

This RFC proposes to port our current Cedar models and proofs (written in Dafny) to an alternative verification tool (Lean) that is better suited for meta-theory proofs, like Cedar’s validation soundness proof.

Note: The changes proposed in this RFC do not impact the cedar-policy/cedar repository or any of its associated Rust crates. The proposed changes are restricted to the formalization available in cedar-policy/cedar-spec.

Motivation

Following a verification-guided development process, we use proofs to gate the release of Cedar’s core components. To release a new version, the Rust implementation of each component is differentially tested against its Dafny model, which is formally verified against key security and correctness properties.

We chose Dafny for its balance of usability and ability to automatically prove basic security properties of Cedar's authorizer. But proving Cedar's meta-theoretic properties (such as validator soundness) is less suited for Dafny’s automation (e.g., see this issue and associated PRs). For these proofs to be robust in terms of performance and maintenance, they need to be highly detailed, favoring the use of an interactive theorem prover. We propose porting Cedar to Lean, an interactive theorem prover that is well suited for large-scale proofs about meta-theory.

Detailed design

Cedar consists of three core components: (1) the evaluator, (2) authorizer, and (3) validator. The evaluator interprets Cedar policies, and the authorizer combines the results of evaluating multiple policies into an authorization decision. Cedar’s security theorems prove basic properties of the authorizer (e.g., deny overrides allow). The validator is a tool for working with Cedar policies. It type checks a policy against a schema. To assure correctness, we prove a validator soundness theorem which guarantees that if the validator accepts a policy, evaluating that policy on a valid input can never cause a type error.

All core components of Cedar are modeled in Dafny and have proofs. There are ~6,000 lines of modeling code. The proofs for the authorizer and evaluator are small: the largest is ~200 LOC. The validation soundness proof is ~2,000 LOC.

For each component, we plan to port the model first and start differentially testing it against the Rust implementation, and then port the proofs. The Dafny and Lean code will co-exist while we work on the port. Once the Lean validator proofs are complete, we will archive the Cedar Dafny folder.

We believe it is best to do the port now while our development is relatively small, and before we begin working on proving other meta-theoretic properties, like soundness of the partial evaluator (which is not yet modeled in Dafny).

Drawbacks

We will need to support both Dafny and Lean proofs for a period of time, to ensure that Cedar remains sound and there are no gaps.

Precomputed Entity Attributes

  • Related RFC(s): This RFC is mostly directly taken from RFC 28; it splits out the changes from RFC 28 that relate to precomputed entity attributes, without the larger changes in RFC 28 regarding EntityDataSource. This RFC is also the successor to RFC 23.
  • Implementation PR(s): cedar#430

Timeline

  • Started: 2023-10-24 (predecessor #28 started on 2023-10-03, and predecessor #23 started on 2023-07-18)
  • Accepted: 2023-11-14
  • Landed: 2023-11-16 on main
  • Released: 2023-12-15 in cedar-policy v3.0.0

Note: These statuses are based on the first version of the RFC process.

Summary

Internally, Cedar entity objects will store their attribute values as precomputed Values, rather than in RestrictedExpression form. This entails some minor breaking changes to the Cedar public API, but provides important efficiency gains.

Motivation

  1. Today, every is_authorized() request re-evaluates all of the attributes in the Entities; they are stored as restricted expressions and evaluated to Value on every call to is_authorized(). (See cedar#227.) This is inefficient.

  2. Today, Entities::evaluate() is implemented on main and provides a way to mitigate this issue, but it is opt-in and we'd rather have it be the default behavior. Notably, as of this writing, Entities::evaluate() is not released yet, so we can make changes in this area without breaking our users.

Detailed design

This RFC has three two components: (a third component was severed and moved to Alternative A after discussion)

Component 1: Store attributes as precomputed Values

We redefine Entities and Entity to hold attributes as Values instead of as restricted expressions. This addresses the first point in Motivation.

This change is breaking for the public API in two (small) ways:

  • Entity::new(), Entities::from_json_*(), and friends can take the same inputs, but will need to sometimes return an evaluation error due to errors evaluating the entity attributes (particularly, errors from evaluating extension functions for attribute values of type ipaddr or decimal). Today, Entity::new() never returns an error, and the Entities functions can return EntitiesError but not an evaluation error.
  • Conversely, Entity::attr() need not return an error anymore, as all of the attributes will already be evaluated. (If we are very concerned about backward compatibility, we could keep the current signature and just never return Err. Currently, this RFC is proposing we change the signature to not contain Result, since we're already making the related breaking change to Entity::new().)

Accepting these breaking changes allows us to give users the best-performing behavior by default. For alternatives that are less breaking, see Alternatives.

Component 2: Remove no-longer-necessary interface to precompute Values

We remove Entities::evaluate(), as all Entities are now stored in their evaluated form by default. This addresses the second point in Motivation.

This is not a breaking change because Entities::evaluate() has not yet been released in any Cedar release.

Drawbacks

  • This RFC represents some breaking changes for users upgrading to a new major version of Cedar. All breaking changes come with some cost in terms of user experience for existing users. This RFC contends that the benefit outweighs the cost in this case.

Alternatives

Alternative A

In addition to what is currently proposed in this RFC, we could add the following change, allowing users to construct entities using precomputed Values.

Motivation

Today, Entity::new() requires callers to pass in attribute values as RestrictedExpression. For some callers, evaluating these RestrictedExpressions is an unnecessary source of runtime evaluation errors (and performance overhead).

Detailed design

We add new constructors for Entities and Entity which take Values instead of RestrictedExpressions. These constructors would work without throwing evaluation errors, in contrast to the existing constructors after the changes described in the main part of this RFC.

This change requires that we expose Value in our public API in some manner (probably via a wrapper around the Core implementation, as we do for many other Core types). Note that today, EvalResult plays the role of Value in our public API, but conversion between EvalResult and Value is relatively expensive (O(n)). We could either:

  • expose Value but leave EvalResult alone. This provides maximum backwards compatibility, but might cause confusion, as Value and EvalResult would be conceptually similar, but slightly different and not interchangeable.
  • consolidate EvalResult and Value into a single representation, called Value and based on the Core Value. This provides maximum API cleanliness but requires the most breaking changes.
  • expose Value, but keep EvalResult as a type synonym for Value. This is almost the same as the above option, but keeps the EvalResult type name around, even though it makes breaking changes to the structure of EvalResult. This might reduce confusion for users who are migrating, as they only have to adapt to the changes to EvalResult and not rename EvalResult to Value in their own code; but, it might increase confusion for future users, who will see two names for the same thing.
  • not expose Value and just use EvalResult, even in the new constructors. This provides maximum backwards compatibility, like the first option, and also avoids the possible confusion generated by the first option, but comes at a notable performance cost (due to the inefficient EvalResult-to-Value conversion required under the hood) and might generate a different kind of confusion as EvalResult is an awkward name in the context of these constructors.

Commentary

The sub-proposal in this alternative is severable and could be its own RFC, or perhaps a change that doesn't rise to the level of requiring an RFC. We have the freedom to do it any time in the future (after the rest of the RFC is accepted and implemented) if we don't want to do it now.

@mwhicks1 opinion, which I tend to agree with, is that this alternative is probably more trouble than it's worth -- that it provides only marginal benefit to users at the cost of cluttering our API, both with additional constructors, and with the public Value type.

If we don't take this alternative, users will have to still construct Entity and Entities via restricted expressions, using constructors which can throw errors. But, we would be able to avoid having to expose Value or make changes to EvalResult, at least for now.

Alternative B

In addition to what is currently proposed in the current RFC, we could provide the status-quo definitions of Entities and Entity under new names: e.g., UnevaluatedEntities and UnevaluatedEntity (naming of course TBD). This is very unlikely to be of any use to new users, but existing users really wanting the old error behavior could use these types and functions instead of Entities and Entity.

To me, this provides little benefit, at the expense of more confusion and more APIs to maintain. It also doesn't make any of the changes more back-compatible; for users, migrating their code to use these renamed types and functions may be just as much work as adopting the actual breaking changes proposed in this RFC.

Alternative C

Instead of changing Entities and Entity as suggested in the current RFC, we could provide new versions of these types -- say, EvaluatedEntities and EvaluatedEntity -- which have the behaviors proposed in the current RFC. This avoids making any breaking changes to existing types and functions.

The downsides of this alternative include:

  • Existing users don't get the improved performance by default (see Motivation). They have to opt-in by migrating their code to these new types and functions.
  • Whether or not we officially deprecate the existing Entities and Entity, new users may incorrectly assume that they should default to using those types, as they have the most-intuitive / least-qualified names, and types like EvaluatedEntities might appear to be for special cases.
  • If we do not officially deprecate the existing Entities and Entity, we are stuck maintaining more APIs indefinitely. These APIs are also somewhat redundant and provide multiple ways to do approximately the same thing, some more optimally than others.
  • If we do officially deprecate the existing Entities and Entity we end up in a final state where we only have EvaluatedEntities and no other kind of Entities, which is a kinda ugly and nonsensical API from a naming standpoint -- not aesthetically pleasing.

Annotations for Cedar Schemas

Timeline

  • Started: 2024-02-05
  • Accepted: 2024-11-07
  • Landed: 2024-12-16 on main
  • Released: 2025-01-21 in cedar-policy v4.3.0

Note: These statuses are based on the first version of the RFC process.

Summary

Like Cedar policies, users may want to associate arbitrary, machine readable metadata with Schema objects. We solved this problem in Cedar policies by allowing for annotations: arbitrary key/value pairs that are attachable to policies. This could be extended to Cedar schemas, allowing users to attach attributes an entity type/common type/action declaration and attribute declaration.

Basic example

Here is a basic example for doc comments.

@doc("this is the namespace") namespace TinyTodo { @doc("a common type representing a task") type Task = { @doc("task id") "id": Long, "name": String, "state": String, }; @doc("a common type representing a set of tasks") type Tasks = Set<Task>; @doc("an entity type representing a list") @docComment("any entity type is a child of type `Application`") entity List in [Application] = { @doc("editors of a list") "editors": Team, "name": String, "owner": User, @doc("readers of a list") "readers": Team, "tasks": Tasks, }; @doc("actions that a user can operate on a list") action DeleteList, GetList, UpdateList appliesTo { principal: [User], resource: [List] }; }

The @id("...") notation is similar to the notation used for policy annotations.

Motivation

Users should be allowed to associate machine readable metadata with objects in a Schema. While we could create special syntax for associating particular kinds of metadata, we cannot predict all of the metadata uses that users will have. Thus providing a flexible system that users can adapt to their needs is preferable. This proposal re-uses the same syntax from Cedar Policies, creating a unified syntax.

Detailed design

Semantics

Attributes have no impact on validation decisions. Attributes are arbitrary key/value pairs where:

  • 'key' is a valid Cedar identifier
  • 'value' is a Cedar string

The Cedar spec takes no opinion or stance on the interpretation of annotations. The interpretation is entirely up to users of Cedar.

Cedar Schema Syntax

Attributes in Cedar Schemas will mirror the syntax used for attributes in a policy: informally that's @<key>("value"). Formally the following rule is added to the Cedar grammar:

Annotation := '@' IDENT '(' STR ')' Annotations := {Annotations}

With an arbitrary number of them being able to prepend to a namespace declaration, entity type declaration, common type declaration, action declaration, and an attribute declaration.

Thus the full schema syntax becomes:

Schema := {Namespace} Namespace := (Annotations 'namespace' Path '{' {Decl} '}') | {Decl} Decl := Entity | Action | TypeDecl Entity := Annotations 'entity' Idents ['in' EntOrTyps] [['='] RecType] ';' Action := Annotations 'action' Names ['in' (Name | '[' [Names] ']')] [AppliesTo] [ActAttrs]';' TypeDecl := Annotations 'type' IDENT '=' Type ';' Type := PRIMTYPE | IDENT | SetType | RecType EntType := Path SetType := 'Set' '<' Type '>' RecType := '{' [AttrDecls] '}' AttrDecls := Annotations Name ['?'] ':' Type [',' | ',' AttrDecls] AppliesTo := 'appliesTo' '{' AppDecls '}' ActAttrs := 'attributes' '{' AttrDecls '}' AppDecls := ('principal' | 'resource') ':' EntOrTyps [',' | ',' AppDecls] | 'context' ':' RecType [',' | ',' AppDecls] Path := IDENT {'::' IDENT} EntTypes := Path {',' Path} EntOrTyps := EntType | '[' [EntTypes] ']' Name := IDENT | STR Names := Name {',' Name} Idents := IDENT {',' IDENT} Annotation := '@' IDENT '(' STR ') Annotations := Annotation {Annotations} IDENT := ['_''a'-'z''A'-'Z']['_''a'-'z''A'-'Z''0'-'9']* - PRIMTYPE STR := Fully-escaped Unicode surrounded by '"'s PRIMTYPE := 'Long' | 'String' | 'Bool' WHITESPC := Unicode whitespace COMMENT := '//' ~NEWLINE* NEWLINE

JSON Syntax

None of the three top-level constructs (EntityTypes, Actions, CommonTypes) in JSON schemas allow for arbitrary key/value pairs. This means a new key can be safely added while preserving backwards compatibility. The same fact also applies to entity attribute declarations. This proposal reserves the annotations key at the top level of each of those constructs, which contains an Object, containing each annotation key as an Object key, associated with the annotation string. The only oddness here is Common Types, whose toplevel is a regular type. While this should still be backwards compatible, it will look a little odd to have annotations in some types and not in others.

A corresponding JSON schema for the above example is as follows.

{ "": { "annotations": { "doc": "this is the namespace" }, "commonTypes": { "Task": { "annotations": { "doc": "a common type representing a task" }, "type": "Record", "attributes": { "id": { "type": "Long", "annotations": { "doc": "task id" }, }, "name": { "type": "String" }, "state": { "type": "String" } } }, "Tasks": { "type": "Set", "element": { "type": "Task" } } }, "entityTypes": { "Application": {}, "List": { "annotations": { "doc": "an entity type representing a list", "docComment": "any entity type is a child of type `Application`" }, "memberOfTypes": [ "Application" ], "shape": { "type": "Record", "attributes": { "editors": { "type": "Team" }, "name": { "type": "String" }, "owner": { "type": "User" }, "readers": { "type": "Team" }, "tasks": { "type": "Tasks" } } } } }, "actions": { "CreateList": { "annotations": { "doc": "actions that a user can operate on a list" }, "appliesTo": { "resourceTypes": [ "Application" ], "principalTypes": [ "User" ] } } } } }

Drawbacks

  1. Complexity: adds more complexity to schema
  2. Oddness around syntax for Common Types in JSON form
  3. By not taking a stance on annotation meanings, it makes it harder for a standard to form around them (ex: for doc strings)
  4. Multi-line docstrings are technically valid but awkward.

Alternatives

Take a stance

Reverse our decision around annotations and start taking stances on what annotations mean. This lets us standardize certain annotations, like doc. This probably can't happen unless we also do this for policies, which we've said we don't want to do.

Doc Strings as comments

Instead of annotations, we could add "doc-strings" as a first class feature. Could look like this:

/# Stop users from accessing a high security document unless: /# A) The principal and user are at the same location /# B) The principal has a job level greater than 4 forbid(principal, action, resource) when { resource.security_level == "HIGH" unless { (resource.location == principal.location) || (principal.job_level > 4 ) };

This has nice and easy multi-line syntax, but is special cased and not as general.

Reserve Identifiers for Internal Use

Timeline

  • Started: 2024-02-14
  • Accepted: 2024-06-10
  • Landed: 2024-07-15 on main
  • Released: 2024-09-16 in cedar-policy v4.0.0

Note: These statuses are based on the first version of the RFC process.

Summary

This RFC extends the reservation of the __cedar namespace in schema in RFC24 and proposes reserving all namespaces and identifiers where the first element is __cedar, forbidding their use in Cedar policies, schema, and entity data so that they can be used by future Cedar language features.

Basic example

Under this RFC the names __cedar, __cedar::Long, and __cedar::namespace::User will all become invalid in all contexts.

These would be rejected in a policy

permit(principal == __cedar::User::"alice", actions, resource);

in a schema

namespace __cedar { entity User; }

and in entity data

[ { "uid": { "__entity": { "type": "__cedar::User", "id": "alice" }}, "attrs": {}, "parents": [] } ]

In this entity data example, the __entity JSON key does not demonstrate the new reserved namespace mechanism. It is a place were we already assign special significance to some identifiers starting with __ in entity data.

Motivation

In designing RFC24 we found that we needed to disambiguate user defined identifiers from built-in primitive and extension types. We accomplished this by reserving only the __cedar namespace inside of schema, but we anticipate that we may need more reserved namespaces in more contexts in the future.

Detailed design

We will update the namespace grammar to the following to exclude namespaces and identifiers starting with __cedar.

Path ::= IDENT {'::' TRAILING_NAMESPACE_IDENT} IDENT ::= INNER_IDENT - '__cedar' TRAILING_NAMESPACE_IDENT ::= ['_''a'-'z''A'-'Z']['_''a'-'z''A'-'Z''0'-'9']* - RESERVED RESERVED ::= 'true' | 'false' | 'if' | 'then' | 'else' | 'in' | 'like' | 'has'

This primarily applies to entity types names. In the case where an entity type isn't qualified with a namespace, the leading identifier is just the entity type, so we would forbid an unqualified type __cedar. It also applies anywhere else namespaces may appear, including the top-level namespace element in a schema file, qualified references to common types in a schema file, and extension function names in policies.

It also applies where a standalone identifier is expected. For example, an entity attribute could not be __cedar. Reserving this single attribute is not own it's own particularly useful, but we may later extend the attribute grammar to accept namespace paths.

Drawbacks

This is a significant breaking change. Anyone using the __cedar namespace will need to update their policies, schema, and entity data to use an unreserved namespace. The nature of this breaking change fortunately allows for migration away from reserved namespaces before the change is released, and the break will manifest immediately on attempting to parse an affected policy, schema, or entity.

Alternatives

A larger breaking change

We could make a larger breaking change by reserving identifiers starting with __ instead of just __cedar. For example, we would additional forbid referencing an entity __tiny_todo::User.

We don't currently see an immediate need for this more aggressive change, and it increases the likelihood a current user will be affected. We may also find that consumers of Cedar who expose the policy language to their users want to reserve namespaces for their own use. They could of course choose a namespace with any leading characters, but they may prefer the uniformity between __cedar::Long and, for example, __tiny_todo.

No breaking change

We can avoid a breaking change to Cedar by updating the Grammar to allow for a new identifier that is currently invalid, and then reserve that identifier. For example, we could reserve $cedar or cedar$ (other options for characters include % or !). It would would be an error to use this identifier in any way that isn't explicitly permitted by the Cedar spec.

Taking this option would require an update to the schema grammar proposed in RFC24.

Enumerated Entity Types

Timeline

  • Started: 2024-02-20
  • Accepted: 2024-03-20
  • Landed: 2025-02-05 on main
  • Released: TBD

Note: These statuses are based on the first version of the RFC process.

Summary

Extend schemas to support declared enumerations of entity-typed values, analogous to how schemas can currently be used to enumerate a finite list of Action-typed values.

Basic example

An enumerated entity type is declared as a normal entity type, but includes the keyword enum followed by the list of legal entity UIDs. Here is a simple example:

entity User; entity Color enum ["Red", "Blue", "Green"]; entity Task { owner: User, name: String, status: Color }; action UpdateTask appliesTo { principal: [User], resource: [Task] };

These data definitions could be used in policies such as

permit( principal, action == Action::"UpdateTask", resource) when { principal == resource.owner && resource.status != Color::"Red" };

Motivation

Enumerated types are useful when you have a fixed set of possible values, and the only thing you want to do with the values is compare them for equality. While you could effectively treat an entity type as an enumeration now, without declaring it in a schema, you gain some benefits by declaring it:

  • The validator can error on uses of illegal enumerated values, e.g., flagging the typo resource.status != Color::"red" in the when clause in the basic example.
  • When using a policy analyzer, it can always generate request and entity store instances where the enumerated entity type has declared-valid values, rather than random UIDs.
  • When using an IDE or web-based policy builder, the enumeration can inform auto-completion suggestions. For the basic example above, in an auto-completing IDE writing resource.status != ... would pop up the three options.

Detailed design

An enumerated entity Foo is declared by writing

entity Foo enum [ … ];

where [] is a non-empty list of allowed values, expressed as strings.

In the JSON format for schemas, you would write

"entityTypes": { ... "Foo": { "enum": [ … ] }, ... }

You can use an enumerated entity type anywhere you can use a normal entity type. Since an enumerated entity cannot have attributes, nor can it have ancestors in the entity hierarchy, all you can do with it is test it for equality (e.g., with == or contains).

The policy validator confirms that any enumerated entity literal in a policy is valid. The request validator does likewise. The entity store validator confirms that enumerated entities do not appear in the store, or if they do, they have no attributes or ancestors. It also confirms that the declared enumerated entity type has no invalid values, and references to entities of the enumerated type are valid. The schema-based entity parser likewise confirms that parsed-in enumerated entity values are valid.

As another example, consider the following.

entity Application enum [ "TinyTodo" ]; entity User in [ Application ]; action CreateList appliesTo { principal: [User], resource: [Application] };

This is a generalization of our TinyTodo example from RFC 24, where we can refine the definition of Application to indicate that it has a single value, Application::"TinyTodo". This allows the validator to catch typos in policies, such as the following.

permit( principal, action in [Action::"CreateList"], resource == Application::"TinyTODO" );

Likewise the request validator would flag a request with Application::"TinyTODO" as its resource, and it would flag the passed-in entity store if it contained such an illegal value.

Notes

As a convention, our example enumerated entity names, like Color::"Red" or Application::"TinyTodo", all begin with an uppercase letter. We choose to consider this approach good style, but not to mandate it.

We require entity enumerations to be given as a non-empty list of strings, like ["Red", "Blue"], but we could also allow them to be specified as identifiers, like [ Red, Blue ]. Doing so would be similar to the handling of attributes, which can be specified as identifiers, principal.owner, or as strings, principal["owner"]. However, entity enumerations can only be referenced as strings, e.g., as Color::"Red" not Color::Red. Specifying them as strings, only, makes this connection a little stronger.

We do not permit declaring empty enumerations. Allowing them would add complication to policy analysis (to consider the exceptional case), but would be essentially useless: You could never create an entity of type Foo, where Foo is uninhabited, and while you could write expr is Foo, this expression is always false.

That an entity is an enumeration is specified as a refinement when declaring the entity type, e.g., writing entity Application; declares the Application entity type, while writing entity Application enum ["TinyTodo"]; declares the Application entity type and then refines it to say that only the "TinyTodo" entity ID is well defined. An alternative syntax that is more intuitive to some readers is entity enum Application ["TinyTodo"]. This syntax is similar to Java-style syntax, enum Application { TinyTodo }. However, this approach could create some confusion: enum is currently a valid entity type, so it's legal to write entity enum; in schemas today. Moreover, if we eventually take Alternative C, below, we may allow enum to be accompanied by other type refinements, such as in and attribute declarations. For example, we could one day be able to write entity Application in [Application] enum ["TinyTodo", "Office"] (or swapping their order, entity Application enum ["TinyTodo", "Office"] in [Application]), and might prefer the uniformity of that to entity enum Application ["TinyTodo", "Office"] in [Application].

Drawbacks

One reason not to do this is that it's not particularly full featured---you cannot do anything useful with an enumerated entity value in a policy other than compare it for equality. We consider more full-featured extensions in the alternatives below, but these have drawbacks of their own. The functionality could be easily extended later, depending on how things play out.

Alternatives

Alternative A: Enumerated primitive values

We previously proposed, in RFC 13, declaring a finite set of primitive values (strings, numbers, etc.) as an enumerated type. The killer objection to the RFC is that it introduces subtyping (you want to use the enumerated type of string as both the enumeration and the string, e.g., with like), which is a significant complication for policy analysis. The present proposal is not problematic for analysis as enumerated entities can be encoded like any entity, but with added ground constraints limiting what form their UIDs can take on.

Alternative B: Enumeration as a distinct concept

Rather than specify a particular entity type as an enumeration, we could define a new concept of enumeration as another kind of primitive type. Here is a notional schema:

enum type Color = Red | Blue | Green; entity Task { name: String, status: Color };

Here is a notional policy:

permit( principal, action == Action::"UpdateTask", resource) when { resource.status != Color.Red };

This syntax is similar to what's provided for Java enums.

The benefit of this approach is that it may feel a little more natural than representing a concept, like a color, as a set of legal entity values. It would also be easy to encode this approach in a policy analysis.

The main drawback of this approach is that it introduces a new primitive type to the language that does not make much sense without schemas. Recall that for Cedar, validation with a schema is optional. We'd allow users to include any random enumerated identifiers (like Scheme-style symbols) in policies which, without validation, would fail equality checks at run-time.

The proposed approach that lists particular entity values as an enumeration makes no change to the language, leveraging the existing entity type concept. So policies without schemas still make sense. With the added schema, there is additional benefit when validating and constructing reasonable policies.

Alternative C: Enumerated entities with hierarchy

Enumerated entities as proposed are limited in their functionality and specificity. We could extend them. For example:

entity Application enum [ "TinyTodo" ]; entity RequestEntity enum [ "Principal", "Resource" ] in [ Application ]; entity User in [ Application ]; action CreateList appliesTo { principal: [User], resource: [Application] }; action GetLists appliesTo { principal: [User, RequestEntity::"Principal"], resource: [Application]};

This differs from some earlier examples in the following ways:

  1. Enumerated entities can have parents that are enumerated entity values, and can be parents of other enumerated entity values; both cases are shown in the definition of RequestEntity
  2. Enumerated entities can appear as singleton types, e.g., as RequestEntity::"Principal" in the definition of action GetLists.

Both of these extensions are similar to what's available for Actions right now, but generalized to arbitrary entity types. You could also imagine enumerated entity types having attributes, as has been anticipated for Actions.

One drawback of this Alternative is that it creates added complication for both the validator (e.g., to handle singleton types) and the analyzer (e.g., to deal with the more general hierarchy constraints).

Another drawback is that it adds complication to the entity store: Once you add hierarchy constraints and attributes, you need to create actual entities to include with policy requests, by extracting them from the schema at request-time. The RFC as proposed does not require that.

The good news is that this Alternative is a strict generalization of the RFC as proposed, which means if we agree to the current proposal we can later upgrade to some or all of this Alternative without incurring a breaking change.

Explicit Unspecified Entities

Timeline

  • Started: 2024-02-27
  • Accepted: 2024-05-28
  • Landed: 2024-06-19 on main
  • Released: 2024-09-16 in cedar-policy v4.0.0

Note: These statuses are based on the first version of the RFC process.

Summary

Cedar currently supports unspecified entities, which are entities of a special, unique type that have no attributes, and are not ancestors or descendants of any other entity in the store. Unspecified entities are intended to act as placeholders for entities that don’t influence authorization (see examples below).

The current implementation of unspecified entities results in several corner cases in our code (especially the validator), which have been difficult to maintain (see discussion on this recent PR). The concept of unspecified entities is (in our opinion) confusing for users, and will only become more confusing once we stabilize the partial evaluation experimental feature, which allows “unknown” entities that are subtly different from “unspecified” entities.

This RFC proposes to drop Cedar support for unspecified entities, instead recommending that users create their own application-specific entity types that act like unspecified entities.

Basic example

Consider a "createFile" action that can be used by any User, as encoded in the following policy:

permit(principal is User, action == Action::"createFile", resource);

When making an authorization request for this action, the principal should be a User, and the context can store information about the file to create, but it is unclear what the resource should be. In fact, given the policy above, the resource doesn't matter. In this case, a Cedar user may want to make the authorization request with a “dummy” resource; as of Cedar version 3.x, this is what unspecified entities are for.

Now consider a "createAccount" action. It's unclear what principal or resource should be used with this action. The approach above can be extended to this case too: make the action apply to an unspecified principal and resource. But now "unspecified" entities are being used to represent multiple distinct concepts: a file store, (unidentified) users creating an account, and a system that manages accounts.

This RFC proposes that users should instead explicitly create these "dummy" entities in an application-specific manner. For this example, the user should create entities of three types: FileSystem, UnauthenticatedUser, and AccountManager. The first will act as the principal for a "createFile" request, and the second and third will act as the principal and resource for a "createAccount" request.

This is (arguably) the approach we used in the TinyTodo application: we made a special Application::“TinyTodo” entity to act as the resource for the “CreateList” and “GetLists” actions. This alternative is also in line with our previous suggestion to use an application-specific Unauthenticated type to represent "unauthenticated" users, rather than using unspecified entities.

Motivation

Unspecified entities are a confusing concept, as evidenced by messy special casing in our own code and multiple RFCs to refine their behavior (RFC35, RFC47). We expect that this concept will only become more confusing once we stabilize the partial evaluation experimental feature, which allows “unknown” entities that are subtly different from "unspecified" entities, despite sharing similar names.

Unspecified entities also lead to two sharp corners in our public API:

  1. Omitting a appliesTo field in the schema (or setting it to null) means that an action applies only to unspecified entities, while using an empty list means that it applies to no entities. There is no way to say that an action applies to both unspecified entities and other types of entities. For more discussion, see RFC35. (Note: RFC35 has been closed in favor of the current RFC.)
  2. It is unclear how we should create Requests that use both unspecified and unknown entities. This has not been an issue so far since partial evaluation and its APIs are experimental, but we are looking to stabilize them soon. Currently the best (although not ideal) proposal is to use a builder pattern for Requests (e.g., request.principal(Some(User::"alice")) sets the principal field of a request). To set a field to be “unspecified” users could pass in None instead of Some(_); to set a field to be unknown users would not call the field constructor at all.

Aside: Unspecified vs. unknown

Unspecified and unknown entities are both pre-defined "dummy" entities that act like any other entity, as long as they are unused or only used in a trivial way (e.g., principal == principal). If a policy uses an unknown entity in a non-trivial way, then evaluation will produce a residual. If a policy uses an unspecified entity in a non-trivial way, then evaluation will produce an error.

Detailed design

Status quo

As of Cedar v3.x, the only way to construct an unspecified entity is by passing None as a component when building a Request. So a request for the "createFile" action above might look like:

#![allow(unused)] fn main() { { principal: Some(User::"alice"), action: Some(Action::"createFile"), resource: None, ... } }

In the schema, the only way to say that an action applies to an unspecified entity is by omitting the relevant appliesTo field. For example, the following schemas specify that the "createFile" action applies (only) to an unspecified resource.

Cedar schema syntax (RFC 24):

action createFile appliesTo { principal: [User] };

Cedar schema JSON syntax:

"createFile": { "appliesTo": { "principal": ["User"] } }

Proposal

The proposal is simple: stop allowing the syntax above. The Request constructor will no longer accept optional arguments and schemas will no longer support missing appliesTo fields. The Request constructor API will be changed to require an EntityUid (instead of Option<EntityUid>) for the principal/action/resource, requiring code changes in programs using Cedar's Rust APIs. The schema APIs will not change, but omitting an appliesTo field will result in a parse error.

Relation to RFC35

This RFC combines nicely with the (accepted) RFC53 (Enumerated Entity Types), which allows users to fix the instances for a particular entity type. This means that custom "unspecified" entities can be easily restricted to a small set of possible entities. For example, RFC53 allows specifying that only Application::"TinyTodo" is valid, and no other Application::"eid".

Upgrade strategy

We appreciate that this RFC proposes a breaking change for Cedar's users, and the upgrade path is non-trivial because it requires users to look carefully at their applications, and decide what they should use instead of unspecified entities. To help, while implementing this RFC, we plan to develop a blog post with examples of applications where you may have used an "unspecified" entity, and what we think you should use instead (suggestions for examples are welcome!).

We also plan to provide APIs (hidden behind a flag, like our experimental features) for the following operations, with the same API and behavior that was available in 3.x.

  • Parsing a schema in the Cedar schema format
  • Parsing a schema in the Cedar JSON schema format
  • Constructing a request

This will allow services that build on top of Cedar (e.g., Amazon Verified Permissions) to handle this upgrade for their customers invisibly. The implementation will follow the approach outlined in Alternative B. However, we will not maintain these APIs indefinitely so you will need to support the new changes eventually. Please reach out to us if there's a way that we can help.

Alternatives

Alternative A - Predefine a special "unspecified" entity type

Continue to support unspecified entities, but re-brand them as "default" entities, with the intuition that you should use them when you want some default value to pass into an authorization request. This alternative proposes to give an explicit name to the unspecified entity type (__cedar::Default) and to require this name to be used in requests and schemas that use this type.

Under this proposal, the schema definition for the "createFile" action from above would look like:

action createFile appliesTo { principal: [User], resource: [__cedar::Default] };

and a (pseudo-syntax) request for this action might be:

#![allow(unused)] fn main() { { principal: EntityUid(User::"alice"), action: EntityUid(Action::"createFile"), resource: Default, ... } }

where the type of the Request constructor is:

#![allow(unused)] fn main() { pub enum RequestEntity { Default, Unknown, EntityUid(EntityUid) } pub fn new( principal: RequestEntity, action: RequestEntity, resource: RequestEntity, context: Context, schema: Option<&Schema>, ) -> Result<Self, RequestValidationError> ... }

(Note: the Unknown variant may be hidden behind the "partial-eval" experimental flag, depending on when this RFC is implemented.)

Users will not be allowed to include entities of type __cedar::Default in the store, which ensures that these entities will never have attributes or ancestors/descendants, which are properties we will rely on during validation. Users will not be allowed to reference the type __cedar::Default in their policies (so neither principal == __cedar::Default::"principal" nor principal is __cedar::Default are allowed). Users do not need to add __cedar::Default to their schemas (and it will be an error if they do) because __cedar::Default is defined automatically. Finally, users cannot create entity uids with type __cedar::Default, which prevents using default entities with the RequestEntity::EntityUid constructor.

Naming specifics

The names RequestEntity and __cedar::Default are both subject to change.

For the latter, we have agreed that we want to use the __cedar prefix (reserved as part of cedar#557 in v3.1), but the exact type name is up for debate. Options we've considered include:

  • None
  • Null
  • Ghost
  • Dummy
  • Empty
  • Irrelevant
  • Generic

Here are some options we’ve ruled out:

  • Any: possibility to confuse the scope condition principal with principal is __cedar::Any. The first applies to any principal, while the latter applies to only the default entity.
  • Arbitrary: same issue as "any"
  • Unspecified: potential for confusion with partial evaluation "unknown"
  • Mock: implies that this entity type should be used for debugging/testing

Discussion notes

We considered this alternative seriously, and it even became the main proposal for a period of time. But ultimately we decided that this alternative feels like a temporary solution, and we would eventually want to get rid of unspecified/default entities anyways.

Here's a nice (lightly edited) summary from @max2me:

Both the main proposal and Alternative A are breaking changes, so I'd rather go a step forward and require customers to update schema, code, and policies once for the benefit of a brighter future (Alternative A would require customers to update just schema and code, yet all the same testing will have to occur).

My concerns with Alternative A:

  • Lack of symmetry -- you omit something in the policy and yet explicitly mention it in the authorization code and schema. This leads to a fragmented experience of working with Cedar as a technology.
  • Surprise factor -- I suspect that most customers who leave principal/resource unscoped in policies will not realize that in addition to all the entities they reference in their code, there is a hidden default type.

In my opinion, occasional duplication of policies and the need to define your own default type is a small cost to pay for the overall simplicity and cohesiveness of the system.

Alternative B - Maintain the status quo, but clean up the implementation

Pre-define the __cedar::Default entity type as in Alternative A, but do not allow users to reference it directly. Instead, update the underlying implementation to use the new type while leaving the current API as-is. We took this approach in the Lean model to avoid special handling for unspecified entities, and it seems to have worked out well. Differential testing has given us confidence that it leads to the same behavior as our current implementation.

Note: This alternative should result in no changes from the user’s perspective. If it does result in changes (e.g., to validation results) then this was unintentional behavior of our current implementation, and should be treated as a bug fix.

General multiplication in Cedar

Timeline

  • Started: 2024-03-07
  • Accepted: 2024-03-15
  • Landed: 2024-03-20 on main
  • Released: 2024-03-29 in cedar-policy v3.1.2

Note: These statuses are based on the first version of the RFC process.

Summary

Allow multiplication of arbitrary expressions (that evaluate to Long), not just multiplication of an expression by a constant.

Basic example

permit(principal, action, resource) when { context.foo * principal.bar >= 100 };

Motivation

Today, Cedar only supports multiplication of two operands if (at least) one operand is a constant. In the more general n-ary (chained) multiplication case, Cedar requires that at most one operand is a non-constant, but allows the constants and non-constant to appear in any order.

  • This creates some confusion for Cedar users. No other popular programming language has this restriction on multiplication.
  • This leads to bugs and confusing behavior. To enforce this restriction, Cedar's internal AST node for multiplication only supports multiplying an expression by a constant. But this requires the Cedar parser to re-associates (and re-order) operands in a chained multiplication, which leads to confusion and problems as described in RFC 50.
  • This limits the power of Cedar by preventing users from writing policies involving multiplication where neither operand is known at policy-authoring time.

This RFC proposes to relax all of these restrictions and allow multiplication of arbitrary expressions (that evaluate to Long). In other words, multiplication will be treated just like addition: * will be valid in all positions and situations where + is valid today.

Detailed design

This will actually simplify the evaluator code, validator code, and internal AST. Multiplication will become an ordinary BinaryOp, and no longer need a special AST node. The evaluator and validator can handle multiplication in the same way as addition and in the same code paths.

With this RFC, the policy parser becomes strictly more permissive: All policies that currently parse will still parse, plus additional policies that used to produce parse errors will now parse as well. There are edge cases where evaluations that do not overflow today could result in overflows after this RFC -- see Drawbacks below.

Drawbacks

One motivation for the original restriction on multiplication was to make SMT analysis of Cedar policies easier. However, presuming that the SMT analysis uses bitvector theory to analyze operations on Cedar Longs, multiplication by a general expression is not much more expensive than multiplication by an arbitrary constant (ignoring special cases such as 0 or constant powers of 2). Some multiplication expressions will be expensive to analyze (computationally), but that is already true for some other kinds of Cedar expressions that don't involve multiplication. In general, we'd prefer to use a linter to warn users about (these and other) expensive expressions, rather than use restrictions on the Cedar language.

This RFC firmly closes the door on changing Cedar's default numeric type to bignum. However, that door is already basically closed. Also, we could still introduce bignums as an extension type (with their own separate multiplication operation) in the future.

This RFC has essentially no interaction with any other currently pending or active RFC, except for RFC 50 which it replaces and makes obsolete.

There are edges cases where evaluations that do not overflow today could result in overflows after this RFC. One example is an expression CONST * CONST * context.value for large CONST when context.value == 0. Today's parser will produce the internal AST (mul (mul context.value CONST) CONST) which will not overflow when context.value == 0, while this RFC would produce the internal AST (mul (mul CONST CONST) context.value) which would overflow even when context.value == 0. This is because, like RFC 50, this RFC proposes to left-associate multiplication and not to reorder operands, in contrast to the current Cedar parser.

This edge case seems very unlikely for users to hit (as it requires a multiplication by two large constants), and this RFC's proposed behavior is undoubtedly more intuitive, as multiplication is left-associative in most programming languages.

Alternatives

RFC 50 is one alternative which fixes some issues with chained multiplication without (significantly) relaxing Cedar's restrictions on multiplication operations. This RFC's proposal leads to simpler evaluation and validation code than RFC 50, and also makes the Cedar language simpler and easier to understand for users.

Extended has Operator

Timeline

  • Started: 2024-04-05
  • Accepted: 2024-05-21
  • Landed: 2024-11-27 on main
  • Released: 2025-01-21 in cedar-policy v4.3.0

Note: These statuses are based on the first version of the RFC process.

Summary

This RFC proposes to extend the syntax of the has operator to check for the presence of all attributes in an access path.

Basic example

Suppose that entites of type User have an optional contactInfo attribute of type {email: String, address?: {street?: String, zip?: String, country: String}}. To safely access the zip field of a user in a policy, we write a chain of has checks as follows:

permit( principal is User, action == Action::"preview", resource == Movie::"Blockbuster" ) when { principal has contactInfo && principal.contactInfo has address && principal.contactInfo.address has zip && principal.contactInfo.address.zip == "90210" };

This RFC proposes to extend the syntax of the has operator so that this chain of checks can be written more succinctly as follows:

permit( principal is User, action == Action::"preview", resource == Movie::"Blockbuster" ) when { principal has contactInfo.address.zip && principal.contactInfo.address.zip == "90210" };

The expression principal has contactInfo.address.zip evaluates to true when all the attributes in the given access path are present.

Motivation

Long chains of has checks are tedious to write and hard to read. The extended has operator replaces these chains with a single constraint that is easier to read and write.

Detailed design

This RFC proposes to desugar the new has syntax into the corresponding chain of basic has checks in the CST -> AST and EST -> AST converter. This requires extending the parser, CST, and EST to support the new syntax; and changing the CST -> AST and EST -> AST converters to implement the desugaring. It should be possible to share the core desugaring code between the two converters.

No other components need to change. In particular, validator, evaluator, and Lean models remain the same.

Extending the syntax

We extend the grammar as follows:

Relation ::= ... | Add 'has' (IDENT['.' IDENT]* | STR) | ...

Note that this extension works only for attribute names that are valid Cedar identifiers. It cannot be used with attribute names that are not valid Cedar identifiers. For example, we cannot write principal has "contact info".address.zip; this check has to be expressed as principal has "contact info" && principal["contact info"] has address.zip. Arbitrary attribute names can be supported in the future if there is demand for it.

Desugaring the extended syntax

The following Lean code shows how to perform the desugaring given an expression, an attribute, and a list of attribute:

def desugarHasChainStep (acc : Expr × Expr) (attr : Attr) : Expr × Expr := (Expr.getAttr acc.fst attr, Expr.and acc.snd (Expr.hasAttr acc.fst attr)) def desugarHasChain (x : Expr) (attr : Attr) (attrs : List Attr) : Expr := (attrs.foldl desugarHasChainStep (Expr.getAttr x attr, Expr.hasAttr x attr)).snd

Here are some examples of desugared ASTs:

#eval desugarHasChain (Expr.var .principal) "contactInfo" [] Expr.hasAttr (Expr.var .principal) "contactInfo" #eval desugarHasChain (Expr.var .principal) "contactInfo" ["address"] Expr.and (Expr.hasAttr (Expr.var .principal) "contactInfo") (Expr.hasAttr (Expr.getAttr (Expr.var .principal) "contactInfo") "address") #eval desugarHasChain (Expr.var .principal) "contactInfo" ["address", "zip"] Expr.and (Expr.and (Expr.hasAttr (Expr.var .principal) "contactInfo") (Expr.hasAttr (Expr.getAttr (Expr.var .principal) "contactInfo") "address")) (Expr.hasAttr (Expr.getAttr (Expr.getAttr (Expr.var .principal) "contactInfo") "address") "zip")

Drawbacks

  1. This feature requires extending the grammar, CST, EST, CST -> AST, and CST -> EST converters. This extension will be a breaking change to the EST. We have to either modify the EST hasAttr node to include an additional List argument, or introduce a separate node to represent the extended syntax. Either change may break code that operates on the EST.

  2. If we choose not to modify the EST, and implement a CST -> EST conversion using the above algorithm, then the EST loses information. It will no longer be possible to recover the original policy syntax from the EST.

Alternatives

We considered several variants of the extended syntax that would support arbitrary attribute names. For example, principal has contactInfo."primary address".zip or principal has ["contactInfo"]["primary address"]["zip"].

We decided that the alternatives were less intuitive and readable. If needed, we can support arbitrary attribute names in the future without causing any further breaking changes to the EST.

Disallow shadowing definitions in the empty namespace

  • Reference Issues: Successor to RFC 64
  • Implementation PR(s): #1147

Timeline

  • Started: 2024-06-26
  • Accepted: 2024-08-01
  • Landed: 2024-08-26 on main
  • Released: 2024-09-16 in cedar-policy v4.0.0

Note: These statuses are based on the first version of the RFC process.

Summary

In schemas, disallow definitions of entity types, common types, and actions that would shadow definitions of other entity types, common types, or actions in the empty namespace.

Basic example

Borrowing and slightly tweaking the example from RFC 64:

entity User { email: String, }; namespace Demo { entity User { id: String, }; entity Account { owner: User, }; }

Today, this schema is accepted, and Account.owner is assumed to be Demo::User and not the empty-namespace User. As noted in RFC 64, there is no way for the user to indicate that they intend Account.owner to be the empty-namespace User. Rather than add syntax to enable this (as proposed in RFC 64), this RFC proposes to make this situation an error -- disallowing declaring a User type in the Demo namespace when one already exists in the empty namespace.

Motivation

Name shadowing is complicated and confusing; schemas using name shadowing may look like they are doing one thing when they are actually doing another. For Cedar policy and schema authors, this can cause misunderstandings and make it more difficult to write correct schemas and policies. By disallowing shadowing in the most egregiously ambiguous cases -- when the name being shadowed was declared in the empty namespace -- we nudge users to make the schema clearer, for instance by renaming a type, or moving a conflicting empty-namespace declaration into a new namespace.

Detailed design

We disallow shadowing a type or action in the empty namespace with a new declaration in a nonempty namespace.

  • For entity and common types, we disallow this shadowing regardless of whether the shadowed declaration is an entity or common type, and regardless of whether the shadowing declaration is an entity or common type.
  • For actions, we only disallow shadowing other actions. We continue to allow an action to have the same name as an entity or common type (in the same namespace or in the empty namespace). That is not "shadowing", as there is no place in a schema where it's ambiguous whether you're referring to an action or an entity/common type.

This RFC does not disallow shadowing altogether; in particular, it does not disallow shadowing a entity typename with a common typename in the same namespace, as originally proposed in RFC 24 (see "Disambiguating Types in Custom Syntax", and in particular Rule 5 in that section):

// Allowed today, and still allowed after this RFC namespace NS { entity User { id: String, } type User = User; // or, more confusingly, type User = String; }

This RFC also does not disallow shadowing the names of primitive or extension types, where you can still refer to the shadowed type using __cedar, again as explained in RFC 24. Conceptually, primitive and extension types are not declared in the empty namespace; they are declared in the __cedar namespace, and implicitly imported as part of a prelude. That is how we could explain (e.g. in the Cedar docs) why the rule is different for shadowing entity/common types and shadowing primitive/extension types. (Internally, another reason for continuing to allow shadowing primitive/extension types, is that we want to be able to introduce new extension types without any possibility of breaking existing schemas.)

This RFC does not change anything about cedar#579. We still plan to implement cedar#579, which allows users to refer to types in the empty namespace, in cases where they are not shadowed.

This RFC does apply the new restriction uniformly in both the human and JSON schema syntaxes. The motivation and situation is the same in both syntaxes for the case of an entity typename shadowing an entity typename, and for the case of a common typename shadowing a common typename. For the case of an entity typename shadowing a common typename or vice versa, this RFC still proposes to disallow in both syntaxes, even though these situations are not ambiguous in the JSON syntax, which distinguishes between entity type and common type references. See this RFC discussion thread.

Drawbacks

Drawback 1

The motivation section for RFC 64 claimed that

Users justifiably expect that they can refer to any declared typename (in any namespace) from any position in the schema, by writing a fully qualified name. The fact that RFC 24 did not provide a way to refer to typenames in the empty namespace by using some kind of fully qualified name seems like an oversight, and should be corrected for consistency.

This RFC does not address that motivation, and effectively counterclaims that users don't actually expect this.

Drawback 2

If RFC 69 or something similar are eventually accepted, the error in this RFC may not be so easily avoided, if the empty-namespace definition is part of a third-party library being imported by the user. In that case, the user couldn't easily rename the empty-namespace type or move it into a nonempty namespace.

To address this drawback, we should guide library authors to not declare library types or actions in the empty namespace (and instead, e.g., declare them in a namespace named after the library). We could even codify that rule, requiring that all libraries have a name and all declarations in the library live in the corresponding namespace, or alternately, implicitly prefixing all library declarations with the library name as a toplevel namespace. Many existing package systems have similar rules or mechanisms.

Alternatives

RFC 64 proposed instead introducing new schema syntax to allow users to refer to entity and common types in the empty namespace whose names are shadowed by declarations in the current namespace. (RFC 64 could be naturally extended to actions as well.)

Introducing new schema syntax is a major change, and in the discussion on RFC 64 we have been, to date as of this writing, unable to agree on a syntax for this which is clear and acceptable to everyone. On the RFC thread and in offline discussions, the leading candidates for such a syntax are currently (as of this writing) ::User, root User, or __cedar::Root::User (reusing the __cedar namespace reserved in RFC 52). But it seems apparent that none of these options are immediately intuitive to all users; if we were to introduce such a syntax, we run the risk of complicating the schema syntax only to not actually clarify the situation because the new syntax is still unclear.

Finally, this shadowing situation is viewed as an unlikely edge case in general, and certainly one that can easily be avoided by users (by renaming types/actions or moving empty-namespace declarations into a namespace). It seems expedient to resolve this in the easiest, clearest way possible. This RFC proposes that disallowing the premise is easier and clearer than designing and implementing new schema syntax that will only be used in an unlikely edge case.

Trailing Commas

  • Reference Issues: cedar#1024
  • Implementation PR(s):

Timeline

  • Started: 2024-06-26
  • Accepted: 2024-09-10
  • Landed: TBD
  • Released: TBD

Note: These statuses are based on the first version of the RFC process.

Summary

The Cedar grammar (both policies and schemas) should accept trailing commas whenever possible.

Basic example

Currently, this is a parse error:

permit(principal, action, resource) when { { foo : 3, bar : 2, } == context };

This RFC would make it not a parse error. It would also change this is several places beyond records.

Motivation

In general, allowing trailing commas makes editing easier. Re-arranging, deleting, and copy-pasting code into/from a comma separated grammar element will no longer require adjusting commas.

Example:

{ "managers": ["Dave", "Shelly", "Chris"], "projects" : { "Alpha" : { "zones" : ["A", "B"] }, "Beta" : { "zones" : ["D", "F"] } } }

The Beta project cannot simply be deleted from this record, and likewise, new fields cannot be simply pasted into any level of the objects, without adjusting commas.

Many languages allow for trailing commas, including:

  • Rust
  • Python
  • Java,
  • Go
  • JavaScript
  • OCaml

(Frustratingly: not JSON!)

In addition, it will make our schema grammar and policy grammar more consistent, as the schema grammar allows trailing commas in many places.

Detailed design

Cedar Policy Grammar

We want to allow trailing commas in the following expressions:

  1. scopes: permit(principal, action, resource, )
  2. sets: [1,2,3,]
  3. records: { foo : 1, bar : 3, }
  4. all method/function calls: .contains(3,), ip("10.10.10.10",)

In the grammar:

The following grammar rules change:

Scope ::= Principal ',' Action ',' Resource

becomes:

Scope ::= Principal ',' Action ',' Resource ','?
EntList ::= Entity {',' Entity}

becomes:

EntList ::= Entity {',' | ',' Entity} ','?
ExprList ::= Expr {',' Expr}

becomes:

ExprList ::= Expr {',' | ',' Expr} ','?
RecInits ::= (IDENT | STR) ':' Expr {',' (IDENT | STR) ':' Expr}

becomes:

RecInits ::= (IDENT | STR) ':' Expr {',' | ',' (IDENT | STR) ':' Expr}

Cedar Schema Grammar

The schema grammar already allows trailing commas in several places (such as record types and entity lists), but not everywhere:

We want to allow trailing commas in the following expressions:

  1. membership lists: entity Photo in [Account, Album,];
  2. Multi-entity declarations: entity Photo,Video;
  3. Record types: { foo : Long, bar : String, } (currently allowed)
  4. Multi-action declarations: action foo,bar,;
  5. Apply specs: appliesTo { principal : ["A"], resource: ["B"]," (currently allowed)
EntTypes := Path {',' Path}

becomes:

EntTypes := Path {',' | ',' Path}
Names := Name {',' Name}

becomes:

Names := Name {',' | ',' Name} ','?
Idents := IDENT {',' IDENT}

becomes:

Idents := IDENT {',' | ',' IDENT}

Backwards Compatibility

This change is fully backwards compatible.

Drawbacks

  • Trailing commas are potentially confusing.

Alternatives

  • Keep status quo: this is a non critical change

Unresolved questions

Entity Slice Validation

Timeline

  • Started: 2024-08-01
  • Accepted: 2024-10-03
  • Landed: 2024-10-03 on main
  • Released: 2024-10-07 in cedar-policy 4.2.0 as an experimental feature

Note: These statuses are based on the first version of the RFC process.

Summary

This RFC introduces "Entity Slicing Validation" (ESV), which consists of

  1. An additional validation check, and
  2. A generic entity slicing algorithm

The two are connected by the concept of a "level," which defines the depth from root variables (like principal and resource) at which entity dereferencing operations may safely take place in policies. The validation check ensures that policies never dereference beyond the given level. The slicing algorithm determines what entities are needed to safely decide an authorization request when evaluating policies valid at a given level.

Basic example

Consider the following policies, taken from the TinyTodo example application:

// Policy 1: A User can perform any action on a List they own permit ( principal, action, resource is List ) when { resource.owner == principal }; // Policy 2: A User can see a List if they are either a reader or editor permit ( principal, action == Action::"GetList", resource ) when { principal in resource.readers || principal in resource.editors }; // Policy 4: Admins can perform any action on any resource permit ( principal in Team::"Admin", action, resource );

We suppose (per the schema, not shown) that principal always has entity type User, and resource always has entity type List. These policies are valid at level 1, meaning that variables principal, action, resource and context are only directly dereferenced, i.e., any entities they reference are not themselves dereferenced.

For example, in policy 1, the expression resource.owner == principal directly dereferences resource by accessing the contents of its owner attribute. In policy 2, expression principal in resource.readers directly dereferences resource to retrieve the contents of its readers attribute, and also directly dereferences principal to retrieve its ancestors, to see whether one might be equal to the contents of resource.readers. Policy 4 also directly dereferences principal to access its ancestors, to see if one is Team::"admin".

Because these policies are valid at level 1, applications submitting authorization requests only need to provide entity data for entities provided as variables, and nothing else. For example, say we want to evaluate the request (User::"Aaron", Action::"GetList", List::"Objectives", {}). Then we would only need to provide entity data (direct attributes and ancestors) for User::"Aaron", Action::"GetList", and List::"Objectives". There is no need to provide data for entities referenced from these entities. For example, suppose the readers attribute of List::"Objectives" is Team::"interns"; there is no need to provide data for Team::"interns" because we know it will never, itself, be dereferenced when evaluating these policies. There is also no need to provide entity data for other entities directly mentioned in policies, e.g., Team::"Admin".

Now suppose we wanted to extend our TinyTodo policies with the following:

// Policy 6: No access if not high rank and at location DEF, // or at resource's owner's location forbid( principal, action, resource is List ) unless { principal.joblevel > 6 && principal.location like "DEF*" || principal.location == resource.owner.location };

This policy does not validate at level 1. That is because the expression resource.owner.location is only valid at level 2: It requires dereferencing resource to retrieve the entity in the owner attribute, and then requires dereferencing that entity to obtain the contents of its location attribute. As a result, if we only provided the principal and resource entity data, as described above, evaluating policy 6 could produce an evaluation error since data may not be provided for resource.owner.

All of the Cedar example policy sets validate with level-based validation, either at level 1 or 2; see the Appendix for details.

Motivation

Currently, Cedar provides no support for slicing. Users are required to either be inefficient and produce their entire entity store as a slice, or derive their own ad-hoc slicing algorithm. We'd like to provide an out-of-the-box solution that addresses many user's needs. Entity slicing in general is a problem very tied to the specifics of application/deployment. It's likely impossible to have a fully general solution. This RFC is attempting to provide a solution for entity slicing that works for the 80% case, while acknowledging that it won't work for everyone.

Additionally, there are some application scenarios where the actors writing the policies and the people performing slicing are not the same people, and thus the slicers need to impose some kind of restrictions on policy authors.

This RFC has a close connection to RFC 74: Entity Slicing using Data Tries, which also seeks to provide a sound foundation for entity slicing. We discuss the relationship between them in the Alternatives below; ultimately, we believe they are complementary and should coexist.

Detailed design

Entity dereferences

An "Entity Dereference" is any operation that requires directly accessing an entity’s data. The following is an exhaustive list of entity dereferencing operations

  1. e1 in e2 dereferences e1 (but not e2)
  2. e1.foo dereferences e1 iff e1 is an entity
  3. e1 has foo dereferences e1 iff e1 is an entity

Note that testing equality is not a dereferencing operation. Therefore expressions like e1 == e2, e1.contains(e2) and e3 in e2 do not dereference e1 or e2.

Also note that by “dereference” above, we are only referring to the given operation, not about any operations that might be executed to evaluate subexpressions. For example, while e1 in e2 only dereferences e1 (per rule 1 above), if e2 is principal.foo, then of course evaluating e2 dereferences principal (per rule 2 above).

Entity roots

The variables principal, action, and resource are entity roots. (Entity literals are not entity roots, and operations on such literals are limited, as described below. Slots and the Unknown entity are treated as entity literals because slots will be replaced by literals and Unknown cannot be dereferenced.) They directly name entities from a request that could be dereferenced in a policy. A context is not an entity root directly; rather, any attributes it (transitively) contains that have entity type are considered roots. For example, consider the following schema:

entity User { manager: User, age: Long, }; action getDetails appliesTo { principal: User, resource: User, context: { admin: User, building: { location: Long, ITDeptHead: User }, foo: { inner: User }, bar: { inner: User } } };

Here, principal, action, and resource are entity roots (as usual), as are:

  • context.admin
  • context.building.ITDeptHead
  • context.foo.inner , and
  • context.bar.inner

Note that context, context.building, and context.building.location are not entity roots because they do not have entity type. Moreover, context.admin.manager is not an entity root because manager is not an attribute of context itself.

Dereference levels

During validation, we extend entity types with a level, which is either or a natural number. We call these leveled entity types. The level expresses how many entity dereferences are permitted starting from an expression having that type. Levels on types are only used internally; policy/schema writers never see them. When the level is , validation is unchanged from the status quo.

Specifying the level

The maximum level that policies may validate at is specified as a parameter to the validator. Schemas are unchanged. Just like Cedar currently supports strict (the default) and permissive validation, it would now support those things with the additional parameter of what level at which to validate. As discussed in the alternatives below, a drawback of this approach is that policy writers cannot look in one place (the schema) to know the limits on the policies they can write. But in a sense, due to the dichotomy of strict vs. permissive validation, that's already the case.

Validation algorithm

Validation works essentially as today (see our OOPSLA’24 paper, page 10), but using leveled types. To validate a policy c at level N, we use a request environment which maps principal, resource, and context to leveled types with level N. For example, for the schema given above, to validate at level 2 our request environment would map principal to User(2), resource to User(2), and context to the type given in the schema annotated with 2, i.e., { admin: User(2), building: { location: Long, ITDeptHead: User(2) },... }.

The rules for expression e1 in e2 (line (3) in Fig. 7)(a) in the paper) are extended to require that the entity type of e1 is E1(n) where n > 0. This requirement indicates that e1 must be an entity from which you can dereference at least one level (to access its ancestors).

The rules for expression e has f and e.f (line (6) in Fig. 7(a) in the paper) are extended so that if e has entity type, then its type must be E(n) where n > 0. The rule for e.f is additionally extended so that if e.f has entity type F (per the schema), then the the final type is leveled as one less than n; e.g., if e has level n, then final expression has type F(n-1). (Rules for e has f and e.f are unchanged when e has record type.)

The rule for entity literals (not shown in the paper) always ascribes them level 0 unless we are validating at level . For example, the type of expression User::"Alice" is User(0). This is because we do not permit dereferencing literals in policies when using leveled validation. (See the Alternatives below for a relaxation of this restriction.)

We extend the subtyping judgment (Fig 7(b) in the paper) with the following rule: E(n) <: E(n-1). That is, it is always safe to treat a leveled entity type at level n as if it had fewer levels. This rule is important when multiple expressions are required to have the same level. For example:

if (principal.age < 25) then principal.manager else principal

Suppose we are validating at level 2, then this expression will have type User(1), where we use subtyping to give expression principal (in the else branch) type User(1) rather than User(2). Similarly, if we were defining a set of entities, all of them must be given the same level. For example, we could give [ principal.manager, principal ] the type Set<User(1)>. Note that for records, individual attributes may have entities with different levels. E.g., { a: principal, b: principal.manager } would have type { a: User(2), b: User(1) }.

Slicing

Once a schema has been validated against a non- infinite level, we can use the following procedure at runtime to compute the entity slice:

/// Takes: /// request : The request object /// level : The level the schema was validated at /// get-entity : A function for looking up entity data given an EUID function slice(request, level, get-entity) { // This will be our entity slice let mut es : Map<EUID, EntityData> = {}; let mut worklist : Set<EUID> = {principal, action, resource, all root euids in context}; for euid in worklist { es.insert(euid, get-entity(euid)); } let current-level = 0; while current-level <= level || worklist.empty? { // `t` will become our new worklist let t : Set<EUID> = {}; // Clear our worklist for euid in worklist { if !es.contains(euid) { es.insert(euid, get-entity(euid)); } // Set of all euids mentioned in this entities data let new-euids = filter(not-in-es?, all-euids(es.get(euid))); t = t.union(new-euids); } // set t to be the new worklist s = t; // increment level level = level + 1; } return es; }

Soundness

Leveled types permit us to generalize the soundness statement we use today. Currently, soundness assumes that expressions like principal.manager will never error, i.e., that the proper entity information is available. Now we assume dereferences are only available up to a certain level. In particular, we define a well-formedness judgment μ ⊢ v : τ where μ is an entity store, v is a value, and τ is a possibly leveled type. This judgment is standard for values having non-entity type, e.g.,

mu |- i : Long mu |- s : String mu |- v1 : T1 ... mu |- vn : Tn ------------------------------------------------------- mu |- { f1: v1, ..., fn: vn } : { f1: T1, ..., fn: Tn }

There are two rules for entity literals of type E::s. The first says level 0 is always allowed. The second says a literal can have a level n only if n dereferences are possible starting from E::s:

mu |- E::s : E(0) mu(E::s) = (v,h) where v = { f1: v1, ..., fn: vn } and h is an ancestor set mu |- v1 : T1 ... mu |- vn : Tn where level(T1) >= n-1 ... level(Tn) >= n-1 ------------------------------------------------------------------------------------ mu |- E::s : E(n) where n > 0

The auxiliary function level(T) identifies the lower-bound of the level of type T. It’s defined as follows:

  • level(E(n)) = n
  • level({ f1: T1, ..., fn: Tn }) = m where m = min(level(T1),...,level(Tn))
  • level(Long) = level(String) = level(Boolean) = level(Set(T)) = (etc.) = ∞ for all other types

Thus the soundness statement says that

  1. If policy set p is validated against a schema s with level l, where l < ∞ .
  2. If entity store μ is validated against schema s
  3. If request σ is validated against schema s
  4. If μ ⊢ σ(principal): E(l) where E is the expected type of the principal for this particular action, and similarly check the proper leveling of action, request, and context.
  5. Then:
    1. isAuthorized(sigma, p, mu) is either a successful result value or an integer overflow error
      1. All errors except integer overflow + missing entity are prevented by the current validator
      2. missing entity errors are prevented by the new validator + slicer

We extend the proof of soundness to leverage premise #4 above: This justifies giving principal a type with level l in the initial request context, and it justifies the extensions to the rules that perform dereferencing, as they decrease the leveled result by 1, just as the μ⊢v:τ rule does, which we are given.

We also want to prove the soundness of the slice(request, level, get-entity) procedure defined above: If slice(σ, l, get-entity) = μ and request σ is validated against schema s and level l, then μ validates against schema s at level l.

Drawbacks

Level-based slicing is coarse-grained: all attributes up to the required level, and all ancestors, for each required entity must be included in the slice. Much of this information may be unnecessary, though, depending on the particulars of the request. For example, for the TinyTodo policies in our detailed example, if the action in the request was Action::"CreateList", then only policies 1 and 4 can be satisfied. For these policies, no ancestor data for principal and resource is needed, and only the contents of resource.owner, and not (say) resource.readers and resource.editors, is required. For some applications, the cost to retrieve all of an entity’s data and then pass it to the Cedar authorizer could be nontrivial, so a finer-grained approach may be preferred. Some ideas are sketched under alternatives.

Implementing levels is a non-trivial effort: We must update the Rust code for the validator of policies and entities, and the Lean code and proofs. We also must provide code to perform level-based slicing.

Alternatives

Alternative: Entity Slicing using Data Tries

RFC 74, Entity Slicing using Data Tries proposes the idea of an entity manifest to ease the task of entity slicing. This manifest is a trie-based data structure that expresses which entity attributes and ancestors are needed for each possible action. This data structure is strictly more precise than the concept of level, meaning that it may require less entity data to be provided. For example, the first part of our Basic Example in this RFC (policies 1, 2, and 4 of TinyTodo), the determined entity manifest would indicate that for the GetList action, you would need to provide resource.owner, resource.readers, resource.editors and the ancestors of principal. This is less data than would be retrieved by the slicing algorithm proposed here: These are level-1 valid policies, so the generic algorithm would acquire the same data given in the manifest but also more, e.g., the ancestors of resource and the attributes of principal.

Entity manifests are less effective as a prescriptive mechanism that aims to limit the shape of allowable policies. For example, a Cedar policy and entity storage service might like to disallow uploading policies whose entity data is expensive to retrieve in the worst case. With levels, such a service can upper-bound this work by blocking schemas with a level greater than a threshold, say 3. Validation will then ensure that uploaded policies respect this upper bound. Manifests are unwieldy as a specification mechanism, since they are specialized to particular entity definitions and name particular types and attributes, of which there could be many. They'd also have to be updated as types evolved over time, while a level-based specification is less likely to change.

We believe that ultimately level-based validation and entity manifests should coexist. Level-based validation is used to bound entity retrieval work and prevent pathological policies, while entity manifests are used to more efficiently define the needed entity data. We could imagine a deployment strategy in which we accept this RFC and perform generic entity slicing as described here, and then later implement entity manifests, and start performing slicing using those instead.

Alternative: Level in the schema, not as a validation parameter

Instead of specifying the validation level as a parameter to the validation algorithm, we could specify it directly in schemas. The benefit of this is that policy writers can see the expected form of policies in one place. For example, we could extend our schema from the initial example as follows:

level = 1; entity User { manager: User, age: Long, }; action getDetails appliesTo { ... }

Policy writers know, by seeing level = 1, that they can write principal.manager in policies, but not principal.manager.manager.

On the other hand, you could imagine that different applications might like to use the same schema file but have different restrictions on entity retrieval, owing to the efficiency of a chosen storage strategy. For example, in a graph database, retrieving data may be efficient no matter what the level is, whereas in a relational database, each level might necessitate another join query, which could be very inefficient.

Specifying levels in schemas also adds a complication when multiple schema files are used during validation. When a user specifies multiple files, they are effectively concatenated together by validator. What should happen if multiple files specify level and the level is different? One approach is that the validator would allow at most one to have a level specification, else it's an error. Doing so serves the expected use-case that one of those files will be for the application, specifying the action types and the level, while any others will be only general entity and type definitions, which are not application specific and thus need not concern themselves with entity slicing.

Alternative: Per-entity levels, rather than a global level

We might refine in-schema level to not apply to all entities, but rather to particular entity types. For example, for User entities (bound to principal) we might specify level 2 but for any other entity type we specify level as 0, as per the following schema:

@level(2) entity Issue = { repo: Repository }; @level(0) entity Repository = { owner: User, readers: Set<User>, }; @level(0) entity User = { manager: User }; action ReadIssue appliesTo { principal: User, resource: Issue };

Per-entity levels allows you to reduce the entities you provide in the slice — given a request, you get the types of the principal and resource entities, look up what their levels are from the schema, and then provide data up to that level for each. Validation for per-entity levels is also a straightforward generalization of the algorithm sketched above.

Note that per-entity levels adds some complication to understandability, in particular for nested attributes. The question is whether the performance benefits are worth it. Suppose I have a policy

permit(principal, action, resource) when { resource.repo.owner.manager == principal };

This is allowed because Issue is labeled with level 2. That means that we are allowed to write resource.repo.owner.manager (two levels of dereference beyond the first, allowed one by level 0). But this is a little confusing because we wrote @level(0) on Repository, and yet we are accessing one of its attributes. The reason that’s Ok is that Repository never actually is bound to principal or resource, so the level is irrelevant.

Alternative: Full support for entity literals

Entity literals cannot be dereferenced. That means that, at all levels, they can only appear in == expressions or on the RHS of in. This captures use-cases that we know about. However, if you wanted to use them on the LHS of in or access their attributes, one approach to doing this would be to extend enumerated entity types to normal entity type definitions. For example,

entity User in [Team] enum { "default" @level } = { manager: User, organization: Org };

This would say that there is a User entity whose ID is default which should always be present in the entity slice, e.g., along with other entities for the principal, resource, etc. That presence is at the level specified globally, e.g., if level was 1, then just the User::"default" and its direct attributes would be included, but not the entities those attributes point to.

Alternatives: Extensions to reduce ancestor requirements

One drawback of level is that it gets all of an entity’s ancestors when it is included in a slice. This could be very expensive in the case that the entity has many such ancestors. For example, in TinyTodo, each User has Team-typed ancestors that correspond to the viewers or editors of the List entities which they can access. This could be 1000s of lists, which are expensive to acquire and to include in the request (especially if calling AVP). Worse, this work is apparently wasted because the policies can only ever look at some of these ancestors, notably those reachable from the editors or viewers of the resource also provided in the request.

Instead of passing in a principal in a request with all of its ancestors, we could just as well pass in only those that are possibly referenced by policy operations. For TinyTodo, it would be those equal to resource.viewers or resource.editors (if they match). Since this invariant is not something the application necessarily knows for certain (it might be true of policies today, but those policies could change in the future), we want a way to reduce the provided ancestor data while being robust to how policy changes in the future. Here are three possible approaches to addressing this issue.

Alternative 1: Leverage partial evaluation

We can use partial evaluation to solve this problem: Specify a concrete request with all of the needed attribute data, after entity selection, but set the ancestor data as **** unknown. Partially evaluating such a request will result in results that have incomplete entity checks. These checks can then be handled by the calling application. This might be a nice feature we can package up and give to customers as a client-side library.

A drawback of this approach is that it could slow down evaluation times. This is because the entity hierarchy is used to perform policy slicing. In the worst case, you’d have to consider all possible policies. It might be possible to provide at least someof the hierarchy with the request to reduce the set of policies that are partially evaluated.

Alternative 2: Ancestor allow-lists

When validating static policies, it’s possible to see which ancestors are potentially needed. Could we do something similar to level for policy validation, but for ancestors instead?

For TinyTodo policies above, we see the following expressions in policies:

  • principal in resource.viewers
  • principal in resource.editors
  • principal in Team::"admin"

For the other examples, the situation is similar. We also see multi-level access, e.g., with GitHub policies principal in resource.repo.readers, and we see recursive access through sets, e.g., for Hotel Chains we see resource in principal.viewPermissions.hotelReservations where hotelReservations is a set containing entities, rather than being a single entity.

Approach: Add an annotation @ancestors to the in declarations around entities. When I write entity User in [Team], add an annotation that enumerates the Team-typed expressions that can be on the RHS of in in policies. Here’s an example of the annotated TinyTodo schema:

@ancestors(resource.viewers,resource.editors,Team::"admin",Team::"interns") entity User in [Team, Application] = { "joblevel": Long, "location": String, }; @ancestors() entity Team in [Team, Application]; @ancestors() entity List in [Application] = { "editors": Team, "name": String, "owner": User, "readers": Team, "tasks": Tasks, }; @ancestors() entity Application; ...

Notice that we annotate List, Team, or Application with () since entities of these types never appear on the LHS of in. An entity type with no annotation at all is assumed to require every potential ancestor.

Validation: The validator will check for syntactic equality against the expressions in the annotation. In particular, when it sees an expression E in F in a policy, if E has an entity type that is annotated with the set of expressions S in the schema, then the validator will check that FS, using syntactic equality. With the above schema, all current TinyTodo policies would validate, while the following would not

// Fails because a Team-typed entity is on the LHS of `in`, // but entity Team has no legal RHS expressions per the schema permit(principal,action == Action::"GetList",resource) when { resource.editors in Team::"admin" }; // Fails because Team::"temp" is not a legal RHS expression for User permit(principal in Team::"temp",action,resource);

Entity selection: When collecting ancestors during entity selection, the application only gets ancestors per the expressions in the annotation. To do that, it would evaluate these expressions for the current request, and provide the ancestors present in the result, if present. For example, suppose our TinyTodo request was

principal = User::"Alice" action = Action::"GetList" resource = List::"123" context = {}

Since TinyTodo policies are level 2, we would provide entity data for both User::"Alice" and List::"123". Suppose that the latter’s data is

{ id = "123", owner = User::"Bob", viewers = Team::"123viewers", editors = Team::"123editors" }

The principal has type User, which is annotated in the schema with the following expressions, shown with what they evaluate to:

  • resource.viewers evaluates to List::"123".viewers which evaluates to Team::"123viewers"
  • resource.editors evaluates to List::"123".editors which evaluates to Team::"123editors"
  • Team::"admin" (itself)
  • Team::"interns" (itself)

Thus the application knows we can determine in advance whether User::"Alice" (the principal) has any of these four ancestors, and include them as such with User::"Alice"’s entity data, if so. No other ancestor data needs to be provided.

Note: The above algorithm i not very helpful for templates containing expressions in ?resource or in ?principal ( (templates with only == ?principal and == ?resource in them are fine, since they don’t check the ancestors). For these, you are basically forced to not annotate the entity type because ?resource and ?principal are not legal expressions. This makes sense inasmuch as a template with constraint principal in ?principal tells us nothing about possible values to which ?principal could be linked — the values could include potentially any entity, so in the worst case we’d need to include all possible ancestors. It may be that you could use further contextual data (like the action) to narrow the ones you provide.

Note: Interestingly, no policy-set example ever has an expression other than principal or resource on the LHS of in. The scheme above is not leveraging this fact.

Note: This approach composes nicely with the per-entity @level approach sketched above.

Alternative: Reduce attribute requirements

The per-entity level approach is one way to reduce attribute requirements, but only across levels — for a particular entity you still must provide all of its attributes if you are providing its data. We might like to provide only some attributes, similar to the alternative that requires only some, not all, ancestors.

As discussed earlier, partial evaluation can be used to help here. There could also be some sort of allow-list for attributes. Probably it would make sense to associate allow-lists with actions.

Appendix: Accepted policy sets

All existing policies in published example Cedar applications can be validated using levels. In particular, the following policy sets validate at level 1:

  • Tags and roles: https://github.com/cedar-policy/cedar-examples/tree/main/cedar-example-use-cases/tags_n_roles
  • Sales orgs: https://github.com/cedar-policy/cedar-examples/tree/main/cedar-example-use-cases/sales_orgs
  • Hotel chains: https://github.com/cedar-policy/cedar-examples/tree/main/cedar-example-use-cases/hotel_chains

And the following validate at level 2:

  • TinyTodo: https://github.com/cedar-policy/cedar-examples/blob/main/tinytodo/policies.cedar and also https://github.com/cedar-policy/cedar-examples/blob/main/tinytodo/policies-templates.cedar
    • Policy 6 includes expression resource.owner.location
  • Tax Preparer: https://github.com/cedar-policy/cedar-examples/blob/main/cedar-example-use-cases/tax_preprarer/policies.cedar
    • The access rule involves checking something to the effect of principal.assigned_orgs.contains(resource.owner.organization).
  • Document cloud: https://github.com/cedar-policy/cedar-examples/blob/main/cedar-example-use-cases/document_cloud/policies.cedar
    • This has policy expression (resource.owner.blocked.contains(principal) || principal.blocked.contains(resource.owner)). (It would not make sense to duplicated the blocked
  • GitHub: https://github.com/cedar-policy/cedar-examples/blob/main/cedar-example-use-cases/github_example/policies.cedar
    • Expressions like principal in resource.repo.readers are common: A resource like an issue or pull request inherits the permissions of the repository it is a part of, so the rules chase a pointer to that repository.

datetime extension

Timeline

  • Started: 2024-08-08
  • Accepted: 2024-09-11
  • Landed: 2024-11-13 on main
  • Released: 2025-01-21 in cedar-policy v4.3.0 as an experimental feature

Note: These statuses are based on the first version of the RFC process.

Summary

Cedar currently supports extension functions for IP addresses and decimal values. A popular request has been support for dates and times as well, but there are several complexities here (e.g., timezones and leap years) that have delayed this. Recall that anything we add to Cedar, we want to be able to formally model in Lean using our verification-guided development approach (see this blog post). We also want to support a decidable SMT-based analysis (see this paper). The goal of this RFC is to narrow in on a set of date/time related features that are useful in practice, but still feasible to implement in Cedar given these constraints.

Basic example

This RFC would support a policy like the following, which allows a user to access materials in the "device_prototypes" folder only if they have a sufficiently high job level and a tenure of more than one year.

permit( principal is User, action == Action::"view", resource in Folder::"device_prototypes" ) when { principal.department == "HardwareEngineering" && principal.jobLevel >= 10 && context.now.timestamp.durationSince(principal.hireDate) > duration("365d") };

Motivation

Previously for applications that determine authorization based on date/time, our suggestion has been to use Unix time (see Workaround A) or to pass the result of time computations in the context (see Workaround B). But these are not ideal solutions because Cedar policies are intended to expressive, readable, and auditable. Unix timestamps fail the readability criteria: an expression like context.now < 1723000000 is difficult to read and understand without additional tooling to decode the timestamp. Unix timestamps are also indistinguishable from any other integral value providing no additional help in expressing the abstraction of time. Passing in pre-computed values in the context (e.g., a field like context.isWorkingHours) makes authorization logic difficult to audit because it moves this logic outside of Cedar policies and into the calling application. It is also difficult for the calling application to predict the necessary pre-computed values that a policy writer requires for their intended purpose. These may change over time, and may also differ depending on the principal, action, and/or resource.

Additional examples

The examples below illustrate the types of policies that this proposal would enable. It assumes that the application adds a Cedar Record named now to context with the following fields:

  • timestamp: A value of type datetime (defined below) representing the current time at policy evaluation
  • dayOfWeek: A long value representing current the day of week (Sunday = 1, ... Saturday = 7)
  • day: A long representing the current day of the month
  • month: A long representing current month (January = 1)
  • year: A long representing the current year

Only allow user "alice" to view JPEG photos for one week after creation.

permit( principal == User::"alice", action == PhotoOp::"view", resource is Photo ) when { resource.fileType == "JPEG" && context.now.timestamp.durationSince(resource.creationTime) <= duration("7d") };

Allow access from a certain IP address only between 9am and 6pm UTC.

permit( principal, action == Action::"access", resource ) when { context.srcIp.isInRange(ip("192.168.1.0/24")) && context.workdayStart <= context.now.timestamp && context.now.timestamp < context.workdayEnd };

Prevent employees from accessing work documents on the weekend.

forbid( principal, action == Action::"access", resource is Document ) when { [1,7].contains(context.now.dayOfWeek) };

Permit access to a special opportunity for persons born on Leap Day.

permit( principal, action == Action::"redeem", resource is Prize ) when { principal.birthDate.day == 29 && principal.birthDate.month == 2 && context.now.day == 29 && context.now.month == 2 };

Forbid access to EU resources after Brexit

forbid( principal, action, resource ) when { context.now.timestamp > datetime("2020-01-31T23:00:00Z") && context.location.countryOfOrigin == 'GB' && resource.owner == 'EU' }

Detailed design

This RFC proposes supporting two new extension types: datetime, which represents a particular instant of time, up to millisecond accuracy, and duration which represents a duration of time. To construct and manipulate these types we will provide the functions listed below. All of this functionality will be hidden behind a datetime feature flag (analogous to the current decimal and IP extensions), allowing users to opt-out.

Instants of Time (datetime)

The datetime(string) function constructs a datetime value. Like with other extension function constructors, strict validation requires string to be a string literal, although evaluation/authorization support any string-typed expression. The string must be of one of the forms, and regardless of the timezone offset is always normalized to UTC:

  • "YYYY-MM-DD" (date only)
  • "YYYY-MM-DDThh:mm:ssZ" (UTC)
  • "YYYY-MM-DDThh:mm:ss.SSSZ" (UTC with millisecond precision)
  • "YYYY-MM-DDThh:mm:ss(+/-)hhmm" (With timezone offset in hours and minutes)
  • "YYYY-MM-DDThh:mm:ss.SSS(+/-)hhmm" (With timezone offset in hours and minutes and millisecond precision)

All formats must adhere to these additional restrictions:

  • The month field (MM) must be in the range 01-12
  • The day field (DD) must be in the range 01-31
  • The hour field (hh) must be in the range 00-23
  • The minute field (mm) must be in the range 00-59
  • The seconds field (ss) must be in the range 00-59 (leap seconds are not supported)

Note that the restrictions on the hour and minute fields also apply to timezone offsets. This implies that timezone offsets must have an absolute value less than 24 hours (i.e., between -2359 and +2359).

The datetime type does not provide a way for a policy author to create a datetime from a numeric timestamp. One of the readable formats listed above must be used, instead.

Values of type datetime have the following methods:

  • .offset(duration) returns a new datetime, offset by duration.
  • .durationSince(DT2) returns the difference between DT and DT2 as a duration. (Note that the inverse of durationSince is DT2.offset(duration)). An invariant for DT1.durationSince(DT2) is that when DT1 is before DT2 the resulting duration is negative.
  • .toDate() returns a new datetime, truncating to the day, such that printing the datetime would have 00:00:00 as the time.
  • .toTime() returns a new duration, removing the days, such that only milliseconds since .toDate() are left. This is equivalent to DT.durationSince(DT.toDate())

Values of type datetime can be used with comparison operators:

  • DT1 < DT2 returns true when DT1 is before DT2
  • DT1 <= DT2 returns true when DT1 is before or equal to DT2
  • DT1 > DT2 returns true when DT1 is after DT2
  • DT1 >= DT2 returns true when DT1 is after or equal to DT2
  • DT1 == DT2 returns true when DT1 is equal to DT2
  • DT1 != DT2 returns true when DT1 is not equal to DT2

Equality is based on the underlying representation (see below) so, for example, datetime("2024-08-21T") == datetime("2024-08-21T00:00:00.000Z") is true. This behavior is consistent with the decimal extension function, where decimal("1.0") == decimal("1.0000") is also true.

Representation

The datetime type is internally represented as a long and contains a Unix Time in milliseconds. This is the number of non-leap seconds that have passed since 1970-01-01T00:00:00Z in milliseconds. A negative Unix Time represents the number of milliseconds before 1970-01-01T00:00:00Z. Unix Time days are always 86,400 seconds and handle leap seconds by absorbing them at the start of the day. Due to using Unix Time, and not providing a "current time" function, Cedar avoids the complexities of leap second handling, pushing them to the system and application.

Durations of Time (duration)

The duration(string) function constructs a duration value from a duration string. Strict validation requires the argument to be a literal, although evaluation/authorization support any appropriately-typed expressions. The string is a concatenated sequence of quantity-unit pairs. For example, "1d2h3m4s5ms" is a valid duration string.

The quantity part is a natural number. The unit is one of the following:

  • d: days
  • h: hours
  • m: minutes
  • s: seconds
  • ms: milliseconds

Duration strings are required to be ordered from largest unit to smallest unit, and contain one quantity per unit. Units with zero quantity may be omitted. "1h", "-10h", "5d3ms", and "3h5m" are all valid duration strings.

A duration may be negative. Negative duration strings must begin with -.

Values of type duration have the following methods:

  • .toMilliseconds() returns a long describing the number of milliseconds in this duration. (the value as a long, itself)
  • .toSeconds() returns a long describing the number of seconds in this duration. (.toMilliseconds() / 1000)
  • .toMinutes() returns a long describing the number of minutes in this duration. (.toSeconds() / 60)
  • .toHours() returns a long describing the number of hours in this duration. (.toMinutes() / 60)
  • .toDays() returns a long describing the number of days in this duration. (.toHours() / 24)

Values with type duration can also be used with comparison operators:

  • DUR1 < DUR2 returns true when DUR1 is shorter than DUR2
  • DUR1 <= DUR2 returns true when DUR1 is shorter than or equal to DUR2
  • DUR1 > DUR2 returns true when DUR1 is longer than DUR2
  • DUR1 >= DUR2 returns true when DUR1 is longer than or equal to DUR2
  • DUR1 == DUR2 returns true when DUR1 is equal to DUR2
  • DUR1 != DUR2 returns true when DUR1 is not equal to DUR2

Comparisons are done with respect to the sign of a duration. I.e., duration("-1d") < duration("1s").

Equality is based on the underlying representation (see below) so, for example, duration("1d") == duration("24h") is true.

Representation

The duration type is internally represented as a quantity of milliseconds as a long, which can be positive, negative, or zero.

A negative duration may be useful when a user wants to use .offset() to shift a date backwards. For example: context.now.offset(duration("-3d")) expresses "three days before the current date".

Errors

All the extension functions proposed in this RFC will throw a type error at authorization time if called with the wrong type of arguments. Additionally, the datetime and duration constructors will return an error if the input string does not match the expected format, or if the internal representation of the value (a 64-bit signed int) would overflow. The datetime constructor will also return an error if the input string refers to a non-existent calendar date (e.g., "2025-02-31"). Unlike some datetime libraries that perform overflow normalization, this extension explicitly rejects such invalid dates. .offset(duration) will return an error if the resulting datetime would overflow.

As noted above, strict validation will require passing literals to the duration and datetime constructors, and it will raise an error if those strings are malformed. Otherwise, validation is straightforward.

JSON Encoding

Cedar supports a JSON format for policies, schemas, entities, and request contexts. The JSON representation of the datetime and duration functions will match the precedents set by the existing IP address and decimal extensions.

For example, here is the JSON encoding of the expression datetime("2020-01-31T23:00:00Z"), which might occur in a policy condition:

"datetime": [ { "Value": "2020-01-31T23:00:00Z" } ]

And here is the JSON encoding for this value when it occurs in entity or context data:

{ "__extn": { "fn": "datetime", "arg": "2020-01-31T23:00:00Z" } }

Finally, here is the JSON encoding of the datetime type in a schema:

{ "type": "Extension", "name": "datetime" }

Out of scope

  • Conversion between UTC and epochs: This will be particularly difficult to model and verify in Lean (although it's technically possible, see this paper which does something similar in the Coq proof assistant). Since it will likely require input-dependent loops, it is unlikely that this can be reasoned about efficiently with SMT.

  • Conversion between UTC and other time zones: Time Zones are a notoriously complex system that evolves rapidly. We avoid this complexity by offering datetime.offset(duration). Policy authors that require "local time" can either provide an additional datetime in context or provide a duration to the context and call .offset() to shift the time.

  • Function to get current time: A natural extension of this proposal would be a function currentTime() that provides the current time instead of it being passed in through context. However, currentTime() is stateful, i.e. not pure, and cannot be modeled in SMT. It's also not useful: currentTime() == currentTime() may return false, and Cedar does not provide local bindings (e.g. let t = currentTime()). For testing purposes, Cedar would also need to provide some way to override currentTime. These problems all go away if currentTime() is not supported.

  • Leap seconds and leap years: Cedar does not have a clock, and this proposal does not add one. Instead, applications pass the current time through context or an entity and let the system / application handle complexities like leap seconds and leap years. This means that Cedar cannot provide utilities like datetime.dayOfWeek() or datetime.dayOfMonth(). Cedar applications that wish to define policies based on these ideas should pass pre-computed properties through entities or through the context.

Support for date/time in other authorization systems

AWS IAM supports date condition operators, which can check relationships between date/time values in the ISO 8601 date format or Unix time. You can find an example of a IAM policy using date/time here.

Open Policy Agent provides a Time API with nanosecond precision and extensive time zone support. During policy evaluation, the current timestamp can be returned, and date/time arithmetic can be performed. The diff (equivalent to our proposed durationSince operator on datetime) returns an array of positional time unit components, instead of a value typed similarly to our proposed duration.

Alternatives

Both alternatives we propose here are "do nothing" options. They show how to encode date/time using existing Cedar types and functionality.

Workaround A: Represent Unix time with a Long

Cedar has long suggested workarounds for date/time functionality by using the comparison and arithmetic operators with context-provided Unix Timestamps.

Here are the previous examples rewritten to use Unix Time.

Only allow experienced, tenured persons from the Hardware Engineering department to see prototypes.

permit( principal is User, action == Action::"view", resource in Folder::"device_prototypes" ) when { principal.department == "HardwareEngineering" && principal.jobLevel >= 10 && (context.currentTime - (365 * 24 * 60 * 60)) >= principal.hireDate };

Only allow user "alice" to view JPEG photos for one week after creation.

permit( principal == User::"alice", action == PhotoOp::"view", resource is Photo ) when { resource.fileType == "JPEG" && resource.creationDate <= (context.currentTime - (7 * 24 * 60 * 60)) };

Allow access from a certain IP address only between 9am and 6pm UTC.

Cedar does not currently support arithmetic division (/) or remainder (%), and therefore this example is not expressible, today.

permit( principal, action == Action::"access", resource ) when { context.srcIp.isInRange(ip("192.168.1.0/24")) && 9 <= ((context.currentTime / (60 * 60)) % 24) && ((context.currentTime / (60 * 60)) % 24) < 18 };

Note that the localized version of this example, with timezone offset, could be supported using the + or - operators on context.currentTime.

Prevent employees from accessing work documents on the weekend.

With Unix Time, this requires / and % operators to compute the dayOfWeek, which isn't currently expressible in Cedar.

Permit access to a special opportunity for persons born on Leap Day.

With Unix Time, this requires / and % operators to compute whether or not it is a leap year, which isn't currently expressible in Cedar.

Forbid access to EU resources after Brexit

forbid( principal, action, resource ) when { context.currentTime > 1580511600 && context.location.countryOfOrigin == 'GB' && resource.owner == 'EU' }

Workaround B: Pass results of time checks in the context

Another workaround we have suggested is to simply handle date/time logic outside of Cedar, and pass the results of checks in the context. For example, you could pass in fields like context.isWorkingHours or context.dayOfTheWeek.

Here are the previous examples rewritten to use additional context.

Only allow experienced, tenured persons from the Hardware Engineering department to see prototypes.

permit( principal is User, action == Action::"view", resource in Folder::"device_prototypes" ) when { principal.department == "HardwareEngineering" && principal.jobLevel >= 10 && principal.hireDate <= context.minimumHiringDateForAccess };

Note: assumes context.hireDate and context.minimumHiringDateForAccess are long values (e.g., Unix time).

Only allow user "alice" to view JPEG photos for one week after creation.

permit( principal == User::"alice", action == PhotoOp::"view", resource is Photo ) when { resource.fileType == "JPEG" && resource.creationDate >= context.oldestViewableDate };

Note: assumes resource.creationDate and context.oldestViewableDate are long values (e.g., Unix time).

Allow access from a certain IP address only between 9am and 6pm UTC.

permit( principal, action == Action::"access", resource ) when { context.srcIp.isInRange(ip("192.168.1.0/24")) && context.isWorkingHours };

Prevent employees from accessing work documents on the weekend.

forbid( principal, action == Action::"access", resource is Document ) when { context.isTheWeekend };

Permit access to a special opportunity for persons born on Leap Day.

permit( principal, action == Action::"redeem", resource is Prize ) when { context.isMyBirthday && context.isLeapDay };

Forbid access to EU resources after Brexit

forbid( principal, action, resource ) when { context.afterBrexit && context.location.countryOfOrigin == 'GB' && resource.owner == 'EU' }

Discussion notes

Local time

The current proposal has no direct support for local time, or time zones outside of UTC. Providing robust time zone support would add significant complications to the formal models and ruin the project's SMT-based analysis goals. Policy authors wishing to use local time can simulate it by:

  • providing offsets, of the form +/-hhmm, to the time strings used by datetime(). (Note: no time zone information is retained. The time will be converted to UTC)
  • utilizing the datetime.offset() method with duration values, or values passed through entities and/or context.

Consider the policy below that checks if a principal's local time is between 09:00 and 17:00.

permit( principal, action == Action::"access", resource ) when { context.now.timestamp.offset(principal.timeZoneOffset).toTime() >= duration("9h") && context.now.timestamp.offset(principal.timeZoneOffset).toTime() <= duration("17h") };

Operator overloading

This RFC proposes to use operators <, <=, >, and >= for comparing datetime and duration objects. Currently in Cedar, these operations are only supported for long-typed values. For other extension types with similar operations, Cedar instead uses extension functions (e.g., .lessThan() for decimal values).

This RFC proposes to reverse this decision, and instead allow using builtin operators for extension functions, as appropriate. This will add some implementation complexity (at least in the primary Rust implementation), but it will make policies that use these operations easier to read and easier to write.

Millisecond precision

The current proposal supports milliseconds. The ISO 8601 format does not specify a maximum precision, so we can technically allow any number of Ss after the . in YYYY-MM-DDThh:mm:ss.SSSZ. Based on this blog post, it appears that Javascript supports milliseconds (3 digits), Python supports microseconds (6 digits), and Rust and Go support nanoseconds (9 digits). Assuming nanosecond accuracy, the maximum (signed) 64-bit number (2^63 - 1) represents April 11, 2262. This date seems far enough out that any of these choices (milliseconds, microseconds, or nanoseconds) seems reasonable.

During discussion, we decided that sub-second accuracy was potentially useful, but we did not have a use case in mind for sub-millisecond accuracy. So in the end we landed on milliseconds. Note that this is the backwards compatible option (at least with respect to allowable date/time strings) because we can add precision later, but not remove it without a breaking change.

Additional restrictions on datetime strings

The current proposal sets additional restrictions on most fields of datetime string formats. These restrictions are common in datetime frameworks for popular programming languages like Python and TypeScript (more details here), and they help us avoid discrepancies between different Cedar implementations.

Overflow handling

The current proposal rejects non-existent dates like 2025-02-31. However, some programming languages and datetime libraries accept such dates and handle excess days through overflow normalization, resulting in a different valid calendar date. For instance, 2025-02-31 is interpreted as being 3 days later than 2025-02-28, thus normalizing to 2025-03-03. The standard Date type in TypeScript reflects this behavior, as shown in this comparison. We argue that this normalization behavior, while convenient for general programming, may be too implicit and potentially surprising for an authorization system.

Entity tags with dedicated syntax and semantics

Timeline

  • Started: 2024-09-11
  • Accepted: 2024-09-19
  • Landed: 2024-09-18 on main
  • Released: 2024-09-30 in cedar-policy v4.1.0 as an experimental feature
  • Stabilized: 2024-10-07 in cedar-policy v4.2.0

Note: These statuses are based on the first version of the RFC process.

Summary

This RFC proposes to extend the Cedar language, type system, and symbolic analysis to include full-featured entity tags for entity types. Tags are a mechanism used by cloud services to attach key-value pairs to resources. Cedar will allow them to be attached to any entities (not just resources).

For evaluation purposes, entity tags are accessed using the methods expr1.hasTag(expr2) and expr1.getTag(expr2), where expr1 evaluates to an entity and expr2 to a string. These are the only ways to access the tags on an entity. The policy syntax, semantics, type system, and symbolic analysis are extended to support these two methods. The schema syntax and the entity JSON format are extended to support defining entity tags and their types.

Entity tags strictly increase the expressiveness of Cedar because they allow computed keys, unlike records and entities. Specifically, tag keys need not be literals, so it is possible to write resource.hasTag(context.tag) and resource.getTag(context.tag).

This proposal is backward-compatible in the sense that existing polices, schemas, entities, etc., do not need to change to accommodate the addition of tags to the language.

This RFC has gone through several revisions, with different revisions proposing alternative designs. The alternatives are discussed at the end of this document.

Basic example

Here is a schema defining two entities, each of which contains entity tags.

entity User = { jobLevel: Long, } tags Set<String>; entity Document = { owner: User, } tags Set<String>;

The User and Document entities each have entity tags, denoted by tags Set<String>, implementing tags whose values are sets of strings. The declaration tags Set<String> indicates an unspecified number of optional tags that all have values of type Set<String>. (Of course, any type could be written, for instance Long, not just Set<String>.) Note that the set of tag (keys) does not have to be defined in the schema, but the type of all tag values needs to be the same (here, Set<String>). (You cannot have tag foo have value 2, and tag bar have value "spam".) Note also that there is no way to name the key-value map representing tags; you cannot treat it as a Cedar record or even compare it for equality to the tag-map on a different entity; you can only compare the values of individual keys.

Here's a policy that conforms to this schema:

permit ( principal is User, action == Action::"writeDoc", resource is Document) when { document.owner == principal || (principal.jobLevel > 6 && resource.hasTag("write") && principal.hasTag("write") && resource.getTag("write").containsAny(principal.getTag("write"))) };

This policy states that for a User to carry out the writeDoc action on a Document, the user must own the document, or else the user's job level must be at least 6 and the document and the user must each have a write tag, where at least one of the user's write-tag's values must be present in the document's write-tag's values.

Detailed design

To support entity tags, we need to extend the JSON format for entities, the JSON and natural syntax for schemas, the policy syntax, as well as the evaluator, validator, and symbolic compiler.

Policy syntax and semantics

Tags support two binary operators, where e and k are Cedar expressions:

e.hasTag(k) e.getTag(k)

The hasTag operator tests if the entity e has a tag with the key k. This operator errors if e is not an entity, or k is not a string. It returns false if e doesn't have any tags at all to match the behavior of the has operator on entities and records. In other words, having no tags is equivalent to having an empty tags-map. The getTag operator retrieves the value for the key k from the tags for the entity e. This operator errors under the same conditions as hasTag and additionally, if e has no tag with the key k.

At the code level, we will extend the Cedar CST, AST, and EST to include the new operators (either as separate nodes in the relevant trees or as two extra binary operators), and we'll extend the evaluator to implement the above semantics.

JSON entity format

The JSON entity format is extended to support optionally specifying tags separately from entity attributes:

[ { "uid": {}, "parents": [], "attrs": {}, "tags": {} }, { ... } ]

Schema

We extend schemas as follows to support the declaration of a tags. Here's the natural syntax:

Entity := 'entity' Idents ['in' EntOrTyps] [['='] EntityRecType] ['tags' Type] ';'

The new syntax simply extends the Entity production to enable optionally specifying the type of attached tags.

The JSON syntax for schemas specifies tags as a separate tags field that specifies the type of the tag values. Here's our introductory example schema in this format:

"entityTypes": { "User" : { "shape" : { "type" : "Record", "attributes" : { "jobLevel" : { "type" : "Long" }, } }, "tags" : { "type" : "Set", "element": { "type": "String" } } }, "Document" : { "shape" : { "type" : "Record", "attributes" : { "owner" : { "type" : "Entity", "name" : "User" }, } }, "tags" : { "type" : "Set", "element": { "type": "String" } } } }

Validating policies

We extend the policy validator to handle the hasTag and getTag operators analogously to how it handles the has and . operator on records and entities.

Capabilities

Background: While typechecking an expression involving records or entities with optional attributes, the validator tracks capabilities. These represent the attribute-accessing expressions that are sure to succeed. If principal has type User and User has an optional Boolean attribute sudo, then the expression principal.sudo only validates if principal.sudo is present in the current capability set. Capabilities are added to that set by a preceding expression of the form principal has sudo, e.g., as in principal has sudo && principal.sudo.

Capability tracking must be generalized to support tags. In particular, tags T should be treated as a record with optional attributes, and hasTag checks on its keys should update the capability set. For our introductory example, consider the following expression

resource.hasTag("write") && // (1) principal.hasTag("write") && // (2) resource.getTag("write").containsAny(principal.getTag("write")) // (3)

After the subexpression (1), resource.hasTag("write") should be added to the current capability set. After subexpression (2), principal.hasTag("write") should be added to it. Finally, when validating subexpression (3), the expression resource.getTag("write") will be considered valid since resource.hasTag("write") is in the current capability set and it will be given type Set<String>, as the tags for resource has type Set<String>. The expression principal.getTag("write") is handled similarly. If either of the hasTag subexpressions (1) or (2) were omitted, subexpression (3) would not validate due to the missing capability set entries.

For entity types with no tags declaration in the schema, the validator gives hasTag the type False (to support short-circuiting), and rejects getTag.

Validating and parsing entities

The Cedar authorizer's is_authorized function can be asked to validate that entities in a request are consistent with a provided schema. Extending validation to work with tags is straightforward. We check that every value in the tags map for an entity e has type T given that type of e is declared to have tags T. If the type e doesn't have tags in the schema, then specifying a tags field is an error.

Similarly, schema-based parsing considers schemas when parsing in entities, and it can confirm when parsing that the contents of tags attribute in the JSON have the appropriate shape, and if the type T has different behavior under schema-based parsing (for instance, if it is an extension type), then schema-based parsing can interpret the tags appropriately.

Symbolic compilation

Tags as defined here have an efficient logical representation as uninterpreted binary functions. In particular, for each declaration entity E ... tags T, we introduce an uninterpreted function f_E of type E -> String -> Option T. With this representation, we can straightforwardly translate the hasTag and getTag operations as applications of the function f_E, where the absence of a key is denoted by binding it to the value (none T).

Drawbacks

Entity types are permitted to have a single tags declaration tags T, which eliminates the possibility of attaching multiple tags-maps to a single entity (unlike RFC 68). This RFC also does not support tags containing other tags-maps, comparing tags-maps with ==, and the like (as discussed above).

The use-cases that we are aware of do not suffer due to these limitations. If you wanted tags-maps to contain other tags-maps, or you wanted to store tags-maps in the context record, or you wanted to attach multiple tags-maps to a single entity, you can create specific entity types with those tags. For example:

entity User { roles: Set<String>, accessLevels: IntTags } tags StringTags; entity StringTags { } tags String; entity IntTags { } tags Long;

In effect, the User's tags is equivalent to tags (tags String), but we have added a level of indirection by expressing the inner (tags String) value as an entity with tags String. Similarly, the User's accessLevels attribute is equivalent to (tags Long). So, we can express policies such as:

permit(principal is User, action, resource is Document) when { principal.roles.contains(context.role) && principal.accessLevels.hasTag(context.role) && principal.accessLevels.getTag(context.role) > 3 }; permit(principal is User, action, resource is Document) when { principal.hasTag("clearance") && principal.getTag("clearance").hasTag("level") && principal.getTag("clearance").getTag("level") == "top secret" };

Implementation effort

The implementation effort for adding tags is non-trivial, as it will require changing the Rust code, the Lean models and proofs, and the differential test generators. It will affect most components of Cedar, including the policy parser, CST→AST conversion, CST→EST conversion, (schema-based) entity parser, (partial) evaluator, validator, and the symbolic compiler.

While they affect most components of Cedar, these changes are relatively localized and straightforward, as outlined above.

Alternatives and workarounds

This feature has gone through several revisions and iterations. We compare this proposal to its predecessor, RFC 68, which discusses other alternatives and workarounds in detail.

RFC 68 proposes to add embedded attribute maps (EA-maps) to Cedar as a second-class, validation-time construct that behaves like a record with limited capabilities. Specifically, an entity attribute can have an EA-map type, but EA-maps cannot occur anywhere else (e.g., it is not possible to have a set of EA-maps). EA-maps are treated as special second-class values during validation. But they are indistinguishable from records during evaluation.

In RFC 68, the validator enforces the second-class status of EA-map records by rejecting (some) expressions that attempt to treat EA-maps as first class values. For example, the validator would reject principal.authTags == resource.authTags, where the authTags attribute of principal and resource is an EA-map of type { ? : Set<String> }. It would also reject (if principal.isAdmin then principal.authTags else principal.authTags).write. However, it would accept (if true then principal.authTags else resource.authTags).write due to how capabilities work in Cedar's type system. This behavior is confusing in that EA-maps don't appear to consistently act as second class values, and understanding when they do requires understanding the subtleties of how the Cedar type system works.

In RFC 68, the dynamic semantics is unaware of EA-maps. That is, the concept of EA-maps doesn't exist at evaluation time; they are just records. This leads to lower implementation burden (e.g., evaluator doesn't need to change), but it widens the gap between the dynamic and static (type-aware) semantics of Cedar. Specifically, EA-maps are first-class values during evaluation but not during validation.

EA-maps, as proposed in RFC 68, do not add any new expressiveness to Cedar. Since EA-maps are just records at evaluation time, computed keys (such as resource.getKey(context.tag)) are disallowed. Adding computed keys (Alternative A in RFC 68) requires roughly the same implementation effort as implementing this proposal. But a downside of adding computed keys to RFC 68 is that doing so would further widen the gap between the dynamic and static semantics, since computed keys would be rejected by the validator but not the evaluator on records that aren't EA-maps.

Finally, after modeling RFC 68 in Lean, we found ourselves in an engineering corner, where the type-based enforcement of second-class status of EA-maps makes it difficult to prove a key property of the validator (every policy that is accepted by the validator is also accepted by the symbolic compiler), and also to model and reason about EA-maps in every component that depends on Cedar types.

The core issue is that RFC 68 does not distinguish, at the syntax and semantics level, between records and EA-maps. This distinction must be enforced at the type level. In particular, we need to introduce a first-class EA-map type, which must then be treated as second-class in specific expressions during type checking and in proofs. The special-casing must be done for ==, if ... then ... else, hasAttr, getAttr, contains, containsAll, containsAny, set, and record expressions.

This special-casing will have to be repeated in every component that gives a static (type-aware) semantics to Cedar---such as the symbolic compiler and, potentially, a type-aware partial evaluator. It will also need to be propagated into specifications and proofs of type-aware components. This is expected to increase the overall implementation and proof burden by thousands of lines of Lean code compared to this proposal. (In the symbolic compiler, the burden is further magnified because the special-casing of EA-map types has to be propagated into the underlying Term language too.)

This proposal avoids the issues of RFC 68 by introducing dedicated syntactic and semantic constructs for working with tags. It is impossible, at the syntax level, to write a Cedar expression that evaluates to a (first-class) tags-map value. Rather, the contents of a tags-map are accessed by applying the hasTag and getTag operators to the entity that has the tags. These two operators are explicit in the AST, and they can be handled locally in the Lean model and proofs, without changing the handling of any other operator. This proposal also reflects tags in both the static and dynamic semantics, keeping their treatment consistent between the two.

Compared to RFC 68, this proposal represents a more substantial addition to the language, and it requires a significantly larger up-front implementation effort. The tradeoff is that investing in the implementation now means that we won't have to consistently pay an implementation and proof penalty for every new type-aware component we add to Cedar. And adopting the bigger language change now lets us keep the dynamic and static semantics consistent, freeing users from having to reason about the subtleties of Cedar's type system.