Semantic Subtyping in Luau - Roblox Blog

Luau is the primary programming language to place the ability of semantic subtyping in the palms of thousands and thousands of creators.

Minimizing false positives

One of the problems with sort error reporting in instruments just like the Script Analysis widget in Roblox Studio is false positives. These are warnings which are artifacts of the evaluation, and don’t correspond to errors which may happen at runtime. For instance, this system

native x = CFrame.new()
native y
if (math.random()) then
  y = CFrame.new()
else
  y = Vector3.new()
finish
native z = x * y

reviews a kind error which can’t occur at runtime, since CFrame helps multiplication by each Vector3 and CFrame. (Its sort is ((CFrame, CFrame) -> CFrame) & ((CFrame, Vector3) -> Vector3).)

False positives are particularly poor for onboarding new customers. If a kind-curious creator switches on typechecking and is straight away confronted with a wall of spurious purple squiggles, there’s a sturdy incentive to instantly swap it off once more.

Inaccuracies in sort errors are inevitable, since it’s inconceivable to determine forward of time whether or not a runtime error might be triggered. Type system designers have to decide on whether or not to stay with false positives or false negatives. In Luau that is decided by the mode: strict mode errs on the aspect of false positives, and nonstrict mode errs on the aspect of false negatives.

While inaccuracies are inevitable, we attempt to take away them every time doable, since they outcome in spurious errors, and imprecision in sort-pushed tooling like autocomplete or API documentation.

Subtyping as a supply of false positives

One of the sources of false positives in Luau (and plenty of different related languages like TypeScript or Flow) is subtyping. Subtyping is used every time a variable is initialized or assigned to, and every time a perform is named: the sort system checks that the kind of the expression is a subtype of the kind of the variable. For instance, if we add varieties to the above program

native x : CFrame = CFrame.new()
native y : Vector3 | CFrame
if (math.random()) then
  y = CFrame.new()
else
  y = Vector3.new()
finish
native z : Vector3 | CFrame = x * y

then the sort system checks that the kind of CFrame multiplication is a subtype of (CFrame, Vector3 | CFrame) -> (Vector3 | CFrame).

Subtyping is a really helpful function, and it helps wealthy sort constructs like sort union (T | U) and intersection (T & U). For instance, quantity? is carried out as a union sort (quantity | nil), inhabited by values which are both numbers or nil.

Unfortunately, the interplay of subtyping with intersection and union varieties can have odd outcomes. A easy (however slightly synthetic) case in older Luau was:

native x : (quantity?) & (string?) = nil
native y : nil = nil
y = x -- Type '(quantity?) & (string?)' couldn't be transformed into 'nil'
x = y

This error is brought on by a failure of subtyping, the outdated subtyping algorithm reviews that (quantity?) & (string?) will not be a subtype of nil. This is a false optimistic, since quantity & string is uninhabited, so the one doable inhabitant of (quantity?) & (string?) is nil.

This is a man-made instance, however there are actual points raised by creators brought on by the issues, for instance https://devforum.roblox.com/t/luau-recap-july-2021/1382101/5. Currently, these points largely have an effect on creators making use of refined sort system options, however as we make sort inference extra correct, union and intersection varieties will develop into extra widespread, even in code with no sort annotations.

This class of false positives not happens in Luau, as now we have moved from our outdated method of syntactic subtyping to another referred to as semantic subtyping.

Syntactic subtyping

AKA “what we did before.”

Syntactic subtyping is a syntax-directed recursive algorithm. The attention-grabbing instances to take care of intersection and union varieties are:

Reflexivity: T is a subtype of T
Intersection L: (T₁ & … & Tⱼ) is a subtype of U every time a number of the Tᵢ are subtypes of U
Union L: (T₁ | … | Tⱼ) is a subtype of U every time the entire Tᵢ are subtypes of U
Intersection R: T is a subtype of (U₁ & … & Uⱼ) every time T is a subtype of the entire Uᵢ
Union R: T is a subtype of (U₁ | … | Uⱼ) every time T is a subtype of a number of the Uᵢ.

For instance:

By Reflexivity: nil is a subtype of nil
so by Union R: nil is a subtype of quantity?
and: nil is a subtype of string?
so by Intersection R: nil is a subtype of (quantity?) & (string?).

Yay! Unfortunately, utilizing these guidelines:

quantity isn’t a subtype of nil
so by Union L: (quantity?) isn’t a subtype of nil
and: string isn’t a subtype of nil
so by Union L: (string?) isn’t a subtype of nil
so by Intersection L: (quantity?) & (string?) isn’t a subtype of nil.

This is typical of syntactic subtyping: when it returns a “yes” outcome, it’s appropriate, however when it returns a “no” outcome, it is likely to be flawed. The algorithm is a conservative approximation, and since a “no” outcome can result in sort errors, it is a supply of false positives.

Semantic subtyping

AKA “what we do now.”

Rather than pondering of subtyping as being syntax-directed, we first take into account its semantics, and later return to how the semantics is carried out. For this, we undertake semantic subtyping:

The semantics of a kind is a set of values.
Intersection varieties are considered intersections of units.
Union varieties are considered unions of units.
Subtyping is considered set inclusion.

For instance:

Type	Semantics
`quantity`	{ 1, 2, 3, … }
`string`	{ “foo”, “bar”, … }
`nil`	{ nil }
`quantity?`	{ nil, 1, 2, 3, … }
`string?`	{ nil, “foo”, “bar”, … }
`(quantity?) & (string?)`	{ nil, 1, 2, 3, … } ∩ { nil, “foo”, “bar”, … } = { nil }

and since subtypes are interpreted as set inclusions:

Subtype	Supertype	Because
`nil`	`quantity?`	{ nil } ⊆ { nil, 1, 2, 3, … }
`nil`	`string?`	{ nil } ⊆ { nil, “foo”, “bar”, … }
`nil`	`(quantity?) & (string?)`	{ nil } ⊆ { nil }
`(quantity?) & (string?)`	`nil`	{ nil } ⊆ { nil }

So in response to semantic subtyping, (quantity?) & (string?) is equal to nil, however syntactic subtyping solely helps one course.

This is all superb and good, but when we need to use semantic subtyping in instruments, we want an algorithm, and it seems checking semantic subtyping is non-trivial.

Semantic subtyping is tough

NP-exhausting to be exact.

We can scale back graph coloring to semantic subtyping by coding up a graph as a Luau sort such that checking subtyping on varieties has the identical outcome as checking for the impossibility of coloring the graph

For instance, coloring a 3-node, two colour graph may be accomplished utilizing varieties:

sort Red = "red"
sort Blue = "blue"
sort Color = Red | Blue
sort Coloring = (Color) -> (Color) -> (Color) -> boolean
sort Uncolorable = (Color) -> (Color) -> (Color) -> false

Then a graph may be encoded as an overload perform sort with subtype Uncolorable and supertype Coloring, as an overloaded perform which returns false when a constraint is violated. Each overload encodes one constraint. For instance a line has constraints saying that adjoining nodes can’t have the identical colour:

sort Line = Coloring
  & ((Red) -> (Red) -> (Color) -> false)
  & ((Blue) -> (Blue) -> (Color) -> false)
  & ((Color) -> (Red) -> (Red) -> false)
  & ((Color) -> (Blue) -> (Blue) -> false)

A triangle is analogous, however the finish factors additionally can’t have the identical colour:

sort Triangle = Line
  & ((Red) -> (Color) -> (Red) -> false)
  & ((Blue) -> (Color) -> (Blue) -> false)

Now, Triangle is a subtype of Uncolorable, however Line will not be, for the reason that line may be 2-coloured. This may be generalized to any finite graph with any finite variety of colours, and so subtype checking is NP-exhausting.

We take care of this in two methods:

we cache varieties to cut back reminiscence footprint, and
surrender with a “Code Too Complex” error if the cache of varieties will get too giant.

Hopefully this doesn’t come up in follow a lot. There is sweet proof that points like this don’t come up in follow from expertise with sort techniques like that of Standard ML, which is EXPTIME-full, however in follow it’s important to exit of your solution to code up Turing Machine tapes as varieties.

Type normalization

The algorithm used to determine semantic subtyping is sort normalization. Rather than being directed by syntax, we first rewrite varieties to be normalized, then verify subtyping on normalized varieties.

A normalized sort is a union of:

a normalized nil sort (both by no means or nil)
a normalized quantity sort (both by no means or quantity)
a normalized boolean sort (both by no means or true or false or boolean)
a normalized perform sort (both by no means or an intersection of perform varieties) and many others

Once varieties are normalized, it’s simple to verify semantic subtyping.

Every sort may be normalized (sigh, with some technical restrictions round generic sort packs). The vital steps are:

eradicating intersections of mismatched primitives, e.g. quantity & bool is changed by by no means, and
eradicating unions of capabilities, e.g. ((quantity?) -> quantity) | ((string?) -> string) is changed by (nil) -> (quantity | string).

For instance, normalizing (quantity?) & (string?) removes quantity & string, so all that’s left is nil.

Our first try at implementing sort normalization utilized it liberally, however this resulted in dreadful efficiency (complicated code went from typechecking in lower than a minute to operating in a single day). The motive for that is annoyingly easy: there’s an optimization in Luau’s subtyping algorithm to deal with reflexivity (T is a subtype of T) that performs an inexpensive pointer equality verify. Type normalization can convert pointer-similar varieties into semantically-equal (however not pointer-similar) varieties, which considerably degrades efficiency.

Because of those efficiency points, we nonetheless use syntactic subtyping as our first verify for subtyping, and solely carry out sort normalization if the syntactic algorithm fails. This is sound, as a result of syntactic subtyping is a conservative approximation to semantic subtyping.

Pragmatic semantic subtyping

Off-the-shelf semantic subtyping is barely totally different from what’s carried out in Luau, as a result of it requires fashions to be set-theoretic, which requires that inhabitants of perform varieties “act like functions.” There are two the explanation why we drop this requirement.

Firstly, we normalize perform varieties to an intersection of capabilities, for instance a horrible mess of unions and intersections of capabilities:

((quantity?) -> quantity?) | (((quantity) -> quantity) & ((string?) -> string?))

normalizes to an overloaded perform:

((quantity) -> quantity?) & ((nil) -> (quantity | string)?)

Set-theoretic semantic subtyping doesn’t help this normalization, and as an alternative normalizes capabilities to disjunctive regular kind (unions of intersections of capabilities). We don’t do that for ergonomic causes: overloaded capabilities are idiomatic in Luau, however DNF will not be, and we don’t need to current customers with such non-idiomatic varieties.

Our normalization depends on rewriting away unions of perform varieties:

((A) -> B) | ((C) -> D)   →   (A & C) -> (B | D)

This normalization is sound in our mannequin, however not in set-theoretic fashions.

Secondly, in Luau, the kind of a perform utility f(x) is B if f has sort (A) -> B and x has sort A. Unexpectedly, this isn’t at all times true in set-theoretic fashions, on account of uninhabited varieties. In set-theoretic fashions, if x has sort by no means then f(x) has sort by no means. We don’t need to burden customers with the concept perform utility has a particular nook case, particularly since that nook case can solely come up in useless code.

In set-theoretic fashions, (by no means) -> A is a subtype of (by no means) -> B, it doesn’t matter what A and B are. This will not be true in Luau.

For these two causes (that are largely about ergonomics slightly than something technical) we drop the set-theoretic requirement, and use pragmatic semantic subtyping.

Negation varieties

The different distinction between Luau’s sort system and off-the-shelf semantic subtyping is that Luau doesn’t help all negated varieties.

The widespread case for wanting negated varieties is in typechecking conditionals:

-- initially x has sort T
if (sort(x) == "string") then
  --  in this department x has sort T & string
else
  -- in this department x has sort T & ~string
finish

This makes use of a negated sort ~string inhabited by values that aren’t strings.

In Luau, we solely permit this type of typing refinement on take a look at varieties like string, perform, Part and so forth, and not on structural varieties like (A) -> B, which avoids the widespread case of basic negated varieties.

Prototyping and verification

During the design of Luau’s semantic subtyping algorithm, there have been modifications made (for instance initially we thought we have been going to have the ability to use set-theoretic subtyping). During this time of fast change, it was vital to have the ability to iterate rapidly, so we initially carried out a prototype slightly than leaping straight to a manufacturing implementation.

Validating the prototype was vital, since subtyping algorithms can have surprising nook instances. For this motive, we adopted Agda because the prototyping language. As effectively as supporting unit testing, Agda helps mechanized verification, so we’re assured in the design.

The prototype doesn’t implement all of Luau, simply the purposeful subset, however this was sufficient to find delicate function interactions that will in all probability have surfaced as tough-to-repair bugs in manufacturing.

Prototyping will not be excellent, for instance the principle points that we hit in manufacturing have been about efficiency and the C++ customary library, that are by no means going to be caught by a prototype. But the manufacturing implementation was in any other case pretty simple (or a minimum of as simple as a 3kLOC change may be).

Next steps

Semantic subtyping has eliminated one supply of false positives, however we nonetheless have others to trace down:

Overloaded perform functions and operators
Property entry on expressions of complicated sort
Read-only properties of tables
Variables that change sort over time (aka typestates)

The quest to take away spurious purple squiggles continues!

Acknowledgments

Thanks to Giuseppe Castagna and Ben Greenman for useful feedback on drafts of this publish.

Alan coordinates the design and implementation of the Luau sort system, which helps drive most of the options of growth in Roblox Studio. Dr. Jeffrey has over 30 years of expertise with analysis in programming languages, has been an lively member of quite a few open-supply software program initiatives, and holds a DPhil from the University of Oxford, England.

Source link