Syntax Overview
Tokens
Layout
A colon begins a layout block, which contains every token or bracketed item indented at least as much as the first token following the colon.
foo:
bar
baz
quux
// ===
foo {
bar
baz
quux
}
foo: bar
baz
quux
// ===
foo { bar
baz
quux }
Comments
// Single-line comment.
/*
Multi-line
comment.
*/
/*
Nested
/* multi-line */
comment.
*/
Literals
Integers
Decimal integer literals consist of one or more decimal digits. Binary, octal, and hexadecimal literals begin with the appropriate prefix: 0b
, 0o
, or 0x
, respectively:
0
1
123
0b1100 // 12
0o777 // 511
0xFF // 255
0xabcd // 43981
By default, integers have type Int32
—the type of a literal is not inferred or polymorphic. You can select a different type with a suffix consisting of i
(for signed integers) or u
(for unsigned) followed by a number of bits. A Kitten implementation is guaranteed to support 8
, 16
, 32
, and 64
-bit integer sizes, but may support additional sizes such as 128-bit or arbitrary-precision integers.
1 // Int32
1i8 // Int8
1i16 // Int16
1i32 // Int32
1i64 // Int64
1u8 // UInt8
1u16 // UInt16
1u32 // UInt32
1u64 // UInt64
Integer literals may be prefixed with a sign character, +
, -
, or −
(U+2122 MINUS SIGN).
Floating-point Numbers
Floating-point numbers consist of one or more decimal digits separated by a decimal point, and optionally followed by a decimal exponent in scientific notation.
1.0
0.5
.5 // 0.5f64
1. // 1.0f64
3.14
1.0e+6 // 1000000.0
The default type of a floating-point literal is Float64
. You can select a different type with a suffix of f32
or f64
—implementations may support additional types such as f80
.
1.0 // Float64
1.0f64 // Float64
1.0f32 // Float32
Like integer literals, floating-point literals (and their exponent parts) may be prefixed with a sign character.
Characters
A character literal consists of a single Unicode code point or character escape, surrounded by single quotes (apostrophes). Character literals have type Char
.
'a'
'é'
'\n'
Valid character escapes include:
\a
—BEL (U+0007)\b
—BS (U+0008)\f
—FF (U+000C)\n
—LF (U+000A)\r
—CR (U+000D)\t
—TAB (U+0009)\v
—VT (U+000B)\'
—'
(U+0027)\"
—"
(U+0022)\\
—\
(U+005C)
You can use ‘
U+2018 LEFT SINGLE QUOTATION MARK and ’
U+2019 RIGHT SINGLE QUOTATION MARK instead of apostrophes.
Lists
A list literal consists of a series of comma-separated terms between square brackets. A trailing comma is allowed.
[] // <T> List<T>
[1] // List<Int32>
[1.0, 2.0] // List<Float64>
[
'a',
'b',
'c',
] // List<Char>
Text
A text literal consists of a series of zero or more Unicode characters or escapes surrounded by double quotes. It has type List<Char>
.
""
"meow"
"foo\n\tbar\n"
You can use “
U+201C LEFT DOUBLE QUOTATION MARK and U+201D RIGHT DOUBLE QUOTATION MARK instead of apostrophes. Unlike ASCII double quotes, the Unicode quotes can be nested, as long as they match—that is, all opening quotes have a corresponding closing quote before the end of the literal:
“”
“meow”
“He said, “Meow”.”
Paragraphs
A paragraph literal is a text literal that may span multiple lines. It begins with three double quotation marks """
followed by a newline, and ends with three double-quotes as well. All lines in a paragraph literal must begin with the same whitespace prefix, or be blank. This indentation is stripped from the resulting text. Paragraph literals are often used for documentation.
about foo:
docs: """
This is a paragraph literal.
It spans multiple lines.
"""
// "This is a paragraph literal.\n\nIt spans multiple lines."
"""
This one ends with a newline.
"""
// "This one ends with a newline.\n"
Quotations
A quotation is a series of zero or more terms surrounded by curly brackets, or preceded by a colon and delimited by indentation. It may allocate a reference-counted closure.
{} // <R...> (R... -> R...)
{ 1 } // <R...> (R... -> R..., Int32)
// <R...> (R... -> R..., Int32, Int32 +IO)
:
1
2
"foo" say
Terms
Words
A word is an identifier that refers to a function, such as say
or map
. There are a number of built-in words:
call
invokes a closure<R..., S..., +P> (R..., (R... -> S... +P) -> S... +P)
<R…, S…> (R…, (R… → S…) → S…)
jump
tail-calls a closurereturn
jumps to the end of the current definitionloop
jumps to the start of the current definition
Locals
Local variables switch from function-level programming to data-level programming by moving a value from the stack into a local variable. They consist of a rightward arrow followed by a comma-separated list of one or more names, terminated with a semicolon.
"foo" -> name;
name say
1 2 3 -> x, y, z;
(x + y + z) say
Instead of a semicolon, a block may be specified; this is syntactic sugar for a quotation beginning with local variable introductions.
-> x { x + 1 }
// ===
{ -> x; x + 1 }
This is intended to combine with case
branches and do
blocks:
match (something_optional)
case some -> x:
x say
…
// ===
match (something_optional)
case some:
-> x;
x say
…
do (map) -> x:
x + 1
// ===
{ -> x; x + 1 } map
do
do
allows higher-order functions to be used as prefix control-flow syntax.
do (f) { g }
// ===
{ g } f
match
match
is the inverse of a constructor: while a constructor takes fields from the stack and constructs an instance of an algebraic data type, match
takes an ADT from the stack, dispatches on its tag, and expands the fields back onto the stack.
match (scrutinee)
case constructor_1 -> field_1, field_2, …:
…
case constructor_2:
…
else:
…
If no else
branch is specified, the default branch calls abort
, making the match
expression require the +Fail
permission. If all constructors are covered by case
branches, then the else
branch is redundant and +Fail
is not required.
if
if
evaluates a Boolean condition. If true, it evaluates the true branch. Otherwise, it evaluates the else
branch, if one is present. If no else
branch is specified, it’s equivalent to else {}
. You can specify multiple conditions by adding elif (condition) { block }
clauses. if
is syntactic sugar for match
.
if (condition_1) {
branch_1
} elif (condition_2) {
branch_2
} else {
false_branch
}
// ===
condition_1 match
case true {
branch_1
} case false {
condition_2 match
case true {
branch_2
} case false {
false_branch
}
}
The condition of an if
may be drawn from the stack.
condition
if {
true_branch
} else {
false_branch
}
condition
if {
true_branch
}
Program Elements
Vocabularies
A vocabulary is a group of related names, introduced with the vocab
keyword followed by a name and a block. Vocabularies may be nested.
vocab math:
define successor (Int32 -> Int32):
(+ 1)
vocab experimental:
define predecessor (Int32 -> Int32):
(- 1)
// ==
define math::successor (Int32 -> Int32):
(+ 1)
define math::experimental::predecessor (Int32 -> Int32):
(- 1)
To reduce nesting, the block may be replaced with a semicolon, in which case all following code until the next vocab
element is placed in that vocabulary.
vocab math;
define successor (Int32 -> Int32):
(+ 1)
vocab math::experimental;
define predecessor (Int32 -> Int32):
(- 1)
Word Definitions
A word is a user-defined name for a function or infix operator.
define double (Int32 -> Int32):
2 (*)
A definition begins with the define
keyword, followed by a type signature and a block.
Metadata
An about
block contains a set of key-value pairs, where the keys are Kitten identifiers and the values are untyped terms. It’s intended to subsume the special syntax, pragmas, and magic comments that other languages use for denoting metadata.
about +:
docs: """
The operation of an additive monoid
with `zero` as the identity.
"""
operator:
left 6
inline:
always
Types
A type describes a user-defined shape of data constructed with sums and products of primitive types. There are three basic kinds of types, using the same notation:
// Enumerations
type Bool:
case false
case true
// Structures
type Pair<A, B>:
case pair (A, B)
// Tagged Unions
type Optional<T>:
case none
case some (T)
It’s possible to explicitly specify that a constructor takes no arguments.
type Bool:
case false ()
case true ()
type Optional<T>:
case none ()
case some (T)
Permissions
A permission is a word that grants permission to do something to a closure that needs that permission to run.
permission Locked
<R..., S...> (R... -> (R... -> S... +Locked) -> S...)
{
take_lock
call
release_lock
}
Traits and Instances
A trait definition declares a generic function, which may have different implementations for different concrete types:
trait show<T> (T -> List<Char>)
These implementations are called instances of the trait, and the type of an instance must be an instance of the signature of its parent trait.
instance show (List<Char> -> List<Char>)
instance show<T> (List<T> -> List<T>)
Note: generic instances not implemented.
Synonyms
A synonym is an alias for an existing name.
synonym name (existing_name)
Type Signatures
Type signatures are one of the most complex areas of Kitten’s syntax, but they use familiar conventions from other languages and provide syntactic sugar to remain readable.
All definitions denote functions, so function types are among the first types you will use. They are represented with a rightwards arrow (->
or Unicode →
), with the types of the inputs and outputs written as comma-separated lists on the left and right sides, respectively.
// Basic function with one input and output
define inc (Int32 -> Int32):
(+ 1)
// Multiple inputs
define add (Int32, Int32 -> Int32):
(+)
// Multiple outputs
define inc_dec (Int32 -> Int32, Int32):
-> x;
(x + 1) (x - 1)
// No inputs
define two (-> Int32):
2
// No outputs
define drop_int (Int32 ->):
-> _;
A function type accepts a set of permissions, which represent actions the function is allowed to take. Permissions are written with a plus sign followed by a permission name, such as +IO
, written after a function’s return types:
define yell (List<Char> -> +IO):
(+ "!") say
define ask (List<Char> -> List<Char> +IO +Fail):
print get_line -> answer;
if (answer empty):
"no input" fail
else:
answer
Words in Kitten are often generic, able to operate on values of any type. A generic type begins with a quantifier, consisting of a comma-separated list of type variable names in angle brackets (<>
). By convention, type variable names are typically named with single capital letters.
// Duplicate a value of any type.
define dup<T> (T -> T, T):
-> x; x x
// Swap two values of any types.
define swap<A, B> (A, B -> B, A):
-> x, y; y x
The quantifier part <T>
is written with no space between it and the word name (e.g., dup<T>
) in order to mimic the generic type syntax of other programming languages. However, it’s important to understand that the quantifier is associated with the type, not the name: it’s dup
and <T> (T -> T, T)
, not dup<T>
and (T -> T, T)
.
There are a few basic kinds of type variables. Value type variables (such as T
, A
, and B
above) are those that refer to a type like Int32
or List<Char>
, which are inhabited by values. Stack type variables are suffixed with an ellipsis (...
or Unicode …
), and refer to a series of types on the stack. For example, the type of the call
word is:
<R..., S...> (R..., (R... -> S...) -> S...)
Which means that it takes some stack R...
, with a closure on top of type R... -> S...
, and applies the closure to the stack to produce the result stack S...
. All functions are generic in the part of the stack that they don’t touch, so a type like Int32 -> Int32, Int32
is syntactic sugar for <S...> (S..., Int32 -> S..., Int32, Int32)
—a function with this type takes any stack S...
with an Int32
on top, and returns the same stack S...
with two Int32
values on top.
Finally, permission type variables are prefixed with a plus sign (+
), and refer to sets of permissions such as +IO
or +Unsafe +Fail
. All functions are generic in the permissions that they don’t use, so a type like Int32 -> Int32, Int32
is syntactic sugar for <+P> (Int32 -> Int32, Int32 +P)
. By default, all function types in the same type signature are given the same implicit permission type variable. Take the type of map
:
<A, B> (List<A>, (A -> B) -> List<B>)
This is syntactic sugar for:
<A, B, +P> (List<A>, (A -> B +P) -> List<B> +P)
Which means that map
requires the same set of permissions +P
as the function you pass to it, because map
calls that function.