Reference

Note
This is very much still work in progress and does not necessarily describe the language as implemented in the GitHub repo.
Syntax definitions use an extended BNF, with ? ... ? blocks denoting regular expressions.

Lexical structure

Yo source code is written in ASCII. Some UTF-8 codepoints will probably work in identifiers and string literals, but there’s no proper handling for characters outside the ASCII character set.

Comments

There are two kinds of comments:

Tokens

The Yo lexer differentiates between the following kinds of tokens: keywords, identifiers, punctuation and literals.

Keywords

Yo reserves the following keywords:

break       else    impl    match       switch    while
continue    fn      in      operator    unless
decltype    for     let     return      use
defer       if      mut     struct      var

Identifiers

An identifier is a sequence of one or more letters or digits. The first element must not be a digit.

digit   = ? 0-9 ?
letter  = ? a-zA-Z_ ?
ident   = letter { letter | digit }

A sequence of characters that satisfies the ident pattern above and is not a reserved keyword is assumed to be an identifier.
All identifiers with two leading underscores are reserved and should be considered internal.

Operators and punctuation

The following character sequences represent operators and punctuation:

+    &    &&    ==    |>    (    )
-    |    ||    !=    =     {    }
*    ^          <     !     [    ]
/    <<         <=          .    ;
%    >>         >           ,    :
                >=

Literals

Numeric literals

An integer literal is a sequence of digits. Depending on the prefix, the literal is interpreted as base 2, 8, 10 or 16.

Prefix Base
0b binary
0o octal
0x hexadecimal
none decimal
bin_digit       = "0" | "1"
oct_digit       = ? 0-7 ?
dec_digit       = ? 0-9 ?
hex_digit       = ? 0-9a-f ?

bin_literal     = "0b" binary_digit { binary_digit }
oct_literal     = "0o" octal_digit { octal_digit }
dec_literal     = dec_digit { dec_digit }
hex_literal     = "0x" hex_digit { hex_digit }

flt_literal     = dec_literal "." dec_literal

Character literal

A character literal is a valid ascii codepoint, enclosed by single quotes.

String literals

A string literal is a sequence of valid ascii codepoints enclosed by double quotes.
There are multiple kinds of string literals:

The b and r prefixes can be combined to create a raw bytestring.

Literal Characters Type
"a\nb" a, \n, b String
r"a\nb" a, \, n, b String
b"a\nb" a, \n, b *i8
br"a\nb" a, \, n, b *i8

Modules

Yo source code is organized in modules. Every .yo file is considered a module, uniquely identified by its absolute path.

The use keyword, followed by a string literal, imports a module:

use "<path>";

Note that this is essentially the same as C++’s #include directive, as in that the parser will simply insert the contents of the imported module at the location of the use statement. If a module has already been imported, future imports of the same module will have no effect.

Path resolution

Import paths are resolved relative to the directory of the module containing the use statement.

Builtin modules

The /stdlib folder contains several builtin modules. These can be imported by prefixing the import name with a colon.

Builtin modules are bundled with the compiler, meaning that the actual stdlib/ files need not be present.
However, the -stdlib-root flag can be used to specify the base directory all imports with a path prefixed by : will be resolved to.

Example Importing a builtin module

use ":runtime/core";

Types

Primitive types

Yo defines the following primitive types:

Typename Size (bytes) Description Values
void 0 the void type n/a
u{N} N/8 unsigned integer type 0 ... 2^N-1
i{N} N/8 signed integer type -2^(N-1) ... 2^(N-1)-1
bool 1 the boolean type true, false
f32 4 IEEE-754 binary32 see wikipedia
f64 8 IEEE-754 binary64 see wikipedia

Integer types

For integer types u{N} and i{N}, valid sizes are: N = 8, 16, 32, 64.
An integer type’s signedness is indicated by its prefix: i8 is a signed integer, u8 an unsigned integer.

Floating-point types

There are two types for floating-point values: f32 and f64.
The f64 type represents a 64-bit wide IEEE-754 floating point value, the f32 type a 32-bit wide IEEE-754 floating point value.

Pointer types

A pointer to a value of type T is expressed by prefixing the type with an asterisk (*).

For example, the type *i32 denotes a pointer to an i32 values.
A pointer type’s base type must be of size > 0. Yo’s equivalent of a C void * is *i8.

Reference types

A reference to a base type T is expressed by prefixing the type with an ampersand (&).

See the lvalue references section for more info.

Function types

A function type represents all functions with the same parameter and result types:

        () -> void  // a function that has no parameters and returns nothing
(i32, i32) -> i64   // a function that takes two `i32` values and returns an `i64` value

A function type (ie, a function’s signature) only contains the types of the parameter and return types, it does not contain the names of the individual parameters or any attributes the actual function declaration might have.

decltype

decltype(<expr>)

The decltype construct can be used whenever the compiler would expect a type. It takes a single argument - an expression - and yields the type that expression would evaluate to. The expression is not evaluated.

decltype is useful in situations where it would otherwise be difficult or impossible to declare a type, for example when dealing with types that depend on template parameters.

Example

fn add<T, U>(x: T, y: U) -> decltype(x + y) {
    return x + y;
}

Typealias

use A = B;

The use keyword, followed by an identifier, introduces a typealias.

Functions

Function declaration

A function is declared using the fn keyword. A function declaration consists of:

A function’s return type may be omitted, in which case it defaults to void.

Example

// A simple function declaration
fn add(x: i64, y: i64) -> i64 {
    return x + y;
}

Function template

In the case of a function template declaration, the template parameters are listed in angled brackets, immediately prior to the function’s parameter list.

Example

// The identity function
fn id<T>(arg: T) -> T {
    return arg;
}

// The add function from above, as a function template
fn add<T>(x: T, y: T) -> T {
    return x + y;
}

See the templates section for more info.

Operator declaration

Since most operators are implemented as functions, they can be overloaded for a specific signature. An operator overload is declared as a function with the name operator, followed by the operator being overloaded.

The following operators may be overloaded:

+    &     &&    ==    ()
-    |     ||    !=    []
*    ^           <
/    <<          >
%    >>          <=
                 >=

Example Overloading the addition operator for a custom type:

fn operator + (x: Foo, y: Foo) -> Foo {
    // some custom addition logic
}

Overloading the call and subscript operators
A struct type may overload the call and subscript operators. These overloads must be defined in one of the type’s impl blocks, and match the signature requirements for instance methods.

The () operator may accept an arbitrary number of parameters. The [] operator must always accept exactly one parameter.

// Example: overloading the subscript operator
impl String {
    fn operator [] (self: &String, index: i64) -> &i8 {
        return self.data[index];
    }
}

Note When overloading comparison operators, implementing just == and < is sufficient, since all other operators have default implementations defined in terms of these two.

Overload resolution

When generating code for a function call, the compiler will collect a set of potential target for the call. From that set, the overload most closely matching the supplied arguments will be selected, based on a scoring system. A tie (ie, two or more equally likely targets) will result in a compile-time error.

Structs

Struct declaration

Custom types can be defined using the struct keyword. All struct types are uniquely identified by their name. A struct type can have properties and a set of member functions (methods) associated with it. Member functions are declared in one or multiple impl blocks.

Example Declaring a struct with properties and member functions

struct Person {
    name: String,
    age: i8
}

impl Person {
    // no `self` parameter -> static method
    fn me() -> Person {
        return Person("Lukas", 20);
    }

    // `self` parameter -> instance method
    fn increaseAge(self: &Self) {
        self.age += 1;
    }
}

Static methods

A static method is a function which can be called on the type itself. All function declarations in an impl block that are not instance methods are static methods.

Example

struct Foo {}

impl Foo {
    fn bar() -> i64 {
        return 123;
    }
}

Foo::bar();     // <- 123

Instance methods

An instance method is a function defined in a type’s impl block which a reference to the type as its first parameter.

struct Number {
    value: i32
}

impl Number {
    fn increment(self: &Self) {
        self.value += 1;
    }

    fn getValue(self: &Self) -> i32 {
        return self.value;
    }
}

let number = Number(10);
number.increment();
number.increment();
number.getValue();      // <- 12

Struct initialization

Unless explicitly disabled via the no_init attribute, the compiler synthesizes the following initialization functions for a struct type:

An initializer is an instance method named init which returns void. A type can define custom initializers simply by overloading init for different signatures.

Constructor
A type’s constructor is invoked simply by calling the type as if it were a function:

let array = Array<Int>();

Based on the arguments passed to the constructor, the compiler will forward the call to one of the type’s initializers.

Memberwise initializer
The synthesized memberwise initializer takes the same arguments as the type’s member fields, and simply sets the respective values. Note that if a type’s member is a reference, the compiler-generated memberwise initializer is the only option to initialize this reference (assigning to a reference member in a non-default initializer will set the object being referenced, as opposed to the reference itself).

Copy initializer
The copy initializer takes two references to the current type (self and another object). It is used to initialize an object from another instance of the same type, for example for constructing a copy when passing an object by-value to a function.

Struct destruction

A type may implement a dealloc instance method, which will be invoked by the compiler when destructing that instance. This method must take just the self parameter and return void.

Note custom dealloc methods will not be invoked for types that specify the no_init attribute.

Expressions

Every expression evaluates to a value of a specific type, which must be known at compile time.

Literals

Literal Type Example
Integer literal i64 12
Floating point literal f64 12.0
Character literal i8 'a'
String literal String "text"
String literal (bytestring) *i8 b"text"

Operators

Note Since most of the binary operators above are implemented as functions, they can be overloaded (see yo.decl.fn.operator)

Type conversions

All type conversions are required to be explicit: Attempting to pass an i64 to a function that expects an u64 will result in a compilation error.

Implicit conversions
The sole exception to this rule is numeric literals: Even though numeric literals by default evaluate to values of type i64, you may use a literal in an expression that expects a different numeric type, and the compiler will implicitly cast the literal.

Explicit conversions
There are two intrinsics for converting a value from one type to another:

Example

fn foo() -> i32 {
    let x = 0; // x has the deduced type i64
    return x;  // this will fail since the function is expected to return an i64
}

fn bar() -> i32 {
    return 0; // this will work fine since the compiler is allowed to insert an implicit static_cast<i32>
}

Lambdas

A lambda expression constructs an anynomous function.

Like a “normal” function, a lambda has a fixed set of inputs and an output.
In addition, a lambda can also capture variables from outside its own scope (these captures must be explicitly declared in the lambda’s capture list).

There is no uniform type for lambda objects, instead the compiler will generate an anonymous type for each lambda expression.

Syntax

lambda_expr        = capture_list [tmpl_params] signature fn_body
capture_list       = "[" "]" | "[" capture_list_elem { "," capture_list_elem } "]"
capture_list_elem  = ["&"] ident [ "=" expr ]

Example

// a noop lambda: no input, no output, does nothing
let f1 = []() {};

// a lambda which adds two integers
let f2 = [](x: i64, y: i64) -> i64 {
    return x + y;
};

// a lambda which adds two values of the same type
let f3 = []<T>(x: T, y: T) -> T {
    return x + y;
};

// a lambda which captures an object by reference, and increments it
let x = 0;
let f4 = [&x](inc: i64) {
    x += inc;
};

Attributes

Attributes can be used to provide the compiler with additional knowledge about a declaration.

Syntax

attr_list = "#[" attr { "," attr } "]"
attr      = ident [ "=" attr_val ]
attr_val  = ident | string

A declaration that can have attributes can be preceded by one or multiple attribute lists. Splitting multiple attributes up into multiple separate attribute lists is semantically equivalent to putting them all in a single list.

Note Specifying the same attribute multiple times with different values is considered undefined behaviour.

Attribute Types

Function Attributes

Name Type Description
extern bool C linkage
inline bool Function may be inlined
always_inline bool Function should always be inlined
intrinsic bool (internal) declares a compile-time intrinsic
no_mangle bool Don’t mangle the function’s name
mangle string Override a function’s mangled name
startup bool Causes the function to be called before execution enters main
shutdown bool Causes the function to be called after main returns

Note

Example

// Forward-declaring a function with external C linkage.
#[extern]
fn strcmp(*i8, *i8) -> i32;

// A function with an explicitly set mangled name
#[mangle="bar"]
fn foo() -> void { ... }

Struct Attributes

Name Type Description
no_init bool The compiler should not generate a default initializer for the type

Intrinsics

A function declared with the intrinsic attribute is considered a compile-time intrinsic. Calls to intrinsic functions will receive special handling by the compiler. All intrinsic functions are declared in the :runtime/intrinsics module.

An intrinsic function may be overloaded with a custom implementation, in this case the overload must not declare the intrinsic attribute.

LValue references

todo

Templates

Templates provide a way to declare a generic implementation of a struct or function.

Syntax

tmpl_params = "<" tmpl_param { "," tmpl_param } ">"
tmpl_param  = ident [ "=" type ]

Template Parameters

A template parameter list consists of one or more template parameters.
In its simplest form, a template parameter is just an identifier, to which the template argument used for the instantiation will be bound for the scope of the template declaration. Alternatively, however, a parameter can also have a default value.

Example A simple identity function

fn id<T>(arg: T) -> T {
    return arg;
}

Template Arguments

In order to instantiate a function or struct template, all template arguments must be known. This is achieved by either explicitly specifying the arguments in the template instantiation (see the tmpl_args rule), or, in the case of calls to function templates, by letting the compiler deduce the argument types from context.

Note Template argument deduction is not supported for calls to the constructor of a struct template. In this case all template arguments need to be explicitly specified (with the possible exception of template parameters which define a default value).

Template Argument Deduction

If a template parameter’s value is explicitly specified in the template instantiation, that argument will be used, regardless of a possible default value, or other information that might be deduced from context.

For each template parameter P which is not explicitly specified in the instantiation, the compiler will attempt to deduce the template argument from context, using the call’s arguments.

The following rules and adjustments apply during deduction:

All template parameters P which were not deduced, but also didn’t produce any deduction failures, and specify a default value T, will be deduced as that default value T.

Examples

// Consider the following function, specifying one template parameter T
fn add<T>(x: T, y: T) -> T {
    return x + y;
}

// Explicit template arguments:
add<i64>(1, 2); // No deduction, T = i64

// Deduced template arguments:
add(1, 2);      // T deduced as i64

let x: i32 = 1;
let y: i64 = 2;
add(1, y);      // T deduced as i32 (initially deduced as i64, then overwritten by non-literal argument)
add(x, y);      // T fails to deduce (initially deduced as i32, then again deduced to incompatible type i64)

Template Codegen

Templates don’t exist “on their own”: No code is generated when you only declare, but never instantiate a template.
When the compiler encounters an instantiation of a struct template or a call to a function template, it generates a specialized version for the supplied generic arguments.

// A function template
fn add<T>(x: T, y: T) -> T {
    return x + y;
}

Function specializations can be declared simply by overloadding the function for a specific signature.

Memory Management

Yo implements C++-style RAII. Types can define a custom copy initializer, which will be invoked when constructing a copy of an object, and a dealloc method, which will be invoked when an object goes out of scope.

Full Syntax

The syntax grammars in this document use extended BNF, with the following modifications:

(* regular expressions *)
? P ? := all values matched by a regular expression with the pattern P

(* syntax shorthand for a (possibly empty) comma separated list *)
L(R) := [ R { "," R } ]

(* syntax shorthand for a repeated rule which may not be omitted *)
{R}+ := R {R}

Syntax describing the Yo programming language:

digit  = ? 0-9 ?
letter = ? a-zA-Z_ ?
ident  = letter {letter | digit}

string = ["b" | "r" | "br"] '"' {char} '"'

dec_literal = {? 0-9 ?}+

number =
      "0b" {"0" | "1"}+
    | "0o" {? 0-7 ?}+
    | dec_literal
    | 0x" {? 0-9a-f ?}+
    | dec_literal "." dec_literal
    | "true" | "false"


import    = "use" string ";"
typealias = "use" ident "=" type ";"

attr_list = "#[" L(attr) "]"
attr      = ident ["=" (ident | string)]

tmpl_params = "<" L(tmpl_param) ">"
tmpl_param  = ident ["=" type]
tmpl_args   = "<" L(type) ">"

type =
      ident [tmpl_args]
    | "*" type
    | "&" type
    | "(" L(type) ")" "->" type

param_list = L(ident ":" type)

struct_decl = [attr_decl] "struct" ident [tmpl_params] "{" param_list "}"
impl_block = "impl" ident "{" {fn_decl} "}"

fn_sig = [tmpl_params] "(" param_list ")" ["->" type]

fn_decl = [attr_decl] "fn" ("operator" op | ident) fn_sig compound_stmt



(* Expressions *)

capture_elem = ["&"] ident ["=" expr]
lambda = "[" L(capture_elem) "]" fn_sig compound_stmt

expr =
      number
    | string
    | ident
    | lambda
    | expr "." ident
    | expr "[" expr "]"
    | expr [tmpl_args] "(" L(expr) ")"
    | unop expr
    | expr binop expr

unop = "-" | "~" | "!" | "&"

binop =
      "+" | "-" | "*" | "/" | "%"
    | "&" | "|" | "^" | "<<" | ">>"
    | "&&" | "||"
    | "==" | "!=" | "<" | "<=" | ">" | ">="
    | "|>"



(* Local Statements *)

compound_stmt = "{" {local_stmt} "}"
var_decl      = "let" ["&"] ident [":" type] ["=" expr] ";"
if_stmt       = "if" expr compound_stmt {"else" "if" expr compound_stmt} ["else" compound_stmt]
while_stmt    = "while" expr compound_stmt
for_stmt      = "for" ["&"] ident "in" expr compound_stmt
expr_stmt     = expr ";"
return_stmt   = "return" [expr] ";"
assignment    = expr "=" expr ";"

local_stmt =
      compound_stmt
    | var_decl
    | assignment
    | if_stmt
    | while_stmt
    | for_stmt
    | expr_stmt
    | return_stmt



(* Program *)

top_level_stmt =
      import
    | typealias
    | struct_decl
    | impl_block
    | fn_decl

program = {top_level_stmt}