elipl/project.md
Kacper Marzecki 748f87636a checkpoint
checkpoint

failing test

after fixing tests

checkpoint

checkpoint

checkpoint

re-work

asd

checkpoint

checkpoint

checkpoint

mix proj

checkpoint mix

first parser impl

checkpoint

fix tests

re-org parser

checkpoint strings

fix multiline strings

tuples

checkpoint maps

checkpoint

checkpoint

checkpoint

checkpoint

fix weird eof expression parse error

checkpoint before typing

checkpoint

checpoint

checkpoint

checkpoint

checkpoint ids in primitive types

checkpoint

checkpoint

fix tests

initial annotation

checkpoint

checkpoint

checkpoint

union subtyping

conventions

refactor - split typer

typing tuples

checkpoint test refactor

checkpoint test refactor

parsing atoms

checkpoint atoms

wip lists

checkpoint typing lists

checkopint

checkpoint

wip fixing

correct list typing

map discussion

checkpoint map basic typing

fix tests checkpoint

checkpoint

checkpoint

checkpoint

fix condition typing

fix literal keys in map types

checkpoint union types

checkpoint union type

checkpoint row types discussion & bidirectional typecheck

checkpoint

basic lambdas

checkpoint lambdas typing application

wip function application

checkpoint

checkpoint

checkpoint cduce

checkpoint

checkpoint

checkpoint

checkpoint

checkpoint

checkpoint

checkpoint
2025-06-13 23:48:07 +02:00

429 lines
35 KiB
Markdown

# Typed Lisp to Elixir Compiler (codename Tilly)
## Project Goals
To build a Lisp dialect with a strong, expressive type system that compiles to readable Elixir code. The type system will support advanced features like type inference, union, intersection, negation, refinement, and elements of dependent typing. The long-term vision includes support for compiling to other target languages.
## Features
- **Core Language:**
- Lisp syntax and semantics.
- Basic data structures (lists, atoms, numbers, etc.).
- numbers
1 2 3 4
- strings
'' 'string' 'other string'
- lists
[] [1 2 3 4]
- tuples
{} {1 2 3 4}
- maps
m{} m{:a 1 :b 2}
- Functions:
- only fixed arity functions, no variadic arg lists
- structure of the function definition
```
(defn
name
(list of parameters with the return type at the end)
'optional docstring'
...forms for the function body
)
```
- function definition where one parameter doesnt have a type -> its to be inferred from usage inside function_body
`(defn name (arg1 (arg2 type_2) return_type) function_body)`
- function definition where one parameter doesnt have a type -> its to be inferred, but it is pattern-matched in the function head
`(defn name (m{:a a_field} (arg2 type_2) return_type) function_body)`
- function definition where one parameter doesnt have a type -> its to be inferred, but it is pattern-matched and bound to a name
`(defn name ((= arg1 m{:a a_field :b (= b_list [])}) (arg2 type_2) return_type) function_body)`
- function definition where one some parameters are generic | universally quantified
`(defn map_head ( (coll (list ~a)) (mapper (function ~a ~b)) (union ~b nil)) function_body)`
- function definition guards, simillar to elixir (additional `where` s-expr after the return type)
`(defn map_head ( (coll (list ~a)) (mapper (function ~a ~b)) ~b (where (some_guard coll))) function_body)`
- Lambdas:
- lone lambda form
`(fn (elem) (+ elem 1)`
- lambda used as a parameter
`(Enum.map collection (fn (elem) (+ elem 1))`
- **Type System:** (See "Type Representation" under "Key Implementation Areas" for detailed structures)
- **Type Inference:** Automatically deduce types where possible.
- **Union Types:** `A | B` (e.g., `%{type_kind: :union, types: Set.new([type_A, type_B])}`).
- **Intersection Types:** `A & B` (e.g., `%{type_kind: :intersection, types: Set.new([type_A, type_B])}`).
- **Negation Types:** `!A` (e.g., `%{type_kind: :negation, negated_type: type_A}`).
- **Refinement Types:** Types refined by predicates (e.g., `%{type_kind: :refinement, base_type: %{type_kind: :primitive, name: :integer}, var_name: :value, predicate_expr_id: <node_id>}`).
- **Dependent Types (Elements of):** Types that can depend on values.
- **Length-Indexed Data Structures:**
- e.g., A list of 3 integers: `%{type_kind: :list, element_type: %{type_kind: :primitive, name: :integer}, length: 3}`.
- A tuple of specific types: `%{type_kind: :tuple, element_types: [type_A, type_B, type_C]}`.
- Benefits: Enables safer operations like `nth`, `take`, `drop`, and can ensure arity for functions expecting fixed-length lists/tuples.
- **Types Dependent on Literal Values:**
- Function return types or argument types can be specialized based on a *literal value* argument by using `%{type_kind: :literal, value: <actual_value>}` in type rules.
- **Refinement Types (as a key form of dependent type):**
- e.g., `%{type_kind: :refinement, base_type: %{type_kind: :primitive, name: :integer}, var_name: :value, predicate_expr_id: <node_id_of_gt_0_expr>}`.
- Initial implementation would focus on simple, evaluable predicates.
- **Value-Types (Typed Literals):**
- e.g., `:some_atom` has type `%{type_kind: :literal, value: :some_atom}`.
- `42` has type `%{type_kind: :literal, value: 42}`.
- **Heterogeneous List/Tuple Types:**
- Handled by `%{type_kind: :tuple, element_types: [type_A, type_B, ...]}`.
- **Structural Map Types:**
- **Key-Specific Types:** Defined via the `known_elements` field in the map type representation. Example: `%{type_kind: :map, known_elements: %{:name => %{value_type: %{type_kind: :primitive, name: :string}, optional: false}, :age => %{value_type: %{type_kind: :primitive, name: :integer}, optional: false}}, index_signature: nil}`.
- **Optional Keys:** Indicated by `optional: true` for an entry in `known_elements`.
- **Key/Value Constraints (Open Maps):** Defined by the `index_signature` field. Example: `%{type_kind: :map, known_elements: %{}, index_signature: %{key_type: %{type_kind: :primitive, name: :atom}, value_type: %{type_kind: :primitive, name: :any}}}`.
- **Type Transformation/Refinement:** To be handled by type system rules for map operations.
- **Compilation:**
- **Target: Elixir:** Generate readable and idiomatic Elixir code.
- **Future Targets:** Design with extensibility in mind for other languages (e.g., JavaScript).
- **Tooling:**
- Clear error messages from the type checker and compiler.
### Key Implementation Areas
1. **Parser:**
- Implement an S-expression parser for the Lisp dialect.
2. **Type System Core:**
- **Type Representation:** Types are represented as Elixir maps, each with a `:type_kind` atom (in `:snake_case`) and other fields specific to that kind.
- **Primitive Types:**
- `%{type_kind: :primitive, name: <atom>}`
- Examples: `%{type_kind: :primitive, name: :any}`, `%{type_kind: :primitive, name: :nothing}`, `%{type_kind: :primitive, name: :integer}`, `%{type_kind: :primitive, name: :float}`, `%{type_kind: :primitive, name: :number}`, `%{type_kind: :primitive, name: :boolean}`, `%{type_kind: :primitive, name: :string}`, `%{type_kind: :primitive, name: :atom}`.
- **Literal Types (Value Types):**
- `%{type_kind: :literal, value: <any_elixir_literal>}`
- Examples: `%{type_kind: :literal, value: 42}`, `%{type_kind: :literal, value: :my_atom}`, `%{type_kind: :literal, value: "hello"}`.
- strings are started and ended with single quote e.g. 'string'
- **Union Types:**
- `%{type_kind: :union, types: Set.new([<type_map>])}`
- `types`: A set of type maps.
- **Intersection Types:**
- `%{type_kind: :intersection, types: Set.new([<type_map>])}`
- `types`: A set of type maps.
- **Negation Types:**
- `%{type_kind: :negation, negated_type: <type_map>}`
- **Function Types:**
- `%{type_kind: :function, arg_types: [<type_map>], return_type: <type_map>, rest_arg_type: <type_map> | nil}`
- `arg_types`: Ordered list of type maps.
- `rest_arg_type`: Type map for variadic arguments, or `nil`.
- **List Types:**
- `%{type_kind: :list, element_type: <type_map>, length: <non_neg_integer | type_variable_map | nil>}`
- `length`: `nil` for any length, an integer for fixed length, or a type variable map for generic/inferred length.
- **Tuple Types:**
- `%{type_kind: :tuple, element_types: [<type_map>]}`
- `element_types`: Ordered list of type maps.
- **Map Types (Structural):**
Maps in Tilly Lisp are inherently open, meaning they can contain any keys beyond those explicitly known at compile time. The type system aims to provide as much precision as possible for known keys while defining a general pattern for all other keys.
- **Representation:**
- **Raw Form (before interning):**
`%{type_kind: :map, known_elements: KE_raw, index_signature: IS_raw}`
- `known_elements` (KE_raw): An Elixir map where keys are literal Elixir terms (e.g., `:name`, `"id"`) and values are `%{value_type: <type_map_for_value>, optional: <boolean>}`.
- `index_signature` (IS_raw): Always present. A map `%{key_type: <type_map_for_key>, value_type: <type_map_for_value>}` describing the types for keys not in `known_elements`.
- **Interned Form (stored in `nodes_map`):**
`%{type_kind: :map, id: <unique_type_key>, known_elements: KE_interned, index_signature: IS_interned}`
- `id`: A unique atom key identifying this canonical map type definition (e.g., `:type_map_123`).
- `known_elements` (KE_interned): An Elixir map where keys are literal Elixir terms and values are `%{value_type_id: <type_key_for_value>, optional: <boolean>}`.
- `index_signature` (IS_interned): Always present. A map `%{key_type_id: <type_key_for_general_keys>, value_type_id: <type_key_for_general_values>}`.
- **Use Case Scenarios & Typing Approach:**
1. **Map Literals:**
- Example: `m{:a "hello" :b 1}`
- Inferred Type:
- `known_elements`: `%{ :a => %{value_type: <type_for_"hello">, optional: false}, :b => %{value_type: <type_for_1>, optional: false} }`
- `index_signature`: Defaults to `%{key_type: <any_type>, value_type: <any_type>}`. This signifies that any other keys of any type can exist and map to values of any type.
- Keys in map literals must be literal values to contribute to `known_elements`.
2. **Type Annotations:**
- Example: `(the (map string integer) my-var)`
- The type `(map string integer)` resolves to:
- `known_elements`: `{}` (empty, as the annotation describes a general pattern, not specific known keys).
- `index_signature`: `%{key_type: <string_type>, value_type: <integer_type>}`.
3. **Core Map Operations (Language Constructs):**
The type system will define rules for the following fundamental runtime operations:
- `(map-get map key)`:
- If `key` is a literal (e.g., `:a`):
- If `:a` is in `map_type.known_elements`: Result is `map_type.known_elements[:a].value_type_id`. If optional, result is union with `nil_type`.
- If `:a` is not in `known_elements` but matches `map_type.index_signature.key_type_id`: Result is `map_type.index_signature.value_type_id` unioned with `nil_type` (as the specific key might not exist at runtime).
- If `key`'s type is general (e.g., `atom`):
- Collect types from all matching `known_elements` (e.g., if `:a` and `:b` are known atom keys).
- Include `map_type.index_signature.value_type_id` if `atom` is a subtype of `map_type.index_signature.key_type_id`.
- Union all collected types with `nil_type`.
- Example: `(map-get m{:a "s" :b 1} some_atom_var)` could result in type `(union string integer nil)`.
- `(map-put map key value)`:
- If `key` is a literal (e.g., `:a`):
- Resulting map type updates or adds `:a` to `known_elements` with `value`'s type. `index_signature` is generally preserved.
- Example: `(map-put m{:b 1} :a "s")` results in type for `m{:a "s" :b 1}`.
- If `key`'s type is general (e.g., `atom`):
- `known_elements` of the input `map` type remain unchanged.
- The `index_signature` of the resulting map type may become more general. E.g., if `map` is `(map string any)` and we `map-put` with `key_type=atom` and `value_type=integer`, the new `index_signature` might be `%{key_type: (union string atom), value_type: (union any integer)}`. This is complex and requires careful rule definition.
- `(map-delete map key)`:
- If `key` is a literal (e.g., `:a`):
- Resulting map type removes `:a` from `known_elements`. `index_signature` is preserved.
- Example: `(map-delete m{:a "s" :b 1} :a)` results in type for `m{:b 1}`.
- If `key`'s type is general (e.g., `atom`):
- This is complex. Deleting by a general key type doesn't easily translate to a precise change in `known_elements`. The `index_signature` might remain, or the operation might be disallowed or result in a very general map type. (Further thought needed for precise semantics).
- `(map-merge map1 map2)`:
- Resulting map type combines `known_elements` from `map1` and `map2`.
- For keys only in `map1` or `map2`, they are included as is.
- For keys in both: The value type from `map2` takes precedence (last-one-wins semantics for types).
- The `index_signature` of the resulting map type will be the most general combination of `map1.index_signature` and `map2.index_signature`. (e.g., union of key types, union of value types).
- **Known Limitations:**
- Typing `map-put` and `map-delete` with non-literal (general type) keys precisely is challenging and may result in less specific types or require advanced type system features not yet planned (e.g., negation types for keys).
- Duplicate literal keys in a map literal: The parser/typer will likely adopt a "last one wins" semantic for the value and its type.
- **Required Future Steps & Prerequisites:**
1. **Parser Support for `(map K V)` Annotations:**
- Modify `Til.Typer.ExpressionTyper.resolve_type_specifier_node` to parse S-expressions like `(map <key-type-spec> <value-type-spec>)` into a raw map type definition.
- Prerequisite: Basic type specifier resolution (for K and V).
2. **Typing Map Literals:**
- In `Til.Typer.infer_type_for_node_ast` (for `:map_expression`):
- Construct `known_elements` from literal keys and inferred value types.
- Assign a default `index_signature` (e.g., `key_type: any, value_type: any`).
- Prerequisite: Recursive typing of child nodes (values in the map literal).
3. **Interning Map Types:**
- In `Til.Typer.Interner.get_or_intern_type`:
- Add logic for `type_kind: :map`.
- Recursively intern types within `known_elements` values and `index_signature` key/value types to create a canonical, interned map type definition.
- Store/retrieve these canonical map types.
- Prerequisite: Interning for primitive and other relevant types.
4. **Subtyping Rules for Maps:**
- In `Til.Typer.SubtypeChecker.is_subtype?`:
- Implement rules for `map_subtype <?: map_supertype`. This involves checking:
- Compatibility of `known_elements` (required keys in super must be present and non-optional in sub, with compatible value types).
- Compatibility of `index_signatures` (contravariant key types, covariant value types).
- Keys in `sub.known_elements` not in `super.known_elements` must conform to `super.index_signature`.
- Prerequisite: Interned map type representation.
5. **Typing Map Operations:**
- In `Til.Typer.ExpressionTyper` (or a new `MapExpressionTyper` module):
- Define S-expression forms for `(map-get map key)`, `(map-put map key value)`, `(map-delete map key)`, `(map-merge map1 map2)`.
- Implement type inference rules for each of these operations based on the principles outlined in "Use Case Scenarios".
- Prerequisite: Subtyping rules, interned map types, ability to type map literals and resolve map type annotations.
- **Advanced Map Typing and Row Polymorphism Considerations:**
While the current map type system provides flexibility with `known_elements` and an `index_signature`, a future enhancement could be the introduction of **row polymorphism**. This would allow for more precise typing of functions that operate on maps with a common set of known fields while allowing other fields to vary.
- **Conceptualization:** In Tilly's context, a row variable in a map type (e.g., `m{ :name string | r }`) could represent "the rest of the map fields." Given Tilly's rich map keys (literals of various types, not just simple labels) and the existing `index_signature` concept, the row variable `r` could itself be considered a placeholder for another Tilly map type. This means `r` could have its own `known_elements` and `index_signature`, making it more expressive than traditional record-based row polymorphism.
- **Requirements for Implementation:**
- **Type Representation:** Extend the map type definition to include a row variable (e.g., `%{type_kind: :map, known_elements: KE, row_variable_id: <type_key_for_row_var>}`). The interaction between a row variable and the `index_signature` would need careful definition; they might be mutually exclusive or complementary.
- **Unification for Rows:** Develop a unification algorithm capable of solving constraints involving row variables (e.g., `m{a: T1 | r1} = m{a: T1, b: T2 | r2}` implies `r1` must unify with or extend `m{b: T2 | r2}`).
- **Subtyping for Rows:** Define subtyping rules (e.g., `m{a: T1, b: T2}` is a subtype of `m{a: T1 | r}` if `r` can be instantiated with `m{b: T2}`).
- **Generalization & Instantiation:** Implement mechanisms for generalizing functions over row variables and instantiating them at call sites.
- **Syntax:** Design user-facing syntax for map types with row variables (e.g., `(map :key1 type1 ... | row_var_name)`).
Implementing full row polymorphism is a significant undertaking and would build upon the existing map typing foundations. It is currently not in the immediate plan but represents a valuable direction for enhancing the type system's expressiveness for structural data.
- **Refinement Types:**
- `%{type_kind: :refinement, base_type: <type_map>, var_name: <atom>, predicate_expr_id: <integer_node_id>}`
- `var_name`: Atom used to refer to the value within the predicate.
- `predicate_expr_id`: AST node ID of the predicate expression.
- **Type Variables:**
- `%{type_kind: :type_variable, id: <any_unique_id>, name: <String.t | nil>}`
- `id`: Unique identifier.
- `name`: Optional human-readable name.
- **Alias Types (Named Types):**
- `%{type_kind: :alias, name: <atom_alias_name>, parameters: [<atom_param_name>], definition: <type_map>}`
- `name`: The atom for the alias (e.g., `:positive_integer`).
- `parameters`: List of atoms for generic type parameter names (e.g., `[:T]`).
- `definition`: The type map this alias expands to (may contain type variables from `parameters`).
- Function types
- (fn (arg_1_type arg_2_type) return_type)
- **Type Checking Algorithm:** Develop the core logic for verifying type correctness. This will likely involve algorithms for:
- Unification.
- Subtyping (e.g., `%{type_kind: :primitive, name: :integer}` is a subtype of `%{type_kind: :primitive, name: :number}`).
- Constraint solving for inference and refinement types.
- **Type Inference Engine:** Implement the mechanism to infer types of expressions and definitions.
- **Bidirectional Type Inference:**
To enhance type inference capabilities, reduce the need for explicit annotations, and provide more precise error messages, the type system could be evolved to use bidirectional type inference. This approach distinguishes between two modes of operation:
- **Synthesis Mode (`=>`):** Infers or synthesizes the type of an expression from its constituent parts. For example, `infer(e)` yields type `T`.
- **Checking Mode (`<=`):** Checks if an expression conforms to an *expected type* provided by its context. For example, `check(e, T_expected)` verifies `e` has type `T_expected`.
- **Benefits:**
- More precise type error reporting (e.g., "expected type X, got type Y in context Z").
- Reduced annotation burden, as types can flow top-down into expressions.
- Better handling of polymorphic functions and complex type constructs.
- **Implementation Requirements:**
- **Explicit Modes:** The core typing algorithm (currently in `Til.Typer`) would need to be refactored to explicitly support and switch between synthesis and checking modes.
- **Top-Down Type Flow:** The `expected_type` must be propagated downwards during AST traversal in checking mode.
- **Dual Typing Rules:** Each language construct (literals, variables, function calls, conditionals, `lambda`s, etc.) would require distinct typing rules for both synthesis and checking. For instance:
- A `lambda` expression, when checked against an expected function type `(TA -> TR)`, can use `TA` for its parameter types and then check its body against `TR`. In synthesis mode, parameter annotations might be required.
- A function application `(f arg)` would typically synthesize `f`'s type, then check `arg` against the expected parameter type, and the function's return type becomes the synthesized type of the application.
- **Integration with Polymorphism:** Rules for instantiating polymorphic types (when checking) and generalizing types (e.g., for `let`-bound expressions in synthesis) are crucial.
Adopting bidirectional type inference would be a significant architectural evolution of the `Til.Typer` module, moving beyond the current primarily bottom-up synthesis approach.
- **Environment Management:** Handle scopes and bindings of names to types (type maps).
- **Function Types, `defn`, and `fn` Implementation Plan:**
This section outlines the plan for introducing function types, user-defined functions (`defn`), and lambdas (`fn`) into Tilly.
**1. Type Representation for Functions:**
* **Structure:** `%{type_kind: :function, arg_types: [<type_map_key>], return_type: <type_map_key>, type_params: [<type_variable_key>] | nil}`
* `arg_types`: An ordered list of type keys for each argument.
* `return_type`: A type key for the return value.
* `type_params`: (Optional, for polymorphic functions) An ordered list of type keys for universally quantified type variables (e.g., for `~a`, `~b`). Initially `nil` for monomorphic functions.
* **Note:** Variadic functions (`rest_arg_type`) are not planned, aligning with Elixir's fixed arity.
**2. Parser Modifications (`Til.Parser`):**
* **`defn` (User-Defined Function):**
* **Syntax:** `(defn name (arg_spec1 arg_spec2 ... return_type_spec) 'optional_docstring' body_forms...)`
* The `return_type_spec` is the last element in the parameter S-expression.
* `'optional_docstring'` is a string literal between the parameter S-expression and the body.
* **AST Node (`:defn_expression`):**
* `name_node_id`: ID of the function name symbol.
* `params_and_return_s_expr_id`: ID of the S-expression node `(arg_spec1 ... return_type_spec)`.
* `arg_spec_node_ids`: List of IDs of argument specifier nodes (derived from children of `params_and_return_s_expr_id`, excluding the last).
* `return_type_spec_node_id`: ID of the return type specifier node (last child of `params_and_return_s_expr_id`).
* `docstring_node_id`: Optional ID of the docstring node.
* `body_node_ids`: List of IDs for body expressions.
* **`fn` (Lambda):**
* **Syntax:** `(fn (arg_spec1 ...) body_forms...)`
* **AST Node (`:lambda_expression`):**
* `params_s_expr_id`: ID of the S-expression node `(arg_spec1 ...)`.
* `arg_spec_node_ids`: List of IDs of argument specifier nodes.
* `body_node_ids`: List of IDs for body expressions.
* **Argument Specifications (`arg_spec`):**
* Initially, `arg_spec` nodes will represent simple symbols (for lambda arguments) or `(symbol type_spec)` (for `defn` arguments).
* More complex patterns (`m{:key val}`, `(= symbol pattern)`) and type variables (`~a`) will be introduced in later phases.
**3. Phased Implementation Plan:**
* **Phase 1: Core Function Type Representation & Interning. (Completed)**
* Defined the `%{type_kind: :function, ...}` structure.
* Implemented interning logic for this type in `Til.Typer.Interner`.
* **Phase 2: Basic Lambdas (`fn`). (In Progress)**
* Parser: Implement parsing for `(fn (arg_name1 ...) body_forms...)`. Argument specs are simple symbols.
* Typer (`infer_type_for_node_ast`): For `:lambda_expression`, argument types default to `any`. Infer return type from the last body expression. Construct raw function type for interning.
* **Lambda Argument Typing Strategy:** Defaulting to `any` initially. Later, with bidirectional type inference, argument types will be inferred more precisely from usage within the body (potentially using type intersection for multiple constraints) or from the context in which the lambda is used (checking mode).
* **Phase 3: Basic Monomorphic Function Calls.**
* Typer (`ExpressionTyper.infer_s_expression_type`): Handle S-expressions where the operator's type is a function type. Perform arity checks and subtype checks for arguments. The S-expression's type is the function's return type.
* **Phase 4: Monomorphic `defn`.**
* Parser: Implement parsing for `(defn name (arg_spec1 type_spec1 ... return_type_spec) 'optional_docstring' body_forms...)`. Require explicit `(symbol type_spec)` for arguments.
* Typer (`infer_type_for_node_ast` and `Environment`): For `:defn_expression`, resolve explicit types, construct/intern the function type, update environment for recursion, type body in new lexical scope, and validate return type. The `:defn_expression` node's type is its interned function type. `Til.Typer.Environment.update_env_from_node` will add the function to the environment.
* **Phase 5: Introduce Polymorphism (Type Variables, `~a`).**
* Update type representations, parser, interner, and typer for type variables and polymorphic function types. Implement unification for function calls.
* **Phase 6: Advanced Argument/Inference Features.**
* Allow `defn` arguments as `symbol` (type to be inferred).
* More sophisticated type inference for lambda arguments.
* (Later) Pattern matching in arguments, typing for `where` guards.
3. **Compiler Backend (Elixir):**
- **AST Transformation:** Transform the Lisp AST (potentially type-annotated) into an Elixir-compatible AST or directly to Elixir code.
- **Mapping Lisp Constructs:** Define how Lisp functions, data structures, control flow, and type information translate to Elixir equivalents.
- **Code Generation:** Produce Elixir source files.
- **Interop:** Consider how the Lisp code will call Elixir code and vice-versa.
4. **Standard Library:**
- Define and implement a basic set of core functions and their types (e.g., list operations, arithmetic, type predicates).
5. **Error Reporting Infrastructure:**
- Design a system for collecting and presenting type errors, compiler errors, and runtime errors (if applicable during compilation phases).
6. **Testing Framework:**
- Develop a comprehensive suite of tests covering:
- Parser correctness.
- Type checker correctness (valid and invalid programs).
- Compiler output (comparing generated Elixir against expected output or behavior).
7. **CLI / Build Tool Integration (Future):**
- A command-line interface for the compiler.
- Potential integration with build tools like Mix.
## Main Data Structure: Node Maps
The core data structure for representing code throughout the parsing, type checking, and transpiling phases will be a collection of "Node Maps." Each syntactic element or significant semantic component of the source code will be represented as an Elixir map.
**Structure of a Node Map:**
Each node map will contain a set of common fields and a set of fields specific to the kind of AST element it represents.
* **Common Fields (present in all node maps, based on `lib/til/parser.ex`):**
* `id`: A unique integer (generated by `System.unique_integer([:monotonic, :positive])`) for this node.
* `type_id`: Initially `nil`. After type checking/inference, this field will store or reference the type map (as defined in "Type Representation") associated with this AST node.
* `parent_id`: The `id` of the parent node in the AST, or `nil` if it's a root node or an orphaned element (e.g. an element of an unclosed collection).
* `file`: A string indicating the source file name (defaults to "unknown").
* `location`: A list: `[start_offset, start_line, start_col, end_offset, end_line, end_col]`.
* `raw_string`: The literal string segment from the source code that corresponds to this node.
* `ast_node_type`: An atom identifying the kind of AST node.
* `parsing_error`: `nil` if parsing was successful for this node, or a string message if an error occurred specific to this node (e.g., "Unclosed string literal"). For collection nodes, this can indicate issues like being unclosed.
* **AST-Specific Fields & Node Types (current implementation in `lib/til/parser.ex`):**
* `ast_node_type: :literal_integer`
* `value`: The integer value (e.g., `42`).
* `ast_node_type: :symbol`
* `name`: The string representation of the symbol (e.g., `"my-symbol"`).
* `ast_node_type: :literal_string`
* `value`: The processed string content (escape sequences are not yet handled, but leading whitespace on subsequent lines is stripped based on the opening quote's column).
* `parsing_error`: Can be `"Unclosed string literal"`.
* `ast_node_type: :s_expression`
* `children`: A list of `id`s of the child nodes within the S-expression.
* `parsing_error`: Can be `"Unclosed S-expression"`.
* `ast_node_type: :list_expression` (parsed from `[...]`)
* `children`: A list of `id`s of the child nodes within the list.
* `parsing_error`: Can be `"Unclosed list"`.
* `ast_node_type: :map_expression` (parsed from `m{...}`)
* `children`: A list of `id`s of the child nodes (key-value pairs) within the map.
* `parsing_error`: Can be `"Unclosed map"`.
* `ast_node_type: :tuple_expression` (parsed from `{...}`)
* `children`: A list of `id`s of the child nodes within the tuple.
* `parsing_error`: Can be `"Unclosed tuple"`.
* `ast_node_type: :unknown` (used for tokens that couldn't be parsed into a more specific type, or for unexpected characters)
* `parsing_error`: A string describing the error (e.g., "Unexpected ')'", "Unknown token").
* `ast_node_type: :file`
* `children`: A list of `id`s of the top-level expression nodes in the file, in order of appearance.
* `raw_string`: The entire content of the parsed file.
* `parsing_error`: Typically `nil` for the file node itself, errors would be on child nodes or during parsing of specific structures.
* **Note on `children` field:** For collection types (`:s_expression`, `:list_expression`, `:map_expression`, `:tuple_expression`, `:file`), this field holds a list of child node `id`s in the order they appear in the source.
* **Pseudo-code example of a parsed integer node:**
```elixir
%{
id: 1,
type_id: nil,
parent_id: nil, # Assuming it's a top-level expression
file: "input.til",
location: [0, 1, 1, 2, 1, 3], # [offset_start, line_start, col_start, offset_end, line_end, col_end]
raw_string: "42",
ast_node_type: :literal_integer,
value: 42,
parsing_error: nil
}
```
* **Pseudo-code example of a parsed S-expression node:**
```elixir
%{
id: 2,
type_id: nil,
parent_id: nil,
file: "input.til",
location: [4, 1, 5, 15, 1, 16], # Location spans the entire "(add 1 2)"
raw_string: "(add 1 2)",
ast_node_type: :s_expression,
children: [3, 4, 5], # IDs of :symbol "add", :literal_integer 1, :literal_integer 2
parsing_error: nil
}
```
* **Pseudo-code example of an unclosed string node:**
```elixir
%{
id: 6,
type_id: nil,
parent_id: nil,
file: "input.til",
location: [17, 2, 1, 25, 2, 9], # Spans from opening ' to end of consumed input for the error
raw_string: "'unclosed",
ast_node_type: :literal_string,
value: "unclosed", # The content parsed so far
parsing_error: "Unclosed string literal"
}
```
**Intended Use:**
This collection of interconnected node maps forms a graph (specifically, a tree for the basic AST structure, with additional edges for type references, variable bindings, etc.).
1. **Parsing:** The parser will transform the source code into this collection of node maps.
2. **Type Checking/Inference:** The type system will operate on these node maps. Type information (`type_id`) will be populated or updated. Constraints for type inference can be associated with node `id`s. The immutability of Elixir maps means that updating a node's type information creates a new version of that node map, facilitating the tracking of changes during constraint resolution.
3. **Transpiling:** The transpiler will traverse this graph of node maps (potentially enriched with type information) to generate the target Elixir code.
A central registry or context (e.g., a map of `id => node_map_data`) might be used to store and access all node maps, allowing for efficient lookup and modification (creation of new versions) of individual nodes during various compiler phases.