checkpoint failing test after fixing tests checkpoint checkpoint checkpoint re-work asd checkpoint checkpoint checkpoint mix proj checkpoint mix first parser impl checkpoint fix tests re-org parser checkpoint strings fix multiline strings tuples checkpoint maps checkpoint checkpoint checkpoint checkpoint fix weird eof expression parse error checkpoint before typing checkpoint checpoint checkpoint checkpoint checkpoint ids in primitive types checkpoint checkpoint fix tests initial annotation checkpoint checkpoint checkpoint union subtyping conventions refactor - split typer typing tuples checkpoint test refactor checkpoint test refactor parsing atoms checkpoint atoms wip lists checkpoint typing lists checkopint checkpoint wip fixing correct list typing map discussion checkpoint map basic typing fix tests checkpoint checkpoint checkpoint checkpoint fix condition typing fix literal keys in map types checkpoint union types checkpoint union type checkpoint row types discussion & bidirectional typecheck checkpoint basic lambdas checkpoint lambdas typing application wip function application checkpoint checkpoint checkpoint cduce checkpoint checkpoint checkpoint checkpoint checkpoint checkpoint checkpoint
35 KiB
Typed Lisp to Elixir Compiler (codename Tilly)
Project Goals
To build a Lisp dialect with a strong, expressive type system that compiles to readable Elixir code. The type system will support advanced features like type inference, union, intersection, negation, refinement, and elements of dependent typing. The long-term vision includes support for compiling to other target languages.
Features
-
Core Language:
- Lisp syntax and semantics.
- Basic data structures (lists, atoms, numbers, etc.).
- numbers 1 2 3 4
- strings '' 'string' 'other string'
- lists [] [1 2 3 4]
- tuples {} {1 2 3 4}
- maps m{} m{:a 1 :b 2}
- Functions:
- only fixed arity functions, no variadic arg lists
- structure of the function definition
(defn name (list of parameters with the return type at the end) 'optional docstring' ...forms for the function body )- function definition where one parameter doesnt have a type -> its to be inferred from usage inside function_body
(defn name (arg1 (arg2 type_2) return_type) function_body) - function definition where one parameter doesnt have a type -> its to be inferred, but it is pattern-matched in the function head
(defn name (m{:a a_field} (arg2 type_2) return_type) function_body) - function definition where one parameter doesnt have a type -> its to be inferred, but it is pattern-matched and bound to a name
(defn name ((= arg1 m{:a a_field :b (= b_list [])}) (arg2 type_2) return_type) function_body) - function definition where one some parameters are generic | universally quantified
(defn map_head ( (coll (list ~a)) (mapper (function ~a ~b)) (union ~b nil)) function_body) - function definition guards, simillar to elixir (additional
wheres-expr after the return type)(defn map_head ( (coll (list ~a)) (mapper (function ~a ~b)) ~b (where (some_guard coll))) function_body)
- Lambdas:
- lone lambda form
(fn (elem) (+ elem 1) - lambda used as a parameter
(Enum.map collection (fn (elem) (+ elem 1))
- lone lambda form
-
Type System: (See "Type Representation" under "Key Implementation Areas" for detailed structures)
- Type Inference: Automatically deduce types where possible.
- Union Types:
A | B(e.g.,%{type_kind: :union, types: Set.new([type_A, type_B])}). - Intersection Types:
A & B(e.g.,%{type_kind: :intersection, types: Set.new([type_A, type_B])}). - Negation Types:
!A(e.g.,%{type_kind: :negation, negated_type: type_A}). - Refinement Types: Types refined by predicates (e.g.,
%{type_kind: :refinement, base_type: %{type_kind: :primitive, name: :integer}, var_name: :value, predicate_expr_id: <node_id>}). - Dependent Types (Elements of): Types that can depend on values.
- Length-Indexed Data Structures:
- e.g., A list of 3 integers:
%{type_kind: :list, element_type: %{type_kind: :primitive, name: :integer}, length: 3}. - A tuple of specific types:
%{type_kind: :tuple, element_types: [type_A, type_B, type_C]}. - Benefits: Enables safer operations like
nth,take,drop, and can ensure arity for functions expecting fixed-length lists/tuples.
- e.g., A list of 3 integers:
- Types Dependent on Literal Values:
- Function return types or argument types can be specialized based on a literal value argument by using
%{type_kind: :literal, value: <actual_value>}in type rules.
- Function return types or argument types can be specialized based on a literal value argument by using
- Refinement Types (as a key form of dependent type):
- e.g.,
%{type_kind: :refinement, base_type: %{type_kind: :primitive, name: :integer}, var_name: :value, predicate_expr_id: <node_id_of_gt_0_expr>}. - Initial implementation would focus on simple, evaluable predicates.
- e.g.,
- Length-Indexed Data Structures:
- Value-Types (Typed Literals):
- e.g.,
:some_atomhas type%{type_kind: :literal, value: :some_atom}. 42has type%{type_kind: :literal, value: 42}.
- e.g.,
- Heterogeneous List/Tuple Types:
- Handled by
%{type_kind: :tuple, element_types: [type_A, type_B, ...]}.
- Handled by
- Structural Map Types:
- Key-Specific Types: Defined via the
known_elementsfield in the map type representation. Example:%{type_kind: :map, known_elements: %{:name => %{value_type: %{type_kind: :primitive, name: :string}, optional: false}, :age => %{value_type: %{type_kind: :primitive, name: :integer}, optional: false}}, index_signature: nil}. - Optional Keys: Indicated by
optional: truefor an entry inknown_elements. - Key/Value Constraints (Open Maps): Defined by the
index_signaturefield. Example:%{type_kind: :map, known_elements: %{}, index_signature: %{key_type: %{type_kind: :primitive, name: :atom}, value_type: %{type_kind: :primitive, name: :any}}}. - Type Transformation/Refinement: To be handled by type system rules for map operations.
- Key-Specific Types: Defined via the
-
Compilation:
- Target: Elixir: Generate readable and idiomatic Elixir code.
- Future Targets: Design with extensibility in mind for other languages (e.g., JavaScript).
-
Tooling:
- Clear error messages from the type checker and compiler.
Key Implementation Areas
-
Parser:
- Implement an S-expression parser for the Lisp dialect.
-
Type System Core:
-
Type Representation: Types are represented as Elixir maps, each with a
:type_kindatom (in:snake_case) and other fields specific to that kind.-
Primitive Types:
%{type_kind: :primitive, name: <atom>}- Examples:
%{type_kind: :primitive, name: :any},%{type_kind: :primitive, name: :nothing},%{type_kind: :primitive, name: :integer},%{type_kind: :primitive, name: :float},%{type_kind: :primitive, name: :number},%{type_kind: :primitive, name: :boolean},%{type_kind: :primitive, name: :string},%{type_kind: :primitive, name: :atom}.
-
Literal Types (Value Types):
%{type_kind: :literal, value: <any_elixir_literal>}- Examples:
%{type_kind: :literal, value: 42},%{type_kind: :literal, value: :my_atom},%{type_kind: :literal, value: "hello"}. - strings are started and ended with single quote e.g. 'string'
-
Union Types:
%{type_kind: :union, types: Set.new([<type_map>])}types: A set of type maps.
-
Intersection Types:
%{type_kind: :intersection, types: Set.new([<type_map>])}types: A set of type maps.
-
Negation Types:
%{type_kind: :negation, negated_type: <type_map>}
-
Function Types:
%{type_kind: :function, arg_types: [<type_map>], return_type: <type_map>, rest_arg_type: <type_map> | nil}arg_types: Ordered list of type maps.rest_arg_type: Type map for variadic arguments, ornil.
-
List Types:
%{type_kind: :list, element_type: <type_map>, length: <non_neg_integer | type_variable_map | nil>}length:nilfor any length, an integer for fixed length, or a type variable map for generic/inferred length.
-
Tuple Types:
%{type_kind: :tuple, element_types: [<type_map>]}element_types: Ordered list of type maps.
-
Map Types (Structural): Maps in Tilly Lisp are inherently open, meaning they can contain any keys beyond those explicitly known at compile time. The type system aims to provide as much precision as possible for known keys while defining a general pattern for all other keys.
-
Representation:
- Raw Form (before interning):
%{type_kind: :map, known_elements: KE_raw, index_signature: IS_raw}known_elements(KE_raw): An Elixir map where keys are literal Elixir terms (e.g.,:name,"id") and values are%{value_type: <type_map_for_value>, optional: <boolean>}.index_signature(IS_raw): Always present. A map%{key_type: <type_map_for_key>, value_type: <type_map_for_value>}describing the types for keys not inknown_elements.
- Interned Form (stored in
nodes_map):%{type_kind: :map, id: <unique_type_key>, known_elements: KE_interned, index_signature: IS_interned}id: A unique atom key identifying this canonical map type definition (e.g.,:type_map_123).known_elements(KE_interned): An Elixir map where keys are literal Elixir terms and values are%{value_type_id: <type_key_for_value>, optional: <boolean>}.index_signature(IS_interned): Always present. A map%{key_type_id: <type_key_for_general_keys>, value_type_id: <type_key_for_general_values>}.
- Raw Form (before interning):
-
Use Case Scenarios & Typing Approach:
-
Map Literals:
- Example:
m{:a "hello" :b 1} - Inferred Type:
known_elements:%{ :a => %{value_type: <type_for_"hello">, optional: false}, :b => %{value_type: <type_for_1>, optional: false} }index_signature: Defaults to%{key_type: <any_type>, value_type: <any_type>}. This signifies that any other keys of any type can exist and map to values of any type.
- Keys in map literals must be literal values to contribute to
known_elements.
- Example:
-
Type Annotations:
- Example:
(the (map string integer) my-var) - The type
(map string integer)resolves to:known_elements:{}(empty, as the annotation describes a general pattern, not specific known keys).index_signature:%{key_type: <string_type>, value_type: <integer_type>}.
- Example:
-
Core Map Operations (Language Constructs): The type system will define rules for the following fundamental runtime operations:
-
(map-get map key):- If
keyis a literal (e.g.,:a):- If
:ais inmap_type.known_elements: Result ismap_type.known_elements[:a].value_type_id. If optional, result is union withnil_type. - If
:ais not inknown_elementsbut matchesmap_type.index_signature.key_type_id: Result ismap_type.index_signature.value_type_idunioned withnil_type(as the specific key might not exist at runtime).
- If
- If
key's type is general (e.g.,atom):- Collect types from all matching
known_elements(e.g., if:aand:bare known atom keys). - Include
map_type.index_signature.value_type_idifatomis a subtype ofmap_type.index_signature.key_type_id. - Union all collected types with
nil_type. - Example:
(map-get m{:a "s" :b 1} some_atom_var)could result in type(union string integer nil).
- Collect types from all matching
- If
-
(map-put map key value):- If
keyis a literal (e.g.,:a):- Resulting map type updates or adds
:atoknown_elementswithvalue's type.index_signatureis generally preserved. - Example:
(map-put m{:b 1} :a "s")results in type form{:a "s" :b 1}.
- Resulting map type updates or adds
- If
key's type is general (e.g.,atom):known_elementsof the inputmaptype remain unchanged.- The
index_signatureof the resulting map type may become more general. E.g., ifmapis(map string any)and wemap-putwithkey_type=atomandvalue_type=integer, the newindex_signaturemight be%{key_type: (union string atom), value_type: (union any integer)}. This is complex and requires careful rule definition.
- If
-
(map-delete map key):- If
keyis a literal (e.g.,:a):- Resulting map type removes
:afromknown_elements.index_signatureis preserved. - Example:
(map-delete m{:a "s" :b 1} :a)results in type form{:b 1}.
- Resulting map type removes
- If
key's type is general (e.g.,atom):- This is complex. Deleting by a general key type doesn't easily translate to a precise change in
known_elements. Theindex_signaturemight remain, or the operation might be disallowed or result in a very general map type. (Further thought needed for precise semantics).
- This is complex. Deleting by a general key type doesn't easily translate to a precise change in
- If
-
(map-merge map1 map2):- Resulting map type combines
known_elementsfrommap1andmap2.- For keys only in
map1ormap2, they are included as is. - For keys in both: The value type from
map2takes precedence (last-one-wins semantics for types).
- For keys only in
- The
index_signatureof the resulting map type will be the most general combination ofmap1.index_signatureandmap2.index_signature. (e.g., union of key types, union of value types).
- Resulting map type combines
-
-
-
Known Limitations:
- Typing
map-putandmap-deletewith non-literal (general type) keys precisely is challenging and may result in less specific types or require advanced type system features not yet planned (e.g., negation types for keys). - Duplicate literal keys in a map literal: The parser/typer will likely adopt a "last one wins" semantic for the value and its type.
- Typing
-
Required Future Steps & Prerequisites:
- Parser Support for
(map K V)Annotations:- Modify
Til.Typer.ExpressionTyper.resolve_type_specifier_nodeto parse S-expressions like(map <key-type-spec> <value-type-spec>)into a raw map type definition. - Prerequisite: Basic type specifier resolution (for K and V).
- Modify
- Typing Map Literals:
- In
Til.Typer.infer_type_for_node_ast(for:map_expression):- Construct
known_elementsfrom literal keys and inferred value types. - Assign a default
index_signature(e.g.,key_type: any, value_type: any).
- Construct
- Prerequisite: Recursive typing of child nodes (values in the map literal).
- In
- Interning Map Types:
- In
Til.Typer.Interner.get_or_intern_type:- Add logic for
type_kind: :map. - Recursively intern types within
known_elementsvalues andindex_signaturekey/value types to create a canonical, interned map type definition. - Store/retrieve these canonical map types.
- Add logic for
- Prerequisite: Interning for primitive and other relevant types.
- In
- Subtyping Rules for Maps:
- In
Til.Typer.SubtypeChecker.is_subtype?:- Implement rules for
map_subtype <?: map_supertype. This involves checking:- Compatibility of
known_elements(required keys in super must be present and non-optional in sub, with compatible value types). - Compatibility of
index_signatures(contravariant key types, covariant value types). - Keys in
sub.known_elementsnot insuper.known_elementsmust conform tosuper.index_signature.
- Compatibility of
- Implement rules for
- Prerequisite: Interned map type representation.
- In
- Typing Map Operations:
- In
Til.Typer.ExpressionTyper(or a newMapExpressionTypermodule):- Define S-expression forms for
(map-get map key),(map-put map key value),(map-delete map key),(map-merge map1 map2). - Implement type inference rules for each of these operations based on the principles outlined in "Use Case Scenarios".
- Define S-expression forms for
- Prerequisite: Subtyping rules, interned map types, ability to type map literals and resolve map type annotations.
- In
-
Advanced Map Typing and Row Polymorphism Considerations:
While the current map type system provides flexibility with
known_elementsand anindex_signature, a future enhancement could be the introduction of row polymorphism. This would allow for more precise typing of functions that operate on maps with a common set of known fields while allowing other fields to vary.-
Conceptualization: In Tilly's context, a row variable in a map type (e.g.,
m{ :name string | r }) could represent "the rest of the map fields." Given Tilly's rich map keys (literals of various types, not just simple labels) and the existingindex_signatureconcept, the row variablercould itself be considered a placeholder for another Tilly map type. This meansrcould have its ownknown_elementsandindex_signature, making it more expressive than traditional record-based row polymorphism. -
Requirements for Implementation:
- Type Representation: Extend the map type definition to include a row variable (e.g.,
%{type_kind: :map, known_elements: KE, row_variable_id: <type_key_for_row_var>}). The interaction between a row variable and theindex_signaturewould need careful definition; they might be mutually exclusive or complementary. - Unification for Rows: Develop a unification algorithm capable of solving constraints involving row variables (e.g.,
m{a: T1 | r1} = m{a: T1, b: T2 | r2}impliesr1must unify with or extendm{b: T2 | r2}). - Subtyping for Rows: Define subtyping rules (e.g.,
m{a: T1, b: T2}is a subtype ofm{a: T1 | r}ifrcan be instantiated withm{b: T2}). - Generalization & Instantiation: Implement mechanisms for generalizing functions over row variables and instantiating them at call sites.
- Syntax: Design user-facing syntax for map types with row variables (e.g.,
(map :key1 type1 ... | row_var_name)).
- Type Representation: Extend the map type definition to include a row variable (e.g.,
Implementing full row polymorphism is a significant undertaking and would build upon the existing map typing foundations. It is currently not in the immediate plan but represents a valuable direction for enhancing the type system's expressiveness for structural data.
-
- Parser Support for
-
-
Refinement Types:
%{type_kind: :refinement, base_type: <type_map>, var_name: <atom>, predicate_expr_id: <integer_node_id>}var_name: Atom used to refer to the value within the predicate.predicate_expr_id: AST node ID of the predicate expression.
-
Type Variables:
%{type_kind: :type_variable, id: <any_unique_id>, name: <String.t | nil>}id: Unique identifier.name: Optional human-readable name.
-
Alias Types (Named Types):
%{type_kind: :alias, name: <atom_alias_name>, parameters: [<atom_param_name>], definition: <type_map>}name: The atom for the alias (e.g.,:positive_integer).parameters: List of atoms for generic type parameter names (e.g.,[:T]).definition: The type map this alias expands to (may contain type variables fromparameters).
-
Function types
- (fn (arg_1_type arg_2_type) return_type)
-
-
Type Checking Algorithm: Develop the core logic for verifying type correctness. This will likely involve algorithms for:
- Unification.
- Subtyping (e.g.,
%{type_kind: :primitive, name: :integer}is a subtype of%{type_kind: :primitive, name: :number}). - Constraint solving for inference and refinement types.
-
Type Inference Engine: Implement the mechanism to infer types of expressions and definitions.
-
Bidirectional Type Inference: To enhance type inference capabilities, reduce the need for explicit annotations, and provide more precise error messages, the type system could be evolved to use bidirectional type inference. This approach distinguishes between two modes of operation:
-
Synthesis Mode (
=>): Infers or synthesizes the type of an expression from its constituent parts. For example,infer(e)yields typeT. -
Checking Mode (
<=): Checks if an expression conforms to an expected type provided by its context. For example,check(e, T_expected)verifiesehas typeT_expected. -
Benefits:
- More precise type error reporting (e.g., "expected type X, got type Y in context Z").
- Reduced annotation burden, as types can flow top-down into expressions.
- Better handling of polymorphic functions and complex type constructs.
-
Implementation Requirements:
- Explicit Modes: The core typing algorithm (currently in
Til.Typer) would need to be refactored to explicitly support and switch between synthesis and checking modes. - Top-Down Type Flow: The
expected_typemust be propagated downwards during AST traversal in checking mode. - Dual Typing Rules: Each language construct (literals, variables, function calls, conditionals,
lambdas, etc.) would require distinct typing rules for both synthesis and checking. For instance:- A
lambdaexpression, when checked against an expected function type(TA -> TR), can useTAfor its parameter types and then check its body againstTR. In synthesis mode, parameter annotations might be required. - A function application
(f arg)would typically synthesizef's type, then checkargagainst the expected parameter type, and the function's return type becomes the synthesized type of the application.
- A
- Integration with Polymorphism: Rules for instantiating polymorphic types (when checking) and generalizing types (e.g., for
let-bound expressions in synthesis) are crucial.
- Explicit Modes: The core typing algorithm (currently in
Adopting bidirectional type inference would be a significant architectural evolution of the
Til.Typermodule, moving beyond the current primarily bottom-up synthesis approach. -
-
-
Environment Management: Handle scopes and bindings of names to types (type maps).
-
Function Types,
defn, andfnImplementation Plan: This section outlines the plan for introducing function types, user-defined functions (defn), and lambdas (fn) into Tilly.1. Type Representation for Functions:
- Structure:
%{type_kind: :function, arg_types: [<type_map_key>], return_type: <type_map_key>, type_params: [<type_variable_key>] | nil}arg_types: An ordered list of type keys for each argument.return_type: A type key for the return value.type_params: (Optional, for polymorphic functions) An ordered list of type keys for universally quantified type variables (e.g., for~a,~b). Initiallynilfor monomorphic functions.
- Note: Variadic functions (
rest_arg_type) are not planned, aligning with Elixir's fixed arity.
2. Parser Modifications (
Til.Parser):defn(User-Defined Function):- Syntax:
(defn name (arg_spec1 arg_spec2 ... return_type_spec) 'optional_docstring' body_forms...)- The
return_type_specis the last element in the parameter S-expression. 'optional_docstring'is a string literal between the parameter S-expression and the body.
- The
- AST Node (
:defn_expression):name_node_id: ID of the function name symbol.params_and_return_s_expr_id: ID of the S-expression node(arg_spec1 ... return_type_spec).arg_spec_node_ids: List of IDs of argument specifier nodes (derived from children ofparams_and_return_s_expr_id, excluding the last).return_type_spec_node_id: ID of the return type specifier node (last child ofparams_and_return_s_expr_id).docstring_node_id: Optional ID of the docstring node.body_node_ids: List of IDs for body expressions.
- Syntax:
fn(Lambda):- Syntax:
(fn (arg_spec1 ...) body_forms...) - AST Node (
:lambda_expression):params_s_expr_id: ID of the S-expression node(arg_spec1 ...).arg_spec_node_ids: List of IDs of argument specifier nodes.body_node_ids: List of IDs for body expressions.
- Syntax:
- Argument Specifications (
arg_spec):- Initially,
arg_specnodes will represent simple symbols (for lambda arguments) or(symbol type_spec)(fordefnarguments). - More complex patterns (
m{:key val},(= symbol pattern)) and type variables (~a) will be introduced in later phases.
- Initially,
3. Phased Implementation Plan:
- Phase 1: Core Function Type Representation & Interning. (Completed)
- Defined the
%{type_kind: :function, ...}structure. - Implemented interning logic for this type in
Til.Typer.Interner.
- Defined the
- Phase 2: Basic Lambdas (
fn). (In Progress)- Parser: Implement parsing for
(fn (arg_name1 ...) body_forms...). Argument specs are simple symbols. - Typer (
infer_type_for_node_ast): For:lambda_expression, argument types default toany. Infer return type from the last body expression. Construct raw function type for interning. - Lambda Argument Typing Strategy: Defaulting to
anyinitially. Later, with bidirectional type inference, argument types will be inferred more precisely from usage within the body (potentially using type intersection for multiple constraints) or from the context in which the lambda is used (checking mode).
- Parser: Implement parsing for
- Phase 3: Basic Monomorphic Function Calls.
- Typer (
ExpressionTyper.infer_s_expression_type): Handle S-expressions where the operator's type is a function type. Perform arity checks and subtype checks for arguments. The S-expression's type is the function's return type.
- Typer (
- Phase 4: Monomorphic
defn.- Parser: Implement parsing for
(defn name (arg_spec1 type_spec1 ... return_type_spec) 'optional_docstring' body_forms...). Require explicit(symbol type_spec)for arguments. - Typer (
infer_type_for_node_astandEnvironment): For:defn_expression, resolve explicit types, construct/intern the function type, update environment for recursion, type body in new lexical scope, and validate return type. The:defn_expressionnode's type is its interned function type.Til.Typer.Environment.update_env_from_nodewill add the function to the environment.
- Parser: Implement parsing for
- Phase 5: Introduce Polymorphism (Type Variables,
~a).- Update type representations, parser, interner, and typer for type variables and polymorphic function types. Implement unification for function calls.
- Phase 6: Advanced Argument/Inference Features.
- Allow
defnarguments assymbol(type to be inferred). - More sophisticated type inference for lambda arguments.
- (Later) Pattern matching in arguments, typing for
whereguards.
- Allow
- Structure:
-
-
Compiler Backend (Elixir):
- AST Transformation: Transform the Lisp AST (potentially type-annotated) into an Elixir-compatible AST or directly to Elixir code.
- Mapping Lisp Constructs: Define how Lisp functions, data structures, control flow, and type information translate to Elixir equivalents.
- Code Generation: Produce Elixir source files.
- Interop: Consider how the Lisp code will call Elixir code and vice-versa.
-
Standard Library:
- Define and implement a basic set of core functions and their types (e.g., list operations, arithmetic, type predicates).
-
Error Reporting Infrastructure:
- Design a system for collecting and presenting type errors, compiler errors, and runtime errors (if applicable during compilation phases).
-
Testing Framework:
- Develop a comprehensive suite of tests covering:
- Parser correctness.
- Type checker correctness (valid and invalid programs).
- Compiler output (comparing generated Elixir against expected output or behavior).
- Develop a comprehensive suite of tests covering:
-
CLI / Build Tool Integration (Future):
- A command-line interface for the compiler.
- Potential integration with build tools like Mix.
Main Data Structure: Node Maps
The core data structure for representing code throughout the parsing, type checking, and transpiling phases will be a collection of "Node Maps." Each syntactic element or significant semantic component of the source code will be represented as an Elixir map.
Structure of a Node Map:
Each node map will contain a set of common fields and a set of fields specific to the kind of AST element it represents.
-
Common Fields (present in all node maps, based on
lib/til/parser.ex):id: A unique integer (generated bySystem.unique_integer([:monotonic, :positive])) for this node.type_id: Initiallynil. After type checking/inference, this field will store or reference the type map (as defined in "Type Representation") associated with this AST node.parent_id: Theidof the parent node in the AST, ornilif it's a root node or an orphaned element (e.g. an element of an unclosed collection).file: A string indicating the source file name (defaults to "unknown").location: A list:[start_offset, start_line, start_col, end_offset, end_line, end_col].raw_string: The literal string segment from the source code that corresponds to this node.ast_node_type: An atom identifying the kind of AST node.parsing_error:nilif parsing was successful for this node, or a string message if an error occurred specific to this node (e.g., "Unclosed string literal"). For collection nodes, this can indicate issues like being unclosed.
-
AST-Specific Fields & Node Types (current implementation in
lib/til/parser.ex):-
ast_node_type: :literal_integervalue: The integer value (e.g.,42).
-
ast_node_type: :symbolname: The string representation of the symbol (e.g.,"my-symbol").
-
ast_node_type: :literal_stringvalue: The processed string content (escape sequences are not yet handled, but leading whitespace on subsequent lines is stripped based on the opening quote's column).parsing_error: Can be"Unclosed string literal".
-
ast_node_type: :s_expressionchildren: A list ofids of the child nodes within the S-expression.parsing_error: Can be"Unclosed S-expression".
-
ast_node_type: :list_expression(parsed from[...])children: A list ofids of the child nodes within the list.parsing_error: Can be"Unclosed list".
-
ast_node_type: :map_expression(parsed fromm{...})children: A list ofids of the child nodes (key-value pairs) within the map.parsing_error: Can be"Unclosed map".
-
ast_node_type: :tuple_expression(parsed from{...})children: A list ofids of the child nodes within the tuple.parsing_error: Can be"Unclosed tuple".
-
ast_node_type: :unknown(used for tokens that couldn't be parsed into a more specific type, or for unexpected characters)parsing_error: A string describing the error (e.g., "Unexpected ')'", "Unknown token").
-
ast_node_type: :filechildren: A list ofids of the top-level expression nodes in the file, in order of appearance.raw_string: The entire content of the parsed file.parsing_error: Typicallynilfor the file node itself, errors would be on child nodes or during parsing of specific structures.
-
Note on
childrenfield: For collection types (:s_expression,:list_expression,:map_expression,:tuple_expression,:file), this field holds a list of child nodeids in the order they appear in the source. -
Pseudo-code example of a parsed integer node:
%{ id: 1, type_id: nil, parent_id: nil, # Assuming it's a top-level expression file: "input.til", location: [0, 1, 1, 2, 1, 3], # [offset_start, line_start, col_start, offset_end, line_end, col_end] raw_string: "42", ast_node_type: :literal_integer, value: 42, parsing_error: nil } -
Pseudo-code example of a parsed S-expression node:
%{ id: 2, type_id: nil, parent_id: nil, file: "input.til", location: [4, 1, 5, 15, 1, 16], # Location spans the entire "(add 1 2)" raw_string: "(add 1 2)", ast_node_type: :s_expression, children: [3, 4, 5], # IDs of :symbol "add", :literal_integer 1, :literal_integer 2 parsing_error: nil } -
Pseudo-code example of an unclosed string node:
%{ id: 6, type_id: nil, parent_id: nil, file: "input.til", location: [17, 2, 1, 25, 2, 9], # Spans from opening ' to end of consumed input for the error raw_string: "'unclosed", ast_node_type: :literal_string, value: "unclosed", # The content parsed so far parsing_error: "Unclosed string literal" }
-
Intended Use:
This collection of interconnected node maps forms a graph (specifically, a tree for the basic AST structure, with additional edges for type references, variable bindings, etc.).
- Parsing: The parser will transform the source code into this collection of node maps.
- Type Checking/Inference: The type system will operate on these node maps. Type information (
type_id) will be populated or updated. Constraints for type inference can be associated with nodeids. The immutability of Elixir maps means that updating a node's type information creates a new version of that node map, facilitating the tracking of changes during constraint resolution. - Transpiling: The transpiler will traverse this graph of node maps (potentially enriched with type information) to generate the target Elixir code.
A central registry or context (e.g., a map of id => node_map_data) might be used to store and access all node maps, allowing for efficient lookup and modification (creation of new versions) of individual nodes during various compiler phases.