elipl/project.md
Kacper Marzecki 748f87636a checkpoint
checkpoint

failing test

after fixing tests

checkpoint

checkpoint

checkpoint

re-work

asd

checkpoint

checkpoint

checkpoint

mix proj

checkpoint mix

first parser impl

checkpoint

fix tests

re-org parser

checkpoint strings

fix multiline strings

tuples

checkpoint maps

checkpoint

checkpoint

checkpoint

checkpoint

fix weird eof expression parse error

checkpoint before typing

checkpoint

checpoint

checkpoint

checkpoint

checkpoint ids in primitive types

checkpoint

checkpoint

fix tests

initial annotation

checkpoint

checkpoint

checkpoint

union subtyping

conventions

refactor - split typer

typing tuples

checkpoint test refactor

checkpoint test refactor

parsing atoms

checkpoint atoms

wip lists

checkpoint typing lists

checkopint

checkpoint

wip fixing

correct list typing

map discussion

checkpoint map basic typing

fix tests checkpoint

checkpoint

checkpoint

checkpoint

fix condition typing

fix literal keys in map types

checkpoint union types

checkpoint union type

checkpoint row types discussion & bidirectional typecheck

checkpoint

basic lambdas

checkpoint lambdas typing application

wip function application

checkpoint

checkpoint

checkpoint cduce

checkpoint

checkpoint

checkpoint

checkpoint

checkpoint

checkpoint

checkpoint
2025-06-13 23:48:07 +02:00

35 KiB

Typed Lisp to Elixir Compiler (codename Tilly)

Project Goals

To build a Lisp dialect with a strong, expressive type system that compiles to readable Elixir code. The type system will support advanced features like type inference, union, intersection, negation, refinement, and elements of dependent typing. The long-term vision includes support for compiling to other target languages.

Features

  • Core Language:

    • Lisp syntax and semantics.
    • Basic data structures (lists, atoms, numbers, etc.).
      • numbers 1 2 3 4
      • strings '' 'string' 'other string'
      • lists [] [1 2 3 4]
      • tuples {} {1 2 3 4}
      • maps m{} m{:a 1 :b 2}
    • Functions:
      • only fixed arity functions, no variadic arg lists
      • structure of the function definition
      (defn 
        name 
        (list of parameters with the return type at the end)
        'optional docstring'
        ...forms for the function body 
      )
      
      • function definition where one parameter doesnt have a type -> its to be inferred from usage inside function_body (defn name (arg1 (arg2 type_2) return_type) function_body)
      • function definition where one parameter doesnt have a type -> its to be inferred, but it is pattern-matched in the function head (defn name (m{:a a_field} (arg2 type_2) return_type) function_body)
      • function definition where one parameter doesnt have a type -> its to be inferred, but it is pattern-matched and bound to a name (defn name ((= arg1 m{:a a_field :b (= b_list [])}) (arg2 type_2) return_type) function_body)
      • function definition where one some parameters are generic | universally quantified (defn map_head ( (coll (list ~a)) (mapper (function ~a ~b)) (union ~b nil)) function_body)
      • function definition guards, simillar to elixir (additional where s-expr after the return type) (defn map_head ( (coll (list ~a)) (mapper (function ~a ~b)) ~b (where (some_guard coll))) function_body)
    • Lambdas:
      • lone lambda form (fn (elem) (+ elem 1)
      • lambda used as a parameter (Enum.map collection (fn (elem) (+ elem 1))
  • Type System: (See "Type Representation" under "Key Implementation Areas" for detailed structures)

    • Type Inference: Automatically deduce types where possible.
    • Union Types: A | B (e.g., %{type_kind: :union, types: Set.new([type_A, type_B])}).
    • Intersection Types: A & B (e.g., %{type_kind: :intersection, types: Set.new([type_A, type_B])}).
    • Negation Types: !A (e.g., %{type_kind: :negation, negated_type: type_A}).
    • Refinement Types: Types refined by predicates (e.g., %{type_kind: :refinement, base_type: %{type_kind: :primitive, name: :integer}, var_name: :value, predicate_expr_id: <node_id>}).
    • Dependent Types (Elements of): Types that can depend on values.
      • Length-Indexed Data Structures:
        • e.g., A list of 3 integers: %{type_kind: :list, element_type: %{type_kind: :primitive, name: :integer}, length: 3}.
        • A tuple of specific types: %{type_kind: :tuple, element_types: [type_A, type_B, type_C]}.
        • Benefits: Enables safer operations like nth, take, drop, and can ensure arity for functions expecting fixed-length lists/tuples.
      • Types Dependent on Literal Values:
        • Function return types or argument types can be specialized based on a literal value argument by using %{type_kind: :literal, value: <actual_value>} in type rules.
      • Refinement Types (as a key form of dependent type):
        • e.g., %{type_kind: :refinement, base_type: %{type_kind: :primitive, name: :integer}, var_name: :value, predicate_expr_id: <node_id_of_gt_0_expr>}.
        • Initial implementation would focus on simple, evaluable predicates.
    • Value-Types (Typed Literals):
      • e.g., :some_atom has type %{type_kind: :literal, value: :some_atom}.
      • 42 has type %{type_kind: :literal, value: 42}.
    • Heterogeneous List/Tuple Types:
      • Handled by %{type_kind: :tuple, element_types: [type_A, type_B, ...]}.
    • Structural Map Types:
      • Key-Specific Types: Defined via the known_elements field in the map type representation. Example: %{type_kind: :map, known_elements: %{:name => %{value_type: %{type_kind: :primitive, name: :string}, optional: false}, :age => %{value_type: %{type_kind: :primitive, name: :integer}, optional: false}}, index_signature: nil}.
      • Optional Keys: Indicated by optional: true for an entry in known_elements.
      • Key/Value Constraints (Open Maps): Defined by the index_signature field. Example: %{type_kind: :map, known_elements: %{}, index_signature: %{key_type: %{type_kind: :primitive, name: :atom}, value_type: %{type_kind: :primitive, name: :any}}}.
      • Type Transformation/Refinement: To be handled by type system rules for map operations.
  • Compilation:

    • Target: Elixir: Generate readable and idiomatic Elixir code.
    • Future Targets: Design with extensibility in mind for other languages (e.g., JavaScript).
  • Tooling:

    • Clear error messages from the type checker and compiler.

Key Implementation Areas

  1. Parser:

    • Implement an S-expression parser for the Lisp dialect.
  2. Type System Core:

    • Type Representation: Types are represented as Elixir maps, each with a :type_kind atom (in :snake_case) and other fields specific to that kind.

      • Primitive Types:

        • %{type_kind: :primitive, name: <atom>}
        • Examples: %{type_kind: :primitive, name: :any}, %{type_kind: :primitive, name: :nothing}, %{type_kind: :primitive, name: :integer}, %{type_kind: :primitive, name: :float}, %{type_kind: :primitive, name: :number}, %{type_kind: :primitive, name: :boolean}, %{type_kind: :primitive, name: :string}, %{type_kind: :primitive, name: :atom}.
      • Literal Types (Value Types):

        • %{type_kind: :literal, value: <any_elixir_literal>}
        • Examples: %{type_kind: :literal, value: 42}, %{type_kind: :literal, value: :my_atom}, %{type_kind: :literal, value: "hello"}.
        • strings are started and ended with single quote e.g. 'string'
      • Union Types:

        • %{type_kind: :union, types: Set.new([<type_map>])}
        • types: A set of type maps.
      • Intersection Types:

        • %{type_kind: :intersection, types: Set.new([<type_map>])}
        • types: A set of type maps.
      • Negation Types:

        • %{type_kind: :negation, negated_type: <type_map>}
      • Function Types:

        • %{type_kind: :function, arg_types: [<type_map>], return_type: <type_map>, rest_arg_type: <type_map> | nil}
        • arg_types: Ordered list of type maps.
        • rest_arg_type: Type map for variadic arguments, or nil.
      • List Types:

        • %{type_kind: :list, element_type: <type_map>, length: <non_neg_integer | type_variable_map | nil>}
        • length: nil for any length, an integer for fixed length, or a type variable map for generic/inferred length.
      • Tuple Types:

        • %{type_kind: :tuple, element_types: [<type_map>]}
        • element_types: Ordered list of type maps.
      • Map Types (Structural): Maps in Tilly Lisp are inherently open, meaning they can contain any keys beyond those explicitly known at compile time. The type system aims to provide as much precision as possible for known keys while defining a general pattern for all other keys.

        • Representation:

          • Raw Form (before interning): %{type_kind: :map, known_elements: KE_raw, index_signature: IS_raw}
            • known_elements (KE_raw): An Elixir map where keys are literal Elixir terms (e.g., :name, "id") and values are %{value_type: <type_map_for_value>, optional: <boolean>}.
            • index_signature (IS_raw): Always present. A map %{key_type: <type_map_for_key>, value_type: <type_map_for_value>} describing the types for keys not in known_elements.
          • Interned Form (stored in nodes_map): %{type_kind: :map, id: <unique_type_key>, known_elements: KE_interned, index_signature: IS_interned}
            • id: A unique atom key identifying this canonical map type definition (e.g., :type_map_123).
            • known_elements (KE_interned): An Elixir map where keys are literal Elixir terms and values are %{value_type_id: <type_key_for_value>, optional: <boolean>}.
            • index_signature (IS_interned): Always present. A map %{key_type_id: <type_key_for_general_keys>, value_type_id: <type_key_for_general_values>}.
        • Use Case Scenarios & Typing Approach:

          1. Map Literals:

            • Example: m{:a "hello" :b 1}
            • Inferred Type:
              • known_elements: %{ :a => %{value_type: <type_for_"hello">, optional: false}, :b => %{value_type: <type_for_1>, optional: false} }
              • index_signature: Defaults to %{key_type: <any_type>, value_type: <any_type>}. This signifies that any other keys of any type can exist and map to values of any type.
            • Keys in map literals must be literal values to contribute to known_elements.
          2. Type Annotations:

            • Example: (the (map string integer) my-var)
            • The type (map string integer) resolves to:
              • known_elements: {} (empty, as the annotation describes a general pattern, not specific known keys).
              • index_signature: %{key_type: <string_type>, value_type: <integer_type>}.
          3. Core Map Operations (Language Constructs): The type system will define rules for the following fundamental runtime operations:

            • (map-get map key):

              • If key is a literal (e.g., :a):
                • If :a is in map_type.known_elements: Result is map_type.known_elements[:a].value_type_id. If optional, result is union with nil_type.
                • If :a is not in known_elements but matches map_type.index_signature.key_type_id: Result is map_type.index_signature.value_type_id unioned with nil_type (as the specific key might not exist at runtime).
              • If key's type is general (e.g., atom):
                • Collect types from all matching known_elements (e.g., if :a and :b are known atom keys).
                • Include map_type.index_signature.value_type_id if atom is a subtype of map_type.index_signature.key_type_id.
                • Union all collected types with nil_type.
                • Example: (map-get m{:a "s" :b 1} some_atom_var) could result in type (union string integer nil).
            • (map-put map key value):

              • If key is a literal (e.g., :a):
                • Resulting map type updates or adds :a to known_elements with value's type. index_signature is generally preserved.
                • Example: (map-put m{:b 1} :a "s") results in type for m{:a "s" :b 1}.
              • If key's type is general (e.g., atom):
                • known_elements of the input map type remain unchanged.
                • The index_signature of the resulting map type may become more general. E.g., if map is (map string any) and we map-put with key_type=atom and value_type=integer, the new index_signature might be %{key_type: (union string atom), value_type: (union any integer)}. This is complex and requires careful rule definition.
            • (map-delete map key):

              • If key is a literal (e.g., :a):
                • Resulting map type removes :a from known_elements. index_signature is preserved.
                • Example: (map-delete m{:a "s" :b 1} :a) results in type for m{:b 1}.
              • If key's type is general (e.g., atom):
                • This is complex. Deleting by a general key type doesn't easily translate to a precise change in known_elements. The index_signature might remain, or the operation might be disallowed or result in a very general map type. (Further thought needed for precise semantics).
            • (map-merge map1 map2):

              • Resulting map type combines known_elements from map1 and map2.
                • For keys only in map1 or map2, they are included as is.
                • For keys in both: The value type from map2 takes precedence (last-one-wins semantics for types).
              • The index_signature of the resulting map type will be the most general combination of map1.index_signature and map2.index_signature. (e.g., union of key types, union of value types).
        • Known Limitations:

          • Typing map-put and map-delete with non-literal (general type) keys precisely is challenging and may result in less specific types or require advanced type system features not yet planned (e.g., negation types for keys).
          • Duplicate literal keys in a map literal: The parser/typer will likely adopt a "last one wins" semantic for the value and its type.
        • Required Future Steps & Prerequisites:

          1. Parser Support for (map K V) Annotations:
            • Modify Til.Typer.ExpressionTyper.resolve_type_specifier_node to parse S-expressions like (map <key-type-spec> <value-type-spec>) into a raw map type definition.
            • Prerequisite: Basic type specifier resolution (for K and V).
          2. Typing Map Literals:
            • In Til.Typer.infer_type_for_node_ast (for :map_expression):
              • Construct known_elements from literal keys and inferred value types.
              • Assign a default index_signature (e.g., key_type: any, value_type: any).
            • Prerequisite: Recursive typing of child nodes (values in the map literal).
          3. Interning Map Types:
            • In Til.Typer.Interner.get_or_intern_type:
              • Add logic for type_kind: :map.
              • Recursively intern types within known_elements values and index_signature key/value types to create a canonical, interned map type definition.
              • Store/retrieve these canonical map types.
            • Prerequisite: Interning for primitive and other relevant types.
          4. Subtyping Rules for Maps:
            • In Til.Typer.SubtypeChecker.is_subtype?:
              • Implement rules for map_subtype <?: map_supertype. This involves checking:
                • Compatibility of known_elements (required keys in super must be present and non-optional in sub, with compatible value types).
                • Compatibility of index_signatures (contravariant key types, covariant value types).
                • Keys in sub.known_elements not in super.known_elements must conform to super.index_signature.
            • Prerequisite: Interned map type representation.
          5. Typing Map Operations:
            • In Til.Typer.ExpressionTyper (or a new MapExpressionTyper module):
              • Define S-expression forms for (map-get map key), (map-put map key value), (map-delete map key), (map-merge map1 map2).
              • Implement type inference rules for each of these operations based on the principles outlined in "Use Case Scenarios".
            • Prerequisite: Subtyping rules, interned map types, ability to type map literals and resolve map type annotations.
          • Advanced Map Typing and Row Polymorphism Considerations:

            While the current map type system provides flexibility with known_elements and an index_signature, a future enhancement could be the introduction of row polymorphism. This would allow for more precise typing of functions that operate on maps with a common set of known fields while allowing other fields to vary.

            • Conceptualization: In Tilly's context, a row variable in a map type (e.g., m{ :name string | r }) could represent "the rest of the map fields." Given Tilly's rich map keys (literals of various types, not just simple labels) and the existing index_signature concept, the row variable r could itself be considered a placeholder for another Tilly map type. This means r could have its own known_elements and index_signature, making it more expressive than traditional record-based row polymorphism.

            • Requirements for Implementation:

              • Type Representation: Extend the map type definition to include a row variable (e.g., %{type_kind: :map, known_elements: KE, row_variable_id: <type_key_for_row_var>}). The interaction between a row variable and the index_signature would need careful definition; they might be mutually exclusive or complementary.
              • Unification for Rows: Develop a unification algorithm capable of solving constraints involving row variables (e.g., m{a: T1 | r1} = m{a: T1, b: T2 | r2} implies r1 must unify with or extend m{b: T2 | r2}).
              • Subtyping for Rows: Define subtyping rules (e.g., m{a: T1, b: T2} is a subtype of m{a: T1 | r} if r can be instantiated with m{b: T2}).
              • Generalization & Instantiation: Implement mechanisms for generalizing functions over row variables and instantiating them at call sites.
              • Syntax: Design user-facing syntax for map types with row variables (e.g., (map :key1 type1 ... | row_var_name)).

            Implementing full row polymorphism is a significant undertaking and would build upon the existing map typing foundations. It is currently not in the immediate plan but represents a valuable direction for enhancing the type system's expressiveness for structural data.

      • Refinement Types:

        • %{type_kind: :refinement, base_type: <type_map>, var_name: <atom>, predicate_expr_id: <integer_node_id>}
        • var_name: Atom used to refer to the value within the predicate.
        • predicate_expr_id: AST node ID of the predicate expression.
      • Type Variables:

        • %{type_kind: :type_variable, id: <any_unique_id>, name: <String.t | nil>}
        • id: Unique identifier.
        • name: Optional human-readable name.
      • Alias Types (Named Types):

        • %{type_kind: :alias, name: <atom_alias_name>, parameters: [<atom_param_name>], definition: <type_map>}
        • name: The atom for the alias (e.g., :positive_integer).
        • parameters: List of atoms for generic type parameter names (e.g., [:T]).
        • definition: The type map this alias expands to (may contain type variables from parameters).
      • Function types

        • (fn (arg_1_type arg_2_type) return_type)
    • Type Checking Algorithm: Develop the core logic for verifying type correctness. This will likely involve algorithms for:

      • Unification.
      • Subtyping (e.g., %{type_kind: :primitive, name: :integer} is a subtype of %{type_kind: :primitive, name: :number}).
      • Constraint solving for inference and refinement types.
    • Type Inference Engine: Implement the mechanism to infer types of expressions and definitions.

      • Bidirectional Type Inference: To enhance type inference capabilities, reduce the need for explicit annotations, and provide more precise error messages, the type system could be evolved to use bidirectional type inference. This approach distinguishes between two modes of operation:

        • Synthesis Mode (=>): Infers or synthesizes the type of an expression from its constituent parts. For example, infer(e) yields type T.

        • Checking Mode (<=): Checks if an expression conforms to an expected type provided by its context. For example, check(e, T_expected) verifies e has type T_expected.

        • Benefits:

          • More precise type error reporting (e.g., "expected type X, got type Y in context Z").
          • Reduced annotation burden, as types can flow top-down into expressions.
          • Better handling of polymorphic functions and complex type constructs.
        • Implementation Requirements:

          • Explicit Modes: The core typing algorithm (currently in Til.Typer) would need to be refactored to explicitly support and switch between synthesis and checking modes.
          • Top-Down Type Flow: The expected_type must be propagated downwards during AST traversal in checking mode.
          • Dual Typing Rules: Each language construct (literals, variables, function calls, conditionals, lambdas, etc.) would require distinct typing rules for both synthesis and checking. For instance:
            • A lambda expression, when checked against an expected function type (TA -> TR), can use TA for its parameter types and then check its body against TR. In synthesis mode, parameter annotations might be required.
            • A function application (f arg) would typically synthesize f's type, then check arg against the expected parameter type, and the function's return type becomes the synthesized type of the application.
          • Integration with Polymorphism: Rules for instantiating polymorphic types (when checking) and generalizing types (e.g., for let-bound expressions in synthesis) are crucial.

        Adopting bidirectional type inference would be a significant architectural evolution of the Til.Typer module, moving beyond the current primarily bottom-up synthesis approach.

    • Environment Management: Handle scopes and bindings of names to types (type maps).

    • Function Types, defn, and fn Implementation Plan: This section outlines the plan for introducing function types, user-defined functions (defn), and lambdas (fn) into Tilly.

      1. Type Representation for Functions:

      • Structure: %{type_kind: :function, arg_types: [<type_map_key>], return_type: <type_map_key>, type_params: [<type_variable_key>] | nil}
        • arg_types: An ordered list of type keys for each argument.
        • return_type: A type key for the return value.
        • type_params: (Optional, for polymorphic functions) An ordered list of type keys for universally quantified type variables (e.g., for ~a, ~b). Initially nil for monomorphic functions.
      • Note: Variadic functions (rest_arg_type) are not planned, aligning with Elixir's fixed arity.

      2. Parser Modifications (Til.Parser):

      • defn (User-Defined Function):
        • Syntax: (defn name (arg_spec1 arg_spec2 ... return_type_spec) 'optional_docstring' body_forms...)
          • The return_type_spec is the last element in the parameter S-expression.
          • 'optional_docstring' is a string literal between the parameter S-expression and the body.
        • AST Node (:defn_expression):
          • name_node_id: ID of the function name symbol.
          • params_and_return_s_expr_id: ID of the S-expression node (arg_spec1 ... return_type_spec).
          • arg_spec_node_ids: List of IDs of argument specifier nodes (derived from children of params_and_return_s_expr_id, excluding the last).
          • return_type_spec_node_id: ID of the return type specifier node (last child of params_and_return_s_expr_id).
          • docstring_node_id: Optional ID of the docstring node.
          • body_node_ids: List of IDs for body expressions.
      • fn (Lambda):
        • Syntax: (fn (arg_spec1 ...) body_forms...)
        • AST Node (:lambda_expression):
          • params_s_expr_id: ID of the S-expression node (arg_spec1 ...).
          • arg_spec_node_ids: List of IDs of argument specifier nodes.
          • body_node_ids: List of IDs for body expressions.
      • Argument Specifications (arg_spec):
        • Initially, arg_spec nodes will represent simple symbols (for lambda arguments) or (symbol type_spec) (for defn arguments).
        • More complex patterns (m{:key val}, (= symbol pattern)) and type variables (~a) will be introduced in later phases.

      3. Phased Implementation Plan:

      • Phase 1: Core Function Type Representation & Interning. (Completed)
        • Defined the %{type_kind: :function, ...} structure.
        • Implemented interning logic for this type in Til.Typer.Interner.
      • Phase 2: Basic Lambdas (fn). (In Progress)
        • Parser: Implement parsing for (fn (arg_name1 ...) body_forms...). Argument specs are simple symbols.
        • Typer (infer_type_for_node_ast): For :lambda_expression, argument types default to any. Infer return type from the last body expression. Construct raw function type for interning.
        • Lambda Argument Typing Strategy: Defaulting to any initially. Later, with bidirectional type inference, argument types will be inferred more precisely from usage within the body (potentially using type intersection for multiple constraints) or from the context in which the lambda is used (checking mode).
      • Phase 3: Basic Monomorphic Function Calls.
        • Typer (ExpressionTyper.infer_s_expression_type): Handle S-expressions where the operator's type is a function type. Perform arity checks and subtype checks for arguments. The S-expression's type is the function's return type.
      • Phase 4: Monomorphic defn.
        • Parser: Implement parsing for (defn name (arg_spec1 type_spec1 ... return_type_spec) 'optional_docstring' body_forms...). Require explicit (symbol type_spec) for arguments.
        • Typer (infer_type_for_node_ast and Environment): For :defn_expression, resolve explicit types, construct/intern the function type, update environment for recursion, type body in new lexical scope, and validate return type. The :defn_expression node's type is its interned function type. Til.Typer.Environment.update_env_from_node will add the function to the environment.
      • Phase 5: Introduce Polymorphism (Type Variables, ~a).
        • Update type representations, parser, interner, and typer for type variables and polymorphic function types. Implement unification for function calls.
      • Phase 6: Advanced Argument/Inference Features.
        • Allow defn arguments as symbol (type to be inferred).
        • More sophisticated type inference for lambda arguments.
        • (Later) Pattern matching in arguments, typing for where guards.
  3. Compiler Backend (Elixir):

    • AST Transformation: Transform the Lisp AST (potentially type-annotated) into an Elixir-compatible AST or directly to Elixir code.
    • Mapping Lisp Constructs: Define how Lisp functions, data structures, control flow, and type information translate to Elixir equivalents.
    • Code Generation: Produce Elixir source files.
    • Interop: Consider how the Lisp code will call Elixir code and vice-versa.
  4. Standard Library:

    • Define and implement a basic set of core functions and their types (e.g., list operations, arithmetic, type predicates).
  5. Error Reporting Infrastructure:

    • Design a system for collecting and presenting type errors, compiler errors, and runtime errors (if applicable during compilation phases).
  6. Testing Framework:

    • Develop a comprehensive suite of tests covering:
      • Parser correctness.
      • Type checker correctness (valid and invalid programs).
      • Compiler output (comparing generated Elixir against expected output or behavior).
  7. CLI / Build Tool Integration (Future):

    • A command-line interface for the compiler.
    • Potential integration with build tools like Mix.

Main Data Structure: Node Maps

The core data structure for representing code throughout the parsing, type checking, and transpiling phases will be a collection of "Node Maps." Each syntactic element or significant semantic component of the source code will be represented as an Elixir map.

Structure of a Node Map:

Each node map will contain a set of common fields and a set of fields specific to the kind of AST element it represents.

  • Common Fields (present in all node maps, based on lib/til/parser.ex):

    • id: A unique integer (generated by System.unique_integer([:monotonic, :positive])) for this node.
    • type_id: Initially nil. After type checking/inference, this field will store or reference the type map (as defined in "Type Representation") associated with this AST node.
    • parent_id: The id of the parent node in the AST, or nil if it's a root node or an orphaned element (e.g. an element of an unclosed collection).
    • file: A string indicating the source file name (defaults to "unknown").
    • location: A list: [start_offset, start_line, start_col, end_offset, end_line, end_col].
    • raw_string: The literal string segment from the source code that corresponds to this node.
    • ast_node_type: An atom identifying the kind of AST node.
    • parsing_error: nil if parsing was successful for this node, or a string message if an error occurred specific to this node (e.g., "Unclosed string literal"). For collection nodes, this can indicate issues like being unclosed.
  • AST-Specific Fields & Node Types (current implementation in lib/til/parser.ex):

    • ast_node_type: :literal_integer

      • value: The integer value (e.g., 42).
    • ast_node_type: :symbol

      • name: The string representation of the symbol (e.g., "my-symbol").
    • ast_node_type: :literal_string

      • value: The processed string content (escape sequences are not yet handled, but leading whitespace on subsequent lines is stripped based on the opening quote's column).
      • parsing_error: Can be "Unclosed string literal".
    • ast_node_type: :s_expression

      • children: A list of ids of the child nodes within the S-expression.
      • parsing_error: Can be "Unclosed S-expression".
    • ast_node_type: :list_expression (parsed from [...])

      • children: A list of ids of the child nodes within the list.
      • parsing_error: Can be "Unclosed list".
    • ast_node_type: :map_expression (parsed from m{...})

      • children: A list of ids of the child nodes (key-value pairs) within the map.
      • parsing_error: Can be "Unclosed map".
    • ast_node_type: :tuple_expression (parsed from {...})

      • children: A list of ids of the child nodes within the tuple.
      • parsing_error: Can be "Unclosed tuple".
    • ast_node_type: :unknown (used for tokens that couldn't be parsed into a more specific type, or for unexpected characters)

      • parsing_error: A string describing the error (e.g., "Unexpected ')'", "Unknown token").
    • ast_node_type: :file

      • children: A list of ids of the top-level expression nodes in the file, in order of appearance.
      • raw_string: The entire content of the parsed file.
      • parsing_error: Typically nil for the file node itself, errors would be on child nodes or during parsing of specific structures.
    • Note on children field: For collection types (:s_expression, :list_expression, :map_expression, :tuple_expression, :file), this field holds a list of child node ids in the order they appear in the source.

    • Pseudo-code example of a parsed integer node:

      %{
        id: 1,
        type_id: nil,
        parent_id: nil, # Assuming it's a top-level expression
        file: "input.til",
        location: [0, 1, 1, 2, 1, 3], # [offset_start, line_start, col_start, offset_end, line_end, col_end]
        raw_string: "42",
        ast_node_type: :literal_integer,
        value: 42,
        parsing_error: nil
      }
      
    • Pseudo-code example of a parsed S-expression node:

      %{
        id: 2,
        type_id: nil,
        parent_id: nil,
        file: "input.til",
        location: [4, 1, 5, 15, 1, 16], # Location spans the entire "(add 1 2)"
        raw_string: "(add 1 2)",
        ast_node_type: :s_expression,
        children: [3, 4, 5], # IDs of :symbol "add", :literal_integer 1, :literal_integer 2
        parsing_error: nil
      }
      
    • Pseudo-code example of an unclosed string node:

      %{
        id: 6,
        type_id: nil,
        parent_id: nil,
        file: "input.til",
        location: [17, 2, 1, 25, 2, 9], # Spans from opening ' to end of consumed input for the error
        raw_string: "'unclosed", 
        ast_node_type: :literal_string,
        value: "unclosed", # The content parsed so far
        parsing_error: "Unclosed string literal"
      }
      

Intended Use:

This collection of interconnected node maps forms a graph (specifically, a tree for the basic AST structure, with additional edges for type references, variable bindings, etc.).

  1. Parsing: The parser will transform the source code into this collection of node maps.
  2. Type Checking/Inference: The type system will operate on these node maps. Type information (type_id) will be populated or updated. Constraints for type inference can be associated with node ids. The immutability of Elixir maps means that updating a node's type information creates a new version of that node map, facilitating the tracking of changes during constraint resolution.
  3. Transpiling: The transpiler will traverse this graph of node maps (potentially enriched with type information) to generate the target Elixir code.

A central registry or context (e.g., a map of id => node_map_data) might be used to store and access all node maps, allowing for efficient lookup and modification (creation of new versions) of individual nodes during various compiler phases.