elipl/set_types.md
Kacper Marzecki 748f87636a checkpoint
checkpoint

failing test

after fixing tests

checkpoint

checkpoint

checkpoint

re-work

asd

checkpoint

checkpoint

checkpoint

mix proj

checkpoint mix

first parser impl

checkpoint

fix tests

re-org parser

checkpoint strings

fix multiline strings

tuples

checkpoint maps

checkpoint

checkpoint

checkpoint

checkpoint

fix weird eof expression parse error

checkpoint before typing

checkpoint

checpoint

checkpoint

checkpoint

checkpoint ids in primitive types

checkpoint

checkpoint

fix tests

initial annotation

checkpoint

checkpoint

checkpoint

union subtyping

conventions

refactor - split typer

typing tuples

checkpoint test refactor

checkpoint test refactor

parsing atoms

checkpoint atoms

wip lists

checkpoint typing lists

checkopint

checkpoint

wip fixing

correct list typing

map discussion

checkpoint map basic typing

fix tests checkpoint

checkpoint

checkpoint

checkpoint

fix condition typing

fix literal keys in map types

checkpoint union types

checkpoint union type

checkpoint row types discussion & bidirectional typecheck

checkpoint

basic lambdas

checkpoint lambdas typing application

wip function application

checkpoint

checkpoint

checkpoint cduce

checkpoint

checkpoint

checkpoint

checkpoint

checkpoint

checkpoint

checkpoint
2025-06-13 23:48:07 +02:00

16 KiB

Tilly Set-Theoretic Type System: Design and CDuce Reference

Objective:

Implement a CDuce-style set-theoretic type system in Elixir for the Tilly compiler. This document outlines the core principles adapted from CDuce and the planned Elixir implementation strategy.

Okay, let's lay out a revised and extensive implementation plan for adapting Tilly to a set-theoretic type system.

Overarching Concern:

  • Bare Maps for Extensibility: We will prioritize using bare Elixir maps for representations like Type Descriptors (Descr) instead of structs. This approach maintains flexibility for future extensions, allowing new fields to be added without altering struct definitions. BDD node structures themselves will be represented by simple Elixir terms (atoms or tuples).

I. Foundational BDD and Type Descriptor Infrastructure (New Modules)

This phase focuses on building the core components for the set-theoretic type system. These will primarily be new modules. The typing_ctx map will be a central piece of data, passed through most functions in this layer, containing the necessary stores and configurations.

  1. Tilly.BDD.Node (Conceptual Definition & Basic Helpers):

    • Node Structures: We define the canonical Elixir terms representing BDD nodes:
      • true: The atom representing the universal set BDD.
      • false: The atom representing the empty set BDD.
      • {:leaf, leaf_value_id}: Represents a leaf node. leaf_value_id can be any term, its interpretation depends on the ops_module (e.g., a boolean for simple sets, or another BDD node ID for nested structures).
      • {:split, element_id, positive_child_id, ignore_child_id, negative_child_id}: Represents an internal decision node.
        • element_id: The value being split upon.
        • positive_child_id, ignore_child_id, negative_child_id: IDs of other BDD nodes in the store.
    • Smart Constructors (Low-Level): This module might contain very basic helper functions for creating these structural tuples if needed, but the primary "smart" construction (with simplification and interning) will occur in Tilly.BDD.Ops and use Tilly.BDD.get_or_intern_node.
    • Purpose: To establish the fundamental building blocks of BDDs.
  2. Tilly.BDD (BDD Store Management):

    • BDD Store (typing_ctx.bdd_store): This will be a map within the typing_ctx responsible for hash-consing BDD nodes. It will contain:
      • nodes_by_structure: A map where keys are {logical_structure_tuple, ops_module_atom} and values are unique_node_id (integer). The ops_module_atom is crucial for canonicalization, ensuring structurally identical nodes intended for different operational semantics are distinct.
      • structures_by_id: A map where keys are unique_node_id and values are %{structure: logical_structure_tuple, ops_module: ops_module_atom}. This allows retrieval of both the node's structure and its associated operations module.
      • next_node_id: An integer for generating new unique node IDs.
    • Key Function: get_or_intern_node(typing_ctx, logical_structure_tuple, ops_module_atom):
      • Takes the current typing_ctx, the node's structural representation (e.g., {:split, ...}), and the atom identifying its operations module (e.g., :integer_bool_bdd_ops).
      • Checks typing_ctx.bdd_store.nodes_by_structure for an existing node.
      • If not found, creates a new unique_node_id, updates both nodes_by_structure and structures_by_id, and increments next_node_id.
      • Returns {new_typing_ctx, node_id}.
    • Helper Functions:
      • get_node_data(typing_ctx, node_id): Returns %{structure: ..., ops_module: ...}.
      • is_false_node?(node_id, typing_ctx), is_true_node?(node_id, typing_ctx).
      • Functions to extract parts of a node structure if needed (e.g., get_element_from_split_node_id(typing_ctx, node_id)).
  3. BDD Operations Modules (e.g., Tilly.BDD.IntegerBoolOps, Tilly.BDD.AtomSetOps, Tilly.BDD.RecursivePairOps):

    • Purpose: These modules provide the specific logic for how BDD elements are compared/hashed and how leaf values are combined/negated for a particular kind of BDD. They are the "plugins" for the generic BDD algorithms.
    • Interface Convention (Informal or via Elixir behaviour): Each module will implement a consistent set of functions:
      • compare_elements(elem1, elem2): Returns :lt, :eq, or :gt.
      • equal_element?(elem1, elem2): Returns boolean.
      • hash_element(elem): Returns an integer.
      • empty_leaf(): Returns the leaf value for emptiness.
      • any_leaf(): Returns the leaf value for universality.
      • is_empty_leaf?(leaf_val): Returns boolean.
      • union_leaves(leaf1, leaf2, typing_ctx): Combines leaves for union. May need typing_ctx for recursive BDD operations if leaves are BDD IDs.
      • intersection_leaves(leaf1, leaf2, typing_ctx): Combines for intersection.
      • negation_leaf(leaf, typing_ctx): Negates a leaf.
    • Example (Tilly.BDD.IntegerBoolOps):
      • compare_elements/2: Uses standard integer comparison.
      • empty_leaf/0: Returns false.
      • union_leaves/3: fn l1, l2, _ctx -> l1 or l2 end.
  4. Tilly.BDD.Ops (Generic BDD Algorithms):

    • Core Operations: Implements union_bdds(node1_id, node2_id, typing_ctx), intersection_bdds/3, negation_bdd/2, difference_bdd/3.
      • These functions take BDD node IDs and the typing_ctx.
      • They fetch node data (structure and ops_module) using Tilly.BDD.get_node_data/2.
      • They use apply(ops_module, :function_name, args) to invoke element/leaf operations from the appropriate ops_module.
      • They return a node_id for the resulting BDD.
    • Smart Constructors (Algebraic): Implements functions like leaf(leaf_value, ops_module, typing_ctx) and split(element, p_id, i_id, n_id, ops_module, typing_ctx).
      • These are the CDuce-style smart constructors that apply simplification rules (e.g., if p_id == n_id in split, simplify to p_id ++ i_id).
      • After simplification, they call Tilly.BDD.get_or_intern_node/3 to get/create the canonical node ID.
      • Example split/6:
        def split(x, p_id, i_id, n_id, ops_module, ctx) do
          # Apply simplification rules based on CDuce bdd.ml, e.g.:
          # if i_id represents True for ops_module, return True node_id.
          # if p_id == n_id, result is union_bdds(p_id, i_id, ctx_with_ops_module).
          # ... other rules ...
          # If no simplification, create logical_structure:
          logical_structure = {:split, x, p_id, i_id, n_id}
          Tilly.BDD.get_or_intern_node(ctx, logical_structure, ops_module)
        end
        
  5. Tilly.Type (Type Descriptor Definition):

    • Descr Map Structure: Defines the structure of a Type Descriptor map. This map holds IDs of canonical BDDs for each basic kind of type. Example:
      %{
        atoms_bdd_id: node_id,    # BDD for specific atoms (elements: atoms, leaves: booleans)
        integers_bdd_id: node_id, # BDD for specific integers (elements: integers, leaves: booleans)
        strings_bdd_id: node_id,  # BDD for specific strings
        tuples_bdd_id: node_id,    # BDD for pairs/tuples. Elements might be type vars or first component types; leaves might be BDD IDs for rest of tuple.
        maps_bdd_id: node_id,  # BDD for maps. Elements might be labels or key types; leaves might be BDD IDs for value types or record structures.
        functions_bdd_id: node_id,# BDD for functions.
        # ... other kinds as needed, e.g., for abstract types ...
        absent_marker_bdd_id: node_id # BDD representing if 'absent' or 'nil' is part of the type
      }
      
    • Helper Functions: May include functions like empty_descr(typing_ctx) which creates a Descr map where all *_bdd_id fields point to the false BDD node ID, and any_descr(typing_ctx) where they point to true BDDs (configured with appropriate ops_module for each field).
  6. Tilly.Type.Store (Type Descriptor Interning):

    • Type Store (typing_ctx.type_store): A map within typing_ctx for interning Descr maps.
      • descrs_by_structure: Maps a canonical Descr map (where all BDD IDs are already canonical) to a unique_descr_id (integer).
      • structures_by_id: Maps unique_descr_id back to the canonical Descr map.
      • next_descr_id: For generating new unique Descr IDs.
    • Key Function: get_or_intern_descr(typing_ctx, descr_map):
      • Ensures all *_bdd_id values in descr_map are canonical.
      • Looks up descr_map in descrs_by_structure.
      • If not found, creates a new ID, updates stores, and increments next_descr_id.
      • Returns {new_typing_ctx, descr_id}.
  7. Tilly.Type.Ops (Operations on Type Descriptors):

    • Set Operations: Implements union_types(typing_ctx, descr1_id, descr2_id), intersection_types/3, negation_type/2.
      • These operate on Descr IDs.
      • Example union_types/3:
        1. Fetch descr_map1 and descr_map2 from typing_ctx.type_store using their IDs.
        2. For each corresponding field (e.g., atoms_bdd_id):
          • Call Tilly.BDD.Ops.union_bdds(descr_map1.atoms_bdd_id, descr_map2.atoms_bdd_id, typing_ctx). The typing_ctx is passed along; the ops_module for these BDDs is retrieved from the bdd_store by union_bdds.
        3. Construct a result_descr_map using the new BDD IDs for each field.
        4. Intern this result_descr_map using Tilly.Type.Store.get_or_intern_descr(typing_ctx, result_descr_map) to get the final result_descr_id.
        5. Return {new_typing_ctx, result_descr_id}.
    • Construction Helpers: Functions to create specific types, e.g., create_integer_type(typing_ctx, integer_value) or create_atom_type(typing_ctx, atom_value). These would construct the appropriate Descr map (with a BDD representing only that literal) and intern it.

II. Adapting Existing Typer Modules

This phase involves refactoring existing modules to use the new BDD-based type infrastructure. The typing_ctx (initialized with empty stores) will be threaded through all relevant type-checking functions.

  1. Til.Typer.Types (Major Refactor):

    • Functions like get_primitive_type(:integer) will now use Tilly.Type.Ops or dedicated constructors (e.g., Tilly.Type.integer_singleton(typing_ctx, value) or Tilly.Type.integer_any(typing_ctx)) to get/create an interned Descr ID.
      • For example, get_primitive_type(:integer) would return the Descr ID for the type "all integers". This Descr would have its integers_bdd_id pointing to a true BDD (interned with an ops_module like :integer_bool_bdd_ops), and other fields like atoms_bdd_id pointing to false BDDs.
    • The concept of predefined type maps (as Elixir maps with :type_kind) is replaced by predefined canonical Descr IDs, likely initialized once and stored or retrieved via helper functions.
  2. Til.Typer.Interner (Major Refactor/Integration):

    • Its current role of interning Elixir map-based type representations is entirely replaced by Tilly.BDD.get_or_intern_node and Tilly.Type.Store.get_or_intern_descr.
    • populate_known_types will be responsible for initializing the typing_ctx by creating and interning the Descr IDs for fundamental base types (e.g., any, nothing, the general integer type, atom type). This involves creating the necessary BDDs (often just true or false BDDs for the respective fields in the Descr) with their appropriate ops_modules.
  3. Til.Typer and Til.Typer.ExpressionTyper (Significant Refactor):

    • typing_ctx Threading: All type inference and checking functions will now take typing_ctx as an argument and return an updated typing_ctx along with their results.
    • infer_type_for_node_ast (or similar functions):
      • For literals (e.g., 123, :foo): Will call functions like Tilly.Type.Ops.create_integer_literal_type(typing_ctx, 123) which constructs a Descr map where integers_bdd_id points to a BDD representing only 123 (and other fields are empty BDDs), interns it, and returns the {new_ctx, descr_id}.
      • For operations that combine types (e.g., an if expression's branches have types descr_id_A and descr_id_B): Will use Tilly.Type.Ops.union_types(typing_ctx, descr_id_A, descr_id_B) to get the {new_ctx, result_descr_id}.
    • resolve_type_specifier_node:
      • For basic type names (Integer, Atom): Will fetch their canonical Descr IDs (e.g., from helper functions in Til.Typer.Types that retrieve pre-interned IDs).
      • For type expressions like (or TypeA TypeB): Will recursively call itself to get descr_id_A and descr_id_B, then use Tilly.Type.Ops.union_types(typing_ctx, descr_id_A, descr_id_B). Similarly for (and ...) and (not ...).
    • The type_id field in AST nodes will now store the Descr ID (an integer) obtained from Tilly.Type.Store.
  4. Til.Typer.SubtypeChecker (Complete Rewrite):

    • is_subtype?(typing_ctx, descr_a_id, descr_b_id):
      1. {typing_ctx, descr_not_b_id} = Tilly.Type.Ops.negation_type(typing_ctx, descr_b_id).
      2. {typing_ctx, intersection_id} = Tilly.Type.Ops.intersection_types(typing_ctx, descr_a_id, descr_not_b_id).
      3. Tilly.Type.Ops.is_empty_type?(typing_ctx, intersection_id).
    • Tilly.Type.Ops.is_empty_type?(typing_ctx, descr_id):
      1. Fetches descr_map from typing_ctx.type_store.
      2. Checks if all *_bdd_id fields in descr_map point to the canonical false BDD node ID (by calling Tilly.BDD.is_false_node?/2 for each).

III. Handling Specific Type Kinds from project.md

The existing type kinds in project.md will be mapped to specific configurations of Descr maps and their constituent BDDs.

  • Primitive Types: %{type_kind: :primitive, name: :integer} becomes a canonical Descr ID. For "all integers", its integers_bdd_id points to a true BDD (interned with ops_module: :integer_bool_bdd_ops), and other fields (like atoms_bdd_id) point to false BDDs.
  • Literal Types: %{type_kind: :literal, value: 42} becomes a Descr ID. Its integers_bdd_id points to a BDD representing only the value 42 (e.g., a single Split node leading to a true leaf, or a more optimized representation if available), and other fields are false BDDs.
  • Union/Intersection/Negation Types: These are no longer explicit type_kinds in the Descr map itself. They are the result of applying Tilly.Type.Ops.union_types, intersection_types, or negation_type to other Descr IDs. The resulting Descr ID represents the union/intersection/negation.
  • Function, List, Tuple, Map Types:
    • These will be represented by Descr maps where the corresponding BDD field is non-empty and structured appropriately. For example, a list of integers would have a non-empty pairs_bdd_id.
    • The BDDs for these types will be more complex:
      • Lists/Tuples (pairs_bdd_id): The BDD might split on the type of the first element, with leaves being BDD IDs representing the type of the rest of the list/tuple. The ops_module for this BDD would handle element types and BDD ID leaves (e.g., :recursive_pair_ops).
      • Maps/Records (records_bdd_id): This BDD would split on record labels or key types. Leaves could be BDD IDs for value types or more complex record structures. This is a significant area requiring careful design inspired by CDuce's record BDDs. The project.md details on known_elements and index_signature will need to be encoded into the BDD structure and its ops_module.
      • Functions (functions_bdd_id): BDDs might split on argument types or arity, with leaves representing return types (possibly as BDD IDs).
    • This is an advanced stage. Initially, constructing these might involve specific helper functions in Tilly.Type.Ops (e.g., create_list_type(typing_ctx, element_descr_id)).