# Tilly Set-Theoretic Type System: Design and CDuce Reference ## Objective: Implement a CDuce-style set-theoretic type system in Elixir for the Tilly compiler. This document outlines the core principles adapted from CDuce and the planned Elixir implementation strategy. Okay, let's lay out a revised and extensive implementation plan for adapting Tilly to a set-theoretic type system. **Overarching Concern:** * **Bare Maps for Extensibility:** We will prioritize using bare Elixir maps for representations like Type Descriptors (`Descr`) instead of structs. This approach maintains flexibility for future extensions, allowing new fields to be added without altering struct definitions. BDD node structures themselves will be represented by simple Elixir terms (atoms or tuples). **I. Foundational BDD and Type Descriptor Infrastructure (New Modules)** This phase focuses on building the core components for the set-theoretic type system. These will primarily be new modules. The `typing_ctx` map will be a central piece of data, passed through most functions in this layer, containing the necessary stores and configurations. 1. **`Tilly.BDD.Node` (Conceptual Definition & Basic Helpers):** * **Node Structures:** We define the canonical Elixir terms representing BDD nodes: * `true`: The atom representing the universal set BDD. * `false`: The atom representing the empty set BDD. * `{:leaf, leaf_value_id}`: Represents a leaf node. `leaf_value_id` can be any term, its interpretation depends on the `ops_module` (e.g., a boolean for simple sets, or another BDD node ID for nested structures). * `{:split, element_id, positive_child_id, ignore_child_id, negative_child_id}`: Represents an internal decision node. * `element_id`: The value being split upon. * `positive_child_id`, `ignore_child_id`, `negative_child_id`: IDs of other BDD nodes in the store. * **Smart Constructors (Low-Level):** This module might contain very basic helper functions for creating these structural tuples if needed, but the primary "smart" construction (with simplification and interning) will occur in `Tilly.BDD.Ops` and use `Tilly.BDD.get_or_intern_node`. * **Purpose:** To establish the fundamental building blocks of BDDs. 2. **`Tilly.BDD` (BDD Store Management):** * **BDD Store (`typing_ctx.bdd_store`):** This will be a map within the `typing_ctx` responsible for hash-consing BDD nodes. It will contain: * `nodes_by_structure`: A map where keys are `{logical_structure_tuple, ops_module_atom}` and values are `unique_node_id` (integer). The `ops_module_atom` is crucial for canonicalization, ensuring structurally identical nodes intended for different operational semantics are distinct. * `structures_by_id`: A map where keys are `unique_node_id` and values are `%{structure: logical_structure_tuple, ops_module: ops_module_atom}`. This allows retrieval of both the node's structure and its associated operations module. * `next_node_id`: An integer for generating new unique node IDs. * **Key Function: `get_or_intern_node(typing_ctx, logical_structure_tuple, ops_module_atom)`:** * Takes the current `typing_ctx`, the node's structural representation (e.g., `{:split, ...}`), and the atom identifying its operations module (e.g., `:integer_bool_bdd_ops`). * Checks `typing_ctx.bdd_store.nodes_by_structure` for an existing node. * If not found, creates a new `unique_node_id`, updates both `nodes_by_structure` and `structures_by_id`, and increments `next_node_id`. * Returns `{new_typing_ctx, node_id}`. * **Helper Functions:** * `get_node_data(typing_ctx, node_id)`: Returns `%{structure: ..., ops_module: ...}`. * `is_false_node?(node_id, typing_ctx)`, `is_true_node?(node_id, typing_ctx)`. * Functions to extract parts of a node structure if needed (e.g., `get_element_from_split_node_id(typing_ctx, node_id)`). 3. **BDD Operations Modules (e.g., `Tilly.BDD.IntegerBoolOps`, `Tilly.BDD.AtomSetOps`, `Tilly.BDD.RecursivePairOps`):** * **Purpose:** These modules provide the specific logic for how BDD elements are compared/hashed and how leaf values are combined/negated for a particular kind of BDD. They are the "plugins" for the generic BDD algorithms. * **Interface Convention (Informal or via Elixir `behaviour`):** Each module will implement a consistent set of functions: * `compare_elements(elem1, elem2)`: Returns `:lt`, `:eq`, or `:gt`. * `equal_element?(elem1, elem2)`: Returns boolean. * `hash_element(elem)`: Returns an integer. * `empty_leaf()`: Returns the leaf value for emptiness. * `any_leaf()`: Returns the leaf value for universality. * `is_empty_leaf?(leaf_val)`: Returns boolean. * `union_leaves(leaf1, leaf2, typing_ctx)`: Combines leaves for union. May need `typing_ctx` for recursive BDD operations if leaves are BDD IDs. * `intersection_leaves(leaf1, leaf2, typing_ctx)`: Combines for intersection. * `negation_leaf(leaf, typing_ctx)`: Negates a leaf. * **Example (`Tilly.BDD.IntegerBoolOps`):** * `compare_elements/2`: Uses standard integer comparison. * `empty_leaf/0`: Returns `false`. * `union_leaves/3`: `fn l1, l2, _ctx -> l1 or l2 end`. 4. **`Tilly.BDD.Ops` (Generic BDD Algorithms):** * **Core Operations:** Implements `union_bdds(node1_id, node2_id, typing_ctx)`, `intersection_bdds/3`, `negation_bdd/2`, `difference_bdd/3`. * These functions take BDD node IDs and the `typing_ctx`. * They fetch node data (structure and `ops_module`) using `Tilly.BDD.get_node_data/2`. * They use `apply(ops_module, :function_name, args)` to invoke element/leaf operations from the appropriate `ops_module`. * They return a `node_id` for the resulting BDD. * **Smart Constructors (Algebraic):** Implements functions like `leaf(leaf_value, ops_module, typing_ctx)` and `split(element, p_id, i_id, n_id, ops_module, typing_ctx)`. * These are the CDuce-style smart constructors that apply simplification rules (e.g., if `p_id == n_id` in `split`, simplify to `p_id ++ i_id`). * After simplification, they call `Tilly.BDD.get_or_intern_node/3` to get/create the canonical node ID. * Example `split/6`: ```elixir def split(x, p_id, i_id, n_id, ops_module, ctx) do # Apply simplification rules based on CDuce bdd.ml, e.g.: # if i_id represents True for ops_module, return True node_id. # if p_id == n_id, result is union_bdds(p_id, i_id, ctx_with_ops_module). # ... other rules ... # If no simplification, create logical_structure: logical_structure = {:split, x, p_id, i_id, n_id} Tilly.BDD.get_or_intern_node(ctx, logical_structure, ops_module) end ``` 5. **`Tilly.Type` (Type Descriptor Definition):** * **`Descr` Map Structure:** Defines the structure of a Type Descriptor map. This map holds IDs of canonical BDDs for each basic kind of type. Example: ```elixir %{ atoms_bdd_id: node_id, # BDD for specific atoms (elements: atoms, leaves: booleans) integers_bdd_id: node_id, # BDD for specific integers (elements: integers, leaves: booleans) strings_bdd_id: node_id, # BDD for specific strings tuples_bdd_id: node_id, # BDD for pairs/tuples. Elements might be type vars or first component types; leaves might be BDD IDs for rest of tuple. maps_bdd_id: node_id, # BDD for maps. Elements might be labels or key types; leaves might be BDD IDs for value types or record structures. functions_bdd_id: node_id,# BDD for functions. # ... other kinds as needed, e.g., for abstract types ... absent_marker_bdd_id: node_id # BDD representing if 'absent' or 'nil' is part of the type } ``` * **Helper Functions:** May include functions like `empty_descr(typing_ctx)` which creates a `Descr` map where all `*_bdd_id` fields point to the `false` BDD node ID, and `any_descr(typing_ctx)` where they point to `true` BDDs (configured with appropriate `ops_module` for each field). 6. **`Tilly.Type.Store` (Type Descriptor Interning):** * **Type Store (`typing_ctx.type_store`):** A map within `typing_ctx` for interning `Descr` maps. * `descrs_by_structure`: Maps a canonical `Descr` map (where all BDD IDs are already canonical) to a `unique_descr_id` (integer). * `structures_by_id`: Maps `unique_descr_id` back to the canonical `Descr` map. * `next_descr_id`: For generating new unique `Descr` IDs. * **Key Function: `get_or_intern_descr(typing_ctx, descr_map)`:** * Ensures all `*_bdd_id` values in `descr_map` are canonical. * Looks up `descr_map` in `descrs_by_structure`. * If not found, creates a new ID, updates stores, and increments `next_descr_id`. * Returns `{new_typing_ctx, descr_id}`. 7. **`Tilly.Type.Ops` (Operations on Type Descriptors):** * **Set Operations:** Implements `union_types(typing_ctx, descr1_id, descr2_id)`, `intersection_types/3`, `negation_type/2`. * These operate on `Descr` IDs. * Example `union_types/3`: 1. Fetch `descr_map1` and `descr_map2` from `typing_ctx.type_store` using their IDs. 2. For each corresponding field (e.g., `atoms_bdd_id`): * Call `Tilly.BDD.Ops.union_bdds(descr_map1.atoms_bdd_id, descr_map2.atoms_bdd_id, typing_ctx)`. The `typing_ctx` is passed along; the `ops_module` for these BDDs is retrieved from the `bdd_store` by `union_bdds`. 3. Construct a `result_descr_map` using the new BDD IDs for each field. 4. Intern this `result_descr_map` using `Tilly.Type.Store.get_or_intern_descr(typing_ctx, result_descr_map)` to get the final `result_descr_id`. 5. Return `{new_typing_ctx, result_descr_id}`. * **Construction Helpers:** Functions to create specific types, e.g., `create_integer_type(typing_ctx, integer_value)` or `create_atom_type(typing_ctx, atom_value)`. These would construct the appropriate `Descr` map (with a BDD representing only that literal) and intern it. **II. Adapting Existing Typer Modules** This phase involves refactoring existing modules to use the new BDD-based type infrastructure. The `typing_ctx` (initialized with empty stores) will be threaded through all relevant type-checking functions. 1. **`Til.Typer.Types` (Major Refactor):** * Functions like `get_primitive_type(:integer)` will now use `Tilly.Type.Ops` or dedicated constructors (e.g., `Tilly.Type.integer_singleton(typing_ctx, value)` or `Tilly.Type.integer_any(typing_ctx)`) to get/create an interned `Descr` ID. * For example, `get_primitive_type(:integer)` would return the `Descr` ID for the type "all integers". This `Descr` would have its `integers_bdd_id` pointing to a `true` BDD (interned with an `ops_module` like `:integer_bool_bdd_ops`), and other fields like `atoms_bdd_id` pointing to `false` BDDs. * The concept of predefined type *maps* (as Elixir maps with `:type_kind`) is replaced by predefined canonical `Descr` IDs, likely initialized once and stored or retrieved via helper functions. 2. **`Til.Typer.Interner` (Major Refactor/Integration):** * Its current role of interning Elixir map-based type representations is entirely replaced by `Tilly.BDD.get_or_intern_node` and `Tilly.Type.Store.get_or_intern_descr`. * `populate_known_types` will be responsible for initializing the `typing_ctx` by creating and interning the `Descr` IDs for fundamental base types (e.g., `any`, `nothing`, the general `integer` type, `atom` type). This involves creating the necessary BDDs (often just `true` or `false` BDDs for the respective fields in the `Descr`) with their appropriate `ops_module`s. 3. **`Til.Typer` and `Til.Typer.ExpressionTyper` (Significant Refactor):** * **`typing_ctx` Threading:** All type inference and checking functions will now take `typing_ctx` as an argument and return an updated `typing_ctx` along with their results. * `infer_type_for_node_ast` (or similar functions): * For literals (e.g., `123`, `:foo`): Will call functions like `Tilly.Type.Ops.create_integer_literal_type(typing_ctx, 123)` which constructs a `Descr` map where `integers_bdd_id` points to a BDD representing only `123` (and other fields are empty BDDs), interns it, and returns the `{new_ctx, descr_id}`. * For operations that combine types (e.g., an `if` expression's branches have types `descr_id_A` and `descr_id_B`): Will use `Tilly.Type.Ops.union_types(typing_ctx, descr_id_A, descr_id_B)` to get the `{new_ctx, result_descr_id}`. * `resolve_type_specifier_node`: * For basic type names (`Integer`, `Atom`): Will fetch their canonical `Descr` IDs (e.g., from helper functions in `Til.Typer.Types` that retrieve pre-interned IDs). * For type expressions like `(or TypeA TypeB)`: Will recursively call itself to get `descr_id_A` and `descr_id_B`, then use `Tilly.Type.Ops.union_types(typing_ctx, descr_id_A, descr_id_B)`. Similarly for `(and ...)` and `(not ...)`. * The `type_id` field in AST nodes will now store the `Descr` ID (an integer) obtained from `Tilly.Type.Store`. 4. **`Til.Typer.SubtypeChecker` (Complete Rewrite):** * `is_subtype?(typing_ctx, descr_a_id, descr_b_id)`: 1. `{typing_ctx, descr_not_b_id} = Tilly.Type.Ops.negation_type(typing_ctx, descr_b_id)`. 2. `{typing_ctx, intersection_id} = Tilly.Type.Ops.intersection_types(typing_ctx, descr_a_id, descr_not_b_id)`. 3. `Tilly.Type.Ops.is_empty_type?(typing_ctx, intersection_id)`. * `Tilly.Type.Ops.is_empty_type?(typing_ctx, descr_id)`: 1. Fetches `descr_map` from `typing_ctx.type_store`. 2. Checks if all `*_bdd_id` fields in `descr_map` point to the canonical `false` BDD node ID (by calling `Tilly.BDD.is_false_node?/2` for each). **III. Handling Specific Type Kinds from `project.md`** The existing type kinds in `project.md` will be mapped to specific configurations of `Descr` maps and their constituent BDDs. * **Primitive Types:** `%{type_kind: :primitive, name: :integer}` becomes a canonical `Descr` ID. For "all integers", its `integers_bdd_id` points to a `true` BDD (interned with `ops_module: :integer_bool_bdd_ops`), and other fields (like `atoms_bdd_id`) point to `false` BDDs. * **Literal Types:** `%{type_kind: :literal, value: 42}` becomes a `Descr` ID. Its `integers_bdd_id` points to a BDD representing *only* the value `42` (e.g., a single `Split` node leading to a `true` leaf, or a more optimized representation if available), and other fields are `false` BDDs. * **Union/Intersection/Negation Types:** These are no longer explicit `type_kind`s in the `Descr` map itself. They are the *result* of applying `Tilly.Type.Ops.union_types`, `intersection_types`, or `negation_type` to other `Descr` IDs. The resulting `Descr` ID represents the union/intersection/negation. * **Function, List, Tuple, Map Types:** * These will be represented by `Descr` maps where the corresponding BDD field is non-empty and structured appropriately. For example, a list of integers would have a non-empty `pairs_bdd_id`. * The BDDs for these types will be more complex: * **Lists/Tuples (`pairs_bdd_id`):** The BDD might split on the type of the first element, with leaves being BDD IDs representing the type of the rest of the list/tuple. The `ops_module` for this BDD would handle element types and BDD ID leaves (e.g., `:recursive_pair_ops`). * **Maps/Records (`records_bdd_id`):** This BDD would split on record labels or key types. Leaves could be BDD IDs for value types or more complex record structures. This is a significant area requiring careful design inspired by CDuce's record BDDs. The `project.md` details on `known_elements` and `index_signature` will need to be encoded into the BDD structure and its `ops_module`. * **Functions (`functions_bdd_id`):** BDDs might split on argument types or arity, with leaves representing return types (possibly as BDD IDs). * This is an advanced stage. Initially, constructing these might involve specific helper functions in `Tilly.Type.Ops` (e.g., `create_list_type(typing_ctx, element_descr_id)`). ---