checkpoint failing test after fixing tests checkpoint checkpoint checkpoint re-work asd checkpoint checkpoint checkpoint mix proj checkpoint mix first parser impl checkpoint fix tests re-org parser checkpoint strings fix multiline strings tuples checkpoint maps checkpoint checkpoint checkpoint checkpoint fix weird eof expression parse error checkpoint before typing checkpoint checpoint checkpoint checkpoint checkpoint ids in primitive types checkpoint checkpoint fix tests initial annotation checkpoint checkpoint checkpoint union subtyping conventions refactor - split typer typing tuples checkpoint test refactor checkpoint test refactor parsing atoms checkpoint atoms wip lists checkpoint typing lists checkopint checkpoint wip fixing correct list typing map discussion checkpoint map basic typing fix tests checkpoint checkpoint checkpoint checkpoint fix condition typing fix literal keys in map types checkpoint union types checkpoint union type checkpoint row types discussion & bidirectional typecheck checkpoint basic lambdas checkpoint lambdas typing application wip function application checkpoint checkpoint checkpoint cduce checkpoint checkpoint checkpoint checkpoint checkpoint checkpoint checkpoint
16 KiB
Tilly Set-Theoretic Type System: Design and CDuce Reference
Objective:
Implement a CDuce-style set-theoretic type system in Elixir for the Tilly compiler. This document outlines the core principles adapted from CDuce and the planned Elixir implementation strategy.
Okay, let's lay out a revised and extensive implementation plan for adapting Tilly to a set-theoretic type system.
Overarching Concern:
- Bare Maps for Extensibility: We will prioritize using bare Elixir maps for representations like Type Descriptors (
Descr) instead of structs. This approach maintains flexibility for future extensions, allowing new fields to be added without altering struct definitions. BDD node structures themselves will be represented by simple Elixir terms (atoms or tuples).
I. Foundational BDD and Type Descriptor Infrastructure (New Modules)
This phase focuses on building the core components for the set-theoretic type system. These will primarily be new modules. The typing_ctx map will be a central piece of data, passed through most functions in this layer, containing the necessary stores and configurations.
-
Tilly.BDD.Node(Conceptual Definition & Basic Helpers):- Node Structures: We define the canonical Elixir terms representing BDD nodes:
true: The atom representing the universal set BDD.false: The atom representing the empty set BDD.{:leaf, leaf_value_id}: Represents a leaf node.leaf_value_idcan be any term, its interpretation depends on theops_module(e.g., a boolean for simple sets, or another BDD node ID for nested structures).{:split, element_id, positive_child_id, ignore_child_id, negative_child_id}: Represents an internal decision node.element_id: The value being split upon.positive_child_id,ignore_child_id,negative_child_id: IDs of other BDD nodes in the store.
- Smart Constructors (Low-Level): This module might contain very basic helper functions for creating these structural tuples if needed, but the primary "smart" construction (with simplification and interning) will occur in
Tilly.BDD.Opsand useTilly.BDD.get_or_intern_node. - Purpose: To establish the fundamental building blocks of BDDs.
- Node Structures: We define the canonical Elixir terms representing BDD nodes:
-
Tilly.BDD(BDD Store Management):- BDD Store (
typing_ctx.bdd_store): This will be a map within thetyping_ctxresponsible for hash-consing BDD nodes. It will contain:nodes_by_structure: A map where keys are{logical_structure_tuple, ops_module_atom}and values areunique_node_id(integer). Theops_module_atomis crucial for canonicalization, ensuring structurally identical nodes intended for different operational semantics are distinct.structures_by_id: A map where keys areunique_node_idand values are%{structure: logical_structure_tuple, ops_module: ops_module_atom}. This allows retrieval of both the node's structure and its associated operations module.next_node_id: An integer for generating new unique node IDs.
- Key Function:
get_or_intern_node(typing_ctx, logical_structure_tuple, ops_module_atom):- Takes the current
typing_ctx, the node's structural representation (e.g.,{:split, ...}), and the atom identifying its operations module (e.g.,:integer_bool_bdd_ops). - Checks
typing_ctx.bdd_store.nodes_by_structurefor an existing node. - If not found, creates a new
unique_node_id, updates bothnodes_by_structureandstructures_by_id, and incrementsnext_node_id. - Returns
{new_typing_ctx, node_id}.
- Takes the current
- Helper Functions:
get_node_data(typing_ctx, node_id): Returns%{structure: ..., ops_module: ...}.is_false_node?(node_id, typing_ctx),is_true_node?(node_id, typing_ctx).- Functions to extract parts of a node structure if needed (e.g.,
get_element_from_split_node_id(typing_ctx, node_id)).
- BDD Store (
-
BDD Operations Modules (e.g.,
Tilly.BDD.IntegerBoolOps,Tilly.BDD.AtomSetOps,Tilly.BDD.RecursivePairOps):- Purpose: These modules provide the specific logic for how BDD elements are compared/hashed and how leaf values are combined/negated for a particular kind of BDD. They are the "plugins" for the generic BDD algorithms.
- Interface Convention (Informal or via Elixir
behaviour): Each module will implement a consistent set of functions:compare_elements(elem1, elem2): Returns:lt,:eq, or:gt.equal_element?(elem1, elem2): Returns boolean.hash_element(elem): Returns an integer.empty_leaf(): Returns the leaf value for emptiness.any_leaf(): Returns the leaf value for universality.is_empty_leaf?(leaf_val): Returns boolean.union_leaves(leaf1, leaf2, typing_ctx): Combines leaves for union. May needtyping_ctxfor recursive BDD operations if leaves are BDD IDs.intersection_leaves(leaf1, leaf2, typing_ctx): Combines for intersection.negation_leaf(leaf, typing_ctx): Negates a leaf.
- Example (
Tilly.BDD.IntegerBoolOps):compare_elements/2: Uses standard integer comparison.empty_leaf/0: Returnsfalse.union_leaves/3:fn l1, l2, _ctx -> l1 or l2 end.
-
Tilly.BDD.Ops(Generic BDD Algorithms):- Core Operations: Implements
union_bdds(node1_id, node2_id, typing_ctx),intersection_bdds/3,negation_bdd/2,difference_bdd/3.- These functions take BDD node IDs and the
typing_ctx. - They fetch node data (structure and
ops_module) usingTilly.BDD.get_node_data/2. - They use
apply(ops_module, :function_name, args)to invoke element/leaf operations from the appropriateops_module. - They return a
node_idfor the resulting BDD.
- These functions take BDD node IDs and the
- Smart Constructors (Algebraic): Implements functions like
leaf(leaf_value, ops_module, typing_ctx)andsplit(element, p_id, i_id, n_id, ops_module, typing_ctx).- These are the CDuce-style smart constructors that apply simplification rules (e.g., if
p_id == n_idinsplit, simplify top_id ++ i_id). - After simplification, they call
Tilly.BDD.get_or_intern_node/3to get/create the canonical node ID. - Example
split/6:def split(x, p_id, i_id, n_id, ops_module, ctx) do # Apply simplification rules based on CDuce bdd.ml, e.g.: # if i_id represents True for ops_module, return True node_id. # if p_id == n_id, result is union_bdds(p_id, i_id, ctx_with_ops_module). # ... other rules ... # If no simplification, create logical_structure: logical_structure = {:split, x, p_id, i_id, n_id} Tilly.BDD.get_or_intern_node(ctx, logical_structure, ops_module) end
- These are the CDuce-style smart constructors that apply simplification rules (e.g., if
- Core Operations: Implements
-
Tilly.Type(Type Descriptor Definition):DescrMap Structure: Defines the structure of a Type Descriptor map. This map holds IDs of canonical BDDs for each basic kind of type. Example:%{ atoms_bdd_id: node_id, # BDD for specific atoms (elements: atoms, leaves: booleans) integers_bdd_id: node_id, # BDD for specific integers (elements: integers, leaves: booleans) strings_bdd_id: node_id, # BDD for specific strings tuples_bdd_id: node_id, # BDD for pairs/tuples. Elements might be type vars or first component types; leaves might be BDD IDs for rest of tuple. maps_bdd_id: node_id, # BDD for maps. Elements might be labels or key types; leaves might be BDD IDs for value types or record structures. functions_bdd_id: node_id,# BDD for functions. # ... other kinds as needed, e.g., for abstract types ... absent_marker_bdd_id: node_id # BDD representing if 'absent' or 'nil' is part of the type }- Helper Functions: May include functions like
empty_descr(typing_ctx)which creates aDescrmap where all*_bdd_idfields point to thefalseBDD node ID, andany_descr(typing_ctx)where they point totrueBDDs (configured with appropriateops_modulefor each field).
-
Tilly.Type.Store(Type Descriptor Interning):- Type Store (
typing_ctx.type_store): A map withintyping_ctxfor interningDescrmaps.descrs_by_structure: Maps a canonicalDescrmap (where all BDD IDs are already canonical) to aunique_descr_id(integer).structures_by_id: Mapsunique_descr_idback to the canonicalDescrmap.next_descr_id: For generating new uniqueDescrIDs.
- Key Function:
get_or_intern_descr(typing_ctx, descr_map):- Ensures all
*_bdd_idvalues indescr_mapare canonical. - Looks up
descr_mapindescrs_by_structure. - If not found, creates a new ID, updates stores, and increments
next_descr_id. - Returns
{new_typing_ctx, descr_id}.
- Ensures all
- Type Store (
-
Tilly.Type.Ops(Operations on Type Descriptors):- Set Operations: Implements
union_types(typing_ctx, descr1_id, descr2_id),intersection_types/3,negation_type/2.- These operate on
DescrIDs. - Example
union_types/3:- Fetch
descr_map1anddescr_map2fromtyping_ctx.type_storeusing their IDs. - For each corresponding field (e.g.,
atoms_bdd_id):- Call
Tilly.BDD.Ops.union_bdds(descr_map1.atoms_bdd_id, descr_map2.atoms_bdd_id, typing_ctx). Thetyping_ctxis passed along; theops_modulefor these BDDs is retrieved from thebdd_storebyunion_bdds.
- Call
- Construct a
result_descr_mapusing the new BDD IDs for each field. - Intern this
result_descr_mapusingTilly.Type.Store.get_or_intern_descr(typing_ctx, result_descr_map)to get the finalresult_descr_id. - Return
{new_typing_ctx, result_descr_id}.
- Fetch
- These operate on
- Construction Helpers: Functions to create specific types, e.g.,
create_integer_type(typing_ctx, integer_value)orcreate_atom_type(typing_ctx, atom_value). These would construct the appropriateDescrmap (with a BDD representing only that literal) and intern it.
- Set Operations: Implements
II. Adapting Existing Typer Modules
This phase involves refactoring existing modules to use the new BDD-based type infrastructure. The typing_ctx (initialized with empty stores) will be threaded through all relevant type-checking functions.
-
Til.Typer.Types(Major Refactor):- Functions like
get_primitive_type(:integer)will now useTilly.Type.Opsor dedicated constructors (e.g.,Tilly.Type.integer_singleton(typing_ctx, value)orTilly.Type.integer_any(typing_ctx)) to get/create an internedDescrID.- For example,
get_primitive_type(:integer)would return theDescrID for the type "all integers". ThisDescrwould have itsintegers_bdd_idpointing to atrueBDD (interned with anops_modulelike:integer_bool_bdd_ops), and other fields likeatoms_bdd_idpointing tofalseBDDs.
- For example,
- The concept of predefined type maps (as Elixir maps with
:type_kind) is replaced by predefined canonicalDescrIDs, likely initialized once and stored or retrieved via helper functions.
- Functions like
-
Til.Typer.Interner(Major Refactor/Integration):- Its current role of interning Elixir map-based type representations is entirely replaced by
Tilly.BDD.get_or_intern_nodeandTilly.Type.Store.get_or_intern_descr. populate_known_typeswill be responsible for initializing thetyping_ctxby creating and interning theDescrIDs for fundamental base types (e.g.,any,nothing, the generalintegertype,atomtype). This involves creating the necessary BDDs (often justtrueorfalseBDDs for the respective fields in theDescr) with their appropriateops_modules.
- Its current role of interning Elixir map-based type representations is entirely replaced by
-
Til.TyperandTil.Typer.ExpressionTyper(Significant Refactor):typing_ctxThreading: All type inference and checking functions will now taketyping_ctxas an argument and return an updatedtyping_ctxalong with their results.infer_type_for_node_ast(or similar functions):- For literals (e.g.,
123,:foo): Will call functions likeTilly.Type.Ops.create_integer_literal_type(typing_ctx, 123)which constructs aDescrmap whereintegers_bdd_idpoints to a BDD representing only123(and other fields are empty BDDs), interns it, and returns the{new_ctx, descr_id}. - For operations that combine types (e.g., an
ifexpression's branches have typesdescr_id_Aanddescr_id_B): Will useTilly.Type.Ops.union_types(typing_ctx, descr_id_A, descr_id_B)to get the{new_ctx, result_descr_id}.
- For literals (e.g.,
resolve_type_specifier_node:- For basic type names (
Integer,Atom): Will fetch their canonicalDescrIDs (e.g., from helper functions inTil.Typer.Typesthat retrieve pre-interned IDs). - For type expressions like
(or TypeA TypeB): Will recursively call itself to getdescr_id_Aanddescr_id_B, then useTilly.Type.Ops.union_types(typing_ctx, descr_id_A, descr_id_B). Similarly for(and ...)and(not ...).
- For basic type names (
- The
type_idfield in AST nodes will now store theDescrID (an integer) obtained fromTilly.Type.Store.
-
Til.Typer.SubtypeChecker(Complete Rewrite):is_subtype?(typing_ctx, descr_a_id, descr_b_id):{typing_ctx, descr_not_b_id} = Tilly.Type.Ops.negation_type(typing_ctx, descr_b_id).{typing_ctx, intersection_id} = Tilly.Type.Ops.intersection_types(typing_ctx, descr_a_id, descr_not_b_id).Tilly.Type.Ops.is_empty_type?(typing_ctx, intersection_id).
Tilly.Type.Ops.is_empty_type?(typing_ctx, descr_id):- Fetches
descr_mapfromtyping_ctx.type_store. - Checks if all
*_bdd_idfields indescr_mappoint to the canonicalfalseBDD node ID (by callingTilly.BDD.is_false_node?/2for each).
- Fetches
III. Handling Specific Type Kinds from project.md
The existing type kinds in project.md will be mapped to specific configurations of Descr maps and their constituent BDDs.
- Primitive Types:
%{type_kind: :primitive, name: :integer}becomes a canonicalDescrID. For "all integers", itsintegers_bdd_idpoints to atrueBDD (interned withops_module: :integer_bool_bdd_ops), and other fields (likeatoms_bdd_id) point tofalseBDDs. - Literal Types:
%{type_kind: :literal, value: 42}becomes aDescrID. Itsintegers_bdd_idpoints to a BDD representing only the value42(e.g., a singleSplitnode leading to atrueleaf, or a more optimized representation if available), and other fields arefalseBDDs. - Union/Intersection/Negation Types: These are no longer explicit
type_kinds in theDescrmap itself. They are the result of applyingTilly.Type.Ops.union_types,intersection_types, ornegation_typeto otherDescrIDs. The resultingDescrID represents the union/intersection/negation. - Function, List, Tuple, Map Types:
- These will be represented by
Descrmaps where the corresponding BDD field is non-empty and structured appropriately. For example, a list of integers would have a non-emptypairs_bdd_id. - The BDDs for these types will be more complex:
- Lists/Tuples (
pairs_bdd_id): The BDD might split on the type of the first element, with leaves being BDD IDs representing the type of the rest of the list/tuple. Theops_modulefor this BDD would handle element types and BDD ID leaves (e.g.,:recursive_pair_ops). - Maps/Records (
records_bdd_id): This BDD would split on record labels or key types. Leaves could be BDD IDs for value types or more complex record structures. This is a significant area requiring careful design inspired by CDuce's record BDDs. Theproject.mddetails onknown_elementsandindex_signaturewill need to be encoded into the BDD structure and itsops_module. - Functions (
functions_bdd_id): BDDs might split on argument types or arity, with leaves representing return types (possibly as BDD IDs).
- Lists/Tuples (
- This is an advanced stage. Initially, constructing these might involve specific helper functions in
Tilly.Type.Ops(e.g.,create_list_type(typing_ctx, element_descr_id)).
- These will be represented by