diff --git a/test.exs b/test.exs index 2296cdf..a0a0d9c 100644 --- a/test.exs +++ b/test.exs @@ -1,10 +1,220 @@ defmodule Tdd do - # --- Existing code from your previous version --- - # (init_tdd_system, get_state, update_state, make_node, get_node_details, - # variable definitions, basic type constructors, apply, sum, print_tdd) - # Ensure your `apply/4` function is the corrected version from the MatchError fix. @moduledoc """ - Ternary decision diagram for set-theoretic types. + # Ternary Decision Diagram (TDD) for Set-Theoretic Types in Elixir + + ## 1. Introduction + + This document outlines the design and implementation of a Ternary Decision Diagram (TDD) based system for representing and manipulating set-theoretic types, inspired by systems like CDuce. The goal is to create a robust way to perform type checking, type inference, and other type-level computations for a rich set of datatypes, similar to those found in Elixir. + + A TDD is a directed acyclic graph (DAG) used to represent a function `f(v1, v2, ..., vn) -> {true, false, dont_care}`. In our context, it represents a characteristic function for a type: given a value, the TDD determines if the value belongs to the type (`true`), does not belong (`false`), or if the specific predicates tested so far are insufficient or irrelevant for this particular type operation (`dont_care`). + + The TDDs are kept **ordered** and **reduced** to ensure a canonical representation for each type, making type equivalence checks (and other operations) efficient. + - **Ordered**: Variables (predicates) appear in the same fixed global order on all paths from the root to a terminal. + - **Reduced**: Isomorphic subgraphs are merged (shared), and nodes whose children would make the test redundant under certain TDD algebra rules are eliminated or simplified. + + ## 2. Core TDD Structure and Operations + + ### 2.1. Nodes + + There are two kinds of nodes: + + 1. **Terminal Nodes**: + * `TRUE_TERMINAL` (ID: `1`): Represents the universal set (type `any`). A path ending here means the value (or part of it) satisfies the type constraints along that path. + * `FALSE_TERMINAL` (ID: `0`): Represents the empty set (type `none`). A path ending here means the value fails the type constraints. + + 2. **Variable Nodes**: + * Represented as a tuple: `{variable_identifier, yes_child_id, no_child_id, dc_child_id}`. + * `variable_identifier`: A unique, globally ordered term identifying the predicate being tested at this node (e.g., "is the value an atom?", "is the integer value < 10?"). + * `yes_child_id`: The ID of the next TDD node if the predicate is true. + * `no_child_id`: The ID of the next TDD node if the predicate is false. + * `dc_child_id` (Don't Care): The ID of the next TDD node if the predicate is irrelevant for the current type or operation. The semantic interpretation of `dc` is crucial and aligns with common TDD usage (e.g., for a union operation, `dc(A | B) = dc(A) | dc(B)`). + + ### 2.2. Node Management (`Tdd` module state) + + The `Tdd` module maintains global state (currently via `Process.put/get` for simplicity, ideally a `GenServer`): + + * `@nodes`: A map from `node_tuple ({variable, yes_id, no_id, dc_id})` to `node_id`. This ensures that structurally identical nodes are shared (part of the "reduced" property). + * `@node_by_id`: A map from `node_id` to its `node_tuple` or a terminal symbol (`:true_terminal`, `:false_terminal`). + * `@next_id`: The next available integer ID for a new node. + * `@op_cache`: A map for memoizing results of operations like `apply` (binary ops), `negate`, and `simplify_with_constraints`. Keys are typically `{{op_name, id1, id2}, result_id}` or `{{op_name, id1}, result_id}`. + + ### 2.3. Variable Ordering + + A strict global total order of all possible `variable_identifier`s is essential. This is achieved by defining variable identifiers as Elixir tuples, which have a natural sort order. + The proposed structure for variable identifiers is: + `{category_integer, predicate_type_atom, specific_value_or_nested_id}` + + Example categories: + * `0`: Primary type discriminators (e.g., `is_atom`, `is_integer`, `is_list`). + * `1`: Atom-specific predicates (e.g., `value == :foo`). + * `2`: Integer-specific predicates (e.g., `value < 10`). + * `4`: Tuple-specific predicates (e.g., `size == 2`, `element 0 has_type X`). + * And so on for other types. + + ### 2.4. Core Operations + + 1. **`make_node_raw(variable, yes_id, no_id, dc_id)`**: + * The fundamental private function for creating or retrieving unique structural nodes. + * Implements structural sharing via the `@nodes` table. + * Implements a basic reduction rule: if `yes_id == no_id == dc_id`, the node is redundant, and that common child ID is returned. + + 2. **`check_assumptions_consistency(assumptions_map)`**: + * A private helper function crucial for semantic reduction. + * Takes a map `%{variable_id => value (true/false/:dc)}` representing current path assumptions. + * Returns `:consistent` or `:contradiction` based on predefined semantic rules of the type system (e.g., `is_atom=true` AND `is_tuple=true` is a contradiction). + * This function will be expanded as more types and predicates are added. + + 3. **`simplify_with_constraints(tdd_id, assumptions_map)`**: + * A private, memoized, recursive function that takes a `tdd_id` and an `assumptions_map`. + * It produces a new `tdd_id` that is semantically equivalent to the input `tdd_id` under the given assumptions, but potentially simpler. + * **Crucial Behavior**: If `check_assumptions_consistency(assumptions_map)` returns `:contradiction` at any point, `simplify_with_constraints` immediately returns `@false_node_id`. + * If the TDD's variable is already in `assumptions_map`, it follows the constrained path. + * Otherwise, it recursively simplifies children, adding the current node's variable assignment to the assumptions for those deeper calls, and rebuilds the node using `make_node_raw`. + + 4. **`apply_raw(op_name, op_lambda, u1_id, u2_id)`**: + * The private, memoized, recursive Shannon expansion algorithm for binary set operations (union, intersection). + * `op_lambda` defines the operation on terminal nodes. + * It selects the `top_var` based on the global variable order. + * Recursively calls `apply_raw` on the children. + * Uses `make_node_raw` to construct result nodes. + * This function computes the *structural* result of the operation. + + 5. **Public API Operations (`sum/2`, `intersect/2`, `negate/1`)**: + * These functions orchestrate the operation: + 1. Call the respective `_raw` version (e.g., `apply_raw` for `sum`/`intersect`, `negate_raw` for `negate`). + 2. Take the `raw_result_id` from step 1. + 3. Return `simplify_with_constraints(raw_result_id, %{})`. This final step ensures that all TDDs exposed through the public API are not only structurally canonical (via `make_node_raw` and `apply_raw`) but also *semantically canonical* (i.e., known impossible paths or contradictions are resolved to `@false_node_id`). + + 6. **Type Constructors (e.g., `type_atom()`, `type_atom_literal(:foo)`)**: + * These public functions build the TDD for a specific type. + * They use `make_node_raw` to define the basic structure. + * They then return `simplify_with_constraints(raw_id, %{})` to ensure the constructed type is in its simplest semantic form. + + 7. **`is_subtype(sub_id, super_id)`**: + * Defined as `simplify_with_constraints(intersect(sub_id, negate(super_id)), %{}) == @false_node_id`. + * Since `intersect` and `negate` now return semantically simplified TDDs, if `A ∩ ¬B` represents an empty set, the result of the intersection will be `@false_node_id`. + + ## 3. Datatype Representation Details + + This section outlines how various Elixir-like datatypes are (or will be) represented using TDD variables and constructors. All constructors ensure the final TDD is passed through `simplify_with_constraints(raw_id, %{})`. + + ### 3.1. Atoms + + * **Variables**: + * `@v_is_atom = {0, :is_atom}`: Primary type check. + * `v_atom_eq_A = {1, :value, A}`: Checks if the atom's value is `A`. Order by `A`. + * **Constructors**: + * `type_atom()`: Represents any atom. TDD: `make_node_raw(@v_is_atom, @true_node_id, @false_node_id, @false_node_id)`. + * `type_atom_literal(val)`: Represents a specific atom. TDD: `make_node_raw(@v_is_atom, node_for_val_eq, @false_node_id, @false_node_id)` where `node_for_val_eq = make_node_raw(v_atom_eq_A, @true_node_id, @false_node_id, @false_node_id)`. + * **Semantic Constraints for `check_assumptions_consistency`**: + * If `assumptions_map` contains `{{0, :is_atom}, true}` and `{{0, other_primary_type}, true}` -> contradiction. + * If `assumptions_map` contains `{{1, :value, A}, true}` and `{{1, :value, B}, true}` where `A != B` -> contradiction. + + ### 3.2. Tuples + + * **Variables**: + * `@v_is_tuple = {0, :is_tuple}`: Primary type check. + * `v_tuple_size_eq_N = {4, :size, N}`: Checks if tuple size is `N`. Order by `N`. + * `v_tuple_elem_I_PRED = {4, :element, Index_I, NESTED_PREDICATE_ID}`: Predicate for element at `Index_I`. `NESTED_PREDICATE_ID` is a variable from the global order, applied to the element. (e.g., `{4, :element, 0, {0, :is_atom}}` checks if element 0 is an atom). Order by `Index_I`, then by `NESTED_PREDICATE_ID`. + * **Constructors**: + * `type_tuple()`: Any tuple. + * `type_empty_tuple()`: The tuple `{}`. + * `type_tuple_sized_any(size)`: Any tuple of a given size. + * `type_tuple_specific(element_type_ids_list)`: e.g., for `{atom(), integer()}`. This will involve creating nodes for size, then for each element, applying the TDD for that element's type. + * **Semantic Constraints**: + * `is_tuple=true` vs. other primary types. + * If `{{4, :size, N}, true}` and `{{4, :size, M}, true}` where `N != M` -> contradiction. + * If `{{4, :size, N}, true}` and a predicate `{{4, :element, I, _}, _}` exists where `I >= N` -> potential contradiction or path simplification (element doesn't exist). + + ### 3.3. Integers (Next to Implement) + + * **Variables**: + * `@v_is_integer = {0, :is_integer}` (or a new category, e.g., `2` for integer properties). + * `v_int_eq_N = {INT_CAT, :eq, N}`. + * `v_int_lt_N = {INT_CAT, :lt, N}` (value < N). + * `v_int_gt_N = {INT_CAT, :gt, N}` (value > N). + * *(Consider also: `lte` (less than or equal), `gte` (greater than or equal) to simplify some range logic, or derive them).* + * **Constructors**: + * `type_integer()`: Any integer. + * `type_int_eq(n)`: A specific integer value. + * `type_int_lt(n)`, `type_int_gt(n)`. + * `type_int_range(min, max, min_inclusive, max_inclusive)`: Integers within a specific range. + * **Semantic Constraints**: + * `is_integer=true` vs. other primary types. + * `eq(N)` and `eq(M)` with `N != M` -> contradiction. + * `eq(N)` and `lt(M)` if `N >= M` -> contradiction. + * `eq(N)` and `gt(M)` if `N <= M` -> contradiction. + * `lt(N)` and `gt(M)` if `N <= M+1` (or `N <= M` if `gt` means `>=`) -> contradiction. (e.g., `x < 5` and `x > 4` has no integer solution). + * *Strategy for complex integer constraints*: Maintain a "current allowed interval" `[min_assumed, max_assumed]` based on `assumptions_map`. If this interval becomes empty or invalid, it's a contradiction. Each new integer assumption (`lt, gt, eq`) refines this interval. + + ### 3.4. Lists (Planned) + + * **Variables**: + * `@v_is_list = {0, :is_list}`. + * `v_list_is_empty = {LIST_CAT, :is_empty}`. + * *If not empty*: + * `v_list_head_VAR = {LIST_CAT, :head, NESTED_PREDICATE_ID}`: Applies a global predicate to the head. + * `v_list_tail_VAR = {LIST_CAT, :tail, NESTED_PREDICATE_ID_FOR_TAIL_LIST}`: Applies a global predicate (usually list predicates) to the tail. + * *(Alternative for fixed-length lists/known structure: `{LIST_CAT, :elem, Index_I, NESTED_PREDICATE_ID}` similar to tuples).* + * **Constructors**: + * `type_list_any()`. + * `type_empty_list()`. + * `type_cons(head_type_id, tail_type_id)`. + * `type_list_of(element_type_id)`: e.g., `list(integer())`. + * **Semantic Constraints**: + * `is_list=true` vs. other primary types. + * If `is_empty=true`, then any predicate about `head` or non-empty `tail` structure is contradictory if it implies existence. + + ### 3.5. Strings & Binaries (Planned) + + * **Variables**: + * `@v_is_binary = {0, :is_binary}`. + * `@v_is_string = {0, :is_string}` (can be a check after `is_binary` or a distinct primary type if model demands). + * Size/length predicates: `v_binary_size_eq_N`, `v_string_length_eq_N`. + * Content predicates: `v_string_eq_S`, `v_string_prefix_P`, `v_string_suffix_S`, `v_string_matches_regex_R`. + * **Semantic Constraints**: Size vs content (e.g., `size=1` and `prefix="foo"` is a contradiction). `eq(S1)` and `eq(S2)` if `S1 != S2`. + + ### 3.6. Maps (Planned - Complex) + + * **Variables**: + * `@v_is_map = {0, :is_map}`. + * `v_map_size_eq_N`. + * `v_map_has_key_K`: (K is a canonical representation of an Elixir term). + * *If `has_key_K` is true*: + * `v_map_key_K_value_VAR = {MAP_CAT, :key_value, K, NESTED_PREDICATE_ID}`: Applies a global predicate to the value associated with key K. + * For `%{pattern_key => pattern_value}` types: + * This requires careful thought. Might involve predicates like `v_map_all_keys_matching_TYPE_X_have_values_matching_TYPE_Y`. + * **Semantic Constraints**: `is_map` vs. others. Size vs. `has_key` interactions. Contradictory type requirements for the same key's value. + + ### 3.7. Functions (Planned - Very Complex) + + * Representation of function types (`fun((Arg1Type, Arg2Type, ...) -> ReturnType)`) is a significant challenge for TDDs. + * **Variables (Tentative)**: + * `@v_is_function = {0, :is_function}`. + * `v_fun_arity_eq_A`. + * Predicates for argument types at specific positions (e.g., `v_fun_arg_I_type_VAR`). + * Predicates for return type (e.g., `v_fun_return_type_VAR`). + * Intersection and union of function types involve concepts like contravariance of arguments and covariance of return types. This may require specialized logic beyond simple TDD operations or a very elaborate variable scheme. Often, function types are handled with auxiliary structures in type systems. + + ## 4. Current Status & Next Steps + + * **Implemented**: Atoms, basic Tuples (any, empty, sized_any). Core TDD operations (`sum`, `intersect`, `negate`, `is_subtype`) with semantic simplification framework (`simplify_with_constraints` and `check_assumptions_consistency`). + * **Passing Tests**: A suite of tests for atom/tuple interactions, unions, intersections, negations, and subtyping, including resolution of contradictions like `atom & tuple == none`. + * **Next Immediate Step**: Implement **Integer types** as outlined in section 3.3. This will involve: + 1. Defining integer-specific predicates and their global order. + 2. Creating integer type constructors. + 3. Significantly expanding `check_assumptions_consistency` to handle integer comparisons (`eq`, `lt`, `gt`) and their interactions. + 4. Adding comprehensive tests for integers. + + ## 5. Future Considerations + + * **Performance**: For very large TDDs or complex types, the number of nodes and cache sizes can grow. Investigate optimizations if needed. + * **Generality of `check_assumptions_consistency`**: Designing this to be easily extensible and correct for many interacting predicates is challenging. A rule-based system or a more abstract way to define predicate interactions might be beneficial. + * **"Don't Care" (`dc`) branch semantics**: Ensure the `dc` branch is consistently and correctly handled in all operations, especially `simplify_with_constraints` if assumptions can make a variable "don't care". Currently, `simplify_with_constraints` assumes `true/false/:dc` values in the `assumptions_map` if a variable is already constrained. + * **Type Inference**: Using the TDD operations to infer types or solve type constraints. + * **Polymorphism**: Representing and operating on types with free type variables. Typically, free variables are treated as `any` or involve substitution before TDD construction. + + This document provides a snapshot of the current TDD system and a roadmap for its extension. The core principle is the combination of structurally canonical ROBDDs (via `make_node_raw` and `apply_raw`) with a semantic simplification layer (`simplify_with_constraints`) that embeds knowledge of the type system's rules. """ # --- Terminal Node IDs ---