document

2025-06-15 21:51:51 +02:00 · 2025-06-15 21:51:51 +02:00 · 385ec666aa
commit 385ec666aa
parent ef0ffd2f08
1 changed files with 215 additions and 5 deletions
--- a/test.exs
+++ b/test.exs
@ -1,10 +1,220 @@
 defmodule Tdd do
  # --- Existing code from your previous version ---
  # (init_tdd_system, get_state, update_state, make_node, get_node_details,
  #  variable definitions, basic type constructors, apply, sum, print_tdd)
  # Ensure your `apply/4` function is the corrected version from the MatchError fix.
  @moduledoc """
-  Ternary decision diagram for set-theoretic types.
+  # Ternary Decision Diagram (TDD) for Set-Theoretic Types in Elixir
  ## 1. Introduction
  This document outlines the design and implementation of a Ternary Decision Diagram (TDD) based system for representing and manipulating set-theoretic types, inspired by systems like CDuce. The goal is to create a robust way to perform type checking, type inference, and other type-level computations for a rich set of datatypes, similar to those found in Elixir.
  A TDD is a directed acyclic graph (DAG) used to represent a function `f(v1, v2, ..., vn) -> {true, false, dont_care}`. In our context, it represents a characteristic function for a type: given a value, the TDD determines if the value belongs to the type (`true`), does not belong (`false`), or if the specific predicates tested so far are insufficient or irrelevant for this particular type operation (`dont_care`).
  The TDDs are kept **ordered** and **reduced** to ensure a canonical representation for each type, making type equivalence checks (and other operations) efficient.
  -   **Ordered**: Variables (predicates) appear in the same fixed global order on all paths from the root to a terminal.
  -   **Reduced**: Isomorphic subgraphs are merged (shared), and nodes whose children would make the test redundant under certain TDD algebra rules are eliminated or simplified.
  ## 2. Core TDD Structure and Operations
  ### 2.1. Nodes
  There are two kinds of nodes:
  1.  **Terminal Nodes**:
    *   `TRUE_TERMINAL` (ID: `1`): Represents the universal set (type `any`). A path ending here means the value (or part of it) satisfies the type constraints along that path.
    *   `FALSE_TERMINAL` (ID: `0`): Represents the empty set (type `none`). A path ending here means the value fails the type constraints.
  2.  **Variable Nodes**:
    *   Represented as a tuple: `{variable_identifier, yes_child_id, no_child_id, dc_child_id}`.
    *   `variable_identifier`: A unique, globally ordered term identifying the predicate being tested at this node (e.g., "is the value an atom?", "is the integer value < 10?").
    *   `yes_child_id`: The ID of the next TDD node if the predicate is true.
    *   `no_child_id`: The ID of the next TDD node if the predicate is false.
    *   `dc_child_id` (Don't Care): The ID of the next TDD node if the predicate is irrelevant for the current type or operation. The semantic interpretation of `dc` is crucial and aligns with common TDD usage (e.g., for a union operation, `dc(A | B) = dc(A) | dc(B)`).
  ### 2.2. Node Management (`Tdd` module state)
  The `Tdd` module maintains global state (currently via `Process.put/get` for simplicity, ideally a `GenServer`):
  *   `@nodes`: A map from `node_tuple ({variable, yes_id, no_id, dc_id})` to `node_id`. This ensures that structurally identical nodes are shared (part of the "reduced" property).
  *   `@node_by_id`: A map from `node_id` to its `node_tuple` or a terminal symbol (`:true_terminal`, `:false_terminal`).
  *   `@next_id`: The next available integer ID for a new node.
  *   `@op_cache`: A map for memoizing results of operations like `apply` (binary ops), `negate`, and `simplify_with_constraints`. Keys are typically `{{op_name, id1, id2}, result_id}` or `{{op_name, id1}, result_id}`.
  ### 2.3. Variable Ordering
  A strict global total order of all possible `variable_identifier`s is essential. This is achieved by defining variable identifiers as Elixir tuples, which have a natural sort order.
  The proposed structure for variable identifiers is:
  `{category_integer, predicate_type_atom, specific_value_or_nested_id}`
  Example categories:
  *   `0`: Primary type discriminators (e.g., `is_atom`, `is_integer`, `is_list`).
  *   `1`: Atom-specific predicates (e.g., `value == :foo`).
  *   `2`: Integer-specific predicates (e.g., `value < 10`).
  *   `4`: Tuple-specific predicates (e.g., `size == 2`, `element 0 has_type X`).
  *   And so on for other types.
  ### 2.4. Core Operations
  1.  **`make_node_raw(variable, yes_id, no_id, dc_id)`**:
    *   The fundamental private function for creating or retrieving unique structural nodes.
    *   Implements structural sharing via the `@nodes` table.
    *   Implements a basic reduction rule: if `yes_id == no_id == dc_id`, the node is redundant, and that common child ID is returned.
  2.  **`check_assumptions_consistency(assumptions_map)`**:
    *   A private helper function crucial for semantic reduction.
    *   Takes a map `%{variable_id => value (true/false/:dc)}` representing current path assumptions.
    *   Returns `:consistent` or `:contradiction` based on predefined semantic rules of the type system (e.g., `is_atom=true` AND `is_tuple=true` is a contradiction).
    *   This function will be expanded as more types and predicates are added.
  3.  **`simplify_with_constraints(tdd_id, assumptions_map)`**:
    *   A private, memoized, recursive function that takes a `tdd_id` and an `assumptions_map`.
    *   It produces a new `tdd_id` that is semantically equivalent to the input `tdd_id` under the given assumptions, but potentially simpler.
    *   **Crucial Behavior**: If `check_assumptions_consistency(assumptions_map)` returns `:contradiction` at any point, `simplify_with_constraints` immediately returns `@false_node_id`.
    *   If the TDD's variable is already in `assumptions_map`, it follows the constrained path.
    *   Otherwise, it recursively simplifies children, adding the current node's variable assignment to the assumptions for those deeper calls, and rebuilds the node using `make_node_raw`.
  4.  **`apply_raw(op_name, op_lambda, u1_id, u2_id)`**:
    *   The private, memoized, recursive Shannon expansion algorithm for binary set operations (union, intersection).
    *   `op_lambda` defines the operation on terminal nodes.
    *   It selects the `top_var` based on the global variable order.
    *   Recursively calls `apply_raw` on the children.
    *   Uses `make_node_raw` to construct result nodes.
    *   This function computes the *structural* result of the operation.
  5.  **Public API Operations (`sum/2`, `intersect/2`, `negate/1`)**:
    *   These functions orchestrate the operation:
        1.  Call the respective `_raw` version (e.g., `apply_raw` for `sum`/`intersect`, `negate_raw` for `negate`).
        2.  Take the `raw_result_id` from step 1.
        3.  Return `simplify_with_constraints(raw_result_id, %{})`. This final step ensures that all TDDs exposed through the public API are not only structurally canonical (via `make_node_raw` and `apply_raw`) but also *semantically canonical* (i.e., known impossible paths or contradictions are resolved to `@false_node_id`).
  6.  **Type Constructors (e.g., `type_atom()`, `type_atom_literal(:foo)`)**:
    *   These public functions build the TDD for a specific type.
    *   They use `make_node_raw` to define the basic structure.
    *   They then return `simplify_with_constraints(raw_id, %{})` to ensure the constructed type is in its simplest semantic form.
  7.  **`is_subtype(sub_id, super_id)`**:
    *   Defined as `simplify_with_constraints(intersect(sub_id, negate(super_id)), %{}) == @false_node_id`.
    *   Since `intersect` and `negate` now return semantically simplified TDDs, if `A ∩ ¬B` represents an empty set, the result of the intersection will be `@false_node_id`.
  ## 3. Datatype Representation Details
  This section outlines how various Elixir-like datatypes are (or will be) represented using TDD variables and constructors. All constructors ensure the final TDD is passed through `simplify_with_constraints(raw_id, %{})`.
  ### 3.1. Atoms
  *   **Variables**:
    *   `@v_is_atom = {0, :is_atom}`: Primary type check.
    *   `v_atom_eq_A = {1, :value, A}`: Checks if the atom's value is `A`. Order by `A`.
  *   **Constructors**:
    *   `type_atom()`: Represents any atom. TDD: `make_node_raw(@v_is_atom, @true_node_id, @false_node_id, @false_node_id)`.
    *   `type_atom_literal(val)`: Represents a specific atom. TDD: `make_node_raw(@v_is_atom, node_for_val_eq, @false_node_id, @false_node_id)` where `node_for_val_eq = make_node_raw(v_atom_eq_A, @true_node_id, @false_node_id, @false_node_id)`.
  *   **Semantic Constraints for `check_assumptions_consistency`**:
    *   If `assumptions_map` contains `{{0, :is_atom}, true}` and `{{0, other_primary_type}, true}` -> contradiction.
    *   If `assumptions_map` contains `{{1, :value, A}, true}` and `{{1, :value, B}, true}` where `A != B` -> contradiction.
  ### 3.2. Tuples
  *   **Variables**:
    *   `@v_is_tuple = {0, :is_tuple}`: Primary type check.
    *   `v_tuple_size_eq_N = {4, :size, N}`: Checks if tuple size is `N`. Order by `N`.
    *   `v_tuple_elem_I_PRED = {4, :element, Index_I, NESTED_PREDICATE_ID}`: Predicate for element at `Index_I`. `NESTED_PREDICATE_ID` is a variable from the global order, applied to the element. (e.g., `{4, :element, 0, {0, :is_atom}}` checks if element 0 is an atom). Order by `Index_I`, then by `NESTED_PREDICATE_ID`.
  *   **Constructors**:
    *   `type_tuple()`: Any tuple.
    *   `type_empty_tuple()`: The tuple `{}`.
    *   `type_tuple_sized_any(size)`: Any tuple of a given size.
    *   `type_tuple_specific(element_type_ids_list)`: e.g., for `{atom(), integer()}`. This will involve creating nodes for size, then for each element, applying the TDD for that element's type.
  *   **Semantic Constraints**:
    *   `is_tuple=true` vs. other primary types.
    *   If `{{4, :size, N}, true}` and `{{4, :size, M}, true}` where `N != M` -> contradiction.
    *   If `{{4, :size, N}, true}` and a predicate `{{4, :element, I, _}, _}` exists where `I >= N` -> potential contradiction or path simplification (element doesn't exist).
  ### 3.3. Integers (Next to Implement)
  *   **Variables**:
    *   `@v_is_integer = {0, :is_integer}` (or a new category, e.g., `2` for integer properties).
    *   `v_int_eq_N = {INT_CAT, :eq, N}`.
    *   `v_int_lt_N = {INT_CAT, :lt, N}` (value < N).
    *   `v_int_gt_N = {INT_CAT, :gt, N}` (value > N).
    *   *(Consider also: `lte` (less than or equal), `gte` (greater than or equal) to simplify some range logic, or derive them).*
  *   **Constructors**:
    *   `type_integer()`: Any integer.
    *   `type_int_eq(n)`: A specific integer value.
    *   `type_int_lt(n)`, `type_int_gt(n)`.
    *   `type_int_range(min, max, min_inclusive, max_inclusive)`: Integers within a specific range.
  *   **Semantic Constraints**:
    *   `is_integer=true` vs. other primary types.
    *   `eq(N)` and `eq(M)` with `N != M` -> contradiction.
    *   `eq(N)` and `lt(M)` if `N >= M` -> contradiction.
    *   `eq(N)` and `gt(M)` if `N <= M` -> contradiction.
    *   `lt(N)` and `gt(M)` if `N <= M+1` (or `N <= M` if `gt` means `>=`) -> contradiction. (e.g., `x < 5` and `x > 4` has no integer solution).
    *   *Strategy for complex integer constraints*: Maintain a "current allowed interval" `[min_assumed, max_assumed]` based on `assumptions_map`. If this interval becomes empty or invalid, it's a contradiction. Each new integer assumption (`lt, gt, eq`) refines this interval.
  ### 3.4. Lists (Planned)
  *   **Variables**:
    *   `@v_is_list = {0, :is_list}`.
    *   `v_list_is_empty = {LIST_CAT, :is_empty}`.
    *   *If not empty*:
        *   `v_list_head_VAR = {LIST_CAT, :head, NESTED_PREDICATE_ID}`: Applies a global predicate to the head.
        *   `v_list_tail_VAR = {LIST_CAT, :tail, NESTED_PREDICATE_ID_FOR_TAIL_LIST}`: Applies a global predicate (usually list predicates) to the tail.
    *   *(Alternative for fixed-length lists/known structure: `{LIST_CAT, :elem, Index_I, NESTED_PREDICATE_ID}` similar to tuples).*
  *   **Constructors**:
    *   `type_list_any()`.
    *   `type_empty_list()`.
    *   `type_cons(head_type_id, tail_type_id)`.
    *   `type_list_of(element_type_id)`: e.g., `list(integer())`.
  *   **Semantic Constraints**:
    *   `is_list=true` vs. other primary types.
    *   If `is_empty=true`, then any predicate about `head` or non-empty `tail` structure is contradictory if it implies existence.
  ### 3.5. Strings & Binaries (Planned)
  *   **Variables**:
    *   `@v_is_binary = {0, :is_binary}`.
    *   `@v_is_string = {0, :is_string}` (can be a check after `is_binary` or a distinct primary type if model demands).
    *   Size/length predicates: `v_binary_size_eq_N`, `v_string_length_eq_N`.
    *   Content predicates: `v_string_eq_S`, `v_string_prefix_P`, `v_string_suffix_S`, `v_string_matches_regex_R`.
  *   **Semantic Constraints**: Size vs content (e.g., `size=1` and `prefix="foo"` is a contradiction). `eq(S1)` and `eq(S2)` if `S1 != S2`.
  ### 3.6. Maps (Planned - Complex)
  *   **Variables**:
    *   `@v_is_map = {0, :is_map}`.
    *   `v_map_size_eq_N`.
    *   `v_map_has_key_K`: (K is a canonical representation of an Elixir term).
    *   *If `has_key_K` is true*:
        *   `v_map_key_K_value_VAR = {MAP_CAT, :key_value, K, NESTED_PREDICATE_ID}`: Applies a global predicate to the value associated with key K.
    *   For `%{pattern_key => pattern_value}` types:
        *   This requires careful thought. Might involve predicates like `v_map_all_keys_matching_TYPE_X_have_values_matching_TYPE_Y`.
  *   **Semantic Constraints**: `is_map` vs. others. Size vs. `has_key` interactions. Contradictory type requirements for the same key's value.
  ### 3.7. Functions (Planned - Very Complex)
  *   Representation of function types (`fun((Arg1Type, Arg2Type, ...) -> ReturnType)`) is a significant challenge for TDDs.
  *   **Variables (Tentative)**:
    *   `@v_is_function = {0, :is_function}`.
    *   `v_fun_arity_eq_A`.
    *   Predicates for argument types at specific positions (e.g., `v_fun_arg_I_type_VAR`).
    *   Predicates for return type (e.g., `v_fun_return_type_VAR`).
  *   Intersection and union of function types involve concepts like contravariance of arguments and covariance of return types. This may require specialized logic beyond simple TDD operations or a very elaborate variable scheme. Often, function types are handled with auxiliary structures in type systems.
  ## 4. Current Status & Next Steps
  *   **Implemented**: Atoms, basic Tuples (any, empty, sized_any). Core TDD operations (`sum`, `intersect`, `negate`, `is_subtype`) with semantic simplification framework (`simplify_with_constraints` and `check_assumptions_consistency`).
  *   **Passing Tests**: A suite of tests for atom/tuple interactions, unions, intersections, negations, and subtyping, including resolution of contradictions like `atom & tuple == none`.
  *   **Next Immediate Step**: Implement **Integer types** as outlined in section 3.3. This will involve:
    1.  Defining integer-specific predicates and their global order.
    2.  Creating integer type constructors.
    3.  Significantly expanding `check_assumptions_consistency` to handle integer comparisons (`eq`, `lt`, `gt`) and their interactions.
    4.  Adding comprehensive tests for integers.
  ## 5. Future Considerations
  *   **Performance**: For very large TDDs or complex types, the number of nodes and cache sizes can grow. Investigate optimizations if needed.
  *   **Generality of `check_assumptions_consistency`**: Designing this to be easily extensible and correct for many interacting predicates is challenging. A rule-based system or a more abstract way to define predicate interactions might be beneficial.
  *   **"Don't Care" (`dc`) branch semantics**: Ensure the `dc` branch is consistently and correctly handled in all operations, especially `simplify_with_constraints` if assumptions can make a variable "don't care". Currently, `simplify_with_constraints` assumes `true/false/:dc` values in the `assumptions_map` if a variable is already constrained.
  *   **Type Inference**: Using the TDD operations to infer types or solve type constraints.
  *   **Polymorphism**: Representing and operating on types with free type variables. Typically, free variables are treated as `any` or involve substitution before TDD construction.
  This document provides a snapshot of the current TDD system and a roadmap for its extension. The core principle is the combination of structurally canonical ROBDDs (via `make_node_raw` and `apply_raw`) with a semantic simplification layer (`simplify_with_constraints`) that embeds knowledge of the type system's rules.
  """
  # --- Terminal Node IDs ---