``` defmodule Tdd do @moduledoc """ Ternary decision diagram, used for representing set-theoritic types, akin to cduce. There are 2 types of nodes: - terminal nodes (true, false) - variable nodes variable nodes consist of: - the variable being tested - yes: id of the node if the result of the test is true - no: id of the node if the result of the test is false - dc: id of the node if the result of the test is irrelevant for the current operation the TDD needs to be ordered and reduced (ROBDD) - 'ordered' if different variables appear in the same order on all paths from the root. - 'reduced' if the following two rules have been applied to its graph: - Merge any isomorphic subgraphs. - Eliminate any node whose two children are isomorphic. Working notes: - structure of the ordered variables: Im thinking of structuring all possible types inside 1 TDD, in contrast to cduce, which uses a `desrc` structure that contains several TDDs (one for each domain, like ints, atoms, functions, etc.), and descr is a union between them. For this, I need to come up with a variable structure that'll be ordered. My set types will need to represent types like: atoms, strings, ints, maps, tuples, functions, kinds? Moreso, those types themselves consist of smaller subsets of types like: - int < 10 - int in [1, 2, 3] - string > "prefix_" - atom == false - atom == false or atom == true or atom == nil - map == %{"id" => string} and %{string => any | nil} - etc. Dont know how to represent them and make them ordered. - node cache: I don't yet know what it should contain, I suspect ids of nodes (TDDs) after reduction. This way a comparison between 2 types is just a pointer (id) check in the node cache. But not yet sure. - reduction rules: not sure how to approach them """ def node(elem, yes, no, dc = _dont_care) do end def sum(one, two) do end def intersect(one, two) do end def negate(one, two) do end end ``` Please help me refine the theory for impementing set-theoritic types with this ======================================================================================================================== Okay, this is a fascinating and ambitious project! Building a TDD-based type system like CDuce is a significant undertaking. Let's refine the theory. You've got the core ideas right: terminals, variable nodes, and the need for ordering and reduction. The `dc` (don't care) child is the main differentiator from BDDs and key to TDDs' power in certain applications. Let's break down your working notes and expand on them. **1. Structure of Ordered Variables (The Core Challenge)** Your intuition to have a single TDD for all types is interesting. CDuce uses one TDD *per domain* (atoms, integers, etc.) and then a `descr` (descriptor) which is a union of these. A single TDD means your variable ordering must be global and able to discriminate across all these domains first, then within them. **Proposed Hierarchical Variable Ordering:** The variables in your TDD will represent *predicates* about a value. The order is crucial. * **Level 0: Base Type Discriminators:** These variables test the fundamental Erlang/Elixir type of a value. * `is_atom?` * `is_integer?` * `is_float?` * `is_string?` (Elixir strings are UTF-8 binaries) * `is_binary?` (General binaries) * `is_list?` * `is_tuple?` * `is_map?` * `is_function?` * `is_pid?` * `is_reference?` * *Order*: Define a canonical order for these (e.g., alphabetical, or by commonality). Let's say `is_atom` < `is_binary` < `is_float` < ... * **Level 1+: Type-Specific Predicates:** Once a base type is determined (e.g., the `yes` branch of `is_integer?` is taken), subsequent variables test properties specific to that type. * **For Atoms:** * `value_is(specific_atom)`: e.g., `value_is(true)`, `value_is(false)`, `value_is(nil)`, `value_is(:foo)`. * *Order*: Canonical order of the atoms themselves (e.g., `false` < `nil` < `true` < other atoms alphabetically). * This implies a potentially large number of variables if you want to distinguish many specific atoms. Often, you care about a few (booleans, nil) and "any other atom". * **For Integers:** * `value < N`, `value == N` (less common as a direct variable, usually derived), `value_in_range(A,B)`. * CDuce uses a finite partitioning of the integer line. For example, variables could be `value < 0`, `value < 10`, `value < MAX_INT_SMALL_ENOUGH_FOR_OPTIMIZATION`. * *Order*: `value < N1` < `value < N2` if `N1 < N2`. * **For Strings/Binaries:** * `length_is(L)` or `length_in_range(L1,L2)` * `prefix_is(prefix_string)` * `matches_regex(regex_pattern)` (Regexes can be complex to integrate directly as simple TDD variables due to their expressive power. CDuce has specific handling for regular expression types on sequences). * *Order*: By length, then by prefix lexicographically, etc. * **For Tuples:** * `arity_is(N)` * `element_at(Index)_is_type(TypeRef)`: This is recursive. `TypeRef` would be the ID of another TDD representing the type of the element. * *Order*: `arity_is(0)` < `arity_is(1)` < ... then `element_at(0)_is_type(T)` < `element_at(1)_is_type(T')`. The `TypeRef` itself would need to be part of the variable ordering if you allow different type tests for the same element index. * **For Maps:** This is the most complex. * `has_key(KeyAtomOrString)` * `value_for_key(Key)_is_type(TypeRef)` * `all_keys_are_type(KeyTypeRef)` * `all_values_are_type(ValueTypeRef)` * CDuce has specialized structures for map patterns. You might need variables like `matches_map_pattern(PatternID)`. * *Order*: By key name, then by tests on those keys. * **For Functions:** * `arity_is(N)` * `accepts_args_types(ArgTypesTupleRef)` * `returns_type(ReturnTypeRef)` * *Order*: By arity, then by argument types, then return type. **The "Variable" Itself:** A "variable" in your TDD node won't be just a name; it'll be an identifier that maps to a specific predicate function and its parameters. E.g., `var_id = 101` might mean `fn val -> is_integer(val) and val < 10 end`. You'll need a global, ordered registry of these predicate variables. **2. Node Cache (`unique_table`)** Yes, this is crucial for the "reduced" property. * It's a hash table mapping `(variable_id, yes_node_id, no_node_id, dc_node_id)` to a unique `node_id`. * **Terminals:** * `true_node_id` (e.g., 1): Represents the "any" type or universal set. * `false_node_id` (e.g., 0): Represents the "empty" type or bottom type. These are the leaves of your TDD. * When you create a node, you first check this table. If an identical node (same variable and children) exists, you reuse its ID. Otherwise, you create a new one, assign it a new ID, and store it. * This ensures that any isomorphic subgraphs are merged (Reduction Rule 1). **3. Reduction Rules** You've listed the ROBDD rules. Let's adapt them for TDDs: * **Merge any isomorphic subgraphs:** Handled by the `unique_table` as described above. Two subgraphs are isomorphic if they are structurally identical (same variables in same places leading to same terminals or further isomorphic subgraphs). * **Eliminate any node whose three children are isomorphic:** If `yes_id == no_id == dc_id` for a variable node `v`, then the test `v` is irrelevant for distinguishing the type further down this path. The node `(v, child_id, child_id, child_id)` should be replaced by `child_id`. This rule is applied *before* looking up/inserting into the `unique_table`. **The Role of `dc` (Don't Care):** The `dc` child is what makes it a TDD. It typically means the variable being tested is irrelevant or its value doesn't affect membership for a particular path or operation. * **In Type Representation:** * If you're representing the type "integer", and your current variable is `is_atom?`: * `yes` child would be `false_node_id` (an integer is not an atom). * `no` child would lead to further tests for integers (or `true_node_id` if "integer" is all you're specifying). * `dc` child: This is tricky for pure representation. If the variable `is_atom?` *must* be evaluated, then `dc` might also be `false_node_id`. * However, `dc` becomes powerful when combining types or dealing with partially specified types. If a type expression doesn't constrain variable `v`, its `dc` branch relative to `v` might be taken. * Consider the type `any`. For any variable `v`, the node would be `(v, true_node_id, true_node_id, true_node_id)`. By reduction rule 2, this simplifies to just `true_node_id`. * Consider the type `none` (empty set). For any variable `v`, node is `(v, false_node_id, false_node_id, false_node_id)`, reduces to `false_node_id`. * **In Operations (`sum`, `intersect`, `negate`):** The `dc` child is heavily used in the `apply` algorithm. When `apply(op, tdd1, tdd2)`: * If `var(tdd1) == var(tdd2)`: Recurse on `(op, tdd1.yes, tdd2.yes)`, `(op, tdd1.no, tdd2.no)`, `(op, tdd1.dc, tdd2.dc)`. * If `var(tdd1) < var(tdd2)` (in global order): `tdd2` doesn't explicitly test `var(tdd1)`. So, `var(tdd1)` is effectively "don't care" for `tdd2`. The operation proceeds by testing `var(tdd1)`: * `yes_child = apply(op, tdd1.yes, tdd2)` * `no_child = apply(op, tdd1.no, tdd2)` * `dc_child = apply(op, tdd1.dc, tdd2)` (This is where TDD differs significantly from BDD's apply). The new node is `mknode(var(tdd1), yes_child, no_child, dc_child)`. * Symmetric case for `var(tdd2) < var(tdd1)`. **4. `mknode(variable_id, yes_id, no_id, dc_id)` function:** This is your core constructor. ```elixir defmodule Tdd do # Store nodes in an array/map, IDs are indices/keys # @nodes %{0 => :false_terminal, 1 => :true_terminal, ...} # @unique_table %{{var_id, yes_id, no_id, dc_id} => node_id} # @variables %{var_id => {predicate_fun, predicate_params, global_order_index}} @true_id 1 @false_id 0 # Internal state for unique table, nodes, variable definitions # This would typically be in a GenServer or an ETS table in a real system # For simplicity, let's imagine it's passed around or module attributes are used carefully. # Example variable IDs (you'd have a system for this) # @var_is_atom 10 # @var_is_integer 11 # @var_int_lt_10 20 def get_true(), do: @true_id def get_false(), do: @false_id # Pseudo-code for mknode def node(variable_id, yes_id, no_id, dc_id) do # Reduction Rule 2: Eliminate node whose children are isomorphic if yes_id == no_id and no_id == dc_id do # If all children are the same, this variable provides no discrimination # for this path. The node is redundant. return yes_id end # Check unique_table (conceptually) # entry = {variable_id, yes_id, no_id, dc_id} # if Map.has_key?(@unique_table, entry) do # return @unique_table[entry] # else # new_node_id = generate_new_id() # @nodes = Map.put(@nodes, new_node_id, {variable_id, yes_id, no_id, dc_id}) # @unique_table = Map.put(@unique_table, entry, new_node_id) # return new_node_id # end # Actual implementation would use a GenServer or ETS for state :not_implemented_yet # Placeholder end end ``` **5. Operations (`sum`, `intersect`, `negate`)** These are built using a generic `apply` function, often memoized (using a `computed_table`). `apply(op_lambda, tdd1_id, tdd2_id)`: 1. **Handle terminal cases:** * `op_lambda.(@false_id, @false_id)`, `op_lambda.(@false_id, @true_id)`, etc. * Some ops have identities/annihilators: * Union: `T | true = true`, `T | false = T`. * Intersection: `T & false = false`, `T & true = T`. 2. **Check `computed_table`:** `cache_key = {op_lambda_id, tdd1_id, tdd2_id}`. If result exists, return it. 3. **Recursive step:** * Fetch node data for `tdd1_id` and `tdd2_id`. Let them be `N1`, `N2`. * Let `v1 = N1.variable_id`, `v2 = N2.variable_id`. * Compare `v1` and `v2` based on global variable order. * **If `v1 == v2`**: `res_yes = apply(op_lambda, N1.yes, N2.yes)` `res_no = apply(op_lambda, N1.no, N2.no)` `res_dc = apply(op_lambda, N1.dc, N2.dc)` `result = mknode(v1, res_yes, res_no, res_dc)` * **If `v1 < v2` (in order)**: `res_yes = apply(op_lambda, N1.yes, tdd2_id)` `res_no = apply(op_lambda, N1.no, tdd2_id)` `res_dc = apply(op_lambda, N1.dc, tdd2_id)` `result = mknode(v1, res_yes, res_no, res_dc)` * **If `v1 > v2` (in order)**: (symmetric to above) `res_yes = apply(op_lambda, tdd1_id, N2.yes)` `res_no = apply(op_lambda, tdd1_id, N2.no)` `res_dc = apply(op_lambda, tdd1_id, N2.dc)` `result = mknode(v2, res_yes, res_no, res_dc)` 4. Store `result` in `computed_table` for `cache_key`. 5. Return `result`. **`negate(tdd_id)`:** This is simpler, an `apply_unary` operation. 1. If `tdd_id == @true_id`, return `@false_id`. 2. If `tdd_id == @false_id`, return `@true_id`. 3. Check `computed_table_for_negation`. 4. Fetch node `N`. `res_yes = negate(N.yes)` `res_no = negate(N.no)` `res_dc = negate(N.dc)` `result = mknode(N.variable_id, res_yes, res_no, res_dc)` 5. Store and return `result`. **Concrete Example: Type `atom(:foo) | atom(:bar)`** Variables (simplified order): `is_atom`, `atom_is_foo`, `atom_is_bar`. 1. TDD for `atom(:foo)`: * `v_is_atom = mknode(:is_atom, v_atom_is_foo, @false_id, @false_id)` * `v_atom_is_foo = mknode(:atom_is_foo, @true_id, @false_id, @false_id)` (Assuming if not atom, or atom but not foo, it's false. `dc` is false as these tests are always relevant here). 2. TDD for `atom(:bar)`: * `v_is_atom_2 = mknode(:is_atom, v_atom_is_bar, @false_id, @false_id)` * `v_atom_is_bar = mknode(:atom_is_bar, @true_id, @false_id, @false_id)` Now, `sum(tdd_foo, tdd_bar)` using logical OR (`||`): Top variable is `:is_atom` for both. `res_yes = apply(||, v_atom_is_foo, v_atom_is_bar)` `res_no = apply(||, @false_id, @false_id)` -> `@false_id` `res_dc = apply(||, @false_id, @false_id)` -> `@false_id` Consider `apply(||, v_atom_is_foo, v_atom_is_bar)`: Variables are `:atom_is_foo` and `:atom_is_bar`. Assume `:atom_is_foo < :atom_is_bar`. Current var is `:atom_is_foo` (from `v_atom_is_foo`). `ry = apply(||, v_atom_is_foo.yes, v_atom_is_bar)` `= apply(||, @true_id, v_atom_is_bar)` -> `@true_id` (since `true || X = true`) `rn = apply(||, v_atom_is_foo.no, v_atom_is_bar)` `= apply(||, @false_id, v_atom_is_bar)` -> `v_atom_is_bar` (since `false || X = X`) `rd = apply(||, v_atom_is_foo.dc, v_atom_is_bar)` `= apply(||, @false_id, v_atom_is_bar)` -> `v_atom_is_bar` So, `res_yes = mknode(:atom_is_foo, @true_id, v_atom_is_bar, v_atom_is_bar)`. This node effectively means: "If it's :foo, then true. Else (if not :foo), then it depends on whether it's :bar (from `v_atom_is_bar`). If :atom_is_foo is don't care, it also depends on :bar." The reduction rule for `no_id == dc_id` isn't standard for BDDs but might make sense for TDDs if "not X" and "X is irrelevant" lead to the same sub-condition. CDuce's BDDs don't have this `dc` child directly in nodes, it's more implicit in their algorithm. If we stick to standard ROBDD `apply` ideas slightly modified for three children, `dc` handling when variables differ is key. **Key Data Structures to Manage:** 1. **Node Storage:** A list, array, or map to store the actual nodes. An ID is an index/key into this. * `nodes = %{0 => {:terminal, :false}, 1 => {:terminal, :true}, 2 => {:node, var_id1, yes2, no2, dc2}, ...}` 2. **Unique Table:** `(variable_id, yes_id, no_id, dc_id) -> node_id`. This is for structural sharing and reduction rule 1. * `unique_table = %{{var_id1, yes2, no2, dc2} => 2, ...}` 3. **Variable Definitions:** A way to map `variable_id` to its actual predicate logic and its position in the global order. * `variables = %{var_id1 => %{name: :is_atom, order_idx: 0, predicate: &is_atom/1}, ...}` 4. **Computed Tables (Memoization for Operations):** * For binary ops: `{(op_name, id1, id2) -> result_id}` * For unary ops: `{(op_name, id1) -> result_id}` **Summary of Refinements:** 1. **Global Hierarchical Variable Ordering:** Discriminate base types first, then type-specific properties. All variables (predicates) need a unique ID and a place in this global order. 2. **`mknode` is Central:** It implements reduction rule 2 (all children same -> return child) and uses the `unique_table` (reduction rule 1). 3. **`apply` Algorithm:** This is the workhorse for `sum`, `intersect`, etc. Its logic for handling cases where variables differ across operands is where the TDD structure (and `dc` child) truly shows its behavior. The `dc` child of a node `N1=(v,y,n,d)` would be used in `apply(op, N1.dc, N2)` if `v < var(N2)` and the `dc` path for `v` is taken. 4. **`dc` Child Interpretation:** In the `apply` algorithm, if `var(T1) < var(T2)`, `T2` doesn't test `var(T1)`. So, `T2` is passed "as is" when considering `T1.yes`, `T1.no`, and `T1.dc`. The resulting node for `var(T1)` will have its children determined by these recursive calls. The `dc` child in a node structure `(v, Y, N, D)` represents the type if the predicate `v` is "skipped" or considered irrelevant for that path. 5. **State Management:** You'll need a robust way to manage the shared state of nodes, unique table, and computed tables (e.g., a GenServer or ETS tables). This is a complex system. Start simple: implement the `mknode` with unique table for just atoms and booleans. Then add `negate`. Then `sum` or `intersect`. Gradually add more complex variable types. Good luck!