elipl/g1

```
defmodule Tdd do
  @moduledoc """
  Ternary decision diagram, used for representing set-theoritic types, akin to cduce.
  There are 2 types of nodes:
  - terminal nodes (true, false)
  - variable nodes

  variable nodes consist of:
  - the variable being tested
  - yes: id of the node if the result of the test is true
  - no: id of the node if the result of the test is false
  - dc: id of the node if the result of the test is irrelevant for the current operation

  the TDD needs to be ordered and reduced (ROBDD)
  - 'ordered' if different variables appear in the same order on all paths from the root.
  - 'reduced' if the following two rules have been applied to its graph:
    - Merge any isomorphic subgraphs.
    - Eliminate any node whose two children are isomorphic.

  Working notes:
  - structure of the ordered variables:
    Im thinking of structuring all possible types inside 1 TDD, in contrast to cduce, which uses a `desrc` structure that contains several TDDs (one for each domain, like ints, atoms, functions, etc.), and descr is a union between them.
    For this, I need to come up with a variable structure that'll be ordered.
    My set types will need to represent types like: atoms, strings, ints, maps, tuples, functions, kinds?
    Moreso, those types themselves consist of smaller subsets of types like:
    - int < 10
    - int in [1, 2, 3]
    - string > "prefix_"
    - atom == false
    - atom == false or atom == true or atom == nil
    - map == %{"id" => string} and %{string => any | nil}
    - etc.
    Dont know how to represent them and make them ordered.
  - node cache:
     I don't yet know what it should contain, I suspect ids of nodes (TDDs) after reduction. This way a comparison between 2 types is just a pointer (id) check in the node cache. But not yet sure.
  - reduction rules: not sure how to approach them

  """

  def node(elem, yes, no, dc = _dont_care) do
  end

  def sum(one, two) do
  end

  def intersect(one, two) do
  end

  def negate(one, two) do
  end
end
```
Please help me refine the theory for impementing set-theoritic types with this


========================================================================================================================


Okay, this is a fascinating and ambitious project! Building a TDD-based type system like CDuce is a significant undertaking. Let's refine the theory.

You've got the core ideas right: terminals, variable nodes, and the need for ordering and reduction. The `dc` (don't care) child is the main differentiator from BDDs and key to TDDs' power in certain applications.

Let's break down your working notes and expand on them.

**1. Structure of Ordered Variables (The Core Challenge)**

Your intuition to have a single TDD for all types is interesting. CDuce uses one TDD *per domain* (atoms, integers, etc.) and then a `descr` (descriptor) which is a union of these. A single TDD means your variable ordering must be global and able to discriminate across all these domains first, then within them.

**Proposed Hierarchical Variable Ordering:**

The variables in your TDD will represent *predicates* about a value. The order is crucial.

*   **Level 0: Base Type Discriminators:** These variables test the fundamental Erlang/Elixir type of a value.
    *   `is_atom?`
    *   `is_integer?`
    *   `is_float?`
    *   `is_string?` (Elixir strings are UTF-8 binaries)
    *   `is_binary?` (General binaries)
    *   `is_list?`
    *   `is_tuple?`
    *   `is_map?`
    *   `is_function?`
    *   `is_pid?`
    *   `is_reference?`
    *   *Order*: Define a canonical order for these (e.g., alphabetical, or by commonality). Let's say `is_atom` < `is_binary` < `is_float` < ...

*   **Level 1+: Type-Specific Predicates:** Once a base type is determined (e.g., the `yes` branch of `is_integer?` is taken), subsequent variables test properties specific to that type.

    *   **For Atoms:**
        *   `value_is(specific_atom)`: e.g., `value_is(true)`, `value_is(false)`, `value_is(nil)`, `value_is(:foo)`.
        *   *Order*: Canonical order of the atoms themselves (e.g., `false` < `nil` < `true` < other atoms alphabetically).
        *   This implies a potentially large number of variables if you want to distinguish many specific atoms. Often, you care about a few (booleans, nil) and "any other atom".

    *   **For Integers:**
        *   `value < N`, `value == N` (less common as a direct variable, usually derived), `value_in_range(A,B)`.
        *   CDuce uses a finite partitioning of the integer line. For example, variables could be `value < 0`, `value < 10`, `value < MAX_INT_SMALL_ENOUGH_FOR_OPTIMIZATION`.
        *   *Order*: `value < N1` < `value < N2` if `N1 < N2`.

    *   **For Strings/Binaries:**
        *   `length_is(L)` or `length_in_range(L1,L2)`
        *   `prefix_is(prefix_string)`
        *   `matches_regex(regex_pattern)` (Regexes can be complex to integrate directly as simple TDD variables due to their expressive power. CDuce has specific handling for regular expression types on sequences).
        *   *Order*: By length, then by prefix lexicographically, etc.

    *   **For Tuples:**
        *   `arity_is(N)`
        *   `element_at(Index)_is_type(TypeRef)`: This is recursive. `TypeRef` would be the ID of another TDD representing the type of the element.
        *   *Order*: `arity_is(0)` < `arity_is(1)` < ... then `element_at(0)_is_type(T)` < `element_at(1)_is_type(T')`. The `TypeRef` itself would need to be part of the variable ordering if you allow different type tests for the same element index.

    *   **For Maps:** This is the most complex.
        *   `has_key(KeyAtomOrString)`
        *   `value_for_key(Key)_is_type(TypeRef)`
        *   `all_keys_are_type(KeyTypeRef)`
        *   `all_values_are_type(ValueTypeRef)`
        *   CDuce has specialized structures for map patterns. You might need variables like `matches_map_pattern(PatternID)`.
        *   *Order*: By key name, then by tests on those keys.

    *   **For Functions:**
        *   `arity_is(N)`
        *   `accepts_args_types(ArgTypesTupleRef)`
        *   `returns_type(ReturnTypeRef)`
        *   *Order*: By arity, then by argument types, then return type.

**The "Variable" Itself:**
A "variable" in your TDD node won't be just a name; it'll be an identifier that maps to a specific predicate function and its parameters. E.g., `var_id = 101` might mean `fn val -> is_integer(val) and val < 10 end`. You'll need a global, ordered registry of these predicate variables.

**2. Node Cache (`unique_table`)**

Yes, this is crucial for the "reduced" property.
*   It's a hash table mapping `(variable_id, yes_node_id, no_node_id, dc_node_id)` to a unique `node_id`.
*   **Terminals:**
    *   `true_node_id` (e.g., 1): Represents the "any" type or universal set.
    *   `false_node_id` (e.g., 0): Represents the "empty" type or bottom type.
    These are the leaves of your TDD.
*   When you create a node, you first check this table. If an identical node (same variable and children) exists, you reuse its ID. Otherwise, you create a new one, assign it a new ID, and store it.
*   This ensures that any isomorphic subgraphs are merged (Reduction Rule 1).

**3. Reduction Rules**

You've listed the ROBDD rules. Let's adapt them for TDDs:

*   **Merge any isomorphic subgraphs:** Handled by the `unique_table` as described above. Two subgraphs are isomorphic if they are structurally identical (same variables in same places leading to same terminals or further isomorphic subgraphs).
*   **Eliminate any node whose three children are isomorphic:** If `yes_id == no_id == dc_id` for a variable node `v`, then the test `v` is irrelevant for distinguishing the type further down this path. The node `(v, child_id, child_id, child_id)` should be replaced by `child_id`. This rule is applied *before* looking up/inserting into the `unique_table`.

**The Role of `dc` (Don't Care):**

The `dc` child is what makes it a TDD. It typically means the variable being tested is irrelevant or its value doesn't affect membership for a particular path or operation.

*   **In Type Representation:**
    *   If you're representing the type "integer", and your current variable is `is_atom?`:
        *   `yes` child would be `false_node_id` (an integer is not an atom).
        *   `no` child would lead to further tests for integers (or `true_node_id` if "integer" is all you're specifying).
        *   `dc` child: This is tricky for pure representation. If the variable `is_atom?` *must* be evaluated, then `dc` might also be `false_node_id`.
    *   However, `dc` becomes powerful when combining types or dealing with partially specified types. If a type expression doesn't constrain variable `v`, its `dc` branch relative to `v` might be taken.
    *   Consider the type `any`. For any variable `v`, the node would be `(v, true_node_id, true_node_id, true_node_id)`. By reduction rule 2, this simplifies to just `true_node_id`.
    *   Consider the type `none` (empty set). For any variable `v`, node is `(v, false_node_id, false_node_id, false_node_id)`, reduces to `false_node_id`.

*   **In Operations (`sum`, `intersect`, `negate`):** The `dc` child is heavily used in the `apply` algorithm.
    When `apply(op, tdd1, tdd2)`:
    *   If `var(tdd1) == var(tdd2)`: Recurse on `(op, tdd1.yes, tdd2.yes)`, `(op, tdd1.no, tdd2.no)`, `(op, tdd1.dc, tdd2.dc)`.
    *   If `var(tdd1) < var(tdd2)` (in global order): `tdd2` doesn't explicitly test `var(tdd1)`. So, `var(tdd1)` is effectively "don't care" for `tdd2`.
        The operation proceeds by testing `var(tdd1)`:
        *   `yes_child = apply(op, tdd1.yes, tdd2)`
        *   `no_child = apply(op, tdd1.no, tdd2)`
        *   `dc_child = apply(op, tdd1.dc, tdd2)` (This is where TDD differs significantly from BDD's apply).
        The new node is `mknode(var(tdd1), yes_child, no_child, dc_child)`.
    *   Symmetric case for `var(tdd2) < var(tdd1)`.

**4. `mknode(variable_id, yes_id, no_id, dc_id)` function:**

This is your core constructor.

```elixir
defmodule Tdd do
  # Store nodes in an array/map, IDs are indices/keys
  # @nodes %{0 => :false_terminal, 1 => :true_terminal, ...}
  # @unique_table %{{var_id, yes_id, no_id, dc_id} => node_id}
  # @variables %{var_id => {predicate_fun, predicate_params, global_order_index}}

  @true_id 1
  @false_id 0

  # Internal state for unique table, nodes, variable definitions
  # This would typically be in a GenServer or an ETS table in a real system
  # For simplicity, let's imagine it's passed around or module attributes are used carefully.

  # Example variable IDs (you'd have a system for this)
  # @var_is_atom 10
  # @var_is_integer 11
  # @var_int_lt_10 20

  def get_true(), do: @true_id
  def get_false(), do: @false_id

  # Pseudo-code for mknode
  def node(variable_id, yes_id, no_id, dc_id) do
    # Reduction Rule 2: Eliminate node whose children are isomorphic
    if yes_id == no_id and no_id == dc_id do
      # If all children are the same, this variable provides no discrimination
      # for this path. The node is redundant.
      return yes_id
    end

    # Check unique_table (conceptually)
    # entry = {variable_id, yes_id, no_id, dc_id}
    # if Map.has_key?(@unique_table, entry) do
    #   return @unique_table[entry]
    # else
    #   new_node_id = generate_new_id()
    #   @nodes = Map.put(@nodes, new_node_id, {variable_id, yes_id, no_id, dc_id})
    #   @unique_table = Map.put(@unique_table, entry, new_node_id)
    #   return new_node_id
    # end
    # Actual implementation would use a GenServer or ETS for state
    :not_implemented_yet # Placeholder
  end
end
```

**5. Operations (`sum`, `intersect`, `negate`)**

These are built using a generic `apply` function, often memoized (using a `computed_table`).

`apply(op_lambda, tdd1_id, tdd2_id)`:
1.  **Handle terminal cases:**
    *   `op_lambda.(@false_id, @false_id)`, `op_lambda.(@false_id, @true_id)`, etc.
    *   Some ops have identities/annihilators:
        *   Union: `T | true = true`, `T | false = T`.
        *   Intersection: `T & false = false`, `T & true = T`.
2.  **Check `computed_table`:** `cache_key = {op_lambda_id, tdd1_id, tdd2_id}`. If result exists, return it.
3.  **Recursive step:**
    *   Fetch node data for `tdd1_id` and `tdd2_id`. Let them be `N1`, `N2`.
    *   Let `v1 = N1.variable_id`, `v2 = N2.variable_id`.
    *   Compare `v1` and `v2` based on global variable order.
        *   **If `v1 == v2`**:
            `res_yes = apply(op_lambda, N1.yes, N2.yes)`
            `res_no  = apply(op_lambda, N1.no, N2.no)`
            `res_dc  = apply(op_lambda, N1.dc, N2.dc)`
            `result = mknode(v1, res_yes, res_no, res_dc)`
        *   **If `v1 < v2` (in order)**:
            `res_yes = apply(op_lambda, N1.yes, tdd2_id)`
            `res_no  = apply(op_lambda, N1.no, tdd2_id)`
            `res_dc  = apply(op_lambda, N1.dc, tdd2_id)`
            `result = mknode(v1, res_yes, res_no, res_dc)`
        *   **If `v1 > v2` (in order)**: (symmetric to above)
            `res_yes = apply(op_lambda, tdd1_id, N2.yes)`
            `res_no  = apply(op_lambda, tdd1_id, N2.no)`
            `res_dc  = apply(op_lambda, tdd1_id, N2.dc)`
            `result = mknode(v2, res_yes, res_no, res_dc)`
4.  Store `result` in `computed_table` for `cache_key`.
5.  Return `result`.

**`negate(tdd_id)`:**
This is simpler, an `apply_unary` operation.
1.  If `tdd_id == @true_id`, return `@false_id`.
2.  If `tdd_id == @false_id`, return `@true_id`.
3.  Check `computed_table_for_negation`.
4.  Fetch node `N`.
    `res_yes = negate(N.yes)`
    `res_no  = negate(N.no)`
    `res_dc  = negate(N.dc)`
    `result = mknode(N.variable_id, res_yes, res_no, res_dc)`
5.  Store and return `result`.

**Concrete Example: Type `atom(:foo) | atom(:bar)`**

Variables (simplified order): `is_atom`, `atom_is_foo`, `atom_is_bar`.
1.  TDD for `atom(:foo)`:
    *   `v_is_atom = mknode(:is_atom, v_atom_is_foo, @false_id, @false_id)`
    *   `v_atom_is_foo = mknode(:atom_is_foo, @true_id, @false_id, @false_id)`
    (Assuming if not atom, or atom but not foo, it's false. `dc` is false as these tests are always relevant here).
2.  TDD for `atom(:bar)`:
    *   `v_is_atom_2 = mknode(:is_atom, v_atom_is_bar, @false_id, @false_id)`
    *   `v_atom_is_bar = mknode(:atom_is_bar, @true_id, @false_id, @false_id)`

Now, `sum(tdd_foo, tdd_bar)` using logical OR (`||`):
Top variable is `:is_atom` for both.
`res_yes = apply(||, v_atom_is_foo, v_atom_is_bar)`
`res_no  = apply(||, @false_id, @false_id)` -> `@false_id`
`res_dc  = apply(||, @false_id, @false_id)` -> `@false_id`

Consider `apply(||, v_atom_is_foo, v_atom_is_bar)`:
Variables are `:atom_is_foo` and `:atom_is_bar`. Assume `:atom_is_foo < :atom_is_bar`.
Current var is `:atom_is_foo` (from `v_atom_is_foo`).
`ry = apply(||, v_atom_is_foo.yes, v_atom_is_bar)`
     `= apply(||, @true_id, v_atom_is_bar)` -> `@true_id` (since `true || X = true`)
`rn = apply(||, v_atom_is_foo.no, v_atom_is_bar)`
     `= apply(||, @false_id, v_atom_is_bar)` -> `v_atom_is_bar` (since `false || X = X`)
`rd = apply(||, v_atom_is_foo.dc, v_atom_is_bar)`
     `= apply(||, @false_id, v_atom_is_bar)` -> `v_atom_is_bar`

So, `res_yes = mknode(:atom_is_foo, @true_id, v_atom_is_bar, v_atom_is_bar)`.
This node effectively means: "If it's :foo, then true. Else (if not :foo), then it depends on whether it's :bar (from `v_atom_is_bar`). If :atom_is_foo is don't care, it also depends on :bar."
The reduction rule for `no_id == dc_id` isn't standard for BDDs but might make sense for TDDs if "not X" and "X is irrelevant" lead to the same sub-condition. CDuce's BDDs don't have this `dc` child directly in nodes, it's more implicit in their algorithm.
If we stick to standard ROBDD `apply` ideas slightly modified for three children, `dc` handling when variables differ is key.

**Key Data Structures to Manage:**

1.  **Node Storage:** A list, array, or map to store the actual nodes. An ID is an index/key into this.
    *   `nodes = %{0 => {:terminal, :false}, 1 => {:terminal, :true}, 2 => {:node, var_id1, yes2, no2, dc2}, ...}`
2.  **Unique Table:** `(variable_id, yes_id, no_id, dc_id) -> node_id`. This is for structural sharing and reduction rule 1.
    *   `unique_table = %{{var_id1, yes2, no2, dc2} => 2, ...}`
3.  **Variable Definitions:** A way to map `variable_id` to its actual predicate logic and its position in the global order.
    *   `variables = %{var_id1 => %{name: :is_atom, order_idx: 0, predicate: &is_atom/1}, ...}`
4.  **Computed Tables (Memoization for Operations):**
    *   For binary ops: `{(op_name, id1, id2) -> result_id}`
    *   For unary ops: `{(op_name, id1) -> result_id}`

**Summary of Refinements:**

1.  **Global Hierarchical Variable Ordering:** Discriminate base types first, then type-specific properties. All variables (predicates) need a unique ID and a place in this global order.
2.  **`mknode` is Central:** It implements reduction rule 2 (all children same -> return child) and uses the `unique_table` (reduction rule 1).
3.  **`apply` Algorithm:** This is the workhorse for `sum`, `intersect`, etc. Its logic for handling cases where variables differ across operands is where the TDD structure (and `dc` child) truly shows its behavior. The `dc` child of a node `N1=(v,y,n,d)` would be used in `apply(op, N1.dc, N2)` if `v < var(N2)` and the `dc` path for `v` is taken.
4.  **`dc` Child Interpretation:** In the `apply` algorithm, if `var(T1) < var(T2)`, `T2` doesn't test `var(T1)`. So, `T2` is passed "as is" when considering `T1.yes`, `T1.no`, and `T1.dc`. The resulting node for `var(T1)` will have its children determined by these recursive calls. The `dc` child in a node structure `(v, Y, N, D)` represents the type if the predicate `v` is "skipped" or considered irrelevant for that path.
5.  **State Management:** You'll need a robust way to manage the shared state of nodes, unique table, and computed tables (e.g., a GenServer or ETS tables).

This is a complex system. Start simple: implement the `mknode` with unique table for just atoms and booleans. Then add `negate`. Then `sum` or `intersect`. Gradually add more complex variable types. Good luck!