319 lines
18 KiB
Plaintext
319 lines
18 KiB
Plaintext
```
|
|
defmodule Tdd do
|
|
@moduledoc """
|
|
Ternary decision diagram, used for representing set-theoritic types, akin to cduce.
|
|
There are 2 types of nodes:
|
|
- terminal nodes (true, false)
|
|
- variable nodes
|
|
|
|
variable nodes consist of:
|
|
- the variable being tested
|
|
- yes: id of the node if the result of the test is true
|
|
- no: id of the node if the result of the test is false
|
|
- dc: id of the node if the result of the test is irrelevant for the current operation
|
|
|
|
the TDD needs to be ordered and reduced (ROBDD)
|
|
- 'ordered' if different variables appear in the same order on all paths from the root.
|
|
- 'reduced' if the following two rules have been applied to its graph:
|
|
- Merge any isomorphic subgraphs.
|
|
- Eliminate any node whose two children are isomorphic.
|
|
|
|
Working notes:
|
|
- structure of the ordered variables:
|
|
Im thinking of structuring all possible types inside 1 TDD, in contrast to cduce, which uses a `desrc` structure that contains several TDDs (one for each domain, like ints, atoms, functions, etc.), and descr is a union between them.
|
|
For this, I need to come up with a variable structure that'll be ordered.
|
|
My set types will need to represent types like: atoms, strings, ints, maps, tuples, functions, kinds?
|
|
Moreso, those types themselves consist of smaller subsets of types like:
|
|
- int < 10
|
|
- int in [1, 2, 3]
|
|
- string > "prefix_"
|
|
- atom == false
|
|
- atom == false or atom == true or atom == nil
|
|
- map == %{"id" => string} and %{string => any | nil}
|
|
- etc.
|
|
Dont know how to represent them and make them ordered.
|
|
- node cache:
|
|
I don't yet know what it should contain, I suspect ids of nodes (TDDs) after reduction. This way a comparison between 2 types is just a pointer (id) check in the node cache. But not yet sure.
|
|
- reduction rules: not sure how to approach them
|
|
|
|
"""
|
|
|
|
def node(elem, yes, no, dc = _dont_care) do
|
|
end
|
|
|
|
def sum(one, two) do
|
|
end
|
|
|
|
def intersect(one, two) do
|
|
end
|
|
|
|
def negate(one, two) do
|
|
end
|
|
end
|
|
```
|
|
Please help me refine the theory for impementing set-theoritic types with this
|
|
|
|
|
|
========================================================================================================================
|
|
|
|
|
|
Okay, this is a fascinating and ambitious project! Building a TDD-based type system like CDuce is a significant undertaking. Let's refine the theory.
|
|
|
|
You've got the core ideas right: terminals, variable nodes, and the need for ordering and reduction. The `dc` (don't care) child is the main differentiator from BDDs and key to TDDs' power in certain applications.
|
|
|
|
Let's break down your working notes and expand on them.
|
|
|
|
**1. Structure of Ordered Variables (The Core Challenge)**
|
|
|
|
Your intuition to have a single TDD for all types is interesting. CDuce uses one TDD *per domain* (atoms, integers, etc.) and then a `descr` (descriptor) which is a union of these. A single TDD means your variable ordering must be global and able to discriminate across all these domains first, then within them.
|
|
|
|
**Proposed Hierarchical Variable Ordering:**
|
|
|
|
The variables in your TDD will represent *predicates* about a value. The order is crucial.
|
|
|
|
* **Level 0: Base Type Discriminators:** These variables test the fundamental Erlang/Elixir type of a value.
|
|
* `is_atom?`
|
|
* `is_integer?`
|
|
* `is_float?`
|
|
* `is_string?` (Elixir strings are UTF-8 binaries)
|
|
* `is_binary?` (General binaries)
|
|
* `is_list?`
|
|
* `is_tuple?`
|
|
* `is_map?`
|
|
* `is_function?`
|
|
* `is_pid?`
|
|
* `is_reference?`
|
|
* *Order*: Define a canonical order for these (e.g., alphabetical, or by commonality). Let's say `is_atom` < `is_binary` < `is_float` < ...
|
|
|
|
* **Level 1+: Type-Specific Predicates:** Once a base type is determined (e.g., the `yes` branch of `is_integer?` is taken), subsequent variables test properties specific to that type.
|
|
|
|
* **For Atoms:**
|
|
* `value_is(specific_atom)`: e.g., `value_is(true)`, `value_is(false)`, `value_is(nil)`, `value_is(:foo)`.
|
|
* *Order*: Canonical order of the atoms themselves (e.g., `false` < `nil` < `true` < other atoms alphabetically).
|
|
* This implies a potentially large number of variables if you want to distinguish many specific atoms. Often, you care about a few (booleans, nil) and "any other atom".
|
|
|
|
* **For Integers:**
|
|
* `value < N`, `value == N` (less common as a direct variable, usually derived), `value_in_range(A,B)`.
|
|
* CDuce uses a finite partitioning of the integer line. For example, variables could be `value < 0`, `value < 10`, `value < MAX_INT_SMALL_ENOUGH_FOR_OPTIMIZATION`.
|
|
* *Order*: `value < N1` < `value < N2` if `N1 < N2`.
|
|
|
|
* **For Strings/Binaries:**
|
|
* `length_is(L)` or `length_in_range(L1,L2)`
|
|
* `prefix_is(prefix_string)`
|
|
* `matches_regex(regex_pattern)` (Regexes can be complex to integrate directly as simple TDD variables due to their expressive power. CDuce has specific handling for regular expression types on sequences).
|
|
* *Order*: By length, then by prefix lexicographically, etc.
|
|
|
|
* **For Tuples:**
|
|
* `arity_is(N)`
|
|
* `element_at(Index)_is_type(TypeRef)`: This is recursive. `TypeRef` would be the ID of another TDD representing the type of the element.
|
|
* *Order*: `arity_is(0)` < `arity_is(1)` < ... then `element_at(0)_is_type(T)` < `element_at(1)_is_type(T')`. The `TypeRef` itself would need to be part of the variable ordering if you allow different type tests for the same element index.
|
|
|
|
* **For Maps:** This is the most complex.
|
|
* `has_key(KeyAtomOrString)`
|
|
* `value_for_key(Key)_is_type(TypeRef)`
|
|
* `all_keys_are_type(KeyTypeRef)`
|
|
* `all_values_are_type(ValueTypeRef)`
|
|
* CDuce has specialized structures for map patterns. You might need variables like `matches_map_pattern(PatternID)`.
|
|
* *Order*: By key name, then by tests on those keys.
|
|
|
|
* **For Functions:**
|
|
* `arity_is(N)`
|
|
* `accepts_args_types(ArgTypesTupleRef)`
|
|
* `returns_type(ReturnTypeRef)`
|
|
* *Order*: By arity, then by argument types, then return type.
|
|
|
|
**The "Variable" Itself:**
|
|
A "variable" in your TDD node won't be just a name; it'll be an identifier that maps to a specific predicate function and its parameters. E.g., `var_id = 101` might mean `fn val -> is_integer(val) and val < 10 end`. You'll need a global, ordered registry of these predicate variables.
|
|
|
|
**2. Node Cache (`unique_table`)**
|
|
|
|
Yes, this is crucial for the "reduced" property.
|
|
* It's a hash table mapping `(variable_id, yes_node_id, no_node_id, dc_node_id)` to a unique `node_id`.
|
|
* **Terminals:**
|
|
* `true_node_id` (e.g., 1): Represents the "any" type or universal set.
|
|
* `false_node_id` (e.g., 0): Represents the "empty" type or bottom type.
|
|
These are the leaves of your TDD.
|
|
* When you create a node, you first check this table. If an identical node (same variable and children) exists, you reuse its ID. Otherwise, you create a new one, assign it a new ID, and store it.
|
|
* This ensures that any isomorphic subgraphs are merged (Reduction Rule 1).
|
|
|
|
**3. Reduction Rules**
|
|
|
|
You've listed the ROBDD rules. Let's adapt them for TDDs:
|
|
|
|
* **Merge any isomorphic subgraphs:** Handled by the `unique_table` as described above. Two subgraphs are isomorphic if they are structurally identical (same variables in same places leading to same terminals or further isomorphic subgraphs).
|
|
* **Eliminate any node whose three children are isomorphic:** If `yes_id == no_id == dc_id` for a variable node `v`, then the test `v` is irrelevant for distinguishing the type further down this path. The node `(v, child_id, child_id, child_id)` should be replaced by `child_id`. This rule is applied *before* looking up/inserting into the `unique_table`.
|
|
|
|
**The Role of `dc` (Don't Care):**
|
|
|
|
The `dc` child is what makes it a TDD. It typically means the variable being tested is irrelevant or its value doesn't affect membership for a particular path or operation.
|
|
|
|
* **In Type Representation:**
|
|
* If you're representing the type "integer", and your current variable is `is_atom?`:
|
|
* `yes` child would be `false_node_id` (an integer is not an atom).
|
|
* `no` child would lead to further tests for integers (or `true_node_id` if "integer" is all you're specifying).
|
|
* `dc` child: This is tricky for pure representation. If the variable `is_atom?` *must* be evaluated, then `dc` might also be `false_node_id`.
|
|
* However, `dc` becomes powerful when combining types or dealing with partially specified types. If a type expression doesn't constrain variable `v`, its `dc` branch relative to `v` might be taken.
|
|
* Consider the type `any`. For any variable `v`, the node would be `(v, true_node_id, true_node_id, true_node_id)`. By reduction rule 2, this simplifies to just `true_node_id`.
|
|
* Consider the type `none` (empty set). For any variable `v`, node is `(v, false_node_id, false_node_id, false_node_id)`, reduces to `false_node_id`.
|
|
|
|
* **In Operations (`sum`, `intersect`, `negate`):** The `dc` child is heavily used in the `apply` algorithm.
|
|
When `apply(op, tdd1, tdd2)`:
|
|
* If `var(tdd1) == var(tdd2)`: Recurse on `(op, tdd1.yes, tdd2.yes)`, `(op, tdd1.no, tdd2.no)`, `(op, tdd1.dc, tdd2.dc)`.
|
|
* If `var(tdd1) < var(tdd2)` (in global order): `tdd2` doesn't explicitly test `var(tdd1)`. So, `var(tdd1)` is effectively "don't care" for `tdd2`.
|
|
The operation proceeds by testing `var(tdd1)`:
|
|
* `yes_child = apply(op, tdd1.yes, tdd2)`
|
|
* `no_child = apply(op, tdd1.no, tdd2)`
|
|
* `dc_child = apply(op, tdd1.dc, tdd2)` (This is where TDD differs significantly from BDD's apply).
|
|
The new node is `mknode(var(tdd1), yes_child, no_child, dc_child)`.
|
|
* Symmetric case for `var(tdd2) < var(tdd1)`.
|
|
|
|
**4. `mknode(variable_id, yes_id, no_id, dc_id)` function:**
|
|
|
|
This is your core constructor.
|
|
|
|
```elixir
|
|
defmodule Tdd do
|
|
# Store nodes in an array/map, IDs are indices/keys
|
|
# @nodes %{0 => :false_terminal, 1 => :true_terminal, ...}
|
|
# @unique_table %{{var_id, yes_id, no_id, dc_id} => node_id}
|
|
# @variables %{var_id => {predicate_fun, predicate_params, global_order_index}}
|
|
|
|
@true_id 1
|
|
@false_id 0
|
|
|
|
# Internal state for unique table, nodes, variable definitions
|
|
# This would typically be in a GenServer or an ETS table in a real system
|
|
# For simplicity, let's imagine it's passed around or module attributes are used carefully.
|
|
|
|
# Example variable IDs (you'd have a system for this)
|
|
# @var_is_atom 10
|
|
# @var_is_integer 11
|
|
# @var_int_lt_10 20
|
|
|
|
def get_true(), do: @true_id
|
|
def get_false(), do: @false_id
|
|
|
|
# Pseudo-code for mknode
|
|
def node(variable_id, yes_id, no_id, dc_id) do
|
|
# Reduction Rule 2: Eliminate node whose children are isomorphic
|
|
if yes_id == no_id and no_id == dc_id do
|
|
# If all children are the same, this variable provides no discrimination
|
|
# for this path. The node is redundant.
|
|
return yes_id
|
|
end
|
|
|
|
# Check unique_table (conceptually)
|
|
# entry = {variable_id, yes_id, no_id, dc_id}
|
|
# if Map.has_key?(@unique_table, entry) do
|
|
# return @unique_table[entry]
|
|
# else
|
|
# new_node_id = generate_new_id()
|
|
# @nodes = Map.put(@nodes, new_node_id, {variable_id, yes_id, no_id, dc_id})
|
|
# @unique_table = Map.put(@unique_table, entry, new_node_id)
|
|
# return new_node_id
|
|
# end
|
|
# Actual implementation would use a GenServer or ETS for state
|
|
:not_implemented_yet # Placeholder
|
|
end
|
|
end
|
|
```
|
|
|
|
**5. Operations (`sum`, `intersect`, `negate`)**
|
|
|
|
These are built using a generic `apply` function, often memoized (using a `computed_table`).
|
|
|
|
`apply(op_lambda, tdd1_id, tdd2_id)`:
|
|
1. **Handle terminal cases:**
|
|
* `op_lambda.(@false_id, @false_id)`, `op_lambda.(@false_id, @true_id)`, etc.
|
|
* Some ops have identities/annihilators:
|
|
* Union: `T | true = true`, `T | false = T`.
|
|
* Intersection: `T & false = false`, `T & true = T`.
|
|
2. **Check `computed_table`:** `cache_key = {op_lambda_id, tdd1_id, tdd2_id}`. If result exists, return it.
|
|
3. **Recursive step:**
|
|
* Fetch node data for `tdd1_id` and `tdd2_id`. Let them be `N1`, `N2`.
|
|
* Let `v1 = N1.variable_id`, `v2 = N2.variable_id`.
|
|
* Compare `v1` and `v2` based on global variable order.
|
|
* **If `v1 == v2`**:
|
|
`res_yes = apply(op_lambda, N1.yes, N2.yes)`
|
|
`res_no = apply(op_lambda, N1.no, N2.no)`
|
|
`res_dc = apply(op_lambda, N1.dc, N2.dc)`
|
|
`result = mknode(v1, res_yes, res_no, res_dc)`
|
|
* **If `v1 < v2` (in order)**:
|
|
`res_yes = apply(op_lambda, N1.yes, tdd2_id)`
|
|
`res_no = apply(op_lambda, N1.no, tdd2_id)`
|
|
`res_dc = apply(op_lambda, N1.dc, tdd2_id)`
|
|
`result = mknode(v1, res_yes, res_no, res_dc)`
|
|
* **If `v1 > v2` (in order)**: (symmetric to above)
|
|
`res_yes = apply(op_lambda, tdd1_id, N2.yes)`
|
|
`res_no = apply(op_lambda, tdd1_id, N2.no)`
|
|
`res_dc = apply(op_lambda, tdd1_id, N2.dc)`
|
|
`result = mknode(v2, res_yes, res_no, res_dc)`
|
|
4. Store `result` in `computed_table` for `cache_key`.
|
|
5. Return `result`.
|
|
|
|
**`negate(tdd_id)`:**
|
|
This is simpler, an `apply_unary` operation.
|
|
1. If `tdd_id == @true_id`, return `@false_id`.
|
|
2. If `tdd_id == @false_id`, return `@true_id`.
|
|
3. Check `computed_table_for_negation`.
|
|
4. Fetch node `N`.
|
|
`res_yes = negate(N.yes)`
|
|
`res_no = negate(N.no)`
|
|
`res_dc = negate(N.dc)`
|
|
`result = mknode(N.variable_id, res_yes, res_no, res_dc)`
|
|
5. Store and return `result`.
|
|
|
|
**Concrete Example: Type `atom(:foo) | atom(:bar)`**
|
|
|
|
Variables (simplified order): `is_atom`, `atom_is_foo`, `atom_is_bar`.
|
|
1. TDD for `atom(:foo)`:
|
|
* `v_is_atom = mknode(:is_atom, v_atom_is_foo, @false_id, @false_id)`
|
|
* `v_atom_is_foo = mknode(:atom_is_foo, @true_id, @false_id, @false_id)`
|
|
(Assuming if not atom, or atom but not foo, it's false. `dc` is false as these tests are always relevant here).
|
|
2. TDD for `atom(:bar)`:
|
|
* `v_is_atom_2 = mknode(:is_atom, v_atom_is_bar, @false_id, @false_id)`
|
|
* `v_atom_is_bar = mknode(:atom_is_bar, @true_id, @false_id, @false_id)`
|
|
|
|
Now, `sum(tdd_foo, tdd_bar)` using logical OR (`||`):
|
|
Top variable is `:is_atom` for both.
|
|
`res_yes = apply(||, v_atom_is_foo, v_atom_is_bar)`
|
|
`res_no = apply(||, @false_id, @false_id)` -> `@false_id`
|
|
`res_dc = apply(||, @false_id, @false_id)` -> `@false_id`
|
|
|
|
Consider `apply(||, v_atom_is_foo, v_atom_is_bar)`:
|
|
Variables are `:atom_is_foo` and `:atom_is_bar`. Assume `:atom_is_foo < :atom_is_bar`.
|
|
Current var is `:atom_is_foo` (from `v_atom_is_foo`).
|
|
`ry = apply(||, v_atom_is_foo.yes, v_atom_is_bar)`
|
|
`= apply(||, @true_id, v_atom_is_bar)` -> `@true_id` (since `true || X = true`)
|
|
`rn = apply(||, v_atom_is_foo.no, v_atom_is_bar)`
|
|
`= apply(||, @false_id, v_atom_is_bar)` -> `v_atom_is_bar` (since `false || X = X`)
|
|
`rd = apply(||, v_atom_is_foo.dc, v_atom_is_bar)`
|
|
`= apply(||, @false_id, v_atom_is_bar)` -> `v_atom_is_bar`
|
|
|
|
So, `res_yes = mknode(:atom_is_foo, @true_id, v_atom_is_bar, v_atom_is_bar)`.
|
|
This node effectively means: "If it's :foo, then true. Else (if not :foo), then it depends on whether it's :bar (from `v_atom_is_bar`). If :atom_is_foo is don't care, it also depends on :bar."
|
|
The reduction rule for `no_id == dc_id` isn't standard for BDDs but might make sense for TDDs if "not X" and "X is irrelevant" lead to the same sub-condition. CDuce's BDDs don't have this `dc` child directly in nodes, it's more implicit in their algorithm.
|
|
If we stick to standard ROBDD `apply` ideas slightly modified for three children, `dc` handling when variables differ is key.
|
|
|
|
**Key Data Structures to Manage:**
|
|
|
|
1. **Node Storage:** A list, array, or map to store the actual nodes. An ID is an index/key into this.
|
|
* `nodes = %{0 => {:terminal, :false}, 1 => {:terminal, :true}, 2 => {:node, var_id1, yes2, no2, dc2}, ...}`
|
|
2. **Unique Table:** `(variable_id, yes_id, no_id, dc_id) -> node_id`. This is for structural sharing and reduction rule 1.
|
|
* `unique_table = %{{var_id1, yes2, no2, dc2} => 2, ...}`
|
|
3. **Variable Definitions:** A way to map `variable_id` to its actual predicate logic and its position in the global order.
|
|
* `variables = %{var_id1 => %{name: :is_atom, order_idx: 0, predicate: &is_atom/1}, ...}`
|
|
4. **Computed Tables (Memoization for Operations):**
|
|
* For binary ops: `{(op_name, id1, id2) -> result_id}`
|
|
* For unary ops: `{(op_name, id1) -> result_id}`
|
|
|
|
**Summary of Refinements:**
|
|
|
|
1. **Global Hierarchical Variable Ordering:** Discriminate base types first, then type-specific properties. All variables (predicates) need a unique ID and a place in this global order.
|
|
2. **`mknode` is Central:** It implements reduction rule 2 (all children same -> return child) and uses the `unique_table` (reduction rule 1).
|
|
3. **`apply` Algorithm:** This is the workhorse for `sum`, `intersect`, etc. Its logic for handling cases where variables differ across operands is where the TDD structure (and `dc` child) truly shows its behavior. The `dc` child of a node `N1=(v,y,n,d)` would be used in `apply(op, N1.dc, N2)` if `v < var(N2)` and the `dc` path for `v` is taken.
|
|
4. **`dc` Child Interpretation:** In the `apply` algorithm, if `var(T1) < var(T2)`, `T2` doesn't test `var(T1)`. So, `T2` is passed "as is" when considering `T1.yes`, `T1.no`, and `T1.dc`. The resulting node for `var(T1)` will have its children determined by these recursive calls. The `dc` child in a node structure `(v, Y, N, D)` represents the type if the predicate `v` is "skipped" or considered irrelevant for that path.
|
|
5. **State Management:** You'll need a robust way to manage the shared state of nodes, unique table, and computed tables (e.g., a GenServer or ETS tables).
|
|
|
|
This is a complex system. Start simple: implement the `mknode` with unique table for just atoms and booleans. Then add `negate`. Then `sum` or `intersect`. Gradually add more complex variable types. Good luck!
|