elipl/g2.md at 61e95be62264a05e0151a3f8da5da8ee143e4b05

Kacper Marzecki 4fbfed4db6 checkpoint new tdd

2025-06-15 18:29:13 +02:00

13 KiB

Raw Blame History

defmodule Tdd do
  @moduledoc """
  Ternary decision diagram, used for representing set-theoritic types, akin to cduce.
  There are 2 types of nodes: 
  - terminal nodes (true, false) 
  - variable nodes 

  variable nodes consist of:
  - the variable being tested 
  - yes: id of the node if the result of the test is true 
  - no: id of the node if the result of the test is false 
  - dc: id of the node if the result of the test is irrelevant for the current operation

  the TDD needs to be ordered and reduced (ROBDD)
  - 'ordered' if different variables appear in the same order on all paths from the root. 
  - 'reduced' if the following two rules have been applied to its graph:
    - Merge any isomorphic subgraphs.
    - Eliminate any node whose two children are isomorphic.

  Working notes:
  - structure of the ordered variables:
    Im thinking of structuring all possible types inside 1 TDD, in contrast to cduce, which uses a `desrc` structure that contains several TDDs (one for each domain, like ints, atoms, functions, etc.), and descr is a union between them.
    For this, I need to come up with a variable structure that'll be ordered. 
    My set types will need to represent types like: atoms, strings, ints, maps, tuples, functions, kinds, free polymorphic variables 
    Moreso, those types themselves consist of smaller subsets of types like:
    - int < 10 
    - int in [1, 2, 3]
    - string > "prefix_" 
    - atom == false
    - atom == false or atom == true or atom == nil
    - map == %{"id" => string} and %{string => any | nil}
    - polymorphic types ???
    - etc.
    Dont know how to represent them and make them ordered.
  - node cache:
    I suspect ids of nodes (TDDs) after reduction. This way a comparison between 2 types is just a pointer (id) check in the node cache. But not yet sure.
  - reduction rules: not sure how to approach them 

  """

  def node(elem, yes, no, dc = _dont_care) do
  end

  def sum(one, two) do
  end

  def intersect(one, two) do
  end

  def negate(one, two) do
  end
end

Please help me refine the theory for impementing set-theoritic types with this. Lets focus on ordering of variables in the TDD

========================================================================================================================

Okay, this is a fascinating and ambitious goal! Building a single, ordered, reduced TDD for all Elixir/Erlang types is a significant challenge, especially with the rich structure of types like maps and functions. Let's break down the ordering of variables.

The core idea for ordering in a BDD/TDD is to have a total global order on all possible "questions" (predicates) you can ask about a value.

High-Level Strategy for Ordering Variables

Primary Type Discriminators First: The most fundamental question is "What basic kind of thing is this?"
General Properties Next: After knowing the primary type, ask about general properties (e.g., size, length).
Specific Value/Content Properties: Then, delve into specific values or structural content.
Recursive Structure for Nested Types: For types like tuples and maps, the predicates about their elements/values will effectively be recursive, but the ordering system must flatten this into a global order.

Proposed Variable Ordering Scheme

Let's define "variables" as unique identifiers for predicates. We need a way to sort these identifiers. A good way is to use tuples, where Elixir's natural tuple sorting provides the order.

Category 0: Primary Type Discriminators These are the most fundamental. They will have the lowest sort order. Order them alphabetically by the type name.

v_is_atom = {0, :is_atom}
v_is_binary = {0, :is_binary}
v_is_float = {0, :is_float}
v_is_function = {0, :is_function}
v_is_integer = {0, :is_integer}
v_is_list = {0, :is_list}
v_is_map = {0, :is_map}
v_is_pid = {0, :is_pid}
v_is_port = {0, :is_port}
v_is_reference = {0, :is_reference}
v_is_string = {0, :is_string} (Note: Elixir strings are UTF-8 binaries. You might treat them as a subtype of binary or a distinct primary type in your model. For simplicity here, let's assume distinct for now, or you'd have predicates like {0, :is_binary_utf8} after {0, :is_binary})
v_is_tuple = {0, :is_tuple}

Category 1: Atom-Specific Predicates If is_atom is true, what specific atom is it? Order by the atom itself.

v_atom_eq_false = {1, :value, false}
v_atom_eq_nil = {1, :value, nil}
v_atom_eq_true = {1, :value, true}
v_atom_eq_specific_A = {1, :value, :an_atom} (e.g., :an_atom comes after true)
... (all known/relevant atoms in your system, ordered canonically)

Category 2: Integer-Specific Predicates If is_integer is true: You need a canonical way to represent integer conditions.

Equality: v_int_eq_N = {2, :eq, N} (e.g., {2, :eq, 0}, {2, :eq, 10})
- Order by N.
Less than: v_int_lt_N = {2, :lt, N} (e.g., {2, :lt, 0}, {2, :lt, 10})
- Order by N.
Greater than: v_int_gt_N = {2, :gt, N} (e.g., {2, :gt, 0}, {2, :gt, 10})
- Order by N.
Set membership for finite sets: v_int_in_SET = {2, :in, Enum.sort(SET)} (e.g. {2, :in, [1,2,3]})
- Order by the canonical (sorted list) representation of SET.
- This gets complex. Often, BDDs for integers use bit-level tests, but for set-theoretic types, range/specific value tests are more natural. You might limit this to a predefined, finite set of "interesting" integer predicates.

Category 3: String-Specific Predicates If is_string is true:

Equality: v_string_eq_S = {3, :eq, S} (e.g., {3, :eq, "foo"})
- Order by S lexicographically.
Length: v_string_len_eq_L = {3, :len_eq, L}
- Order by L.
Prefix: v_string_prefix_P = {3, :prefix, P}
- Order by P lexicographically.
(Suffix, regex match, etc., can be added with consistent ordering rules)

Category 4: Tuple-Specific Predicates If is_tuple is true:

Size first:
- v_tuple_size_eq_N = {4, :size, N} (e.g., {4, :size, 0}, {4, :size, 2})
  - Order by N.
Element types (recursive structure in variable identifier): For a tuple of a given size, we then check its elements. The predicate for an element will re-use the entire variable ordering scheme but scoped to that element.
- v_tuple_elem_I_PRED = {4, :element, index_I, NESTED_PREDICATE_ID}
  - Order by index_I first.
  - Then order by NESTED_PREDICATE_ID (which itself is one of these {category, type, value} tuples).
- Example: Is element 0 an atom? v_el0_is_atom = {4, :element, 0, {0, :is_atom}}
- Example: Is element 0 the atom :foo? v_el0_is_foo = {4, :element, 0, {1, :value, :foo}}
- Example: Is element 1 an integer? v_el1_is_int = {4, :element, 1, {0, :is_integer}} This ensures that all questions about element 0 come before element 1, and for each element, the standard hierarchy of questions is asked.

Category 5: Map-Specific Predicates If is_map is true: This is the most complex.

Size (optional, but can be useful):
- v_map_size_eq_N = {5, :size, N}
  - Order by N.
Key Presence:
- v_map_has_key_K = {5, :has_key, K} (e.g., {5, :has_key, "id"}, {5, :has_key, :name})
  - Order by K (canonically, e.g., strings before atoms, then lexicographically/atom-order).
Key Value Types (recursive structure): For a given key K that is present:
- v_map_key_K_value_PRED = {5, :key_value, K, NESTED_PREDICATE_ID}
  - Order by K (canonically).
  - Then order by NESTED_PREDICATE_ID.
- Example: Does map have key :id and is its value a string?
  - First variable: v_map_has_id = {5, :has_key, :id}
  - If yes, next variable: v_map_id_val_is_str = {5, :key_value, :id, {0, :is_string}}
Predicates for "all other keys" / "pattern keys": This is needed for types like %{String.t() => integer()}.
- v_map_pattern_key_PRED_value_PRED = {5, :pattern_key, KEY_TYPE_PREDICATE_ID, VALUE_TYPE_PREDICATE_ID}
- Example: For %{String.t() => integer()}:
  - Key type predicate: {0, :is_string}
  - Value type predicate for such keys: {0, :is_integer}
  - Variable ID: {5, :pattern_key, {0, :is_string}, {0, :is_integer}}
- These pattern key predicates should likely be ordered after specific key predicates. The exact sorting of KEY_TYPE_PREDICATE_ID needs careful thought (e.g. (0, :is_atom) before (0, :is_string)).

Category 6: List-Specific Predicates If is_list is true:

Is Empty:
- v_list_is_empty = {6, :is_empty}
Head/Tail Structure (if not empty, recursive): This mirrors how types like nonempty_list(H, T) are defined.
- v_list_head_PRED = {6, :head, NESTED_PREDICATE_ID}
- v_list_tail_PRED = {6, :tail, NESTED_PREDICATE_ID} (Note: NESTED_PREDICATE_ID for tail would again be list predicates like {6, :is_empty} or {6, :head, ...})
- Example: Head is an atom: {6, :head, {0, :is_atom}}
- Example: Tail is an empty list: {6, :tail, {6, :is_empty}}
- All head predicates come before all tail predicates.

Category 7: Function-Specific Predicates If is_function is true:

Arity:
- v_fun_arity_eq_A = {7, :arity, A}
  - Order by A.
Argument Types (very complex, may need simplification for TDDs):
- v_fun_arg_I_PRED = {7, :arg, index_I, NESTED_PREDICATE_ID}
Return Type (also complex):
- v_fun_return_PRED = {7, :return, NESTED_PREDICATE_ID}
- Function types are often represented by separate structures or simplified in TDDs due to their higher-order nature. Full function type checking within this TDD variable scheme would be extremely elaborate.

Binary, Float, Pid, Port, Reference Predicates: These would get their own categories (e.g., 8, 9, 10...).

Floats: {X, :is_float} -> {X, :eq, F}, {X, :lt, F}, etc.
Binaries: {Y, :is_binary} -> {Y, :size, S}, {Y, :matches_pattern, Pat} (e.g. <<h :: size(8), _ :: binary>>)

Polymorphic Variables (alpha, beta, etc.) Polymorphic variables are part of the type language, not properties of concrete values. A TDD represents a set of concrete values.

When you construct a TDD for a type like list(alpha), where alpha is free, alpha essentially means any. So, for predicates concerning list elements, they would all go to their dc (don't care) branches, ultimately leading to true.
If alpha is bound (e.g., in (alpha -> alpha) where alpha = integer), you first resolve alpha to integer and then build the TDD for (integer -> integer).
So, "free polymorphic variables" don't become TDD variables themselves. They influence which branches are taken during TDD construction for types containing them, often mapping to any or dc paths.

"Kinds" The "kind" of a type (e.g., Type.Atom, Type.Integer) is meta-information. The TDD represents a type. The variables within the TDD are about properties of values that belong to that type. You wouldn't have a TDD variable is_kind_atom?. Instead, the TDD for the type atom would start with the v_is_atom = {0, :is_atom} variable.

The dc (Don't Care) Branch The dc branch for a variable v in a node means that for the type being represented, the outcome of the test v is irrelevant to membership given the path taken to reach this node.

For type any, all variables would effectively go to dc, all leading to the true_terminal.
For type integer, the test v_is_atom:
- yes branch -> false_terminal (an atom is not an integer)
- no branch -> continues to test v_is_integer, etc.
- dc branch: This is the tricky part in TDDs for types. In some formulations, dc(N) is yes(N) U no(N). If a type truly doesn't care about v_is_atom (e.g., any), then yes, no, and dc might all point to the same child representing any for subsequent tests.
- For ROBDD reduction with dc, if yes_child == no_child == dc_child, the node is redundant. If yes_child == no_child, the test v is locally irrelevant, and the node might be simplified (parent points to yes_child, and the dc path needs careful handling depending on the TDD algebra). Standard BDD reduction (if yes_child == no_child, eliminate node) assumes only two children. You'll need TDD-specific reduction rules.

Summary of Ordering Logic:

Use tuples as variable identifiers: {category_num, predicate_type_atom, value_or_nested_id}.
Sort these tuples using Elixir's default tuple comparison. This establishes the global order.
category_num provides the coarsest level of ordering.
Within a category, predicate_type_atom provides the next level.
Finally, value_or_nested_id provides the finest level, potentially recursively applying the same ordering logic for nested structures.

This structured approach to variable identifiers is crucial for the "ordered" property of your ROBDD. When implementing apply operations (union, intersect, negate), you'll always compare the variable IDs of the current nodes from the two TDDs and operate on the one that comes first in the global order.

13 KiB Raw Blame History

13 KiB

Raw Blame History