elipl/refactor.md at c51c35bfe418d4543255c6e6a87c81f360682894

Kacper Marzecki 130d369734 checkpoint

2025-06-18 15:44:49 +02:00

14 KiB

Raw Blame History

Motivation for Architectural Evolution of the TDD-based Type System

To: Project Stakeholders & Technical Team From: System Architect Date: October 26, 2023 Subject: A Principled Path Forward for Stability, Extensibility, and Polymorphism

1. Introduction & Current Success

The existing Ternary Decision Diagram (TDD) implementation has been remarkably successful in proving the core value of a set-theoretic approach to types. By representing types as canonical graphs, we have achieved powerful and efficient operations for union, intersection, negation, and subtyping. The system's ability to automatically simplify complex type expressions like (atom | tuple) & (tuple | :foo) into a canonical form (tuple | :foo) is a testament to the power of this model.

However, as we scale the system to represent more complex, real-world data structures—specifically recursive types like list(X)—a foundational architectural issue has been identified. This issue currently compromises the system's correctness and severely limits its future potential.

This document motivates a series of architectural changes designed to address this issue, creating a robust foundation for future growth, including the long-term goal of supporting polymorphism.

2. The Core Challenge: The Unstable Variable Problem

The correctness of an Ordered TDD relies on a fixed, global, total order of all predicate variables. Our current implementation for recursive types like list_of(X) violates this principle.

The Flaw: We currently construct predicate variables by embedding the integer TDD ID of the element type X. For instance, list_of(atom) might use the variable {5, :all_elements, 4} if type_atom() resolves to TDD ID 4.
The Consequence: TDD IDs are artifacts of construction order; they are not stable. If type_integer() is created before type_atom(), their IDs will swap, and the global variable order will change. This means that semantically identical types can produce structurally different TDDs, breaking the canonical representation that is the central promise of our system. This is not a theoretical concern; it is a critical bug that invalidates equivalence checks and subtyping operations under different run conditions.

This instability is a symptom of a deeper issue: we are conflating a type's logical description with its compiled, set-theoretic representation.

3. The Proposed Solution: A Separation of Concerns

To solve this, we will introduce a principled separation between the "what" and the "how" of our types.

TypeSpec: The Logical "What" We will introduce a new, stable, declarative data structure called a TypeSpec. A TypeSpec is a structural description of a type (e.g., {:list_of, :atom}, {:union, [:integer, :atom]}). This becomes the new public API and canonical language for defining types. It is human-readable, stable, and completely independent of the TDD implementation.
TDD ID: The Compiled "How" The TDD graph and its ID will be treated as a compiled artifact. It is the powerful, efficient, set-theoretic representation of a concrete TypeSpec. The Tdd module will evolve into a low-level "backend" or "compiler engine."

This separation yields immediate and long-term benefits:

Guaranteed Correctness & Stability: By using the stable TypeSpec within our TDD variables (e.g., {5, :all_elements, {:base, :atom}}), we restore a fixed global variable order. This guarantees that our TDDs are once again truly canonical, ensuring the reliability of all type operations.
Clean Architecture & API: The public-facing API will be vastly improved, operating on clear, declarative TypeSpecs instead of opaque integer IDs. The internal complexity of the TDD machinery is encapsulated behind a new Compiler module, improving maintainability.
Unlocking Future Capabilities: This architecture is not just a bug fix; it is an enabling refactor. A system that "compiles" a TypeSpec to a TDD is naturally extensible to polymorphism. The TypeSpec for list(A) can be represented as {:list_of, {:type_var, :A}}. A higher-level type inference engine can then operate on these specs, substituting variables and using our TDD compiler for concrete subtype checks. This provides a clear, principled path to our goal of supporting polymorphic functions and data structures.

4. Phased Implementation Plan

We will roll out these changes in three deliberate phases to manage complexity and deliver value incrementally.

Phase 1: Stabilize the Core. Introduce TypeSpec and the Compiler to fix the unstable variable problem. This is the highest priority, as it ensures the correctness of our existing feature set.
Phase 2: Refine the Public API. Build a new, high-level TypeSystem module that operates exclusively on TypeSpecs, providing a clean and intuitive interface for users of our library.
Phase 3: Introduce Polymorphism. With the stable foundation in place, implement the unification and substitution logic required to handle TypeSpecs containing type variables, enabling the checking of polymorphic function schemas.

5. Conclusion

The current TDD system is a powerful proof of concept that is on the verge of becoming a robust, general-purpose tool. By addressing the foundational "unstable variable" problem through a principled separation of a type's specification from its implementation, we not only fix a critical bug but also pave a clear and logical path toward our most ambitious goals. This architectural evolution is a necessary investment in the long-term stability, correctness, and extensibility of our type system.

Phase 1: Fix the Core Instability & Introduce `TypeSpec`

This phase is non-negotiable. It makes your current system sound and stable.

1. Introduce the TypeSpec Data Structure: This will be the new "source of truth" for defining types. It's a stable, structural representation. This becomes the primary vocabulary for your public API.

# In a new file, e.g., typespec.ex
defmodule Tdd.TypeSpec do
  @typedoc """
  A stable, structural representation of a type.
  This is the primary way users and the higher-level system will describe types.
  """
  @type t ::
          # Base Types
          :any
          | :none
          | :atom
          | :integer
          | :tuple # any tuple
          | :list # any list

          # Literal Types (canonical form of a value)
          | {:literal, term()}

          # Set-theoretic Combinators
          | {:union, [t()]}
          | {:intersect, [t()]}
          | {:negation, t()}

          # Parameterized Structural Types
          | {:tuple, [t()]} # A tuple with specific element types
          | {:cons, t(), t()} # A non-empty list [H|T]
          | {:list_of, t()} # A list of any length where elements are of a type

          # Integer Range (example of specific properties)
          | {:integer_range, integer() | :neg_inf, integer() | :pos_inf}

          # Polymorphic Variable (for Phase 3)
          | {:type_var, atom()}
end

2. Create a "Compiler": Tdd.Compiler.spec_to_id/1 This is the new heart of your system. It's a memoized function that takes a TypeSpec and returns a canonical TDD ID. Your existing Tdd module becomes the "backend" for this compiler.

# In a new file, e.g., compiler.ex
defmodule Tdd.Compiler do
  # This module manages the cache from spec -> id
  # Process.put(:spec_to_id_cache, %{})

  def spec_to_id(spec) do
    # 1. Normalize the spec (e.g., sort unions) to improve cache hits.
    normalized_spec = Tdd.TypeSpec.normalize(spec)

    # 2. Check cache for `normalized_spec`. If found, return ID.

    # 3. If not found, compile it by calling private helpers.
    id =
      case normalized_spec do
        :atom -> Tdd.type_atom()
        :integer -> Tdd.type_integer()
        {:literal, val} -> build_literal_tdd(val)
        {:union, specs} ->
          ids = Enum.map(specs, &spec_to_id/1)
          Enum.reduce(ids, Tdd.type_none(), &Tdd.sum/2)
        # ... and so on ...
        {:list_of, element_spec} -> build_list_of_tdd(element_spec)
      end

    # 4. Cache and return the new ID.
    id
  end

  # The key fix is here:
  defp build_list_of_tdd(element_spec) do
    # A) Compile the element spec to get its ID for semantic checks.
    element_id = spec_to_id(element_spec)

    # B) Create the TDD variable using the STABLE element_spec.
    #    This is the critical change.
    stable_var = Tdd.v_list_all_elements_are(element_spec)

    # C) Use Tdd module to build the node, passing both the stable
    #    variable and the compiled ID for the consistency checker.
    #    (This requires a small change in the Tdd module).
    Tdd.type_list_of_internal(stable_var, element_id)
  end
end

3. Modify the Tdd Module to be a "Low-Level Backend": The public API of Tdd shrinks. Its main users are now Tdd.Compiler and Tdd.TypeSystem (Phase 2).

Change Variable Constructors: v_list_all_elements_are now takes a TypeSpec.

# In Tdd module
# The variable now contains the stable TypeSpec, not a volatile ID.
def v_list_all_elements_are(element_spec), do: {5, :all_elements, element_spec}

Update check_assumptions_consistency: This function now needs access to the compiler to handle the new variable format.

# In Tdd module, inside the consistency checker
# ... when it encounters a `v_list_all_elements_are` assumption...
{{5, :all_elements, element_spec}, true} ->
  # It needs to know the TDD ID for this spec to do subtyping checks.
  ambient_type_for_head = Tdd.Compiler.spec_to_id(element_spec)
  # ... rest of the logic proceeds as before ...

This creates a circular dependency (Compiler -> Tdd -> Compiler), which is a sign of deep coupling. It's acceptable for now, but a sign that this logic might eventually belong in the compiler itself.

Refactor Type Constructors: Your old type_list_of, type_tuple, etc., are either removed or renamed to _internal and used by the new Compiler.

Phase 2: Build the High-Level Type System API

This is the new public-facing interface. It operates on TypeSpecs and uses the Compiler and Tdd modules as its engine.

# In a new file, e.g., type_system.ex
defmodule TypeSystem do
  alias Tdd.TypeSpec, as: Spec

  # --- Public Type Constructors (using TypeSpecs) ---
  def type_atom, do: :atom
  def type_integer, do: :integer
  def type_union(specs), do: {:union, specs}
  def type_list_of(spec), do: {:list_of, spec}
  # ... and so on ...

  # --- Public Operations ---
  @doc """
  Checks if spec1 is a subtype of spec2.
  """
  def is_subtype(spec1, spec2) do
    # 1. Compile both specs to their canonical TDD IDs.
    id1 = Tdd.Compiler.spec_to_id(spec1)
    id2 = Tdd.Compiler.spec_to_id(spec2)

    # 2. Use the fast, low-level TDD check.
    Tdd.is_subtype(id1, id2)
  end

  def intersect(spec1, spec2) do
    # This is a choice:
    # Option A (easy): Just return {:intersect, [spec1, spec2]} and let the compiler
    # handle it lazily when `spec_to_id` is called.
    # Option B (better): Do some immediate simplifications here before compiling.
    # e.g., intersect(:atom, :integer) -> :none
    {:intersect, [spec1, spec2]}
  end
end

At the end of Phase 2, you have a sound, stable, and extensible set-theoretic type checker. The "god function" problem still exists in check_assumptions_consistency, but the system is no longer fundamentally broken.

Phase 3: Introduce Polymorphism

Now you leverage the TypeSpec architecture.

1. Define Polymorphic Function Schemas: These are data structures, living outside the TDD world.

# A function schema
@type fun_schema :: {:function,
  forall: [:atom],         # list of quantified type variables
  args: [Spec.t()],        # argument types (can use quantified vars)
  return: Spec.t()         # return type (can use quantified vars)
}

# Example: Enum.map/2
# forall A, B. (list(A), (A -> B)) -> list(B)
map_schema = {:function,
  forall: [:A, :B],
  args: [
    {:list_of, {:type_var, :A}},
    {:function, forall: [], args: [{:type_var, :A}], return: {:type_var, :B}}
  ],
  return: {:list_of, {:type_var, :B}}
}

2. Implement a Unification/Substitution Engine: This is a new, separate module that operates only on TypeSpecs and schemas.

defmodule Type.Unify do
  # Takes a spec with variables and a concrete spec, returns a substitution map
  # or :error.
  # unify({:list_of, {:type_var, :A}}, {:list_of, :integer}) -> {:ok, %{A: :integer}}
  def unify(polymorphic_spec, concrete_spec) do
    # ... recursive pattern matching logic ...
  end

  # Applies a substitution map to a spec.
  # substitute({:list_of, {:type_var, :A}}, %{A: :integer}) -> {:list_of, :integer}
  def substitute(spec, substitution_map) do
    # ... recursive substitution logic ...
  end
end

3. Build the Type Checker for Function Calls: This is the final piece that ties everything together.

# In TypeSystem module
def check_function_call(function_schema, concrete_arg_specs) do
  # 1. Unify the schema's arg specs with the concrete arg specs.
  #    This will produce a substitution map, e.g., %{A: :integer, B: :atom}.
  {:ok, substitution_map} = Unify.unify(function_schema.args, concrete_arg_specs)

  # 2. Check that the concrete args are valid subtypes.
  #    (This step is actually part of unification itself).

  # 3. Infer the concrete return type by substituting into the schema's return spec.
  concrete_return_spec = Unify.substitute(function_schema.return, substitution_map)

  {:ok, concrete_return_spec}
end

This phased approach solves your immediate problem correctly, provides a clean public API, and builds the exact foundation needed to add a polymorphic layer on top without requiring another major architectural upheaval.

14 KiB Raw Blame History