Learning Epistemology

Version 2 Storyboard

1. Introduction

In scientific inquiry, we face a fundamental problem. Our goal is to develop theories that accurately describe the world. In order to do so, we are forced to generalize beyond our immediate experience to account for every other possible experience.


The reason why generalizing beyond our experience is a problem is because there are an infinite number of alternative theories consistent with our finite data set. Each alternative will agree with our observed data, but present different predictions for unobserved data. How should a scientist choose the correct theory among the infinite alternatives? We need a reasonable strategy to guide our theory choice.


Scientists use the principle of Ockham's Razor as their guide. Ockham's Razor states that when there are multiple consistent theories are being considered, the choice should be the simplest one.


Simple theories have an intuitive appeal, but that is not a justification of Ockham's Razor as a principle of reasoning.


A justification should demonstrate that preferring the simplest compatible theory is better at finding the truth than any other competing strategy.


Why is it necessary to justify using simplicity to guide theory choice? Well, if you already know that the true theory is simple, you don't need Ockham's Razor as a principle of reasoning to guide your choice. This implies that whenever the use of Ockham's Razor needs justification, then the truth may be arbitrarily complex.


And if you admit that the truth may be arbitrarily complex, how can a systematic preference for simple theories help find the true one? In that case, simplicity couldn't possibly track the truth. If the truth turns out to be complex, then the simple theory will be the wrong choice.


Put another way, if Ockham's Razor is supposed to work like a compass and always point in the direction of truth, then its needle is stuck on one heading. Nobody could navigate like that, and Ockham's Razor couldn't help like that either. So, if Ockham's Razor provides any help at all, it has be in some other way besides pointing to the true theory.


Let's recap what we need to demonstrate. A successful justification must possess these three properties:

To achieve this goal, we will first examine a toy problem of inquiry that is intuitive to analyze. We will demonstrate how preferring the simplest compatible theory helps find the true one. The results for this specific problem will guide our development of a general theory of Ockham's Razor.

Next, we will define simplicity and Ockham's Razor without reference to a specific problem of inquiry. Using these general definitions, we will demonstrate how Ockham's Razor helps find true theories. Since our definitions won't refer to any specific case of inquiry, our results will apply to all cases. This general demonstration will finally justify Ockham's Razor as a principle of reasoning.


2. Strategy for a Response

We stated before that simplicity alone cannot indicate which theory is true, but to be fair, nothing can do this. No matter how you choose a general theory, it can always be proven false by future unexpected discoveries. Since it is impossible to prove that a general theory is true, what can we do instead? Well, if our theory is true, then for each new datum we observe, our theory will accurately account for it. Stated another way, if we never have to retract our theory in response to incorrect predictions, then it is true.


Imagine if we could establish a necessary limit on how many retractions we would have to make during our inquiry. For any given problem, we'd have a lower bound on the number of retractions any strategy could be forced to make in the worst case. Sometimes you might get lucky and use fewer retractions than this minimum. But in the worst case, we would know that every strategy uses at least that many retractions. Returning to the compass analogy, it would be like establishing a limit on the number of times you would have to change headings.


If we could establish this worst case retraction lower bound, we would have a goal for our strategies. Any strategy that uses more retractions than the lower bound would be doing so unnecessarily. To avoid this, we should design our strategies to only retract within this lower bound. That way, we know that each retraction is a step towards the truth, instead of just pointless vacillating among theories.


Let's think about how we could establish a lower bound on retractions for a problem and design an efficient solving strategy. What makes it difficult to choose the true theory when the truth is complex? The difficulty is that we can't differentiate between worlds with all simple data and ones with future unobserved complex data.


If the world does have hidden complexity, it can be revealed at any time during the course of our inquiry. When it finally is revealed, we have to retract from our false simple theory to a more complex one. So even though assuming simplicity led to a retraction, it was only in response to new complex evidence.


But now suppose that we act on an assumption of hidden complexity and adopt a complex theory. As we wait arbitrarily long to observe the complex evidence, we face a new problem.

If the expected revelation never occurs, our complex theory is false. In this situation, we are forced to eventually retract to the simple theory, or else never find the truth.


Once we have reverted to the simple theory, if new complex evidence ever does appear, it will force us back to the original complex theory. Notice what has happened here: an initial gamble on hidden complexity lead to unnecessary retractions. In such a worst-case scenario, Ockham's Razor would have found the truth more efficiently than the non-Ockham strategy.


So perhaps Ockham's Razor is just the assurance we're looking for. It looks like non-Ockham strategies retract unnecessarily in the worst-case, while Ockham's Razor doesn't. Always assuming that the world is simple leads us to the truth efficiently no matter how complex the world turns out to be. If this is so, we'll have a justification of Ockham's Razor that satisfies all three requirements:

To explore this idea, let's examine Ockham's Razor applied to a toy problem. The problem should be easy to think about, yet still capture all the fundamental details. Doing so will help us solidify our intuitions before we develop a general theory of Ockham's Razor.


3. Counting Marbles

Let's think about the counting problem. As scientists, we often find ourselves having to count things. We will be in a situation where we don't know when the things will appear, but we do know that there will only be a finite number of things.

We face the counting problem when we want to:
Determine the order of an effect
Find the edges in a causal graph
Establish the number of free parameters in a model
And many other similar situations.

We're going to examine a very simple example of the counting problem. It will retain all of the problem's fundamental elements, but will be easy to think about. That way we can discover how Ockham's Razor applies to the counting problem without dealing with irrelevant details.


We want to study this marble making machine. Specifically, we'd like to determine how many marbles it will ever make, that is, the complete count. Marble makers can only produce a finite number of marbles, but each marble can appear at any time. Our question is an instance of the counting problem. So, how does Ockham's Razor suggest we proceed with our inquiry?


Ockham's Razor instructs us to choose the simplest theory compatible with our evidence. For guessing the complete count, those instructions translate to this:

Each marble is an anomaly, a distinct surprise in our experience, so the more marbles that appear, the more complex the truth is. The appearance of n marbles is consistent with any number of marbles greater than or equal to n. But, guessing more than n incurs additional worst-case retractions, forcing our performance to be worse than Ockham's. Let's see how it works.


We're going to compare different strategies for solving the counting problem. A strategy is a way of guessing theories in response to evidence. At each stage of inquiry, a strategy examines the available evidence, then returns either a natural number which indicates a guess at the complete count, or a question mark which indicates a refusal to guess at that stage. A strategy solves the counting problem if and only if it converges to the true complete count, regardless of how many objects there are or when they appear. We're only interested in evaluating strategies that solve the given problem.


First let's establish the Ockham's strategy's worst-case retractions.

When no marbles have appeared, Ockham will guess "0".

After the guess, a new marble is free to appear.

If one does, Ockham will retract and guess the current count.

This can continue until all the marbles have appeared.

So in the worst case, the Ockham strategy commits one retraction per marble.

That means that for every total number of marbles n, Ockham's worst-case retractions are equal to n.


Now let's see how a non-Ockham strategy commits more worst-case retractions.
A non-Ockham strategy is just any deviation from the Ockham strategy.
We're interested in the stage of inquiry where the non-Ockham strategy deviates from Ockham.
So, suppose its very first guess is "1" in anticipation of the first marble.

The first marble, however, is free to wait arbitrarily long before it appears.
As evidence accumulates with no marble appearances, the non-Ockham strategy must make a choice: either keep the wrong answer "1" and never solve the problem, or else retract to "0".

Since the strategies we're considering must solve the problem, it must eventually retract.

After the retraction, a new marble is free to appear.

If one does, the strategy must eventually retract again to the current count in order to solve the problem.

This can continue until all the marbles have appeared.

So in the worst case, the non-Ockham strategy will commit more than one retraction per marble after point of deviation from Ockham's Razor.
That means that for every total number of marbles n after the deviation, the non-Ockham strategy's worst case retractions will be greater than n.


Notice that we are comparing retraction effeciency from the first point of deviation from Ockham's Razor. In our comparison, we hold fixed both the evidence and guess history up to that point. In doing so, we compare performance only in situations that present the same initial evidence sequence, and only strategies that make the same initial guesses. Let's look at another example to see why this is important.

Consider a non-Ockham strategy that refrains from guessing while waiting for evidence to accumulate.
During this time, two marbles happen to appear.
But when it eventually makes a guess, it overcounts and guesses "3".
Finally, it corrects its guess back to "2".

At first glance, it seems as if the non-Ockham strategy commits fewer retractions as Ockham, but that is only the case if efficiency is measured over the whole course of inquiry. When a strategy decides to deviate from Ockham's Razor, we take as given both the evidence and guess history up to the point of deviation. Therefore, we hold them both fixed for the efficiency comparisons.

To do this, we construct a hybrid strategy that mimics the non-Ockham strategy until the deviation point, and then follows Ockham's Razor. Comparing the strategies at the deviation point reveals the inefficiency. As before, both strategies will commit an equal number of retractions up until the deviation. But afterwards, for every total number of marbles n beyond the deviation, the mimic-Ockham hybrid strategy will commit only n additional worst-case retractions. The non-Ockham strategy will commit greater than n additional worst-case retractions.


As we saw in the previous example, repeating a guess can be useful for avoiding retractions in non-worst case scenarios. But, as we also saw, any guess besides the current count or a repetition of the previous guess will result in extra worst-case retractions.

Let's modify the Ockham strategy to include the possibility to repeat the previous answer, and call it the lagged Ockham strategy. A lagged Ockham strategy will only do two things:


We just saw how guessing more than the current count incurs extra worst case retractions. The same extra retraction argument applies if a strategy ever drops the simplest compatible theory while it is still simplest, which would be a violation of the retention principle. The retention principle states that, once adopted, a theory should only be abandoned in response to empirical evidence, not on general skeptical grounds.

In our setup, that would amount to guessing a theory, then subsequently changing the guess while it is still consistent. Just to be clear, while changing from a refusal to guess to a guess is no retraction, changing from a guess to a refusal to guess is a retraction, since a previously held positive theory is no longer being advocated.

Comparing worst-case retraction efficiency at the point where the retention principle is violated, we find the same unnecessary retraction sequence.

Dropping the current guess incurs a retraction.
If no more marbles appear, the strategy must eventually go back to the abandoned strategy in order to solve the problem.
Once this has happened, a new marble is free to appear.
If it does, the strategy must eventually retract in order to solve the problem.
This can continue until all the marbles appear.
So in the worst case, violating the retention principle causes the strategy to commit more than one retraction per marble after the violation point.
That means that for every total number of marbles n after the violation, the strategy's worst-case retractions will be greater than n.

Let's call strategies that never violate the retraction principle stalwart, since they persist with an answer until it runs into real empirical problems. So a stalwart lagged Ockham strategy will only:


We've seen that violating either Ockham's Razor or the retention principle causes a strategy to commit more than n worst-case retractions for every possible answer n after the violation. We've also seen that the stalwart lagged Ockham strategy commits n worst-case retractions for every possible answer n. That means that for each answer after the violation, stalwart lagged Ockham will commit fewer worst-case retractions than a non-Ockham or non-stalwart strategy. When this situation occurs, we'll say that the non-Ockham strategy is strongly beaten by the Ockham strategy over worst-case retractions.

This definition is opposed to weakly beaten. A strategy is weakly beaten if its performance is worse in only some answers when compared to another strategy. Of course, being strongly beaten implies that a strategy is also weakly beaten, but not the opposite.


Finally, let's say that a strategy is efficient if it is never weakly beaten. If a strategy is never weakly beaten, there is never a situation where it performs worse than a competeing strategy. So if a strategy is weakly beaten, then it is also inefficient.


The final step of our justification is to show that the stalwart lagged Ockham strategy is efficient, meaning that no other convergent strategy does better in any answer. To do this, say that each stalwart, lagged Ockham solution s (e.g., the strategy that always returns the current count) is efficient at each finite evidence sequence e in the sense that over each answer compatible with e, solution s matches the worst-case retraction performance of an arbitrary solution agreeing with s along e-.

First, let's consider that we're at some arbitrary stage of inquiry t.
We'll suppose that there is some arbitrary strategy in the same situation, meaning that it observed the same evidence and made the same guesses as the stalwart, lagged Ockham up until now.

From this, we know that both strategies have committed the same number of retractions prior to t, and both strategies' guess at t - 1 was the current count n at t - 1.

Now, if the Ockham strategy's guess at t is a retraction, it follows from the stalwartness property that a new marble must have appeared, so its guess is the count at t, n + 1.

Since the arbitrary strategy solves the problem, it must eventually retract from n to n + 1.
So in answer n + 1, both strategies have equal worst-case retractions.

Finally, we know that the Ockham strategy retracts no more than once for each additional marble appearance.
So for any number of marbles m beyond n, Ockham's worst-case retractions will be at most n + m.

We also know that the arbitrary strategy can be forced to commit at least one retraction per additional marble, simply by not releasing the next marble until it guesses the current count.
So for any number of additional marbles m beyond n, the arbitrary strategy's worst-case retractions are at least n + m.

We've just shown that for each possible answer, no strategy can do better in worst-case retractions than the stalwart lagged Ockham strategy. This means that Ockham is never strongly beaten, which in turn means that Ockham is efficient.


Let's take stock of what we've demonstrated.

For any strategy solving the counting problem, and for each finite initial evidence sequence e,
Since being strongly beaten implies being weakly beaten, and since being weakly beaten implies inefficiency, the following also holds.
If a strategy solves the counting problem, and the cost is worst case retractions, then the following are equivalent:
So of all the possible strategies for solving the counting problem, there are two kinds: those that are efficient and those that are strongly beaten. The efficient ones are all the stalwart lagged Ockham strategies, and the strongly beaten ones are everything else. With this final proof, we've done what we set out to do for at least the counting problem. We wanted to demonstrate how Ockham's Razor performs better than other strategies at finding the truth. We showed how it does so by minimizing worst-case retractions, regardless of how complex the truth turns out to be. Furthermore, nothing in our argument assumed that simplicity indicates truth. Even though simplicity alone guides Ockham's guesses, there is always the possibility that each guess will be proven wrong in the future.

4. Issues with Developing a General Demonstration of Ockham's Razor

We've demonstrated how systematically preferring the simplest compatible answer helps solve the counting problem efficiently. Even though many cases of scientific inquiry are instances of the counting problem, there are many others which are not. In order to fully justify Ockham's Razor, we need to give a general demonstration of its truth-finding efficiency, one that is applicable to every case of scientific inquiry. To accomplish this goal, we need a way to talk about simplicity, problems, evidence, and answers without referring to the specifics of any particular case of scientific inquiry. That's a tall order, but our previous results will help guide our general demonstration's development. First, let's consider some issues that our general demonstration will need to address.


In the marble counting problem, it was intuitive how to rank the simplicity of each possible answer. The higher the count, the more complex the answer. Therefore, the simplest compatible answer is always the current count. This intuition corresponds to many classically cited attributes of simplicity, such as:

However, while these are all qualities of simple answers, they are not a basis on which to define what simplicity is. We want our general definition of simplicity to correspond to something objective about the problem, not just arbitrarily name one answer simple and another complex.


To understand this concern, consider if instead of marbles, we wanted to count "firstbles". A firstble is just like a marble unless the marble appears on the first stage of inquiry, in which case the marble is not a firstble. Similarly, a non-marble appearance on the first stage counts as a firstble.

Even though firstbles may seem odd, they are no more odd than marbles since each can be defined in terms of the other in an identical fashion. As far as each definition's syntactic complexity is concerned, they are symmetrical.

It seems like counting marbles should present the same difficulty as counting firstbles, but each problem description identifies a different uniformity. Clearly, the most uniform marble world is not be the most uniform firstble world, and vice versa. The same issue arises with the other classical definitions of simplicity. So if these attributes have anything to do with simplicity, they must be an artifact of some deeper structure. We want a defintion based on this deeper structure.


The marble-versus-firstble issue has been raised as an objection to the possibility of a general definition of simplicity. Since each world's complexity ranking is dependent on the description of the problem, critics suppose that there can be no objectivity about the matter. But rather than being an objection to simplicity itself, this insight is merely a fact about simplicity's relationship to efficient inquiry. Efficiency is always measured relative to a problem as posed, so any efficiency-based connection between simplicity and truth must be dependent on the problem definition. For example, a strategy that is efficient at counting marbles need not be efficient at counting firstbles and vice versa. When we pose a well-formed problem, we fix the descriptions that concern us, which in turn fixes the structure of the problem. Our defense of Ockham's Razor does not dictate which problem is under investigation, only the manner in which to solve it efficiently. Therefore, in order to be general, our definition of simplicity should be designed to apply to as broad a class of problem descriptions as possible.



In light of these observations, let's re-examine the marble counting problem to characterize simplicity in this structure-dependent fashion. Before, we said that each marble appearance was an anomaly, and that simplicity is determined by the number of anomalies. Let's define what an anomaly is in a structural way without reference to marbles or marble makers. How does the structure of the problem change with each anomaly (in this case, with each marble)?

Whenever a marble appears, the lowest complete count consistent with the evidence prior to the marble's appearance is no longer consistent. Restating this in general terms, whenever an anomaly occurs, the simplest consistent answer prior to the anomaly is no longer consistent. So anomalies change the structure of the potential answers relative to what is currently known.

As we saw before, strategies can be forced to guess an answer by presenting the same evidence repeatedly. An answer pattern is a series of answers without immediate repetition that can be forced in this manner, one after the other, like forcing "0", then "1", then "2" and so on. Since anomalies make the current simplest answer inconsistent, in effect, they truncate a term off the beginning of the answer pattern.

This is the generic characterization of anomalies and simplicity that we've been looking for. An anomaly occurs when evidence truncates an answer off the beginning of a forcible answer pattern. An answer's simplicity can be ranked by the number of anomalies that can be forced before it.


There are many problems that have a different structure than the counting problem. Our general justification needs to account for them. Let's examine some different kinds of problem structures In our marble maker example, marbles can only appear one at a time. This means that only one anomaly can occur at each stage of inquiry. But what if we modified the problem to allow multiple marbles to appear at once? That would mean that several anomalies could occurs simultaneously. Let's be sure to account for these kinds of problems in our general justification.


In the counting problem, there is always a unique simplest answer, the current count. However, some problems can present multiple equally simple answers simultaneously. For example, consider this variation on marble counting: we know that a marble maker will produce at least one marble, but those marbles could be either all white or all black. We want to guess the complete count and the color of the marbles. At the start of our inquiry, these answer patterns are forcible:

Both "1 black" and "1 white" are equally simple, in that nothing can be forced before them. However, if we guess one color instead of the other before the appearance of the first marble, we can be forced to retract if the opposite color appears. Until one answer has been singled out as uniquely simplest, efficiency requires that we refrain from guessing. Let's say that a strategy respects symmetry if it refrains from guessing when multiple equally simple answers are consistent with our evidence. Our general justification will have to account for problems with symmetrical structures.


Finally, we'll need to address the issue of repeated answers in a problem structure. In the counting problem, answers are only forcible once during the course of inquiry. But consider this variation: we want to guess the complete count of a normal marble maker, but only if the complete count is odd. If the complete count turns out to be even, we must only report the answer "even". In this case, the answer "even" can be forced an aribtrary number of times during the course of inquiry. In the last section, we assessed rectraction efficiency over worlds that satisfy a particular answer. That technique won't work in this even-count-the-odd problem, because in the worst case, the number of retractions an arbitrary strategy would need to converge to "even" is unbounded. To accomodate for these problem structures, we're going to instead assess efficiency over worlds belonging to a particular anomaly complexity class. An anomaly complexity class is just the group of worlds that satisfy an answer while also sharing the same initial answer pattern. In regular marble counting, using complexity classes doesn't change anything, since each answer is unique. But in the even-count-the-odd problem, using complexity classes means we assess worlds with zero anomalies satifying "even" separately from those with two anomalies satisfying "even", and so on. In this manner, repeated answers can be treated individually whenever they appear in the problem structure. Our general account will assess efficiency over anomaly complexity classes in this fashion.


Taking these three possible problem structures into account:

we can represent an extremely broad class of problems with our general justification. In our justification, we will: In the next section, we will develop a formal language that expresses these ideas with clarity and precision. In the section after that, we will use this formal language to demonstrate Ockham's Razor's truth-finding efficiency for an aribitrary problem of inquiry, thereby justifying its use as a principle of scientific reasoning.


5. Formal Definitions

In this next section, we're going to develop a formal language with which to precisely express the ideas we've been developing. It is imperative to become fluent with these definitions before proceeding on to the actual justification.


An empirical problem consists of two objects, W and A.

W is a set of infinite input sequences. Each infinite input sequence in W is called a potential world, and represents the infinite series of evidence that world would present. We represent a single potential world as w.

A is a partition over the worlds in W. Each cell in W is called a potential answer, and contains all the worlds for which that answer is true. We represent a single potential answer as a.

We represent an empirical problem as the pair (W, A).


As evidence becomes available, some potential worlds become incompatible. To represent this formally, we let the finite input sequence e represent the available evidence at any stage of inquiry.

We call a potential world compatible with e when e is identical to the world's initial inputs. More broadly, we say that We represents the subset of worlds in W that are compatible with e.

Similarly, we say that Ae represents that subset of each answer in A containing an element of We.

Finally, we say that e is compatible with W just in case e is compatible with some world in W.

For our justification, we will have to refer to a finite input sequence e with the last input truncated off. We represent this trucated input sequence as e-.


For the rest of the section, all definitions are relative to a given empirical problem (W, A).

We're going to represent scientific inquiry as a formal game between the scientist and nature. First, let's define what a strategy is for both players.

A scientist strategy is just a mapping from finite input sequences to either potential answers, or "?" indicating a refusal to guess.

A nature strategy is just a mapping from finite answer sequences (including "?") to individual inputs.

We say that a scientist strategy is a solution to the problem if it converges to the true answer for each world.


When a finite answer sequence contains no two adjacent answers identical to each other, we call it an answer pattern.

Given both a finite input sequence e compatible with W and an answer pattern g, the pattern forcing game is played as follows.

In the limit, the two players determine an infinite play sequence, which we call p.

pscientist represents the scientist's subsequence of responses, while pnature represents nature's subsequence of responses.

Nature wins the game if and only if the following conditions are met.

Otherwise, the scientist wins.

A winning strategy is one that wins against an arbitrary opponent strategy.

We call a pattern forcing game determined just in case either the scientist or nature has a winning strategy.

An answer pattern g is forcible at e if and only if nature has a winning strategy in the pattern forcing game given g and e.


It turns out that not every pattern forcing game is determined. However, for the purposes of our justification, we are going to restrict our attention to the ones that are.

Restriction 1: Pattern forcing games must be determined

For typical applications, though, this restriction is already satisfied. Typical empirical problems can be expressed as a statement using a series of "there is" and "for all" clauses, followed by a final condition. For example, the marble counting problem can be expressed as the statement

Problems of this kind are called Borel, because they describe a W with the properties of a Borel set.

Furthermore, typical empirical problems are solvable because there is some strategy that solves the problem. That is, there is some strategy that converges to the true answer in every possible world.

Specifically, if problem (W, A) is solvable, W is a Borel set, and e is a finite input sequence,
then for all answer patterns g, the g-answer pattern game for (W, A) is determined.


While any answer pattern may be forcible, our demonstration is going to make heavy use of backwards-maximal answer patterns. An answer pattern is backwards-maximal at e if it describes a complete series of answers that could have been forced since e. For example, at the onset of the marble counting problem, the answer pattern (0, 1, 2, 3) is backwards-maximal. On the other hand, the answer pattern (0, 2, 3) is not backwards-maximal because "1" could have been forced before "2".

We express this formally by saying that an answer pattern g is backwards-maximally forcible at e if and only if

This definition ensures that all backwards-maximally forcible answer patterns are synchronized with each other from the end of e.

Ge represents the set of all backwards-maximally forcible answer patterns at e. For example, in the marble counting problem, if e were the empty input sequence, then Ge would be:


Some problems do not admit backwards-maximally forcible answer patterns to be constructed. The formal demonstration we develop here will be restricted to problems that do admit this. Specifically:

Restriction 2: Backwards-maximal sequences must exist
The following results are restricted to problems such that if answer pattern g is forcible at e,
then there exists an answer pattern g′ where


We will next formally define anomalies. Recall that we established that anomalies occur when a previously simplest answer becomes incompatible with the available evidence.

Given two answer patterns g and g′, we say that g * g′ is the concatenation of g to the beginning of g′. For example

Similarly, we say that g * Ge represents the set of all patterns g * g′ such that g′ is in Ge.

A finite non-empty input sequence e compatible with W presents an anomaly if and only if
there exists a finite non-empty answer pattern a * g such that

a * g represents the answers that can no longer be forced. By forcing answer pattern a * g to be non-empty, we ensure that anomalies only occur when all the patterns in Ge are truncated.

For example, suppose the first marble appears at stage e in the marble counting problem. In this case, a is the answer "0", and g is the empty pattern. This input counts as an anomaly according to our definition, as the following table shows.

a * g * Ge = a * g * Ge Ge-
(0) * ( )
(1)
(1, 2)
(1, 2, 3)
...
= (0)
(0, 1)
(0, 1, 2)
(0, 1, 2, 3)
...
( )
(0)
(0, 1)
(0, 1, 2)
(0, 1, 2, 3)
...
Notice that all the patterns in Ge- lose their initial "0" when we pass to Ge. Notice further how no answer pattern in Ge begins with "0". Both conditions have been met, so the marble appearance is an anomaly.

Contrast this with another example, this time from the count-black-versus-count-white problem. Remember that at least one marble will be produced, and the marbles will be either all black or all white. Suppose the first marble appears at e and it is black. In this case, a is the answer "1W" and g is the empty pattern. This input does not count as an anomaly, as the following table shows.

a * g * Ge = a * g * Ge Ge-
(1W) * ( )
(1B)
(1B, 2B)
(1B, 2B, 3B)
...
= (1W)
(1W, 1B)
(1W, 1B, 2B)
(1W, 1B, 2B, 3B)
...
NOT ⊆ ( )
(1B)
(1W)
(1B, 2B)
(1W, 2W)
(1B, 2B, 3B)
(1W, 2W, 3W)
...
In this problem, "1B" and "1W" are equally simple at e. As this table illustrates, when one answer is forced instead of the other, it doesn't result in the truncation of all the patterns when passing from Ge- to Ge. Therefore, the first marble is not an anomaly, like we observed in previous section.


For each world w in W, we say that Complexity(w, e) represents the number of anomalies w presents after e. This is the world w's conditional anomaly complexity.

Similarly, for each potential answer a, we say that Complexity(a, e) represents the least Complexity(w, e) such that w is an element of the intersection of W and a. This is the answer's conditional anomaly complexity.

We call it conditional because we are conditioning on the input sequence e. For unconditional anomaly complexity, we find Complexity(w, e) or Complexity(a, e) when e is empty.


We say that answer a is simplest at e if and only if

We will later demonstrate that this occurs precisely when Complexity(a, e) = 0.

With this definition, we can define the three properties of efficient scientific strategies.
At each finite input sequence e, a strategy satisfies:


TODO: Section on symmetrical solvability


Recall that we wanted to assess efficiency over anomaly complexity classes in order to account for repeated answers. Each world in an anomaly complexity class should present the name number of anomalies after finite input sequence e.

We say that ComplexityClass(n, e) represents the set of each world w in We such that Complexity(w, e) = n.

We refer to any ComplexityClass(n, e) as the nth anomaly complexity class at e.


Now we're ready to define worst-case retraction efficiency for strategies over anomaly complexity classes. Remember that anomaly complexity classes group together answers that share a common anomaly history.

We're going to use s to represent a solution to a given problem (W, A).

We say that the worst-case retraction bound for ComplexityClass(n, e) is the least upper bound of retractions incurred by solution s over worlds in ComplexityClass(n, e). We say that Retractions(s, n, e) refers to the worst-case retraction bound for solution s over worlds in ComplexityClass(n, e).

Now we can define efficiency as follows.

  1. Solution s is efficient at e in worst-case retractions if and only if
    for each solution s′ that agrees with s along e-, and for each n, Retractions(s, n, e)Retractions(s′, n, e).
  2. Solution s is strongly beaten at e in worst-case retractions if and only if
    there exists a solution s′ that agrees with s along e-
    such that for each n where ComplexityClass(n, e) contains a non-empty answer pattern, Retractions(s, n, e) > Retractions(s′, n, e).
  3. Solution s is weakly beaten at e in worst-case retractions if and only if
    there exists a solution s′ that agrees with s along e-
    such that for each n, Retractions(s′, n, e)Retractions(s, n, e),
    and there exists an n such that Retractions(s, n, e) > Retractions(s′, n, e).


TODO: Section on nested problems


6. General Demonstration of Ockham's Razor

Proposition 4
If strategy s violates Ockham's Razor at e,
then s is strongly beaten in retractions at e.

Proof of Proposition 4
Let s be a strategy that violates Ockham's Razor.
Then s(e) equals some answer a that is not a simplest answer at e.
Let strategy s′ agree with s along e, and then follow Ockham's Razor after e.
So for each e′ properly extending e, s′ responds with the simplest answer compatible with e′ or "?" if none exists.
By Restriction 3, (W, A) is symmetrically solvable.
So s′ solves (W, A), because in each world, both s′ and the assumed symmetrical solution converge to the same answer.
Let r represent the retractions common to both s and s′ along e-.
Suppose ComplexityClass(n, e) contains a non-empty answer pattern.
Then by Lemma 3, there exists an answer pattern a′ * g with a length of at least n + 1 that is in Ge.
a is not a simplest answer, so by Lemma 5, a′ ≠ a.
By Lemma 4, there exists a world w in the intersection of a′ and Ge where a′ * g remains forcible after e.
Since s solves the problem, s retracts a after e in w, say at e′.
Let j be the length of e′.
a′ * g is still forcible, so by Lemma 9, there exists a world w′ in ComplexityClass(n, e) where s can be forced to repeat each successive answer in a′ * g an arbitrary number of times.
a′ * g remains forcible along e′ and a′ * g is in Ge, so a′ * g is also in Ge, which means that no anomaly occurs in e′ after e.
Therefore, w′ is in ComplexityClass(n, e).
So, Retractions(s, n, e) is at least r * j * o[n].
Let i be the length of e.
By Lemma 8, s′ after e only at anomalies, so Retractions(s′, n, e) is at most r * i * o[n] where i < n.
r * j * o[n] < r * i * o[n] and ComplexityClass(n, e) is an arbitrary anomaly complexity class containing a non-empty answer pattern.
Therefore, s is strongly beaten by s′ at e in worst-case retractions.