Link to other articles
Explorations of mathematical logic: proof as computer program, mathematical system as graph, contradiction and contrapositive
In this article I jot down some brief ideas (which will be developed upon and further described later) about representing mathematical proof as computer programs on graphs, and I also elucidate why contrapositive proof is a valid technique. The article was originally only intended to be about contrapositive proof, but these writings are stream of consciousness, and as soon as I started describing the geenral idea of mathematical proof for the sake of introduction, I ended up writing the bulk of the article on proofs as programs and mathematical systems as graphs. As I've alluded, I haven't made the concept rigorous, and frankly I haven't done the best job at describing it here, but alas. Still, goes to show the beauty of unrestricted thought; you end up thinking about cool concepts by accident.
This might seem like a random article when compared to the "philosophical" thread followed by the four preceding articles, but I have no plans on anchoring these writings to any specific niche. These articles are just meant to be a random collection of stream of consciousness flurries that set some of my thoughts in stone, so that I can easily refer back to them. As such there's not going to be a single strong prevailing "subject" in this blog. There will definitely be prevailing themes in the blog, but I plan on looping in as many subjects as interest me.
That said, I'm definitely going to publish a lot of articles on mathematics. I don't know too much about mathematics, and haven't gone through more advanced topics yet. I'm currently reading "Understanding Analysis" by Stephen Abbott, which is an absolutely incredible book. But other than that first foray into real analysis, I haven't studied much else at all, except some extremely basic linear algebra. I will pursue this study of mathematics, because I really love the subject so far, though I am still early on that journey. So these thoughts will end up re-inventing the wheel a lot, and I'm completely fine with that; I believe that's how mathematics should be learned, and I believe that for almost everything else, re-inventing the wheel is the optimal way to experience those fields. I might write another article on that sometime. Anyway, here's the article.
Mathematical proof (with tangent about logic as graph and functions)
Any rational deduction ideally operates on a trusted, axiomatic system of logic. This base system of logic consists of specific objects and operations, and is always consistent, meaning that the same logical statement cannot be both true and false at the same time.
Another "good to have" property of these logical systems is that of thoroughness. As I've mentioned, logical systems entail certain types of objects, such as operations, numbers, and other definitions such as "functions" or "vector spaces". These objects should be as linked or connected as possible. I haven't made that idea very rigorous yet (note to self: explore this concept), but if two different objects are within the scope of our logical system, there should be an unambiguous linkage between them in order to avoid (as much as possible) unproveable statements. As an abstract example, imagine we have "object A" and "object B" in our logical operation. Let's say each object belongs to a specific object type (for example, the object "2" is a number). Further, object types have hierarchies of specificity; and this can also be viewed in terms of subsets (for example, 2 is a number, but also has even parity, is an integer, is positive, etc.; all of these are object types, an equivalent term would be attributes). So we have object A belongs to object type A and object B belongs to object type B. It is desireable on the abstract level that their corresponding object types have some link, otherwise we cannot relate those objects in any way, meaning we cannot make any strong rational deductions about them together.
My usage of "link" might be a bit confusing, so I'll elucidate it by representing logical systems as a graph (in the sense of graph theory). Let's say we start out with some base objects of a logical system. For example, integers and two operations: addition and subtraction. This can be further abstracted: instead of needing the concept of "integer" we could make do with just the numbers 1, 0, and -1, and 2 can be represented as 1 + 1, 3 as 1 + 1 + 1, etc. Addition is defined as throwing in 1, subtraction as throwing in -1, and 0 is considered "void" and 1 - 1 is defined as 0 (it cancels out). We should make these axioms clear with the following representations:
- x + 0 = x for any number x
- We will add the concept of "number" for convenience of representing our system, thus 1 and -1 now also belong to the object type of "number"
- 1 - 1 = 0
We can summarize all of this in a graph. We have the object types of "1", "0", "-1" (all under the object type "base number"), "addition", and "subtraction" (under object type "operation"), these can be represented as nodes, with a link between parent (number, operation) and child. This is a specific type of link which can be described as an "is a" relationship, in other words, belonging into some superset defined by the parent object type. And there is a link present that ties all of these together: and that is encapsulated by combining those definitions of addition and subtraction above with those two axiomatic statements. We can even define a separate object type for "statement", with those two at its base. Those definitions of addition and subtraction are akin to a function; they take in a number and either add or subtract 1, in other words they change the state to another valid number in the system (another integer). The definition can be represented as an abstract blueprint, just like functions abstractly take x and apply some operation to this abstract entity. Those operation objects thus represent a different kind of link in the graph, a "state change" link, though on an abstract level (if it isn't clear yet, links are just edges in the graph). Directed edges tend well to this; edges flow from the "substrate object" (number) to the operation. A directed link also travels from those operations to another object, in this case back to the numbers. This is equivalent to input flowing to algorithm flowing to output. We can consider an object type of "state", which in this case would be the integers (all integers can be obtained by repeated addition or subtraction of 1 from another integer). It would be a hassle to think about all the (infinite) states that could be represented in this system, since there are infinitely many numbers that can be obtained, we can deal with this, without any loss of information or specificity, through the aforementioned abstraction of algorithm; every single state is encapsulated by taking the view of state change from some initial input state (like how the derivative reduces dimensionality in physics, an infinity of positions can sometimes be abstracted into one constant velocity, and any state can be queried by providing an initial state and amount of time that has passed). Thus we reduce an infinitely large graph into an extremely small finite one by using an abstract algorithm/function.
The usage of "algorithm" implies a connection to computer science, and indeed, the connection has salt. We can then observe the logical system in motion through this graph we have created in the same way that we would run code. We can provide some input number, let's say 2 (which would be represented as 1 + 1 in this system), and request to perform 2 additions on it; without storing all of the possible numbers of the system, we would simply start at that specific state of 2 (represented as 1 + 1), follow the link to "addition", enter its algorithm (kind of like a subroutine in computer science), which brings in 1, and we end up with 1 + 1 + 1 (imagining we have some other function in this graph that translates this chain into decimal notation as output to the user, we end up with 3).
Applied to mathematical proof: consider the statement "adding 1 to 2 leads to the same number as subtracting 2 from 5". Let's again take the view of a computer program running on the graph. Translating our statement to the language of logic, we are trying to prove that "add(2, 1) = subtract(5, 2)", where the first parameters are the initial number and the second parameter are the number of times we repeat the operation (the paralllels to programming should be clear). In fact that function formulation can be considered a "higher level function" over the base functions of addition or subtraction. In pseudocode:
- function repeated_add(int start, int num_times):
- int current = start
- for int i from 0 to num_times:
- current += 1
- return current
We could take a first stab and prove the statement this way: from the language of logic, the "equal operator" would get compiled to two concurrent threads: the first evaluating "add(2, 1)" and the other evaluating "subtract(5, 2)". Those would get evaluated in the way we described above, starting from 2 in the graph and flowing through to the addition algorithm and getting an output, and doing the same for the subtraction part. Our compiler performs a "join" (as in parallel computing) which waits for both passes through the graph to terminate and collects the results, and checks to see if they are the same entity. If so, our statement is proven, if not, it is disproven. Note that this is a very basic and brute force approach to proof, but it has been codified. Here we got a result of "true" because the equality operator was satisfied, but there are ways to get an answer of "false" as well. For example, if both executions had different values, the equality operator would return false. Another possibility is running into a contradiction; we can keep track of the current development of the proof, in other words the things we consider true thus far in our investigation of the statement. If at any further point in the execution of the graph, we flow into an object that has properties contrary to the properties suggested by the current progress, then we can "throw an error" in the program and return false.
As seen in the previous paragraph, for more simple systems at least, we could probably come up with a "compiler" that can verify any logical statement by passing it through the graph. The specifics would depend on the "statement operators", for example quantifiers like "for all" or "there exists", or checking for equality. I'll leave details for another article.
Mathematical systems are generally "traced out" throughout time; mathematicians prove more and more statements, even if nothing is innovated per se, more statements get proven, and when they are proven they get added to the repository of "true" statements that can be used in further investigations of the system. This can manifest itself as an extension or evolution of the graph of the mathematical system. More links are added when we prove a new statement; a statement generally implies an object having some property, and once the statement is proven, we can create those links between the object and its "property" object. Once this happens, other statements become easier to prove or disprove. Viewing a proof as the execution of a compiled program, we can now make direct use of those links when we are provided with the objects investigated in the past statements, and those proved properties can come in handy in further proofs.
It is important to note that even if we can use one statement to prove another, we never have to do it that way. We could always do the entire proof brute force, without exploiting other proved statements. We could technically just trace the graph in every way we can until we either arrive at the desired object of the statement (in which case it is proven) or we arrive at some contradiction (disproven), and if we end up proving it, then it is possible that those other proved statements became "implicit" to the execution of that program (as in, at some point, we followed the same path as the program that would have proved those other statements). This is a brute force method, and is pretty much "unconscious proof"; we just explore everything in the system starting from the objects in the proof and hope we land at the desired destination, or something that contradicts the developed chain of assumptions.
This brute force method theoretically works, but is extremely inefficient, both for verifying single statement, and especially, at the global level of proving many statements. For harder and more complex statements, the number of paths that would need to be taken can increase exponentially. But if a proof of that statement can make use of a corollary statement that has already been proven, we can drastically reduce the number of paths to explore, because previous proved statements provide a direct conduit to a conclusion. This becomes more important the more statements we can prove and compile. Most mathematical proofs indeed make use of several previously proved corollaries, and each of those corollary made use of another corollary in its own proof; and since mathematicians can immediately make use of the conclusions of the corollaries, and apply them to the objects of their analysis, mathematicians save extreme amounts of mental effort from standing on the shoulders of giants. This is analogous in the graph. On the global scale, imagine we would like to make a bunch of statements, each of which makes use of a more basic corollary that was hard to prove initially (let's say the Pythagorean theorem), if we avoid using previous statements, we would have to independently stumble upon the Pythagorean theorem each time we run our program, and then get to the conclusion; this is extremely complex and inefficient on the large scale.
We can thus consider the integration of proved statements into our graph as a growing field of "program optimizations".
Back to the desired concept of thoroughness in our logical systems, in our "system graph", we desire that the entire graph be "connected"; connected meaning that there exists some path of links tying every pair of object types together. If there is a point of disconnection between two object types, then a statement involving those object types cannot be proven, and this is problematic when we know that the statement is a sensible one. In a logical system, we would like that every statement has the result of "true" or "false"; ambiguity is undesireable. Though, of course, statements about axioms themselves (the initial statements, with no links flowing into them from other statements) would be impossible to prove, by construction.
Creativity and experience in proof, AI
We can associate proofs with the running of a path through a graph, but even if we use previously proved statements to bolster the proof of any downstream statement, the method ends up being a less costly brute force method. Mathematicians don't tend to blindly trace out all the objects through their current repository of knowledge; there tend to be mental heuristics at play, knowledge and experience that give shortcuts as to which paths to explore first. Likewise, creativity bolsters the process of finding a meaningful path, as opposed to brute force graph search algorithms.
Creativity and experience, and their effect on path-finding, are difficult to replicate as rules. If we are to view statement verification as the running of a program, the steps of the program are determined by a compiler; the one implicit to our discussion involves running graph search algorithms starting from the objects in the statement until we end up joining them or contradicting the past stack of exploration. However, if we wanted our program to "behave like a human", to exploit intelligence and practice in order to explore the statement, we would have to modify our compiler a bit, we can do so by adding specific heuristics, and many of them would be useful for optimization; but it would be tedious and a stretch to try and simulate the thought process of a good mathematician with such heuristics.
Perhaps this is where AI could come in. It is definitely possible that we use proved statements as "training data" for an AI model, perhaps using a graph neural network, or another formulation (I haven't thought about this so I'll avoid specifications); and we can see the verifications of basic statements using the "base" compiler which uses brute-force graph search, and train the AI to optimize the time complexity of those verifications. Off the top of my head I can think of a few ways to do this (including reinforcement learning) but I'll leave the details for later.
Proofs by contradiction
For any given statement, we can either prove or disprove it. A "proof" occurs if all the specific objects in the statement can be linked as we pass through the graph. The statement can be discarded as false if we end up with some contradictions in the realized objects. For example, if we end up attributing a specific number as odd, and then later on attributing that specific number as even, then we can discard the statement from which these attributions arose. Note that it is important that the system on which we run the program must itself be as consistent as possible; that there are no inherent contradictions in the system. If the system at the level of axioms is consistent, and iteratively integrate statements proved from the axioms, then the resulting "developed system" should be consistent as well.
If we can disprove a statement by running into a contradiction, then, equivalently, we have proved another statement. Using the fundamental axioms of logic, which make common sense, if a proposition P is true, then the statement "P is false" is false; the suffix of "is false" can be encapsulated into a "logical negation operator", and "negation(P)" is equivalent to the statement "P is false". Symmetrically, if the statement P is false, then obviously negation(P) will be true. Then if a statement S is revealed to be false due to the uncovering of a contradiction, then we can say that negation(S) is true. We have thus proved negation(S).
This can be adapted to a general strategy for proving some proposition. If we first assume the proposition is false, and then run the program on negation(P) and find some contradiction, then we have proved negation(negation(P)) is true, and negation(negation(P)) is just P itself.
There are times when we have a conditional statement of the form "if A, then B". We can then pass a statement of the form "if A, then negation(B)" and try to find some contradiction. Here, the "if" is immune because it acts as an "initial state" or axiom from which we start the program. A is essentially held constant. So we start from A, and try to run the program from negation(B). If it turns out that we can find a link, then it turns out that negation(B) is actually the correct deduction if we assume A. However, if we find that negation(B) is contradictory to something, then we can say that from A, B is the rational deduction, thus we have proved the desired statement of "if A, then B".
Contrapositive proof
A contrapositive proof is a technique where we desire to prove "if A, then B", and use the angle of "if not B, then not A" to prove it symmetrically. This is useful when the second statement is easier to work with than the first.
In order to disprove the statement of "if A, then B"; one would need to show that under the frame of A, B is not true. Thus, it would amount to proving "A and not B". Now, thinking about disproving that statement, we would have to show "(not A) or B" arises. One way to do this is to assume and start from not(B), which does not immediately disprove the statement. But if we can show that not(B) and not(A) can be linked, in other words if we can prove "if not(B) then not(A)", then we have also proven "(not A) or B" because we end up showing not(A), which is a sufficient condition. In other words; "if not(B) then not(A)" shows that "(not A) or B", which is equivalent to "not(A and not B)", which is equivalent to "not(not(if A, then B))"; in other words we can use the contrapositive technique to show "if A, then B".