Update 02/03/22 - See the videos below for the latest developments involving the application of the Fractal Brain Theory towards the goal of creating True Artificial Intelligence
This is an excerpt from the beginning of the Artificial Intelligence chapter of the Fractal Brain Theory book (Author Wai H. Tsang), pages 430 to 457. The rest of the chapter pages 458 to 512 (It's a long chapter) are available in the print book, which can be ordered from Amazon or BarnesNoble.
The Fractal Brain Theory Book
Chapter 11
Artificial Intelligence
The Fractal Brain Theory, apart from being a unifying scientific theory of psychology, neuroscience and genomics, is also directly applicable to the biggest questions and hardest puzzles in the quest to create true artificial intelligence. Naturally a comprehensive scientific understanding of brain and mind should be relevant towards the goal of creating artificial ones. The Fractal Brain Theory has the very interesting property of incorporating many of the most recurring and state of the art ideas in mainstream artificial intelligence research, while at the same time giving us a clear roadmap for the next steps forward.
There exists a close relationship between the Fractal Brain Theory and the creation of Artificial Intelligence together with the instigation of the much anticipated Technological Singularity. We’ll explore this relationship in some detail by showing that there exists some deep connections between our fractal way of looking at brain and mind, and many existing ideas in AI and computer science.
In the same way that the theory is able to integrate and unify a vast amount of data, facts and findings from the brain and mind sciences, so too with the sciences of the computational and informational. The brain theory sits at the nexus of many of the existing approaches to creating artificial intelligence and also some of the best and most effective ideas in computer science.
This ability to integrate a lot of diverse ideas from artificial intelligence into a coherent unified picture, from the very outset solves a major problem which has been cited as one of the main reasons why there hasn’t been much theoretical progress in the quest to create AI in the past few decades. Patrick Winston of MIT and early prominent researcher, recently called this the ‘mechanistic balkanization of AI’, the state of affairs where the field has divided itself into many sub-disciplines which study specific mechanisms or very circumscribed approaches to AI. This coupled with an inability to take advantage of a wider cross fertilization of ideas or seeing the necessity of working together in trying to answer the larger problems.
Moreover the ability of the Fractal Brain Theory to seamlessly bridge the divide between on the one hand, the engineering fields of artificial intelligence and computer science, and on the other neuroscience and psychology, solves a wider and more significant ‘balkanization’. This is the inability and sometimes reluctance on the part of AI researchers to embrace and take advantage of facts and findings from the brain and mind sciences. The powerfully integrated and overarching perspective of the Fractal Brain Theory, provides for us a major advantage over the existing demarcated, compartmentalized and overly narrow approaches to creating artificial intelligence.
The fractal artificial intelligence that derives from the Fractal Brain Theory will at first seem novel and ground-breaking, but at the same time there will be a lot of familiarity inherent in its workings. In a sense practically all of artificial intelligence and computer science, in one way or another, directly or indirectly is convergent upon the workings of brain and mind. The Fractal Brain Theory and the new kind of artificial intelligence associated with it is the fullest expression of this convergence.
This self similar, symmetrical and recursive way of looking at the brain enables a massive unification of brain and mind, and furthermore also an even wider unification of neuroscience and psychology with artificial intelligence, computer science and information theory, together with genomics, ontogenesis, the wider theory of evolution and discrete maths. All of which we discussed in earlier chapters.
From now on we’ll be using the expressions ‘new kind of artificial intelligence’ and ‘fractal artificial intelligence’ quite interchangeably. Like Stephen Wolfram’s ‘New Kind of Science’, which seeks to reframe the laws of physics and our understanding of the Universe in terms of simple computational principles, especially by modelling physical phenomena using discrete cellular automata, so too in an analogous way we seek to rationalize existing artificial intelligence techniques and reframe existing approaches using a more succinct and unifying description. When we say ‘fractal artificial intelligence’, we mean the creation of AI that derives from a view of the brain and mind, which sees its functioning and structure as perfectly symmetrical, self similar and recursive.
The formalising of the Fractal Brain theory in the language of spatialized binary trees and binary combinatorial spaces has a very useful and interesting property. This relates to the fortunate state of affairs that this binary formalism is also the same language that underlies computer science, artificial intelligence and information theory.
At first this might seem like an amazing coincidence or perhaps as some sort of deliberate contrivance. But it is also partly a natural consequence of the fact that the same constraints and issues faced by computer scientists in their design of computing hardware and intelligent systems, are also those that biological brains have had to deal with. The same advantages of using binary codes in computers, i.e. signal fidelity, persistence of memory, processing accuracy, error tolerance and the handling of ‘noise’, are also advantages that may likewise be exploited by nature, employing the same means. That is, by going digital and binary. Hence our idea of a binary combinatorial coding, digitized brain grounded in actual neurophysiology and anatomy, fits naturally with the convergence of these issues of information processing, that both computers and biological brains have to work with and around.
The use of the language of spatialized binary trees and binary combinatorial spaces by the Fractal Brain Theory, apart from the correspondence with a mass of empirical data, facts and findings from neuroscience, neuropsychology as well as genomics; also allows for the complete synthesis of biological brains and minds with the fields of computer science and artificial intelligence. In doing so, it illuminates, unifies and solves many of the biggest issues in these technological endeavours. It answers a lot of the hard questions in AI and even gives us insight into the nature of the so called Technological Singularity, which concerns some of the further implications of the advent of ‘true’, ‘strong’ or ‘general’ artificial intelligence.
In a sense, this entire book has been about artificial intelligence, or has been very relevant to AI all the way through. If things have been implicitly about AI, then here we’ll make things a little more explicit. In this chapter we’ll be dealing with specific issues and problems which are central to the field. We’ll start by examining some commonly held assumptions which really impede the progress of AI, in order to clear the way for a better and more effective way forward, which the rest of the chapter will elaborate upon.
Some Common Fundamental Fallacies in AI Research
There exists in the field of AI a set of interrelated fallacies which are commonly held by a lot of workers in the field including some of its main thought leaders. These fallacies are sometimes entrenched and taken as foundational assumptions. They relate to the very question of what exactly is involved in the quest to create true AI, and to the question of what is the nature of intelligence or the thing we are trying to make artificial versions of.
All of these fallacies, directly or indirectly, relate to the idea that there should exist a critical generative algorithm from which all the algorithms of the brain and mind are generated, and which is really the central idea behind the Fractal Brain Theory. After all if, as we’ve shown, it is the case that the processes of ontogenesis and phylogenesis (i.e. evolution) are happening in relation to our thoughts and behaviours and within our brains, then this suggests that the process of brain and mind is really one big generative process. After all ontogenesis and the process of evolution are the most prolific generative processes we know of. The thoughts we think, the behaviours we express, the changes we make to our thoughts and behaviours, within the context of the Fractal Brain Theory, are all part and parcel of this generative process.
In chapter 8 we also explored the idea of recursive self modification and showed how it is that this idea is central to the process of ontogenesis and evolution. We later extrapolated this concept to the workings of the brain and mind. So the Fractal Brain Theory really consists of, at its heart, a single recursively self modifying function, which is able to generate all the diverse aspects of body, brain and mind. How then does this idea of a critical recursively self modifying process or algorithm, which is what the Fractal Brain Theory is all about, relate to the biggest and most fundamental fallacies in AI? It does so by directly challenging them.
So what these fallacies? Firstly there exists the fallacy, believed and promulgated by some leading AI thought leaders, which is that neuroscience can only have a minor role to play, if any role at all, in the creation of true artificial intelligence.
The second fallacy is that there does not exist and cannot exist a critical single algorithm or process behind intelligence. This was discussed to an extent in chapter 1, where we listed some of the prominent names in both the supporting and opposing camps, i.e. those who believe in the existence of a critical algorithm waiting to be discovered and those who don’t.
,p>The third common fallacy involves the idea that the quest towards the creation of AI and superhuman intelligence involves a search into ‘mind design space’.
All three fallacies are actually interrelated; we’ll go through each of them, but in the reverse order that I’ve just mentioned them, and then at the end discuss their interdependencies. So next we’ll expand on the third fallacy.
Mind Design Space?
This idea of ‘mind design space’ seems to be promoted by an influential AI think tank called the Machine Intelligence Research Institute or MIRI, and which was formerly called the Singularity Institute before a name change a few years back. Demis Hassabis of Deepmind also seems to have at least at some point accepted the validity of this notion judging from a presentation he gave at the 2010 Singularity Summit. The basic idea is that there is a set of all possible cognitive algorithms and therefore research and development in AI involves finding possible permutations of cognitive algorithm existing in this ‘mind design space’, which will finally give us a fully functioning general purpose AI.
Eliezer Yudkowsky, the leading theorist for MIRI, puts it in this way, ‘Think of an enormous space of possibilities, a giant multidimensional sphere. This is Mind Design Space, the set of possible cognitive algorithms. Imagine that somewhere near the bottom of that sphere is a little tiny dot representing all the humans who ever lived - it’s a tiny dot because all humans have basically the same brain design.’
This sort of viewpoint is also expressed by computer scientist Roman V. Yamolskiy who writes, ‘As our understanding of human brain improves, thanks to numerous projects aimed at simulating or reverse engineering a human brain, we will no doubt realize that human intelligence is just a single point in the vast universe of potential intelligent agents.’
The basic flaw behind this way of looking at things derives from the fact that it is based on several complete unknowns or missing pieces of the AI puzzle which include a missing theory of what exactly is intelligence, a lack of a basic understanding of the brain and also the potential existence of a missing generative process and in particular a recursively self modifying one, which could in theory generate all the possibilities which exist within this ‘mind design space’ and explore all configurations of possible intelligences. It is problematic to form doubtless certainties on the foundation of things of which very little is generally known.
If the missing generative process could be discovered then this would also expose our second fallacy, i.e. that there does not or cannot exist a critical algorithm behind intelligence. If we had an algorithm which was able to generate all ‘possible cognitive algorithms’, then surely this would be the critical algorithm and Holy Grail of AI.
Also even though a vast space of possible intelligent minds, that are not human-like, is conjectured, not a single one has been demonstrated. If there were so many possible designs for intelligent minds ‘out there’, then one would think that AI researchers would be discovering new ones all the time, but alas not a single true intelligence outside of human intelligence has yet been created.
Even if a true artificial intelligence in ‘mind space’ was one day discovered, which itself is a big if, then without any understanding of what is the nature of human intelligence, there isn’t any sensible way of determining whether this mind design was actually fundamentally different from that of the human mind. Also, it doesn’t make sense to say that ‘all humans have basically the same brain design’ without being able to say anything about what that design is or the nature of the intelligence that arises from it.
A large group of computers from the same production line could be essentially identical, but depending on later configuration details and software installation, they could end up doing radically different tasks and in a diverse manner of completely different ways. In the same way, while it is true that there exists commonality between the brains of different humans, it is at the same time an enormously configurable organ. And as we’ve earlier discussed, also a recursively self modifying one; allowing it to produce an amazing flexibility in ‘mind designs’. We get to meet them everyday in various shapes, forms and levels of competence. And if there is a recursively self modifying algorithm behind human intelligence, then given sufficient resources and augmentation, is there anything that would be inconceivable for it? Is there even a limit to what the unaugmented human genius may conceive? It’s an open question.
The fallacy of ‘mind design space’ also manifests itself in a slightly different ways. The influential theorist and AI commentator Ben Goertzel believes that the road towards the creation true AI involves putting together a load of human generated partial solutions to the AI problem, and getting them to somehow work together. He believes that ‘Intelligence depends on the emergence of certain high-level structures’, and that ‘Achieving the emergence of these structures within a system formed by integrating a number of different AI algorithms and structures is tricky.’ However he believes that this approach is correct because, ‘We have not discovered any one algorithm or approach capable of yielding the emergence of these structures.’ So therefore in his view the way to create true AI is to put together separate human generated AI algorithms. But apparently he is faced with an “integration bottleneck” and difficulties in getting all of these algorithms to work together. But what we are suggesting in contrast, is that the very nature of human intelligence is exactly a single brain algorithm that is able to generate and give rise to the ‘emergence of these structures’ of mind. We go further to suggest that this same recursively self modifying algorithm is behind the processes of ontogenesis and phylogenesis which we talked a lot about in earlier chapters 7 and 8.
Therefore the most effective path to true AI is actually the task of trying to discover that ‘one algorithm or approach capable of yielding the emergence of these structures’, rather than trying to tie together human generated ones. The essence of intelligence is in the process of creating or generating, and the generated product itself is not intelligence. So putting together human generated algorithms, none of which are intelligent, will not produce artificial intelligence, because the intelligence was in the generation of the algorithms in the first place. Real artificial intelligence involves exactly the missing ‘one algorithm’ that is able to generate all the other ones. A human who has lost the ability to generate new algorithms, thoughts and behavioural patterns in response to changing circumstances and novel contexts is no longer considered intelligent by other humans.
This way of looking at things relates to an observation that some AI designers have made over the years, which is that once an artificial intelligence has solved some task that used to require humans to do it, then it is no longer considered as an intelligent task by people contemplating the task and the AI doing it. This may have a lot to do with the intuition a lot of people have that there is a creative, flexible and adaptable element to real intelligence, which allows a human being to generate novel solutions and responses to unforeseen circumstances. Something which a static AI lacking a generative ability would completely lack and so be seen as unintelligent.
Therefore the creation of narrow and inflexible AI is really the instantiation of one of the points in ‘mind design space’, as Yudkowsky would describe it, but the same thinking doesn’t apply to true artificial intelligence or human intelligence which is the meta-mind that is able to explore the mind design space and try out different cognitive algorithms. It is therefore incorrect to think of human intelligence as a mere point in ‘mind design space’ because true intelligence has the ability to reinvent itself, recursively self modify and potentially transcend its limitations.
It is relevant to note that Yudkowsky also believes, ‘The entire [mind design space] floats in a still vaster space, the space of optimization processes. Natural selection creates complex functional machinery without mindfulness; evolution lies inside the space of optimization processes but outside the circle of minds.”
However as we have been discussing in chapter 8, if the process of evolution is fundamentally the same process that is occurring in the brain, in the form of learning, creativity and cognitive adaptation, together with the idea that ontogenesis is, given the right way of interpreting things, exactly as the process of thought and behaviour, and throw in the idea of genome as brain and gene as neuron; then we see clearly that intelligence is exactly that ‘optimization process’ which, within given physical constraints, is able to explore ‘mind design space’. Therefore, obviously it can’t be considered as ‘outside the circle of minds’, rather it is intrinsic to the process of mind and the nature of intelligence.
There is No Single Critical Algorithm?
As we saw in earlier chapters the process of evolution as it occurs within the substrate of DNA and genomics is faced with certain constraints. These are transcended by the implementing of the process of evolution in the context of brains and minds. There is no reason why in turn any human constraints will not also be transcended. But the point is that even though it will be operating on different physical substrates, this evolutionary progression still involves fundamentally the same algorithm at work, which can be seen as the ‘critical algorithm’ behind intelligence, or the opposite to what our second fallacy supposes. There does exist a critical algorithm and this is the algorithm that is able to create all the various cognitive algorithms within ‘mind design space’.
As we discussed in chapters 8 and 9, this is exactly what is behind the Fractal Brain Theory i.e. a recursively self modifying algorithm, which is able to generate explorations into combinatorial space, and which can be seen as the space of all possible algorithms and functions. In its most minimal form, we also suggested that this recursive self modifying function was equivalent to the idea of the Strong Minimalist Thesis of Universal Grammar and also the idea of the most minimal dynamic formal axiomatic system FAS, which is able to generate all the static FASs.
Therefore intelligence is not a particular point in a ‘mind design space’ but rather it is the process by which the space of ‘possible cognitive algorithms’ is explored together with mechanisms by which the cognitive algorithms are evaluated, ranked and selected. Of course a human being will be constrained by physical limitations, so the speed, scale and extent to which the human mind is able to explore this mind design space is limited by the physical substrate, but also more commonly through a lack of inclination. But in principle, perhaps given sufficient ‘transhuman’ augmentation or even without it, then the same critical recursively self modifying generative algorithm behind our intelligence, will also be able to potentially generate any configuration of any possible cognitive algorithm.
If the idea which we proposed at the end of chapter 8 is true, i.e. that the most minimal seed AI from which the recursively self modifying generative algorithm is able to auto-bootstrap itself into being, is also the most minimal dynamic formal axiomatic system possible, then the creation of artificial intelligence will necessarily be convergent with the functioning of human intelligence, because the very essential core of what is intelligence will be rooted in the intrinsic properties of our most minimal dynamic formal axiomatic system, which is unchangeably rooted in the most fundamental level of mathematics, i.e. discrete maths and formal axiomatic systems. So the answer to the question of what is at the heart of human intelligence and the ultimate solution to the puzzle of creating true, strong and general AI will necessarily be one and the same.
This would expose our second fallacy therefore. There does exist a critical recursively self modifying generating algorithm that lies at the heart of human intelligence. The same algorithm is necessary and essential for the creation of true artificial intelligence.
The Disputed Relationship between AI and Neuroscience
The idea of a critical algorithm and recursively self modifying process which derives from our investigations into brain, mind and genomics, which most of this book has been about, taken together with the proposal that any route to the creation of AI will necessarily converge upon the same algorithm that the brain uses, directly challenges and exposes our first fallacy which is the belief that neuroscience can only possibly play a minor role in the creation of AI.
As an example of this fallacy being expressed, commentator and theorist Ben Goertzel expresses the view, ‘we just don’t know enough about the brain’, and ‘it’s going to be a while before we know enough about the brain to use neuroscience to guide the creation of a general intelligence.’ ‘We don’t know enough about neuroscience to drive AI design – and the neuroscientists I talk to tell me the same thing.’
But the reality is not that we don’t know enough about the brain, the problem is more that we know too much and what’s lacking is a framework in which to sensibly interpret and bring together the colossal amount of facts and findings, relating to the brain and mind, into a single unified conception. It’s really not the case that the discovery of some new dopamine receptor subtype or some hither to unknown nerve fibre tract in the human brain, will suddenly unlock the mystery of intelligence for us.
Microsoft co-founder Paul Allen, who is the creator of a major neuroscience research institute, says that he asks regularly at meetings ‘What’s revolutionary this month’, and that in reply ‘everyone goes hmm, err’. The revolution in neuroscience will not come from some piece meal discovery or novel methodology for gathering even more data about the brain, interesting though they might be; but rather a theoretical breakthrough which allows us to put to use all the data that’s already been gathered towards the end of understanding how the brain works and creating AI. What the field has been waiting for is exactly that ‘broad framework of ideas’, which the field of brain science has been ‘conspicuously lacking’, since its inception and which Francis Crick bemoaned in 1979.
It’s not necessarily productive consulting with neuroscientists on the matter, especially in relation to AI, because due to the heavily compartmentalized nature of neuroscience, each specialist will only be seeing a tiny piece of the brain jigsaw puzzle and also generally without the necessary background in computing and information science, which is needed to interpret what may be relevant or not to the creation of AI.
One of the advantages of the philosophy and working assumption of the Fractal Brain Theory, is that it sees behind the myriad and diverse complexity of the brain and assumes symmetry and self-similarity waiting to be uncovered. Therefore, even with our imperfect knowledge of the brain, it is possible to work out what these recurring patterns are, and also to extrapolate or rather interpolate the overarching symmetries into areas of the brain which are less understood. The parts should work as each other, the components should work as the composites, and everything should work as the whole. This is one of the main ideas behind chapter 9, where we discussed the process of the brain.
One of the things which the Fractal Brain Theory shows is that the necessary knowledge of neuroscience from which to work out these underlying symmetries, has existing for decades. Though of course, the more knowledge and empirical evidence the better, in order to put flesh on our abstract symmetry and self-similarity theoretical framework. It is exactly this ‘broad frame work of ideas’ and unifying conception which the Fractal Brain Theory provides.
In relation to the question of the significance of neuroscience with respect to the creation of true AI, Eliezer Yudkowsky has stated, ‘I don’t expect the first strong AIs to be based on algorithms discovered by way of neuroscience any more than the first airplanes looked like birds’, and that ‘you can make airplanes without knowing how a bird flies. You don’t need to be an expert in bird biology.’
This metaphor actually comes up quite a lot in writings and discussions about the relationship between how the brain works and how future AIs may potentially work, i.e. in a way that does the same thing as the brain, i.e. manifest intelligence, but in a completely different manner. However this presupposes that there doesn’t exist some critical algorithm or process that will necessarily be common to all fully functioning AI and biological brains. So this metaphor relates to our third fallacy of various kinds of intelligence existing in a ‘mind design space’, versus the idea that the very essence of intelligence is the process which is able to search that mind design space in the first place. The critical algorithm will generate instantiations of points or regions in mind design space, but those instantiations are not in themselves intelligent even if they seem to perform tasks requiring intelligence.
If we applied this thinking to the birds versus planes metaphor, then these would be as the instantiations of ‘mind design space’. However there does exist a common algorithm behind the birds and planes, which is the algorithm of evolution. Of course birds are evolved but also as we’ve shown in earlier chapters, so are the ideas which exist in our minds, including the designs for planes, rockets, helicopters and hot air balloons. So the metaphor of human intelligence versus artificial intelligence as birds versus planes, really misses the point in the same way that the idea of ‘mind design space’ does, by concentrating on surface manifestations of a common deeper process. This process is really where the answer to the puzzle of intelligence and the source of all the specifics of various flying entities, biological and non-biological, is to the found.
It should be noted that the attitude towards neuroscience which sees it as being of only minor relevance, isn’t shared by all practitioners of AI. So for instance Yoshua Bengio, one of the ‘godfathers’ of deep learning neural networks says that, ‘I’m one of the few people who think that machine learning people, and especially deep learning people, should pay more attention to neuroscience. Brains work, and we still don’t know why in many ways. Improving that understanding has a great potential to help AI research.’
Neural Networks is the sub-field of AI with the closest association to neuroscience, so if this is the observation of one if its leading lights with a lot of experience in the field, then it is difficult to imagine that other sub-fields of AI will generally be any more receptive to ideas from neuroscience. A book which came out in 2008, was sub-titled as a ‘Collection of essays dedicated to the 50th anniversary of artificial intelligence’, and was written by many of the leaders in the field. It contained an essay which asked, ‘What can AI get from neuroscience?’ So it is perhaps a positive sign that at least some AI practitioners are at least asking the question. It is further encouraging that the answer in this particular instance was that neuroscience could be of use towards the creation of AI.
There are also some major figures working in AI who do fully champion the input of neuroscience and these include Demis Hassabis who has himself made significant original contributions to the field of neuroscience, also Ray Kurzweil and perhaps the biggest advocate of the neuroscience approach to AI is Jeff Hawkins, who promotes AI implementations which he believes are closely modelled on the brain. Of course the Fractal Brain Theory puts the neuroscience approach to AI on a much firmer grounding by providing a unifying language, architecture and common underlying process, for understanding how findings from neuroscience may be translated into software algorithms and hardware designs.
So these are our three commonly believed major fallacies which are impeding the progress towards the creation of true AI.
In diagram 11.1 we have depicted these fallacies as an interconnected triadic structure, because they are actually related one other even though this will not always be obvious. The fallacy of no critical algorithm is connected to the fallacy of AI creation as mind design space because this is central to what is intelligence and the function of the critical algorithm is to generate cognitive algorithms. The nature of the critical algorithm generates all the other algorithms which are points in ‘mind design space’. The points or cognitive algorithms existing in this abstract space may do things that require intelligence to do, but are not themselves intelligent in the true sense of the expression.
The connection between the fallacy that neuroscience only has a minor role to play in the creation of true AI and the other two fallacies is not so obvious. This is largely because up to this point the brain has been this huge unknown. But what the Fractal Brain Theory, along with its extrapolation to the structures and processes of the genome, is able to show is that at the heart of the brain and mind’s functioning is a recursively self modifying function which is able to generate functions or algorithms, i.e. explore the possibilities of mind design space within the given biological constraints.
The entire Fractal Brain Theory distils down to a single critical recursively self modifying algorithm and it works by generating patterns and altering the patterns by which it generate patterns. So it works in a manner not merely analogous to but, as we’ve described in earlier chapters, exactly as the biological processes of ontogenesis and phylogenesis (i.e. evolution). The Fractal Brain Theory works as one big generative process or algorithm. It is the critical algorithm. The emergence of the Fractal Brain Theory of itself and in itself is the negation of the fallacy that neuroscience won’t be a major factor in the creation of AI. The properties of the Fractal Brain Theory which are described in the rest of this book will also negate and make irrelevant the other two fallacies.
We’ve been discussing ideas which impede progress towards the goal of creating AI. These road blocks, which exist as fallacies and red herrings are easily removed by challenging their validity as working assumptions and simply discarding them. However we turn our attention now to some central problems in AI which cannot just be dismissed, but which rather exist as puzzles relating to the essential intrinsic properties of intelligence. They are fundamental issues relating to the creation of AI, all of which, have to be satisfactorily tackled in order to attain the final goal. All of these key issues are inextricably bound up with one another, as we’ll see, to form what I call the Gordian Knot of AI.
One of the central challenges faced by any attempt towards the creation of fully functioning general purpose AI, will necessarily be to unravel this Gordian Knot. It is the gate through which all roads to true AI must pass. So we’ll examine in some detail the nature of this Gordian Knot of AI and its inter-linkages. Once its nature is fully understood then it is easy to see why the problem of creating AI has been so seemingly intractable and fiendishly difficulty. But once we see more clearly into the exact nature of the problem, or rather the totally interconnected sets of problems, then we may start to see how they may be solved and how the Gordian Knot of AI may be finally unravelled. Later on we’ll explain how the Fractal Brain Theory addresses each and everyone of these issues and provides a composite solution to this composite problem.
The Next Step and Gordian Knot of Artificial Intelligence
There exists an interesting situation presently in the world of AI. What the leading lights of the field collectively say are the next necessary puzzles that need to be solved in order to create true artificial intelligence, are actually the same puzzles that experts have been saying are the main impediments to progress for the past 30 to 40 years. We’ll explain what these problems are and then explain why they have been so intractable. We will even explain why the ideas which have been discussed in this book are the key to finally resolving these persistent issues.
The main problems to be solved which are all critical to enabling the creation of true, general, strong, human level and super-human AI, can be put into 4 categories and each category contains a number of tightly related sub-puzzles. Some of the sub-puzzles are sometimes seen as separate but what we will show is that they are really inseparably inter-connected with the other sub-puzzles, and in turn the main puzzle categories are also inextricably intertwined with each other, so that it is impossible to really consider, let along solve each main puzzle category, in isolation. I call this the Gordian Knot of Artificial Intelligence, and it is the main reason why so little progress has been made in solving these problems over the past few decades.
Historically the Gordian Knot is a semi-mythic or allegorical part of the legend of major historical figure Alexander the Great. But its modern usage is as a metaphor for some intractable problem, and in this context that problem is AI. The Gordian Knot of AI is not really amenable to being tackled by traditional reductionist approaches that have proved more successful in other areas of science and technology. This tightly inter-connected puzzle doesn’t yield to the highly specialized, ‘balkanized’ and compartmentalized way in which AI and brain research is done in the academy or corporate research labs.
In order to give an overview of the problem categories, diagram 11.2 depicts what they are together with some of the sub-categories they contain. Each of the circles ( Labelled A, B, C and D ), at the corners of the square arrangement shown in the diagram, corresponds to one of our four problem categories and the bi-directional arrows which connect up all of these circles with each other, represent the mutual dependency that each of the problem categories has with all the others.
Next we’ll now go through the main problem categories in turn and later on we’ll systematically examine the inter-dependencies.
The first problem category labelled A in diagram 11.2, relates to the problems of combinatorial explosion, which is often seen as the biggest problem of them all. The seeming insurmountable nature of this problem is the one factor which was cited by the infamous Lighthill Report, which single handedly killed off much of the major funding for AI research in the United Kingdom in the 1970s.
The problem of course hasn’t gone away and a lot of AI research is involved in figuring out ways to circumvent it. The problem has to do with the generating of too many possible answers in the realm of ‘search space’, and finding out which of the possibilities or subset of which, is the answer or a likely candidate for the further refining towards an answer.
So for instance in the board games of Chess or Go, there are a myriad number of possible moves and much of the engineering behind designing the AIs which play these games, is in the handling of the combinatorially explosive possibilities which quickly add up to beyond astronomical numbers. Though in these cases there have clearly been notable successes in effective functioning and therefore their handling of the problem of combinatorial explosion, a generalized solution to the puzzle is yet to be found.
Ben Goertzel, who is an advocate of the open source and cognitive science based approach to AI, thinks that there may be a way that separately constructed modules making up an AI, each of which is prone to the problem of combinatorial explosion, will somehow magically ‘quell’ each others combinatorial explosions but doesn’t seem to offer any explanation of how this might happen in practice except to say that, ‘The question’s whether you can hook different algorithms together so they can kind of calm each other down.’ He has also said within the same context, ‘I’ve become convinced that essentially every cognitive algorithm used within intelligence has exponential complexity. Pretty much all you’re doing is making uncontrollable insane combinatorial explosions one way or another.’
However we know that humans do possess a ‘cognitive algorithm’ that isn’t combinatorially explosive, so therefore there exists another way of looking at the problem of combinatorial explosion, which is to consider how we might completely circumvent it by creating AIs that actually work like humans.
In order to illustrate this alternative approach to things, we describe a related problem in AI, which is how to do what is known as ‘one shot learning’. Humans learn by being given very few examples and by internally generating models that are matched to sensory input. This relates to the idea of ‘poverty of the stimulus’, which means that unlike deep learning neural networks which require tens of thousands or even millions of examples to learn anything, humans seem to do the same in a few steps by generating the right guesses to match to the input. So the problem of ‘one shot learning’ becomes the problem of generating the right shots, which then relates to the puzzle of combinatorial explosion because if we knew how to do this, then we wouldn’t have to deal with combinatorial explosion, because we wouldn’t be generating all the millions of possibilities to have to sort through, prune or ‘quell’ in the first place.
So a human chess or go player doesn’t play like a computer, combing through billions of permutations of possible moves and board positions but will rather generate a few dozen moves to ponder, but these will all be pretty good moves to start with. The solution to the problem of combinatorial explosion lies in solving the problem of how to generate a few good ‘shots’ in the first place versus generating billions of ‘shots’ and then trying to find the potentially useful ones. We’ll later describe how we do this and it intimately relates to the other problem categories.
The problem of combinatorial explosion relates to our earlier discussion about the missing recursively self modifying generative process in AI and which we related to the ideas presented in earlier chapters of giving AIs creativity through the evolutionary aspect of the Fractal Brain Theory, and also the idea that there exists a critical algorithm necessary for the creation of AI, which is the algorithm that is able to generate algorithms. So tightly bound up with the problem of combinatorial explosion and one shot learning are all the central issues discussed in our earlier in this chapter. The first strand of rope in our figurative Gordian Knot is itself already a tangled interweave of many central AI problems and issues.
Our next major problem category labelled B in diagram 11.2, is really also a Gordian Knot in itself and relates to the problem of hierarchical representation, hierarchical decision making and hierarchical problem solving.
Professor Stuart Russell, who is the co-author of the standard textbook on AI, believes that this is one of the major impediments to the creation of AI, he said in a recent interview that, ‘There are two or three problems like [hierarchical decision making] where if all of those were solved, then it’s not clear to me that there would be any major obstacle between there and human-level AI.’
The importance of hierarchical methods in AI became apparent with the arrival of so called ‘blackboard’ hierarchical systems in the 1980s, which included a famous early attempt at speech recognition called Hearsay II. But now in contemporary AI research they are everywhere. The whole deep learning approach is really based on using hierarchical neural networks. We could also add the work of influential names in AI such as Jeff Hawkins whose approach is called Hierarchical Temporal Memory (HTM). Then there’s Ray Kurzweil who thinks that an approach known as Hierarchical Hidden Markov Models (HHMM), is a major part of the ‘secret of human thought’, and a key to ‘How to create a mind’. So it is quite apparent that the idea of hierarchical models and methods has these days become central to most current attempts at trying to solve the problem of AI.
Perhaps it is not so apparent that the other major AI sub-problems of how to represent time and space, together with the problem of how to handle context, are actually deeply bound up with the problems relating to hierarchical representation and problem solving. However, within the context of the Fractal Brain Theory, as we discussed earlier on in this book in chapter 5, the human mind seems to handle space and time in a completely hierarchical nested binary tree manner. Also as we discussed in chapter 4, binary trees and binary combinatorial codes give us a very generic and powerful way of handling context, and in a way that’s totally recursive and indefinitely expandable.
Of course the representation of time and sequence learning is an important problem in AI. Leading researchers such Jeff Hawkins and Peter Norvig have stressed the central importance of temporal learning and representation, but no satisfactory generic solution has yet emerged. Within the context of deep learning neural networks, Professor Andrew Ng has written recently, ‘While most deep learning researchers also think that learning from sequences is important, we just haven’t figured out ways to do so that we’re happy with yet.’
In relation to the problem of context the late great John McCarthy, who designed the AI programming language LISP, said in a 1989 interview that it was, ‘the next mountain that has to be surmounted’, and which to date still hasn’t been surmounted in a satisfactory way. A comprehensive and generic solution to the problem of context is something that the AI world is still waiting for.
We therefore see that the puzzle of hierarchical representation and problem solving are inextricably bound up with the important and central AI problems of how to represent space and time together with problem of how to represent context in a completely generic way. They are really different facets of the same problem.
Our third major category of AI, labelled C in diagram 11.2, relates to how to make an AI represent things and solve problems in a way that isn’t ‘brittle’. This would allow for AIs to use of analogy, which would in turn enable ‘transfer learning’, i.e. the ability to take concepts, skills and experience gathered in one domain over to another. This is something that all of existing AI is hopeless at. So a grandmaster beating chess computer can’t beat a small child at chequers, and a super-human space invaders playing AI won’t be able to generalize the ‘skills’ that it has ‘learned’ to even a closely related game but needs to be trained from scratch.
Perhaps through having first hand knowledge of these sorts of systems Demis Hassabis, when asked what was the big problem he was working on in an early 2015 interview replied, ‘The big thing is what we call transfer learning. You’ve mastered one domain of things, how do you abstract that into something that’s almost like a library of knowledge that you can now usefully apply in a new domain?’
Apparently the US government agency DARPA (Defence Advanced Research Projects Agency) has spent a lot of time, effort and money on trying to tackle this problem but ending up with nothing to show for it. Paul Cohen of the University of Arizona and DARPA has said, ‘DARPA has sunk vast amounts of money into programs with names like transfer learning where the goal is to transfer knowledge acquired in one context to another, can’t do it.’
So transfer learning seems to be a very difficult puzzle to solve, but as we’ll see this is because it is impossible to get a grip on the nature of this problem without considering it fully in the context of the other three major problem categories.
Another way of looking at the brittleness problem or transfer learning problem is to consider it in terms of the puzzle of how to get AIs to effectively handle analogy. After all there is no such thing as a brittle analogy and we are able to transfer experience gained from one domain to another through the use of analogizing. Therefore the solution to the puzzle of how to do analogy in AI is tightly bound up with the solution of the brittleness problem and transfer learning.
Douglas Hofstadter who in the late 1970s wrote the award winning book on AI called Godel Escher and Bach, thinks analogy is the absolutely number one key to creating AI and it makes sense to include it in this problem category.
We could even include the age old AI problem of ‘symbol grounding’ in this category. In traditional AI or GOFAI i.e. ‘Good old fashioned artificial intelligence’, which dealt with logic and strictly symbolic representations, then there was the problem of mapping these abstract representations to complex spatial temporal coordinates. So there was a disconnect between the symbol for say ‘cat’ and all the complex information, visual, audio, tactile and even olfactory, that we as humans have relating to the concept ‘cat’. So therefore the symbol was said to be ungrounded and we had the ‘symbol grounding’ problem.
In some of the most cutting edge AI research today, which starts off by being trained with a mountain of complex spatial temporal data in the form of images and videos, including of course those involving cats and dogs, then we have the opposite problem. The learning algorithms are able to extract low level features and use composites of them in order to recognize things like cats, but they are not yet capable of forming the abstract notion of what a cat is.
Even if modern AIs in the form of Google’s Brain Project are able to learn to extract the form of a cat’s face from millions of YouTube clips of cats, then it is still unable to form an abstract conception which is able to transfer to different views of a cat or other sensory modalities. So we have the symbol grounding problem again in a different sense, in that we have the ground but we no longer have the symbol versus before where we had symbols but lacked the ground.
So this is a big challenge for AI and to solve it is really to help solve the brittleness problem because ungrounded symbols will generally be brittle and it is difficult if not impossible to form analogies purely from symbols, and without analogy there is no transfer learning. So we have a dense nexus of problems here in the form of transfer learning, the brittleness problem, symbol grounding and giving AIs the ability to analogize. Another mini-Gordian Knot.
Our fourth and final major problem category, labelled D in diagram 11.2, relates to the puzzle of how to give an AI a sense of meaning and purpose. This problem is closely bound up with what is considered by the leading deep learning experts to be the major stumbling block of their approach, which is how to do ‘unsupervised learning’.
Yann LeCun of Facebook and New York University says, ‘We know the ultimate answer [for the algorithm of the cortex] is unsupervised learning, but we don’t have the answer yet.’, and that ‘We are still missing a basic principle of what unsupervised learning should be built upon.’
Yoshua Bengio, who is another one of the big names in deep learning AI said recently that, ‘Unsupervised learning is really, really important’, and puts it at the top of the list of AI problems waiting to be solve.
Andrew Ng of Baidu Corporation and former head of Stanford University AI Lab, says of unsupervised learning, ‘We don’t know what the right algorithms are. We have some ideas, but it will be much harder.’
The problem of unsupervised learning in deep learning relates deeply to the idea of giving a machine a sense of meaning and purpose, because it should be obvious, that it is our internal sense of meaning and purpose, which is our ultimate supervisor and relates to the sources of emotion and motivational drive which we discussed in earlier chapters. This would then tie in with other important problems in AI, relating to how to best do reinforcement learning and the puzzle of how to implement utility functions.
Reinforcement learning is central to the London based company DeepMind, which has been attracting a lot of media attention for its successes with video game playing AIs and finally beating the best humans at the ancient Chinese board game Go. Reinforcement learning is partly behind both of these remarkable achievements.
Demis Hassabis, the main mind behind Deepmind, has said that, ‘The core of what we’re doing at Deepmind focuses around what is called reinforcement learning, and that is how we think about intelligence at Deepmind.’ He thinks that at some point in the future, ‘Reinforcement learning will be as big as deep learning is now.’ The new artificial intelligence company OpenAI, with billion dollar backing from high-tech industry names such as Elon Musk, Jeff Bezos of Amazon and Sam Altman president of tech incubator Y-Combinator, has identified reinforcement learning along with unsupervised learning, as the two key problems it will initially be working on.
Regarding the closely related idea of utility function, after all utility is intrinsically reinforcing, this has been much talked about in AI circles for years but a satisfactory and generic solution has never emerged. But at the same time Stuart Russell believes that, ‘at the moment [the topic of utility function] is very under studied in artificial intelligence and so my feeling is that this will be a very important area.’
We seem to have in our fourth and final problem category yet another mini Gordian Knot which involves the tightly interrelated puzzles of giving an AI a sense of purpose and meaning, unsupervised learning, reinforcement learning and utility function.
We’ll now go on to explain how the separate mini Gordian Knots we’ve been discussing are in turn all tightly bound up with one another. They cannot be sensibly considered, let alone unravelled, without taking into account the other problem categories to which they are intertwined. This will give us our overall Gordian Knot of AI, which has interwoven into it almost all of the most important issues and central problems of AI.
Peter Norvig in an opening address to an AI conference said, ‘So we’ve got this learning, deep semantics, hierarchical, temporal and the continual, and we want to make progress on all of those, how do we do that… each of those individually is a tough problem ,but figuring out how to do them all at once that’s what make AGI as challenging as it is.’
What the idea of the Gordian Knot of AI is saying is that, it is not the case that the individual problems are ‘tough’ to completely and satisfactorily solve in isolation; but rather impossible to do so when considered on their own. It is only by taking the rather more intimidating route of tackling all of the problems at once, and in intimate relation to one another, that the central problems of AI can truly be solved, and true AI can finally be created.
We made the observation earlier that these central problem categories have existed as the main impediments to progress in AI for decades. They are really the perennial and enduring intractable puzzles remaining to be solve in AI. So we are now starting to get an idea as to why these problems have been so difficult to tackle, i.e. their interconnectedness. This is especially the case given the modus operandi in which just about all scientific research and technological development is conducted, that is, in a highly specialized, compartmentalized and reductionist manner. To return to Patrick Winston’s term, it has been the ‘mechanistic balkanization’ of AI and science in general, which has held back progress in AI and the theoretical mind sciences.
Once we recognize this complete and total interconnectedness of these major problems in AI, only then will we be able to make progress and solve them. As we’ll see later on, the Fractal Brain Theory provides some very useful guides and ready-made solutions to these problems, once we understand better what they are and exactly how it is that they are related to one another. So we need now to explicitly show how the separate major problem categories are interconnected with one another to prepare the way for showing how they may together be solved. We’ll explain why that solution is inherent and implicit within the workings and properties of the Fractal Brain Theory.
We have the four main problem categories of AI, as we’ve just outlined. Given these four sub-divisions, then there exist six connections between them which are all the possible pairings of the sub-categories (diagram 11.3). So now we’ll go through each of the pairings in turn and in the same order of the numbers which have been associated with each pairing in the diagram. To start off we’ll discuss our first pairing i.e. number 1.
Our first pairing number 1, between the problem of combinatorial explosion and the problem of meaning and purpose, is probably the most obvious relationship between our problem categories and is quite easy to explain. Without an idea of meaning or purpose then it would be impossible to even start to even think about how to solve the problem of combinatorial explosion. So for example, in the game of Chess without the idea of checkmate being the ultimate purpose of the game and the winning outcome, then there is no way to begin to think about utility functions or which moves to reinforce or not. Without an overall sense of meaning in an AI or human for that matter, then things become an exercise of randomly drifting around the space of myriad combinatorial possibilities without any of them having any more significance or ‘meaning’ over any other. So therefore the idea of meaning, purpose, utility function and reinforcement are inseparably bound up with that of combinatorial explosion and the generation of possibilities in search space.
Moving onto pairing number 2, i.e. hierarchical spatial, temporal processing and representation, meets the brittleness problem, transfer learning and symbol grounding; we find a whole set of relationships between the two problem domains. Neuroscience tells us that the symbolic and semantic aspects of our minds exist higher up in the brain hierarchy versus those dealing with more spatially and temporally complex information. This means that the problem of symbol grounding is intimately related to the idea and problems of hierarchical representations.
In relation to using analogy, a hierarchical view of representation and symbol grounding allows us to do multi-resolution processing. This means that higher up a representation hierarchy we will have lower resolution representations and vice versa, the lower down the brain hierarchy we go then the more higher resolution and more spatially and temporally complex things become. This means that though it may be impossible to find an analogous mapping working with highly detailed comparisons, by going up the hierarchy lower resolution correspondences may become more apparent. This lower resolution representation is then what is ‘transferred’ from one domain to another with the specific details of the new domain added to it as the reconstituted higher resolution details.
In language processing the higher resolution constructs may consist of specific sentences whereas the lower resolution ones may take the form of rules of syntax into which specific words are plugged in. We can extrapolate this idea from language to all other modalities. Pairing number 2 is really about the tight relationship between the idea of hierarchical context and that of non-brittle content. The problems of brittleness, transfer learning and analogy are only solved by considering them in relation to hierarchical spatial temporal structures.
Pairing number 3 connects the hierarchical spatial temporal representation aspect with the problem of meaning, purpose and utility function. This relates to the idea discussed in chapter 9, that our sense of meaning and purpose derives not from some static mathematical function, assigning utility to our sensory input, actions and mental models, but is rather a data structure that continually grows and augments itself in the course of the lifetime of an organism.
We used the analogy of a diffusion limited aggregate growth structure to show how this utility function structure of salience and affordance, would grow from our emotion centres and hardwired unconditioned reinforcers.
In relation to our hierarchical spatial temporal structures, this means that our dynamic utility function would grow into our representation of physical reality, so that places and specific times may be given emotional significance and ‘meaning’.
Examples of this in practice will be psychological phenomena such as conditioned place preference or contextual place avoidance. Specific temporal contexts can also be given special salience, i.e. birthdays or the end of the working day. It is therefore impossible to understand the nature of the dynamic utility function without taking into account the spatial and temporal aspect, with which it works and which are is inherently hierarchical. Similarly, notions of space and time, in the context of the mind of an intelligent being, generally don’t really exist without a sense of meaning and purpose.
Similar lines of thought also apply to pairing number 4, which is between our problem of utility function and the problem of non-brittle representation, symbol grounding and transfer learning. The representations which form in our minds are inseparable with their meaning and tied in with the idea of their utility at the outset. The very idea of a meaningless concept is a contradiction in terms, even if a particular concept has more meaning for one person than another, or is even completely meaningless to somebody else. Given that at some point a concept was formed, then at the outset it has been attached with meaning within a certain context. Whether that concept will turn out to be useful or not is another matter.
When we examine the neuroanatomy, we find that the cortical regions associated with semantic knowledge and concept formation, i.e. the polar temporal cortex and surrounding areas, are tightly inter-interconnected with the emotion centres and in particular the amygdaloid nuclei. So the abstract concepts we form in our brains are well positioned physically to be associated with utility or aversion. It is also this affective dimension which influences not just which concepts are formed in the first place, but also the degree of specificity and differentiation that a concept or symbol might have.
An example would be the fact that Eskimos have 50 different words for the concept of ‘snow’, whereas a typical resident of a modern day city like London might know snow, slush and sleet, together with perhaps one or two more related labels. This is simply because there’s lots of snow where Eskimos live and knowing the fine distinction between different kinds can be a life or death matter. As a result, the emotion centres will ‘supervise’ the formation of a greater number of distinctions with respect to the concept of snow.
The full understanding of the intimate connection between the formation of non-brittle and analogously transferable concepts with the idea of utility and meaning, is to break the ‘barrier of meaning’ as Melanie Mitchell, a complexity theorist, calls it. It is to recognize that the process of concept or symbol formation and ideas of meaning, purpose, salience and affordance, are all inextricably linked. Naturally in order to create true AI and give it semantic symbolic representations, then we need to give these representations meaning, which means relating them to utility functions which determine in turn which concepts, ideas and behaviours get reinforced.
Now moving onto pairing number 5, which is between the problem domain of combinatorial explosion and that of the brittleness problem, transfer learning and AI analogizing; then we identify here one of the main reasons why humans reasoning and thinking in general doesn’t involve combinatorial explosion. It is because we are able to use analogies and generalize solutions we have learned in one context to another. In relation to our earlier discussion about ‘one shot learning’, it is exactly through this ability to transfer knowledge between domains that allows us to generate the right shots or at least good guesses, roughly in the right ball park, to begin with.
Even in the earlier days of AI research in the 1970s, when workers in the field were first trying to come to grips with the problem of combinatorial explosion, one idea as to how to solve the problem was to try to find certain ‘rules of thumb’ or heuristics that would save the AI from generating such a huge combinatorially explosive search space in the first place.
So even early on, researchers encountered the problem of transfer learning and how to discover these rules of thumb. Once we’ve learned a rule from a specific instance within a specific context, then we could apply that rule again with success to exactly the same sort of context, which will be pretty narrow. But the problem is, how do we transfer that rule of thumb to another context. So this leads onto the problems of how to get a computer to use analogy, which in turn leads to the brittleness problem. Even by considering these issues which have existed in research for a long time, we can see the significance of pairing 5 of diagram 11.3.
Humans are able to use the ability to analogize and transfer knowledge from not just one domain to a specific problem, but rather the distilled wisdom is the extraction of the essential qualities of the patterns of the world and life, that are made over an entire lifetime especially during childhood. We are able to bring to bear on any situation and problem at least an educated guess as to what the best approach or solution is that is better than random selection.
This is often based on even some small resemblance of the novel circumstance to a particular past experience or a composite of many past experiences. This might even be a composite of things that have happened in very different domains. And a particular educated guess to the answer to a problem in some new circumstance may be informed by a past experience that occurred in a completely different circumstance, that yet has some analogous even metaphoric connection between it and the new circumstance in question. This educated guess will often produce, if not a totally correct result, then one which is not totally off the mark.
There is a very good reason why the use of analogy is such a good strategy to use in our dealings with the World even if we haven’t yet been able to get machines to use the same facility. This is because the World and Universe, we inhabit is full of symmetry, self-similarity and recurring patterns. If it is often the case that we can transfer skills and concepts from one domain to another, then this is because all over the reality we inhabit, there exists underlying patterns which keep recurring in a multitude of diverse contexts even if the different patterns are not always an exact fit, and our analogies don’t always work out very well.
If it’s possible to transfer patterns or rules of thumb that we’ve learned, to new circumstances, then this is because all the richness and diversity of the World is really made up of myriad variations of the same underlying patterns. If we superficially see differentiation and variety or broken symmetry, then there is underlying symmetry which we consciously or unconsciously exploit when we use analogies, make our best guesses or in trying to generate the right shots in one shot or a few shots of learning.
These underlying patterns may also be scale invariant or fractalesque. It has been said that the genius of Leonardo DaVinci derived from his ability to extrapolate from the large to the small and vice verse, from macrocosmos to microcosmos and back again. He believed that ‘the structure of the whole is reflected in the parts’, and linked the patterns of a human being to the patterns of the wider world. So for instance DaVinci wrote that, ‘The body of the Earth, like the bodies of animals, is interwoven with a network of veins which are joined together and are formed for the nutrition and vivication of the earth and of its creatures.’
The reason why his analogical thinking allowed him to make remarkable scientific discoveries and insights into the nature of things is that reality and the World is set up to allow this sort of reasoning to be effective, due to it underlying symmetry and self-similarity.
Put another way, the reality we inhabit doesn’t consist mainly of disordered white noise or incongruence, but is instead loaded with symmetrical order and self-similar pattern. To use analogy and transfer our learning from diverse and different contexts into each other, is to exploit these recurring patterns or symmetries. To not do so, as in the case of most existing AI, is in a sense to search through the irrelevant white noise and meaningless non-pattern, to produce the problem of combinatorial explosion. This understanding is the key to solving or making irrelevant the problem of combinatorial explosion.
This may seem a bit speculative and philosophical but the example of an effective video and image compression technology called Fractal Compression helps to illustrate our point. Fractal compression is able to work by identifying self-similarity of form within a photographic image and in its analyses it uncovers a wealth of hidden pattern which it uses to compresses the image. This shows how even on a superficial level the world is full of symmetry and self-similarity. Our analogies and even our scientific theories are the uncovering of the deeper symmetries. We exploit these symmetries through our use of analogy and the transferring of concepts from old contexts to new and novel ones.
So this establishes our link between on the one hand the problem of combinatorial explosion and on the other that of non-brittle representations, the use of analogy in AI and transfer learning.
We now move on to the final pairing number 6, between the problem domain of combinatorial explosion, which we’ve just considered in relation to the brittleness problem, but this time we relate it to the problems of hierarchical representation and processing. There is a very simple reason why hierarchical representations are extremely useful when it comes to the problem of dealing with combinatorial explosion and it has to do with the idea of recursive hierarchical decompositioning of data sets.
An analogy is useful for understanding this concept. The problem of combinatorial explosion can be likened to the metaphor of a search for a needle in a haystack. Only in this instance by doubling the size of our metaphoric haystacks, we don’t merely double the search space but exponentiate it. So doubling the size of the haystack means the search space is squared or taken to the power of 2. For example if the initial search space was 10 units big, then doubling the haystack size doesn’t make a new search space of size 2 x 10 = 20, but rather one of 10 squared i.e. 10 x 10 = 100. Tripling the size of the haystack creates a new search space of size 1000 not 30 and quadrupling it produces numbers of 10000 versus 40. So this is a good way of visualizing the problem of combinatorial explosion.
The idea of hierarchical decomposition works by halving the size of the haystacks which means we go the other way round. By doing this we don’t merely halve the search space, but produce the inverse exponent or logarithm of the search space. So we might start with a search space of 1000 and by hierarchically decomposing the haystacks into 4 separate ones then we end up with 4 separate haystacks each of which contain a search space of 10 units which is much easier to handle. The real power of hierarchical decomposition comes into play when we are faced with really large, beyond astronomical, numbers which are usually associated with combinatorially explosive problems.
If we transfer the examples just discussed, by analogizing our non-brittle haystack metaphor to AI and mind science then they become spatial temporal representations. Doubling a haystack in this new context would mean doubling the size of some representational map, i.e. area of cerebral cortex or AI data array. It could also mean that a temporal representation is doubled in time length or in its number of temporal sequence elements. Either way we would be able to logarithmically shrink down the combinatorially explosive possibilities by using hierarchical decomposition of brain regions, sequence memories or AI spatial/temporal data structures.
We could also add that hierarchical decomposition is able to exploit the symmetry and self-similarity in the world around us. The separated out, recursively divided up metaphoric haystacks will actually be handling the same patterns and common symmetries. In a Universe full of self-similarity, it is possible to treat all of the recursively decomposed mini-haystacks as if they were the big haystack and vice versa. We could interpolate from the big haystacks to the little ones and vice versa extrapolate from the little ones to the big ones. So not only is the size of the search space made more manageable but also, in a metaphoric sense, the separated haystacks will contain the needles in roughly the same places. Therefore, discovering the needle in one of the haystacks will often transfer to the automatic or quicker discovery of the other needles in the other haystacks.
The human visual system can be used to illustrate this property. There is a lot of symmetry in the primary visual cortex of our brains with the same kinds of bar or edge detectors configured in a tessellating array of cortical columns spanning this entire area. We discussed this in chapters 3 and 5.
This symmetry of brain design reflects the symmetry of the visual input in that the same visual primitives, i.e. edges aligned at various orientations, are being repeatedly used all across the visual field. So in a sense the same solution or sets of needles in a haystack for one region of primary visual cortex, transfers to all the other regions. This is a low level example of how the brain exploits the symmetry in the natural world, but the same idea extrapolates to higher level representations and also to abstract thoughts, even up to the level of unifying scientific theories.
All of this forms another representational hierarchy which is recursively decomposed to make the searching of the haystacks and finding the right answers, existing somewhere in the combinatorial space of possible answers, easier. From this perspective, the process of hierarchical decomposition as a way of tackling combinatorial explosion is therefore a much more general concept. For our immediate purposes it means we have established why for the creation of AI, the problem of combinatorial explosion is really inseparable from the problem of hierarchical representation and processing.
We’ve now completed our discussion of why the major problem categories of AI, depicted in diagram11.2 and which we outlined earlier, are complete intertwined with one another. The links in diagram11.3 which represent the pairings between the problem categories of AI reflect fundamental relationships between them, and reflect the way that each of these problem categories are inextricably bound up with one another. Earlier on when we went through each of the major problem categories then we also explained how each of them was like a mini Gordian Knot in itself, with all the sub-categories intertwined with each other. What we have now done is to show that all the mini knots in turn entangle with one another to create our overall Gordian Knot which actually encompasses most if not all of the central, most important and really intractable problems of AI, which have resisted being satisfactorily solved for decades. Now that we see the nature of this Gordian Knot and how it includes in its dense nexus so many of the key issues, then we see why it is that the creation of AI has been such a difficult task.
John McCarthy, one of the founding fathers of AI who is credited with first using the expression ‘artificial intelligence’, said in an interview that once AI was finally created then we’d look back and be saying to ourselves ‘why didn’t we do it sooner.’ The answer to that question, given our Gordian Knot way of looking at things, is simply that there was a great impasse on the road to creating true artificial intelligence because a reductionist approach was used to try to tackle a problem which isn’t really reducible into a set of self contained puzzles. The property of linear separability which has served the sciences so well is not really as applicable to complex non-linear phenomena such as the brain and mind, therefore AI also. It existed as one big interconnected problem where it was not possible to separate out the combinatorial haystack into smaller ones. So it is really a more holistic and integrative approach which will be finally effective towards the goal of understanding the nature of intelligence and the creation of AI.
Marvin Minsky another founder of AI, came up with the concept of ‘AI complete’, which echoes the idea in computer science of ‘NP complete’ or Non-polynomial time (NP) complete. This basically involves a class of combinatorially explosive algorithms, and the basic idea is that the solution to any one of them will automatically imply a solution to all the others. So Minsky’s ‘AI complete’ is the idea that the solution to any of the really difficult problems of AI, i.e. full natural language understanding, common sense reasoning or human level sensory motor coordination; will necessarily also contain within it the key to unlocking all the other problems.
In a sense our explanation of the Gordian Knot of AI gives us the underlying reason why there is a lot of truth in the idea of ‘AI complete.’ This is because the complete entanglement of the major AI problem categories and sub-categories, requires a complete solution to each of the difficult AI complete practical problems, like full natural language understanding, will necessarily involve the solution of all the interconnected problem categories and sub-categories we listed. Because our Gordian Knot essentially includes every single major problem category of AI, then the unravelling of this Gordian Knot in turn necessarily implies the solution to all the other really difficult practical AI problems, so therefore the idea of ‘AI complete’ would be more or less correct.
We’ve explained the complete interconnectedness of all the major problems in AI, i.e. combinatorial explosion, the brittleness problem, hierarchical problem solving and utility function. We’ve also shown that many other central AI problems are closely tied in with these major categories. We’ve also spent the preceding chapters of this book explaining our symmetry, self-similarity, recursive and recursively self modifying way of looking at mind, brain and genomics, together with related matters. So now we’ll go through some of the ways that our new approach is able give us new lines of attack and even ready made solutions to the really intractable AI problems which we have been discussing. We’ll show how the Fractal Brain Theory is able to relate to every aspect and relationship contained in our Gordian Knot and the way in which it is able to unravel it.
This is an excerpt from the beginning of the Artificial Intelligence chapter of the Fractal Brain Theory book (Author Wai H. Tsang), pages 430 to 457. The rest of the chapter pages 458 to 512 (It's a long chapter) are available in the print book, which can be ordered from Amazon or BarnesNoble.