Brains might be simple. Many physicists have hoped, optimistically, that perception involves a yet-undiscovered simplicity worthy of Nature's other universal laws.
Suppose that intuition is correct, and brains in fact do use principles and symmetries as deep as gravitation. Gravitation was Feynman's canonical example in The Character of Physical Law. The force of gravity is continuous, its symmetry is isotropic, its range is multi-scale, and its structure is simple (following an inverse-square law which in turn derives from the structure of 3-D space itself).
From the proper viewpoint--but only from the proper viewpoint--we believe perception can also be understood as following such a simple law, continuous and multi-scale. Furthermore, as with gravity, that universality and simplicity result ultimately from the topology and symmetries of spacetime itself. Here is how that view unfolds.
Representation is identity. First up is the basic mathematical question of identity: what is a representation, anyway? A perfect representation must be a perfect copy, as in an equation or a computer memory. Unfortunately, perfect copies are only possible for quantized structures. A continuous representation, then, must be a compressed representation, as in images or streaming video. One kind of simple continuous representation is a theoretical parameterization or spline (like the expression 1/r2), in which a few symbols represent far more data points; likewise, the smooth lines of a cartoon can represent a continuous face. So even though an ideal continuous representation cannot possibly contain all the information embedded in its input data (much less in the continuous original), the ideal can and should look as much like the original as possible, having the same geometry, topology, and structure. A continuous representation should seem like a perfect copy, just as our perception of the outside world seems real. In geometric terms: a representation has the symmetry of isomorphism, one thing shaped just like another.
Perception maps 3-D spacetime. The next question is topological: what is being represented in perception? Perception is the representation of real things in real time. That means representing either moving 3-D structures, such as objects in the outside world, or representing muscles and bones inside a body (also moving 3-D structures). So in an idealized sense, perception by a brain in real-time 3-space (as opposed to recognition by computers of still images in data hyperspace) must produce a corresponding smoothly-changing 3-D structure. This unorthodox but simple and principled viewpoint understands perception entirely as the construction and synchronization of an apparently flawless 3-D simulation, a representation decompressed and updated from undersampled sensory data. If in geometric terms representation has the symmetry of isomorphism, then perception must have the topology of 3-D spacetime (as Kant observed).
Hardware is hard. That simple, geometric description of perception seems impossible to implement in hardware. What kind of idealized 3-D medium can form and hold an ever-evolving 3-D micro-copy of the world, as if simulating the outside world inside a snow-globe? How do signals from outside even enter into such a medium, and how do the results escape? What wave equations inside the medium allow it to simulate momentum and trajectory, and what nonlinearities of amplification and smoothing enhance the signal and remove the noise? How can such a system consume the absolute minimum of power?
Physics offers inspiration if not actual answers. An elastic medium like jelly can hold moving 3-D information as waves. Information can enter and exit such a medium as impulses which create amplitude excursions (in the same way that spikes drive synaptic events), and in turn amplitude transients can initiate new spikes. Subtle correlations can be enhanced (and entropy thus reduced) by local non-linear dynamics like Hodgkin-Huxley and STDP, and noise can be removed by thermal blurring. As in quantum mechanics, efficiency comes from computing as much as possible continuously, without quantization, and from forming quanta (i.e. spending energy on new impulses) only when absolutely necessary. In Elastic Nanocomputation, I calculate several efficiency metrics comparing continuous to "neural" representations; typical advantages are millions-fold. In this viewpoint, the over-arching theoretical challenge is to merge the thermodynamics of low-entropy stable states (e.g. correlation-enhancement, amplification, phase transitions, crystallization, attractor dynamics, resonance, and so on) with physical structures like fractals having continuous scale and infinite resolution, which furthermore evolve continuously through time. Such a unification poses a challenge even for physicists, but is not impossible.
The above constraints are not the limitations of a particular hardware model, but constraints of the representational problem-space itself. Any theory of perception faces them. Representing a continuous 3-D world efficiently requires some kind of 3-D medium, virtual or real. Ignoring the problem-space won't make it go away. So in lieu of solving the hardware problems (which Nature has somehow solved, even if we haven't), we can deal with software or algorithmic abstractions of them, say in terms of geometric reference frames, statistical inference, information flow, and so on. These principles, best summarized as "the thermodynamics of sensory compression," place their strongest constraints (such as those on bandwidth and priors) on all possible representations, not just on the sub-problems of body control discussed below. Even though body control is the more difficult computational and biological problem, the math of representation takes logical precedence.
Multi-scale representations must be warped. Only in the quantized world of computer science can a representation be exactly the same as its object. A truly continuous object from the real world can at best be well-approximated. In the simplest case representation and object would be geometrically similar, the representation forming a scaled-down copy and its degraded resolution spread equally. A more sophisticated approach would be multi-scale, with variable resolution, in effect allowing local mapping deformations while keeping the global structure. In that case the representation and the object, being continuously deformable into one another, would be topologically equivalent.
Strong priors are essential. A general-purpose representation (like a "deep learning" system) comes with little prior knowledge of its world. A specialized representation, on the other hand, has strong priors. By Bayes theorem, the stronger the priors, the better the model, so a brain pre-wired for 3-D spacetime will vastly outperform one which must first infer the world's dimensionality from scratch.
The most efficient priors are splines. The algorithm of coarse-graining aggregates many data points into fewer, so it implements dimensionality reduction. But in order to produce those fewer data points (which still contain noise), coarse-graining discards fine correlations. Splines, on the other hand, compress data into a curve using very few parameters, whose smoothness masks the data's noise and graininess. But this approach requires knowing the spline's functional form as a strong prior.
A 4-D spline looks like a cartoon. The physical world is made of objects, whose boundaries have measure zero. If in modeling the world one (somehow) fits spline surface contours to those boundaries, the resulting cartoon-view becomes a very efficient form of compression, carrying the locations of crucial boundaries with only a few continuous real-valued parameters, and smoothing out high-frequency noise between them. A moving spline in 3-D space would look like a soliton or a wave-front. A crucial feature of cartoon compression is that it does not represent its own errors. Even a spline fit to noisy data is still a smooth, sharp curve, and necessarily cloaks both its data's faults and its own.
Wavefronts and attractors are like eigenmodes. Certain shapes and types of spline move best through correlation-enhancing media (like laser amplifiers or photomultipliers), equivalent to the travelling eigenmodes inside a wave-propagating medium. Depending on boundary conditions, those travelling local waves might also create global standing eigenmodes. The term "eigenmode" is used loosely enough to include low-entropy attractor states, not just linear ones. The dominant eigenmodes will be those which best produce or liberate the medium's correlation-amplification mechanisms, and/or best evade suppression.
Compression is like crystallization. Both compression and crystallization are non-equilibrium processes; both consume energy and reduce entropy; both get stuck in energy minima; both undergo phase transitions; both amplify local correlations into global ones. Of the two, crystallization is better understood mathematically, while compression is better understood informationally. Concepts of crystallization already apply to alternate dimensionalities (1-D and 2-D spin-glass models), non-repeating structures (snowflakes and DNA), and time-varying continuous ones (coupled oscillators). I propose extending them in all those ways at once. While compression does not yet have the rigor of thermodynamics, it does apply to nearly arbitrary time-varying signals. If the signal originates from three dimensions, its time-varying 3-D sub-manifold could be seen as a four-dimensional structure, continuous in space and time, having the low entropy of a crystal but without lattices fixed in either space or in time (I think of these new topologies as 4-D "flow crystals"). A flow crystal would have the entropy and energetics of a crystal, but the spatiotemporal form of physical reality, flowing through space and vibrating through time. A flow crystal would use the topology of a moving 3-D spline to represent the physical world, and use statistical mechanics to "crystallize" around its ever-flowing data stream.
A flow crystal can have defects. A representation might crystallize imperfectly, i.e. it might contain mis-aligned domains or crystal defects. Defects degrade the resolution of the representation. Defects in an isolated crystal can be fixed by thermal methods like annealing. A more efficient way is coupling one crystal to a separate crystal of a similar type. In the real world one it would be thermodynamically easy (but physically difficult) to re-seed one crystal with lattice information from another, but a flow crystal's structure is contained in its low-entropy eigenmodes, vibrations which can be transmitted. (The vibrations might be transmitted directly by mechanical coupling, or indirectly by light or sound, whose coupling decreases as 1/r2). Because coupling two imperfect low-entropy structures lowers their collective entropy yet further, flow crystals benefit from one another's company; they can "heal" each other.
Uncorrected defects will be cloaked. As with ordinary curve-fitting, a flow crystal's response to real-time inconsistencies must be a compromise between accommodating outliers vs. ignoring them. Likewise during learning (but at a much slower timescale), the crystal as a whole might accommodate and fix a defect in a crystal (the outlier) and thus heal it. But if the defect remains too stubborn, it will be ignored. In that case, smoothing will redirect spacetime flow around the defect, cloaking it to restore an apparent continuity. Paradoxically, the more effective a medium's continuity-enforcing mechanisms, the more effectively they render defects effectively invisible, like the retina's blind spot.