The Tragedy of the Data Scientist

The equations to consciousness carry great responsibility, by Bill Softky

The Tragedy of the Data Scientist

The equations to consciousness carry great responsibility, by Bill Softky

Human communication approaches a megabyte per second, the vast portion of it vibratory and unconscious.

Hello Fellow Data Scientist,

We’re lucky. Data scientists (in the broadest sense) are the very first human beings whose conceptual language — in particular dimensionality-reduction, compression, and statistical validation — can explain our bodies and minds, and hence “consciousness,” in neutral, actionable terms. Our brains are information processors, and data scientists know information theory. We have the Golden Ticket.

I came to data science after a job at Bell Labs, a Caltech Physics Ph.D., and a postdoc at NIH’s Math Research Branch. I worked at one tech startup after another as a coder and software architect, first calling myself a “specialist in statistical algorithms,” and later Chief Algorithm Officer, while still writing my own queries and code in real languages like Java and Python (not Matlab or Mathematica). Made a couple big splashes, and the last two startups were acquired. Those “exits” bought me time to work on cooler stuff.

In retrospect, I was probably destined for data science. I grew up in Silicon Valley, in Menlo Park. Learning to program BASIC at age nine on an old-school teletype with an acoustic modem, tracing down nanoamp current-leaks in homemade circuit boards, measuring atomic nuclei as a laser-jock at Bell Labs, finding statistical gold in neural spike-trains at Caltech…I was always gathering data and making sense of it myself.

From this physics/brain/data perspective I see the sword of technology cuts both ways. Information theory tells us how fragile brains are, and Big Data tells us how to grab human attention ever more effectively. The problem is, attentional economies are zero-sum: every time my app captures attention or influences behavior, some live human at the other end loses focus or autonomy. Yes, we can influence nervous systems algorithmically, but it’s not fair, and bad for them. That’s the dilemma I want us data scientists to solve.

I was the child of two brilliant nerds, and the older brother of one. Dad was a nuclear physicist, Mom too for a while, before becoming a reporter and environmentalist. My kid brother Ed filed a corporate patent in high school (inventing a way to measure hundreds of sensors with three wires). With me he was a brilliant co-conspirator, collaborator, and technical colleague. Our parents gave Ed and me lots of help and lots of freedom, so we rode bikes everywhere, bought old radios to fix at garage sales, and climbed trees.

Silicon Valley wasn’t called Silicon Valley then. Santa Clara and Fremont were mostly orchards. Start-ups and stock options and venture capital weren’t a thing yet. Life was more three-dimensional: we built things with our hands, bought from surplus stores (not screens), rode actual bikes on real asphalt, and were unreachable unless our parents knew whose house to call. Tech was rare: “long-distance” calls were expensive, no answering machines, only a couple TV channels (none on Sunday morning except ‘Agriculture USA’ out of Nebraska), no cable, no internet, no Google, nothing wireless except walkie-talkies. So it was easier to work on major projects without distraction.

What was fun about making TV jammers, “librarian tormentors,” blinking tie clips, and such was actually making them. We had to select the transistors, read resistor color-codes, make our own circuit boards, and measure voltages and currents all day long, with analog voltmeters and a fifty-pound 1949 Tektronix 512 vacuum-tube oscilloscope with a small round green screen (bandwidth limit 1 MHz). We took and analyzed our own data on the fly, and when our circuits finally worked it felt like we had beaten Mother Nature.

Back then we ruled tech, it didn’t rule us. You built a gadget to do something cool and simple, and it did. No software, no dependencies, just wires and voltages. For example, Ed’s and my piece de resistance was hacking the public-address system at Menlo-Atherton High School. This was a perfect “MacGyver” project made from a cassette-tape player hidden under the eaves, two wires from it covertly spliced into the amplifier’s co-ax cable, and a remote on-switch dropped down a drain pipe for access from the hall below. We had one shot, and it worked: our bootleg announcement cancelling final exams aired second period on the last day of classes my sophomore year.

That technology was analog, simple, and do-it-yourself. But with digital tech today, the knowledge-to-complexity ratio of man vs. machine has inverted, and you buy it instead of building it yourself. Now tech can understand you, it can advise you, it can anticipate you, and it can trick you. For these abilities, data scientists can take credit.

Back then, data science wasn’t called “data science” either. I called myself a “specialist in statitsical algorithms” when I first returned to The Valley. There weren’t billboards up and down Bayshore Freeway recruiting Machine Learning and Data Science specialists, like now. But the job still makes you one of the few people in the building who actually knows what’s going on. You run the queries, you vet the data, you find the trends and correlations, you build the predictive models, you live and die by the math, and you get have the CEO’s ear in ways marketing guys never can. If data is the lifeblood of a company, then a data scientist is an oracle.

Of course, people don’t always listen to oracles, even if they should. Execs see patterns in random noise, or fail to believe statistics. One CEO told me to lie to his investors (I didn’t). Several ignored proof of crippling product problems; more than once I was the messenger who got shot. Of course customer needs take priority; of course deep architectures take too long to build. Of course the laws of information processing can’t answer everything, but the answers they do give are always right. (Ye canna change the laws of mathematics, Captain!)

This is where our collective faith in mathematics can help us solve our dilemma. The most important principle I internalized in twenty years of data science, from my first successful commercial algorithm (video car detection) through my most impactful (the PreFix debugger which saved Microsoft) and most elaborate (an auto-bidder against Google’s AdWords “auction”), is dimensionality reduction. In information-limited scenarios — which means almost always — raw data should be fit as gracefully as possible to low-dimensional continuous models.

For example, the trick called Singular Value Decomposition (SVD) transforms one high-dimensional vector into another by going through a narrow, low-dimensional chokepoint. The narrower the choke-point, the better the signal-to-noise ratio and the better the answers. Kind of like Occam’s Razor: the best models are the most compact. Dimensionality-reduction gets much more complicated when the low-dimensional data-manifold is curved or variable, but the basic idea to wring out noise by judicious compression is essential.

Ideally you know the structure you’re looking for in advance, like “assuming” a day has 24 hours or a week has seven days. Never, ever use data to re-learn things you know in other ways; that’s a waste of data. For example, MRI machines know in advance you exist in three dimensions, as a prior assumption hard-wired in place before scanning data even arrives. Such prior assumptions aren’t cheating or laziness, they’re mathematically essential to get anywhere with limited data. It’s called Bayes Theorem; I first heard it evangelized twenty-five years ago at Caltech, and now it’s common currency.

If you’re with me so far, you already have the keys to understand the brain and how to live. That’s the cool project I’ve been working on the last couple years, with my partner Criscillia.

Here are the results in three very compressed paragraphs. They are justified at length in an upcoming journal article, and here:

[1409.8275] Elastic Nanocomputation in an Ideal Brain (1p abstract + 36 pages + 49 endnotes)
Abstract: This explanation of what a brain is and does rests on informational first principles, because information…arxiv.org

I’m sharing these insights with you because I want you to understand how our minds work, and use that knowledge to begin healing the world.

Brains are 3-D. All brains evolved to map 3-D space, assembling spatial maps from discontinuous pulses sent by distributed sensors. Doesn’t matter if it’s a proprioceptive map from mechanoreceptor pulses, or a visual map from retinal pulses, in both cases the data is high-dimensional and discontinuous, yet must represent continuous 3-D space. That’s a functional definition of real-time tomography (like MRI), which is hard enough with 3-D priors, and probably impossible without them. The conclusion that brains are 3-D simulation engines seems obvious in hindsight, but in one stroke it promotes raw, real-time sensorimotor experience as the highest, purest function of a brain, and demotes anything quantized. Memory, language, categories, decisions, and cognition are all coarse, low-bandwidth hacks ill-suited to the exquisite continuous circuitry they run on. We did not evolve to “think,” we evolved to move, sense, and feel, which is still our primary activity even if we’re oblivious. Our minds are actually continuous, unified, and far more powerful than we’ve been taught.

Natural inputs are good, artificial ones bad. Look around you: at your fingernails, the room, beyond the window. Do the numbers: using an HDMI pixel-pitch of 0.2mm as a benchmark, your visual field exhibits teravoxel resolution with predictive latency at zero and sub-millisecond phase precision, fed by only a million pulses per second, i.e. about a megabyte of input de-compressed millions-fold into a vision far surpassing anything Virtual Reality can even dream about. Whatever magnificent engine is synthesizing that real-time, hyper-resolution, hyper-consistent image from a few photons through two tiny pupil-holes, the self-tuning strategies of that engine evolved for fractal, continous, multi-sensory, interactive inputs. Think trees and turbulence, not discontinuous images, events, or words. Evolution hyper-optimized brains not just for 3-D space in general, but for specific continuous patterns found in Nature. That is the statistical contract and informational nutrition each brain needs. As one might expect, such a high-performace system doesn’t do well when most of its data arrives in the wrong format. And digital inputs deviate from the natural contract in every possible way: they alternate smoothness with discontinuity, they fracture sensory input, they arrive on flat screens, they teleport through space and time, and they are specially sculpted to catch our eyes and hold our interest. Technology is neither natural nor random; it is so sophisticated it can captivate and even hack our brains.

Go back to sensorimotor basics. Brains work fine with the right input. You’re a vertebrate, so your most important systems lie along your midline (swallowing, breathing, reproduction). Learn to feel and flex your spine with Yoga, Pilates, dance, or Feldenkrais. Go for high-entropy grace and flexibility, not low-entropy reps and cardio. Go “forest bathing” in quiet and/or natural places with trees, vistas, wind, or water. When you can, be close to people in proximity, ideally without remote distractions, and soak up the micro-expressions and nano-gestures of human resonance. Use those nonverbal channels at work as much as possible. When you must use digital channels, respect their glitches, biases, and limited sensory bandwidth. Don’t expect low-latency replies, assume misunderstandings are innocent, don’t do emotions via text. Mobile is better than text, landline better than mobile, video better than only audio, but proximity still beats all, millions-fold.

The tragedy of technological damage is best explained not in the language of capitalism, materialism, civilization, human biology, or even carbon-based life. The most efficient language is mathematics, and it describes the inevitable collision between creatures’ information-foraging habits and the natural results of their material productivity. We make things we find attractive, and now that’s all we see. This could happen anywhere in the universe.

The Triumph of the Data Scientist is we can understand our minds as ultra-high-performance representational engines. The Tragedy of the Data Scientist is twofold. First, doing computer work damages our own sensory needs: we must sit down indoors for hours, stare at screens, communicate with text, and think through code and numbers. Second, our profession damages the outside world when it manipulates people by aggregating revenue from micro-thefts of human attention and autonomy, and the results addict children and tip elections. We data scientists are both blessed and cursed: we will be the first to know what humankind is made of and the first to offer help, but as well the first to understand our crimes. Let us apply our brains to solve them.