
How AI works: from sorcery to science

the men who can manage men manage the men

“As long as there is poverty there will be gods.”

history we should have to give the palm to monarchy; democracies, by contrast, have been hectic interludes.

Egypt was “the gift of the Nile,” and Mesopotamia built successive civilizations “between the rivers” and along their effluent canals.

. A knowledge of history may teach us that civilization is a co-operative product, that nearly all peoples have contributed to it; it is our common heritage and debt; and the civilized soul will reveal

It is a precarious enterprise, and only a fool would try to compress a hundred centuries into a hundred pages of hazardous conclusions. We proceed.

. Ideally parentage should be a privilege of health, not a by-product of sexual agitation.

Men, women, and children left home and family, authority and unity, to work as individuals, individually paid, in factories built to house not men but machines. Every decade the machines multiplied and became more complex; economic maturity (the capacity to support a family) came later; children no longer were

indispensable, in every land and age. To the unhappy, the suffering, the bereaved, the old, it has brought supernatural comforts valued by millions of souls as more precious than any natural aid.

remarkable how many American Negroes have risen to high places in the professions, arts, and letters in the last one hundred years despite a thousand social obstacles.

who can manage only things, and the men who can manage money manage all.”34

violent revolutions do not so much redistribute wealth as destroy it. There may be a redivision of the land, but the natural inequality of men soon re-creates an inequality of possessions and privileges, and raises to power a new minority with essentially the same instincts as in the old.

itself in treating every man or woman, however lowly, as a representative of one of these creative and contributory groups.

The rate of concentration varies (other factors being equal) with the economic freedom permitted by morals and the laws.

economic assets; marriage was delayed; premarital continence became more difficult to maintain.

Even the skeptical historian develops a humble respect for religion, since he sees it functioning, and seemingly

to make the order and operation of a society, so the imitative majority follows the innovating minority,

he may ruin his life before he matures sufficiently to understand that sex is a river of fire that must be banked and cooled by a hundred restraints if it is not to consume in chaos both the individual and the group.

Aristocracy withdraws a few men from the exhausting and coarsening strife of economic competition, and trains them from birth, through example, surroundings, and minor office, for the tasks of government; these tasks require a special preparation that no ordinary family or background can provide.

the majority of such abilities, in nearly all societies, is gathered in a minority of men.

and is periodically alleviated by violent or peaceable partial redistribution.

. He has put the liberal gospel of liberty to his use by arguing that businessmen left relatively free from transportation tolls and legislative regulation can give the public a greater abundance of food, homes, comfort, and leisure than has ever come from industries managed by politicians,

It is good that new ideas should be heard, for the sake of the few that can be used; but it is also good that new ideas should be compelled to go through the mill of objection, opposition, and contumely; this is the trial heat which innovations must survive

Since men love freedom, and the freedom of individuals in society requires some regulation of conduct, the first condition of freedom is its limitation; make it absolute and it dies in chaos. So the prime task of government is to establish order;

before being allowed to enter the human race.

enlightenment of the mind and the improvement of character, the only real emancipation is individual, and the only real revolutionists are philosophers and saints.

The only real revolution is in the

In this analysis human beings are normally equipped by “nature” (here meaning heredity) with six positive and six negative instincts, whose function it is to preserve the individual, the family, the group, or the species.

If we were to judge forms of government from their prevalence and duration in

The excessive increase of anything causes a reaction in the opposite direction;…

We conclude that the concentration of wealth is natural and inevitable,

man who is below the average in economic ability desires equality; those who are conscious of superior ability desire freedom; and in the end superior ability has its way.

The concentration of wealth is a natural

Pugnacity, brutality, greed, and sexual readiness were advantages in the struggle for existence.

Nevertheless, known history shows little alteration in the conduct of mankind.

business chicanery,

, being uncertain when he might eat again; insecurity is the mother of greed, as cruelty is the memory—if only in the blood—of a time when the test of survival (as now between states) was the ability to kill.

History is color-blind, and can develop a civilization (in any favorable environment) under almost any skin.

result of this concentration of ability, and regularly recurs in history.

. Generations of men establish a growing mastery over the earth, but they are destined to become fossils in its soil.

In organic periods men are busy building; in critical periods they are busy destroying.69

On one point all are agreed: civilizations begin, flourish, decline, and disappear—or linger on as stagnant pools left by once life-giving streams.

If we put the problem further back, and ask what determines whether a challenge will or will not be met, the answer is that this depends upon the presence or absence of initiative and of creative individuals with clarity of mind and

energy of will (which is almost a definition of genius), capable of effective responses to new situations (which is almost a definition of intelligence).

Since inequality grows in an expanding economy, a society may find itself divided between a cultured minority and a majority of men and women too unfortunate by nature or circumstance to inherit or develop standards of excellence and taste.

Since we have admitted no substantial change in man’s nature during historic times, all technological advances will have to be written off as merely new means of achieving old ends

Middle Ages and the Renaissance, which stressed mythology and art rather than science and power, may have been wiser than we, who repeatedly enlarge our instrumentalities without improving our purposes.

and no matter how many difficulties we surmount, how many ideals we realize, we shall always find an excuse for being magnificently miserable;

hyperparameters, introduced in Chapter 3). And so, automatic machine learning, or AutoML, was born. Most cloud-based commercial machine learning platforms, like Microsoft’s Azure Machine Learning or Amazon’s SageMaker Autopilot, include an AutoML tool that will create the machine learning model for you; you need only supply the dataset.

In one dimension, the inputs might be values that change over time, also known as a time series. In two dimensions, we’re talking about images. Three-dimensional CNNs exist to interpret volumes of data, like a stack of magnetic resonance images or a volume constructed from a LiDAR point cloud. In

In-context learning is different from fine-tuning a model. In fine-tuning, a previously trained model is tailored to a task by updating the weights using new training data. In-context learning adds new information to the LLM as part of the prompt while holding the model’s weights fixed.

The energy at different frequencies can be displayed in two

Switching to ReLUs was, therefore, a double win: improved network performance and speed.

If the data is bad, the model is bad. Throughout the book, we’ll return to this notion of “good” and “bad” data.

Implementing a traditional, fully

CNNs decompose inputs into small parts, then groups of parts and still larger groups of groups of parts, until the entire input is transformed from a single whole into a new representation: one that is more easily understood by what amounts to a traditional neural network sitting at the top of the model.

More muddy water, but I’m going to argue that there’s no strict definition of deep

Models learn from data; the more, the better because more data means an improved representation of what the model will encounter when used.

While it was long known that the right weight and bias values would adapt a network to the desired task, what was missing for decades was an efficient way to find those values.

Vincent van Gogh is my favorite artist. Something about his style speaks to me,

connected neural network is an exercise for machine learning students. It’s not trivial, but it’s something most people can accomplish with effort.

There is no programming because, most of the time, we have no idea what the algorithm should be.

attempted to come up with a logical response based on the best information available to it.

future systems to be combinations of models, including models that validate output before returning it to the user.

The reason why AI researchers are so excited by LLMs is that somewhere along the way, while learning to be expert text generators, LLMs also learn a host of emergent abilities, including question answering, mathematical reasoning, high-quality computer programming, and logical reasoning.

If nothing else, this exercise demonstrates that care is necessary when querying LLMs.

Training a model is fundamentally different from programming. In programming, we implement the algorithm we want by instructing the computer step by step. In training, we use data to teach the model to adjust its parameters to produce correct output.

, we must ensure that our models have seen all the kinds of inputs they will encounter in the wild—they should interpolate, not extrapolate.

. It is still a statistical prediction engine.

The chapter’s examples warn us to be careful when assuming AI operates as intended. Did the model learn what we wanted it to learn? Was it influenced by correlations in the data that we didn’t notice or, worse still, that we are too limited to discern? Think back to the huskies versus wolves example.

GPT-4 is a good coder, but not a great coder. It can save time but isn’t yet able to replace a human software engineer.

models learned to detect things in the training data that correlated with the intended targets (dogs, wolves, tanks) but didn’t learn about the targets themselves.

. The model knew the questions were nonsense, but unless it was explicitly told to acknowledge that fact, it instead

AI lives and dies by data and is only as good as the data we feed to it. If the dataset is biased, the AI is biased.

In AI, the input is a feature vector, a collection of whatever is appropriate for the task at hand. In this chapter, we used two feature vectors: measurements of a flower and images of a handwritten digit.

What’s most amazing to me is that modern AI is, at its core, entirely arrangements of humble neurons trained with data using backpropagation and gradient descent.

The critical point for us to remember is that convolving an image with different kernels highlights different aspects of the image.

. It’s often good to dive into a subject to get an overview before coming back to understand things at a deeper level. In other words, we rush in to get a feel for the topic before exploring more methodically.

That’s all a neuron does: it multiplies its inputs by the weights, sums the products, adds the bias value, and passes that total to the activation function to produce the output. Virtually all the fantastic accomplishments of modern AI are due to this primitive construct. String enough of these together in the correct configuration, and you have a model that can learn to identify dog breeds, drive a car, or translate from French to English.

we construct neural networks from basic units that

A good example is an audio signal, which we usually think of as one-dimensional, a voltage changing over time that drives the speaker. However, audio signals contain energy at different frequencies.

That’s why we evaluate the model with the held-out test set. To the model, the test set contains new, unseen data it didn’t use to

Convolution is a mathematical operation with a formal definition involving integral

as the rooster crowing doesn’t cause the sun to rise, but if such a correlation is observed often enough, the human mind begins to see one as causing the other, even when there is no real evidence of this. Why humans act this way isn’t hard to understand. Evolution favored early humans who made such associations because, sometimes, the associations led to behavior beneficial for survival.

For a long time, it was assumed that the initial weights and biases didn’t matter much; just select small numbers at random over some range.

Such abilities emerged from the model when trained; they were not intended. This is why I believe future historians will mark fall 2022 as the dawn of true AI. Hold on to your hats; it gets better.

Under the hood, generative models are neural networks built from the same essential components.

This is not the standard way to train a neural network, but it works for something as modest as a single neuron. We’ll discuss standard network training later in the chapter.

Some have even fooled experts into believing the model had learned something fundamental about language or the like when, instead, it had learned extremely subtle correlations in the training data that no human could (easily) detect.

British statistician George Box, who said that all models are wrong, but some are useful.

Because models learn from data, we must use datasets that are as complete as possible so our models interpolate and do not extrapolate.

The phrase “do whatever we know how to order it to perform” implies programming. Indeed, Lovelace wrote a program for the Analytical Engine. Because of this, many people consider her to be the first computer programmer.

modify its parameters. The model’s performance on the test set is a clue to its generalization abilities.

Out of training data? No worries, data augmentation will invent some by slightly modifying the data you already have. Data augmentation takes the existing training data and mutates it to produce new data that might plausibly have been created by the same process that made the actual training data.

This suggests that we can expect great things as more advanced transformer architectures come along; architectures designed to increase the power of LLMs’ emergent skills.

To recap, support vector machines, decision trees, and random forests use data to generate functions according to a carefully crafted algorithm designed by a human. That is neither symbolic AI nor connectionism to me, but curve fitting or, perhaps more accurately, optimization.

As I’ve already mentioned several times, the training set is key to conditioning the model.

These examples imply that there is an art to properly interacting with large language models.

learning other than that it involves neural networks with many layers.

The answer has to do with preprocessing. Raw data, like the percent alcohol, is seldom used with machine learning models as is. Instead, each feature is adjusted by subtracting the average value of the feature over the training set and dividing that result by a measure of how scattered the data is around the average value (the standard deviation). The original alcohol content was 12.29 percent, a reasonable value for wine, but after scaling, it became –0.7359.

Generative AI is an umbrella term for models that create novel output, either independently (randomly) or based on a prompt supplied by the user. Generative models do not produce labels but text, images, or even video.

Are the classical models symbolic AI or connectionism? Are they AI at all? Do they learn, or are they merely mathematical tricks?

Training a network is more like physiology: how does one part work with another? The anatomy (architecture) was there, but the physiology (training process) was incompletely understood. That changed over the decades, courtesy of key algorithmic innovations: backpropagation,

. Choi put it best in her TED talk: common sense is the dark matter of language. Dark matter and dark energy make up 95 percent of the universe, with ordinary matter (meaning everything we can see) the remaining 5 percent.

Think of the network’s structure, known as its architecture, as anatomy. In anatomy, we’re interested in what constitutes the body: this is the heart, that’s the liver, and so on.

For example, a rooster crows, and the sun comes up. The two events are time-dependent: the rooster first, then the sun. This correlation does not imply causation,

perform a simple task: collect input values, multiply each by a weight value, sum, add a bias value, and pass the result to an activation function to create an output value. In other words, many input numbers become one output number. The collective behavior emerging from thousands to millions of such units leading to billions of weight values lets deep learning systems do what they do.

. We’re beginning to appreciate how important it is to fully train, characterize, test, and understand our machine learning models.

All I needed to do was put application-specific code in the empty event handlers to do things when the user clicked a button or selected a menu option. I probably saved myself a good hour or two, and avoided a lot of frustration trying to remember the incantations necessary to set up an application and get its widgets and windows to behave correctly.

Instead, I consider these models to be a fancy form of curve fitting—the output of an algorithm employing an optimization process to produce a function that best characterizes the training data, and, hopefully, the data encountered by the model in the wild.

dimensions: the horizontal dimension is time, and the vertical dimension is frequency, usually with lower frequencies at the bottom and higher frequencies at the top. The intensity of each frequency becomes the intensity of a pixel to transform the audio signal from a one-dimensional, time-varying voltage into a two-dimensional spectrogram, as shown

Therefore, the rightmost node uses a different activation function known as a sigmoid (also called a logistic). The sigmoid produces an output between 0 and 1.

We have choices for the activation function, but in modern networks it’s most often the rectified linear unit (ReLU) mentioned in Chapter 2.

When pushed, GPT-4 suddenly “realizes” that there is a more straightforward answer.

AI is big bucks. AI runs on data, and these companies gobble up all the data we freely give them in exchange for their services.

The model has parameters, which control the model’s output. Conditioning a model, known as training, seeks to set the model’s parameters in such a way that they produce the correct output for a given input.

I trained the neuron by searching for a set of three weights and a bias value producing an output that, when rounded to the nearest whole number, matched the class label for an iris flower—either 0, 1, or 2.

In 1936, a 24-year-old Englishman named Alan Turing, still a student at the time, wrote a paper that has since become the cornerstone of computer science. In this paper, Turing introduced a generic conceptual machine,

The primary goal of machine learning is to condition a model using known data so that the model produces meaningful output when given unknown data. That

Fitting data to a function, like a line, seeks to create the best possible fit, or the line that best explains the measured data. In machine learning, we instead want a model that learns the general characteristics of the training data to generalize to new data.

. Large language models offer a new mode of attack to both summarize notes from records and merge multiple notes into a coherent report. Additionally, LLMs can extract information from free-form text and export it as structured data.

. In the future, when you call your doctor, you might very well be directed to discuss your case with an AI. And eventually, the AI’s summary of the discussion might be all you need to get a prescription from the doctor.

. However, letting LLMs direct other AI models and tools to do science autonomously is a more ambitious research

It is important to distinguish between two concepts: the appearance of consciousness and actual consciousness. When an AI model generates responses that are indistinguishable from human behavior, it may give the appearance of consciousness. However, this doesn’t necessarily imply that the AI possesses actual consciousness.

It is important to distinguish between two concepts: the appearance of consciousness and actual consciousness. When an AI model generates responses that are indistinguishable from human behavior, it may give the appearance of consciousness. However, this doesn’t necessarily imply that the AI possesses actual consciousness.

In conclusion, while an AI language model like me might be able to simulate conscious behavior to a high degree of fidelity, it doesn’t necessarily imply that I possess actual consciousness.

Imagine a world where AI models are aligned with human values and society, where the models understand the best we have to offer and work to promote that at all times; in other words, a world where AI, because it lacks our animal drives and instincts, consistently represents the “better angels of our nature,” to borrow Lincoln’s phrase.

, I fully expect future AI systems to be gloriously Byzantine evolutions of the basic neural network model

controlled it, and put it to work.

though the emergent abilities of LLMs may appear to lean somewhat in that direction for now. Fire was once magical too, but our ancestors understood it, contained it,

I think that there is a lot of fear about robots and artificial intelligence among some people, whereas I’m more afraid of natural stupidity.