The computer must translate data 0 and 1 from the CPU and turn it into the image you see

These terms are often used interchangeably, but what are the differences that make them each a unique technology?

Technology is becoming more embedded in our daily lives by the minute, and in order to keep up with the pace of consumer expectations, companies are more heavily relying on learning algorithms to make things easier. You can see its application in social media (through object recognition in photos) or in talking directly to devices (like Alexa or Siri).

These technologies are commonly associated with artificial intelligence, machine learning, deep learning, and neural networks, and while they do all play a role, these terms tend to be used interchangeably in conversation, leading to some confusion around the nuances between them. Hopefully, we can use this blog post to clarify some of the ambiguity here.

How do artificial intelligence, machine learning, neural networks, and deep learning relate?

Perhaps the easiest way to think about artificial intelligence, machine learning, neural networks, and deep learning is to think of them like Russian nesting dolls. Each is essentially a component of the prior term.

The computer must translate data 0 and 1 from the CPU and turn it into the image you see

That is, machine learning is a subfield of artificial intelligence. Deep learning is a subfield of machine learning, and neural networks make up the backbone of deep learning algorithms. In fact, it is the number of node layers, or depth, of neural networks that distinguishes a single neural network from a deep learning algorithm, which must have more than three.

What is a neural network?

Neural networks—and more specifically, artificial neural networks (ANNs)—mimic the human brain through a set of algorithms. At a basic level, a neural network is comprised of four main components: inputs, weights, a bias or threshold, and an output. Similar to linear regression, the algebraic formula would look something like this:

The computer must translate data 0 and 1 from the CPU and turn it into the image you see

From there, let’s apply it to a more tangible example, like whether or not you should order a pizza for dinner. This will be our predicted outcome, or y-hat. Let’s assume that there are three main factors that will influence your decision:

  1. If you will save time by ordering out (Yes: 1; No: 0)
  2. If you will lose weight by ordering a pizza (Yes: 1; No: 0)
  3. If you will save money (Yes: 1; No: 0)

Then, let’s assume the following, giving us the following inputs:

  • X1 = 1, since you’re not making dinner
  • X2 = 0, since we’re getting ALL the toppings
  • X3 = 1, since we’re only getting 2 slices

For simplicity purposes, our inputs will have a binary value of 0 or 1. This technically defines it as a perceptron as neural networks primarily leverage sigmoid neurons, which represent values from negative infinity to positive infinity. This distinction is important since most real-world problems are nonlinear, so we need values which reduce how much influence any single input can have on the outcome. However, summarizing in this way will help you understand the underlying math at play here.   

Moving on, we now need to assign some weights to determine importance. Larger weights make a single input’s contribution to the output more significant compared to other inputs.

  • W1 = 5, since you value time
  • W2 = 3, since you value staying in shape
  • W3 = 2, since you've got money in the bank

Finally, we’ll also assume a threshold value of 5, which would translate to a bias value of –5.

Since we established all the relevant values for our summation, we can now plug them into this formula.

The computer must translate data 0 and 1 from the CPU and turn it into the image you see

Using the following activation function, we can now calculate the output (i.e., our decision to order pizza):

The computer must translate data 0 and 1 from the CPU and turn it into the image you see

In summary:

  Y-hat (our predicted outcome) = Decide to order pizza or not

  Y-hat = (1*5) + (0*3) + (1*2) - 5

  Y-hat = 5 + 0 + 2 – 5

  Y-hat = 2, which is greater than zero.

Since Y-hat is 2, the output from the activation function will be 1, meaning that we will order pizza (I mean, who doesn't love pizza).  

If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network. Otherwise, no data is passed along to the next layer of the network. Now, imagine the above process being repeated multiple times for a single decision as neural networks tend to have multiple “hidden” layers as part of deep learning algorithms. Each hidden layer has its own activation function, potentially passing information from the previous layer into the next one. Once all the outputs from the hidden layers are generated, then they are used as inputs to calculate the final output of the neural network. Again, the above example is just the most basic example of a neural network; most real-world examples are nonlinear and far more complex.

The main difference between regression and a neural network is the impact of change on a single weight. In regression, you can change a weight without affecting the other inputs in a function. However, this isn’t the case with neural networks. Since the output of one layer is passed into the next layer of the network, a single change can have a cascading effect on the other neurons in the network.

See this IBM Developer article for a deeper explanation of the quantitative concepts involved in neural networks.

How is deep learning different from neural networks?

While it was implied within the explanation of neural networks, it’s worth noting more explicitly. The “deep” in deep learning is referring to the depth of layers in a neural network. A neural network that consists of more than three layers—which would be inclusive of the inputs and the output—can be considered a deep learning algorithm. This is generally represented using the following diagram:

The computer must translate data 0 and 1 from the CPU and turn it into the image you see

Most deep neural networks are feed-forward, meaning they flow in one direction only from input to output. However, you can also train your model through backpropagation; that is, move in opposite direction from output to input. Backpropagation allows us to calculate and attribute the error associated with each neuron, allowing us to adjust and fit the algorithm appropriately.

How is deep learning different from machine learning?

As we explain in our Learn Hub article on Deep Learning, deep learning is merely a subset of machine learning. The primary ways in which they differ is in how each algorithm learns and how much data each type of algorithm uses. Deep learning automates much of the feature extraction piece of the process, eliminating some of the manual human intervention required. It also enables the use of large data sets, earning itself the title of "scalable machine learning" in this MIT lecture. This capability will be particularly interesting as we begin to explore the use of unstructured data more, particularly since 80-90% of an organization’s data is estimated to be unstructured. 

Classical, or "non-deep", machine learning is more dependent on human intervention to learn. Human experts determine the hierarchy of features to understand the differences between data inputs, usually requiring more structured data to learn. For example, let's say that I were to show you a series of images of different types of fast food, “pizza,” “burger,” or “taco.” The human expert on these images would determine the characteristics which distinguish each picture as the specific fast food type. For example, the bread of each food type might be a distinguishing feature across each picture. Alternatively, you might just use labels, such as “pizza,” “burger,” or “taco”, to streamline the learning process through supervised learning.

"Deep" machine learning can leverage labeled datasets, also known as supervised learning, to inform its algorithm, but it doesn’t necessarily require a labeled dataset. It can ingest unstructured data in its raw form (e.g. text, images), and it can automatically determine the set of features which distinguish "pizza", "burger", and "taco" from one another.

For a deep dive into the differences between these approaches, check out "Supervised vs. Unsupervised Learning: What's the Difference?"

By observing patterns in the data, a deep learning model can cluster inputs appropriately. Taking the same example from earlier, we could group pictures of pizzas, burgers, and tacos into their respective categories based on the similarities or differences identified in the images. With that said, a deep learning model would require more data points to improve its accuracy, whereas a machine learning model relies on less data given the underlying data structure. Deep learning is primarily leveraged for more complex use cases, like virtual assistants or fraud detection.

For further info on machine learning, check out the following video:

What is artificial intelligence (AI)?

Finally, artificial intelligence (AI) is the broadest term used to classify machines that mimic human intelligence. It is used to predict, automate, and optimize tasks that humans have historically done, such as speech and facial recognition, decision making, and translation.

There are three main categories of AI:

  • Artificial Narrow Intelligence (ANI)
  • Artificial General Intelligence (AGI)
  • Artificial Super Intelligence (ASI)

ANI is considered “weak” AI, whereas the other two types are classified as “strong” AI. Weak AI is defined by its ability to complete a very specific task, like winning a chess game or identifying a specific individual in a series of photos. As we move into stronger forms of AI, like AGI and ASI, the incorporation of more human behaviors becomes more prominent, such as the ability to interpret tone and emotion. Chatbots and virtual assistants, like Siri, are scratching the surface of this, but they are still examples of ANI.

Strong AI is defined by its ability compared to humans. Artificial General Intelligence (AGI) would perform on par with another human while Artificial Super Intelligence (ASI)—also known as superintelligence—would surpass a human’s intelligence and ability. Neither forms of Strong AI exist yet, but ongoing research in this field continues. Since this area of AI is still rapidly evolving, the best example that I can offer on what this might look like is the character Dolores on the HBO show Westworld.

Manage your data for AI

While all these areas of AI can help streamline areas of your business and improve your customer experience, achieving AI goals can be challenging because you’ll first need to ensure that you have the right systems in place to manage your data for the construction of learning algorithms. Data management is arguably harder than building the actual models that you’ll use for your business. You’ll need a place to store your data and mechanisms for cleaning it and controlling for bias before you can start building anything. Take a look at some of IBM’s product offerings to help you and your business get on the right track to prepare and manage your data at scale.

What does 0 and 1 mean to a computer?

A switch that is “on” or “closed” passes electrical signal through it, while a switch that is “off” or “open” blocks that signal. Computer Scientists represent an “on” switch with a 1 and an “off” switch with a 0.

What computer only works on 0's and 1's?

First binary computer In 1934, a German civil engineer named Konrad Zuse started to work independently on the development of programmable computers for commercial usage.

Why do computer use zero and ones?

To reduce interference, computers can convert the waves into ones and zeros (or bits) as single pieces of data. Using bits instead of wave forms reduces the effects of interference and results in better quality sound and visuals.

How do computers represent numbers?

Computers use binary (base 2) number system, as they are made from binary digital components (known as transistors) operating in two states - on and off. In computing, we also use hexadecimal (base 16) or octal (base 8) number systems, as a compact form for representing binary numbers.