# 2.2: Differentiable Circuits And PyTorch

10 Sep 2017In section 2.1, we delved into the nuts of bolts of linear regression using Python – the linear model, stochastic gradient descent, computing multivariate derivatives for the mean squared error …

Being able to compute linear relationships using gradient descent is certainly important, but there’s more to the story. To create truly useful machine learning pipelines, we need to work with
many more variables and far more complex functions. Thus, in this section we go a **level of abstraction** higher.

### Differentiable Circuits

But what does that really mean? The concept of **differentiable circuits** is key to reasoning about complex machine learning pipelines. (This is a non-standard term, inspired by Karpathy’s Hacker’s Guide To Neural Networks). For example, envision a simple real-valued circuit in which numbers “flow” along edges and interact at intersections.

### Tensors, Variables, and Functions in PyTorch

PyTorch is a mathematical framework that allows you to
**optimize** equations using gradient descent. Whereas in regular Python we work
with numbers and numpy arrays, with PyTorch we work
with multidimensional `Tensor`

and `Variable`

objects that store a history of
operations.

For example, here’s how you create a “number” in PyTorch:

```
import torch
num = torch.FloatTensor([5.0])
num2 = torch.LongTensor([5])
```

Note that every number (tensor) is actually an array. Furthermore, tensors can
have multiple dimensions, representing (for example) stored images or text in a
single variable. Tensors can also be of different data types, like `FloatTensor`

or
`LongTensor`

, depending on the kind of data they store.

A `Variable`

is a special type of `Tensor`

that also stores a history of operations,
and can be **differentiated** and modified.

```
import torch
from torch.autograd import Variable
m = Variable(torch.FloatTensor([1.0]), requires_grad=True)
input = Variable(torch.FloatTensor([1.0]), requires_grad=False)
```

If a variable should be modified — like, for example, the slope parameter $m$ in
linear regression — we set the `requires_grad`

flag to `True`

during creation.
But some variables, like input data, shouldn’t be mutable at all; thus, we
set `requires_grad`

to `False`

for those data types.

Lastly, variables can be modified to create new variables using functions.

```
import torch
from torch.autograd import Variable
import torch.nn.functional as F
x = Variable(torch.FloatTensor([1.0]), requires_grad=True)
y = Variable(torch.FloatTensor([5.0]), requires_grad=True)
z = (x + y)/(x*y)
w = F.tanh(z)
d = F.mse_loss(w, z)
print (d) # Prints out variable containing 0.1342, MSE between z and w
```

These operations are mostly self-explanatory,
working like regular Python arithmetic operations and math functions, but
with the added constraint that variables remember their history. Thus, $d$ is
not only the value ($0.1342$) — it is an expression, involving $x, y, w, z$,
that *evaluates* to $0.1342$.

### Optimization Using PyTorch

The reason that variable history is important is that it allows us to *update
variables* in order to optimize equations.

For example:

```
import torch
from torch.autograd import Variable
import torch.nn.functional as F
from torch.optim import SGD, Adadelta
y = Variable(torch.FloatTensor([1.0]), requires_grad=True)
x = Variable(torch.FloatTensor([5.0]), requires_grad=True)
optimizer = SGD([x, y], lr=0.1)
for i in range(0, 100):
loss = (x-y).abs() # Minimizes absolute difference
loss.backward() # Computes derivatives automatically
optimizer.step() # Decreases loss
optimizer.zero_grad()
print (x) # Evaluates to 3.0
print (y) # Evaluates to 3.0 -- optimization successful
```

You can create an optimizer to compute the derivatives and perform *stochastic
gradient descent*, passing it a list of parameters to optimize and a learning
rate. In the above example, we perform SGD to minimize the absolute difference
between two variables *x* and *y*. Eventually, after 100 steps, the variables
both become equal. The fact that PyTorch variables can change to fulfill an
equation *differentiates* them (no pun intended) from regular Python `int`

and
`float`

objects.

You can even use more advanced optimizers, such as Adadelta. If SGD is like a ball rolling down a hill, then Adadelta is like a person climbing down the same hill. It’s slower and more exploratory, but less likely to overshoot or get stuck; thus, it’s used a lot with more complex deep neural nets.

```
optimizer = Adadelta((x, y), lr=0.1)
```

We won’t get too much into the details of variables, functions, and optimizers here. But we hope that you have a general familiarity with how PyTorch can be used to speed up and solve optimization problems. Next up: using these new tools to tackle more exotic regression tasks!