Modules

Terrace’s Module class is a simpler, faster way of defining PyTorch modules. Instead of replicating the structure of your neural network in both the __init__ and forward methods, you can define everything in the forward method. All the modules are created the first time you call the method, and are re-used afterward.

Here’s an example of a basic neural network with one hidden layer:

import torch
import torch.nn as nn
import torch.nn.functional as F
import terrace as ter

class BasicNN(ter.Module):

    def forward(self, x):
        # we always need to call start_forward at the beginning
        # of the method, so the module knows to reset all the
        # internal counters
        self.start_forward()

        # Pytorch >= 1.13 also has a LazyLinear class
        # Note that I'm only defining the output dimension;
        # the input dimension is automatically determined
        hid = F.relu(self.make(ter.LazyLinear, 100)(x))
        out = self.make(ter.LazyLinear, 1)(hid)
        return out

x = torch.randn(16, 128)
model = BasicNN()
out = model(x)

print(out.shape)
torch.Size([16, 1])

The key idea is to use the make method to both instantiate and run submodules. During the first forward call, calling make(ModuleType, *args, **kwargs) will instantiate a submodule ModuleType(*args, **kwargs). When we call forward again, make will simply return the same module it gave us before.

In the above example, the reason we’re using the LazyLinear module instead of nn.Linear is that LazyLinear will infer the input dimension automatically. But how? It turns out we already have all the tools we need to easily implement the LazyLinear class ourselves. Here’s how we could do it with only a couple lines:

class MyLazyLinear(ter.Module):

    def __init__(self, out_feat):
        super().__init__()
        self.out_feat = out_feat

    def forward(self, x):
        self.start_forward()
        in_feat = x.size(-1)
        return self.make(nn.Linear, in_feat, self.out_feat)(x)

linear = MyLazyLinear(16)
print(linear(x).shape)
torch.Size([16, 16])

Terrace provides a couple dimension-inferring wrappers around PyTorch’s modules, including LazyLayerNorm and LazyMultiheadAttention, but nothing remotely comprehensive. However, as you can see, it is very easy to create your own wrappers.

Gotchyas

While terrace’s Module is more convenient in most cases, there are some gotchyas you’ll need to consider:

First, and most importantly, data-dependant control flow will not work out of the box, and (very unfortunately) will sometimes fail silently.

For instance, the following model will not work:

class BadNetwork1(ter.Module):

    def forward(self, x):
        self.start_forward()
        h = self.make(ter.LazyLinear, 10)
        # the number of times this loop
        # get executed is data-dependant
        for n in range(int(torch.amax(x))):
            h = self.make(ter.LazyLinear, 10)(x)

When we run this model the first time, it will work fine. However, as soon as we run it on a tensor whose maximum value is larger than the first tensor we gave the model, it will throw an error.

This is a pretty contrived example. But there is a far more insidious example that can really mess you up:

class BadNetwork2(ter.Module):

    def forward(self, x):
        self.start_forward()
        # here we try to use two different models depending
        # on the mean value of x. What could go wrong?
        if x.mean() > 0:
            return self.make(ter.LazyLinear, 10)(x)
        else:
            return self.make(ter.LazyLinear, 10)(x)

In this case, we want to use different weights in our neural network in a data-dependant manner. However, this will silently fail! Terrace uses a counter to determine what module to run when make is called. In both the if and else clauses of the example, make will return the first LazyLinear module we initialized. This means that we’re actually using the same weights for all inputs! The correct way to have the desired effect is to make sure you initialize both linear layers in the same order every time:

class FixedNetwork2(ter.Module):

    def forward(self, x):
        self.start_forward()

        if_linear = self.make(ter.LazyLinear, 10)
        else_linear = self.make(ter.LazyLinear, 10)

        if x.mean() > 0:
            return if_linear(x)
        else:
            return else_linear(x)

Warning

Be very careful using Terrace Modules with data-dependant control flow.

Another, less insidious gothcya is that, since Module parameters are initialized lazily, you’ll need to make sure to run your module at least once before calling its parameters(). For instance, you need to run all your models on a dummy batch before configuring any optimizers for training.