Israeli Checkpoint - Search News

Saving memory using gradient-checkpointing

The strategy we use here is to mark a subset of the neural net activations as checkpoint nodes. For the simple feed-forward network in our example, the optimal choice is to mark every sqrt(n)-th node ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

Trending now