Cheatsheet for Setting up Deep Learning Cloud VMs


VM Setup Notes

My general notes / cheatsheet on how to set up deep learning environments on cloud infrastructure.


Useful Resource: train-deep-learning-models-on-gpus-using-amazon-ec2-spot-instance

Set up - AWS

  1. Create a volume
  2. Create an instance
    • Choose p3.2xlarge for V100 GPU.
    • NB: Request Spot Instance
    • set Subnet to same location as volume
    • In Configure Security Group choose select existing security group and add the jupyter one (need to remember how I set up this initially.)
  3. Launch and Click the Connect button and copy command and run.

Connect Volume to VM

Checks which drives are available and mounts device to folder.

sudo mkdir /dltraining
sudo mount /dev/xvdf /dltraining

Create Snapshots

Use interface to create snapshots to move volume between subnets incase availability is limited or cost becomes too high.


Instructions to follow when I use it again. Currently GCP preemtible VMs availability is unreliable and V100 VMs are kept alive for a few hours at a time (1-6 hours max), then shutdown due to demand most of the time.

Set up - GCP

export IMAGE_FAMILY=pytorch-latest-gpu
export INSTANCE_NAME="pytorch-instance"
export ZONE="europe-west4-b"
export INSTANCE_TYPE="n1-standard-8"

gcloud compute instances create $INSTANCE_NAME \
        --zone=$ZONE \
        --image-family=$IMAGE_FAMILY \
        --image-project=deeplearning-platform-release \
        --maintenance-policy=TERMINATE \
        --accelerator="type=nvidia-tesla-t4,count=1" \
        --machine-type=$INSTANCE_TYPE \
        --boot-disk-size=500GB \
        --metadata="install-nvidia-driver=True" \


Clone repo:

git clone --single-branch --branch <branchname> https://<username>:<access-code><username>/<repo>.git

Start Jupyter

jupyter notebook --ip=

Copy files

scp -i <pem-key> -r ubuntu@<vm-instance>:<filedir/filename> <destdir>


git clone
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
cd ..

Tmux session start

source activate pytorch_p36
python -m pip install -r requirements.txt

Exporting Tokens

export NEPTUNE_API_TOKEN="<access_token>"
export SLACK_URL="<slack_url>"

Autoreload notebook

%load_ext autoreload
%autoreload 2

Profile pythong code

import cProfile'ds.__getitem__(15)')

Linux Commands

Count number of files in folder

ls -1 | wc -l

Sort output of files in folder by size

du -sh -- * | sort -h
du -sh -- * | sort -rh