Parallel Neural Network Training using Multiple GPUs
Deep learning is quickly becoming the most powerful and ubiqutous tool within machine learning, performing well in a vast array of applications1. However, for problems requiring many hidden layers for accurate calculations, the training of these neural networks can quickly become very computationally expensive. In this project, I learned how to significantly speed-up neutral network training by using CUDA to perform calculations on a GPU and MPI to perform these calculations in parallel across multiple Tesla K80 devices.