Visualization Laboratory Wiki
Docs» training:ds:2023:distributed_deep_learning_on_ksl_platforms

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
training:ds:2023:distributed_deep_learning_on_ksl_platforms [2023/05/04 11:46] – removed - external edit (Unknown date) 127.0.0.1training:ds:2023:distributed_deep_learning_on_ksl_platforms [2023/05/04 11:46] (current) – ↷ Page moved from training:ds:distributed_deep_learning_on_ksl_platforms to training:ds:2023:distributed_deep_learning_on_ksl_platforms James Kress
Line 1: Line 1:
 +====== Distributed Deep Learning on KSL Platforms ======
 +
 +<WRAP group>
 +
 +<WRAP twothirds column>
 +
 +===== Overview =====
 +
 +With the increasing complexity and size of both Deep Learning (DL) models and datasets, the computational cost of training these model can be non-trivial, ranging from a few tens of hours to even several days. Exploiting parallelism exhibited inherently in the training process of DL models; we can distribute training on multiple GPUs on single or multiple nodes of Ibex. We will survey the available distributed training frameworks (e.g. PyTorch DDP, PyTorch Lightning, Jax) along with demonstrations and hands-on exercises on how to run them on Ibex resources.
 +
 +===== Learning Outcomes =====
 +
 +After attending the training, you will be able to:
 +
 +  * Understand the considerations when refactoring training scripts to scale from 1 to N GPUs
 +  * Understand the data management related to distributed training jobs
 +  * Familiarize how to launch distributed training jobs on Ibex resources
 +  * Understanding the scaling characteristics of your distributed training workload
 +
 +A Quiz will be conducted after the training, which is mandatory to submit to ensure the continued use of KSL resources.
 +
 +</WRAP>
 +
 +<WRAP quarter column><WRAP center round box 100%> {{:icon:twbs:link:w-25:h-auto:calendar3.svg?nolink&}}<tab1> ** Date**
 +
 +  * February 12th, 2023
 +  * 9:00 am - 12:00 pm
 +
 +<WRAP center round box 100%> {{:icon:twbs:link:w-25:h-auto:map.svg?nolink&}}<tab1> ** Venue**
 +
 +  * Room 5220, Level 5, Building 3
 +
 +</WRAP></WRAP>
 +
 +</WRAP>
 +
 +<WRAP column> <WRAP center round box download 100%> {{:icon:twbs:link:w-25:h-auto:globe.svg?nolink&}}**Organizers**
 +
 +{{:icon:twbs:link:w-25:h-auto:person.svg?nolink&}}Didier Barradas Bautista \\ {{:icon:twbs:link:w-25:h-auto:headset-vr.svg?nolink&}}Visualization Core Laboratory \\ {{:icon:twbs:link:w-25:h-auto:envelope-at.svg?nolink&}}didier.barradasbautista@kaust.edu.sa
 +
 +{{:icon:twbs:link:w-25:h-auto:person.svg?nolink&}}Mohsin A. Shaikh \\ {{:icon:twbs:link:w-25:h-auto:headset-vr.svg?nolink&}}Supercomputing Core Laboratory \\ {{:icon:twbs:link:w-25:h-auto:envelope-at.svg?nolink&}}mohsin.shaikh@kaust.edu.sa
 +
 +</WRAP>
 +
 +<wrap indent></wrap> \\ <wrap indent></wrap>
 +
 +</WRAP>
 +
 +<WRAP quarter column><WRAP center round box download 100%> ** Workshop Materials**
 +
 +  * Slides: [[https://www.hpc.kaust.edu.sa/sites/default/files/files/public/DataScienceTrainings/DistributedDL/2023/Dist_DL_Feb2023.pdf|Slides]]
 +  * GitHub: [[https://github.com/mshaikh786/Dist-DL-training|GitHub]]
 +  * Recording: [[https://youtu.be/6qY9V3QMSXw|Recording]]
 +  * Documentation: [[https://kaust-supercomputing-lab.atlassian.net/l/cp/VrJyDPjK|Docs]]
 +
 +</WRAP>
 +
 +<WRAP center round box todo 100%> **Pre-requisites****?**
 +
 +  * Have KAUST IT credentials (i.e. the ones you use to access your KAUST email)
 +  * Bring your laptop and have your terminal ready
 +  * Essential knowledge of Linux shell is necessary.
 +  * Have some experience working with Conda package manager.
 +  * Basic training “Data Science on-boarding on KSL platforms” or possess equivalent knowledge
 +
 +</WRAP>
 +
 +</WRAP>
 +
 +</WRAP>
 +
 +{{tag>workshop}}
 +
  

Site Tools

  • Media Manager

Page Tools

  • Show page
  • Old revisions
  • Backlinks
  • Back to top

User Tools

  • Log In
Visualization Laboratory Wiki

Table of Contents

Welcome to the KVL

  • Home
  • Training Events
  • Facilities
  • Highlights

KVL Documentation

  • Frequently Asked Questions
  • Visualization Tools User Guides
  • AR & VR Tools User Guides
  • Data Science Tools User Guides
  • Facility User Guides