Table of Contents

Data Science Tools User Guides

Overview

Data science is a field that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data. It is a rapidly growing field with many applications, including business intelligence, healthcare, finance, and marketing.

The KAUST Visualization Core Lab (KVL) is hosting several hands-on workshops as part of ongoing efforts to build capacity in core data science skills at KAUST and in the Kingdom. We try to make all the material available on several channels, like the ones below.

GitHub

Our GitHub repository contains code and data for various data science projects. For more material, check the Kaust RCCL GitHub.

Data Science for Beginners

This video overviews how to use the HPC clusters to start your data science projects. And this video will guide you to setup your Windows machine for data science.

Data Science Tools and Technologies

We have several tools to help you start your Data Science project at KAUST. KAUST Students, Faculties, Researchers & Staff,You can access JupyterHub as well as the Remote work stations with your KAUST Credentials

Ibex Starting Pack

Access to ibex

Unless your username is associated with either a Project, Principal Investigator, or a Class, access to ibex resources will be severely constrained. To nominate your supervisor, please visit http://my.ibex.kaust.edu.sa . You can check if a PI is defined using the “whoismypi” program on a login node.

To use Ibex resources in any significant way, you must be under the direct supervision of the Research Faculty or a Director. From RCAC clarification, it is only professors (full, associate, assistant, and research).

Welcome to KAUST! As a new user, especially if it's your first time, here are some essential resources to get you started:

1. Ibex 101: Begin with the Ibex 101 slides and demos. These will introduce you to all the resources and commands available in the cluster. This foundational knowledge will be referenced frequently.

2. Ibex Training: Explore our detailed documentation on various topics, including Data Science, for more comprehensive information. Be sure to check out these wikis:

3. KAUST Visualization YouTube Channel: Visit our YouTube channel here for many tutorials on data science workshops and more. These videos are beneficial, covering the best practices for installing libraries and coding with Ibex.

4. Python and Conda: on the subject of managing your environments, we encourage you to install Conda using the official guide here

5. Jupyter, VsCode and Rstudio: To deploy your prefered IDE on ibex check the video here

Important storage tips

  • /home/$USER - this is your personal home directory and is limited to 200GB of data. Do not run HPC jobs in your home directory - use it to keep configurations, scripts, etc.
  • /ibex/user/$USER - this is your HPC storage. Use this storage to save the output and input to compute jobs. This is much faster than your home directory, so your jobs will run faster. This directory has a limit of 1.5 TB. If you need more storage, contact the Ibex support team.

These resources will equip you with the knowledge and skills to maximize your time at KAUST. We hope this docuwiki page is helpful! Please let us know if you have any questions. Happy learning!


Further useful links

KVL Training on Data Science

  • KVL often runs training on Data Science in collaboration with the KSL. Check out our current and past workshops for more: Training Events