Data Science Tools User Guides
Overview
Data science is a field that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data. It is a rapidly growing field with many applications, including business intelligence, healthcare, finance, and marketing.
The KAUST Visualization Core Lab (KVL) is hosting several hands-on workshops as part of ongoing efforts to build capacity in core data science skills at KAUST and in the Kingdom. We try to make all the material available on several channels, like the ones below.
GitHub
Our GitHub repository contains code and data for various data science projects. For more material, check the Kaust RCCL GitHub.
Data Science for Beginners
This video overviews how to use the HPC clusters to start your data science projects. And this video will guide you to setup your Windows machine for data science.
Data Science Tools and Technologies
We have several tools to help you start your Data Science project at KAUST. KAUST Students, Faculties, Researchers & Staff,You can access JupyterHub as well as the Remote work stations with your KAUST Credentials
- Binder (small resource and open to the public)
- Remote workstations (you have to request the service the first time you use it here)
Ibex Starting Pack
Access to ibex
Unless your username is associated with either a Project, Principal Investigator, or a Class, access to ibex resources will be severely constrained. To nominate your supervisor, please visit http://my.ibex.kaust.edu.sa . You can check if a PI is defined using the “whoismypi” program on a login node.
To use Ibex resources in any significant way, it must be under the direct supervision of the Research Faculty or a Director.
Welcome to KAUST! As a new user, especially if it's your first time, here are some essential resources to get you started:
1. Ibex 101: Begin with the Ibex 101 slides and demos. These will introduce you to all the resources and commands available in the cluster. This foundational knowledge will be referenced frequently.
2. Ibex Training: Explore our detailed documentation on various topics, including Data Science, for more comprehensive information. Be sure to check out these wikis:
3. KAUST Visualization YouTube Channel: Visit our YouTube channel here for many tutorials on data science workshops and more. These videos are beneficial, covering the best practices for installing libraries and coding with Ibex.
4. Python and Conda: on the subject of managing your environments, we encourage you to install Conda using the official guide here
5. Jupyter, VsCode and Rstudio: To deploy your prefered IDE on ibex check the video here
Important storage tips
- /home/$USER - this is your personal home directory and is limited to 200GB of data. Do not run HPC jobs in your home directory - use it to keep configurations, scripts, etc.
- /ibex/user/$USER - this is your HPC storage. Use this storage to save the output and input to compute jobs. This is much faster than your home directory, so your jobs will run faster. This directory has a limit of 1.5 TB. If you need more storage, contact the Ibex support team.
These resources will equip you with the knowledge and skills to maximize your time at KAUST. We hope this docuwiki page is helpful! Please let us know if you have any questions. Happy learning!
Further useful links
- ibex@hpc.kaust.edu.sa - email this address to create a service request
- https://kaust-ibex.slack.com - Use #general for simple queries
KVL Training on Data Science
- KVL often runs training on Data Science in collaboration with the KSL. Check out our current and past workshops for more: