Right now H2o is the leading open source platform for data science and machine learning. It provides a ton of data science tools that are optimized for working with big data. More information (and software) is available at their website. This post is a walkthrough of how to set up a virtual environment using Anaconda that can run H2o so anyone can get started with using their software.
Step 1: Downloading Anaconda
First things first you’re going to want to download the most recent edition of Anaconda from their website, available here: https://www.anaconda.com/download/
Once you’ve followed all the installers instructions and have Anaconda up and running you can go ahead and move to Step 2.
Step 2: Setting Up your Virtual Environment with Anaconda
To set up a virtual environment open Terminal and enter the following command:
conda create --name py35 python=3.6
where “py35” would be the whatever name you want for your environment. Next you’re going to want to activate your environment. Using the command:
source activate py35
Once you have your environment activated you should have the name beside your entries on your terminal. From here you need to install the packages that will allow you to access your conda environments in jupyter notebooks.
conda install jupyter
conda install nb_conda
Now that these are installed you can run all your virtual conda environments in jupyter notebooks, but first you will need to deactivate and reactivate your virtual environment with the following commands:
source deactivate py35
source activate py35
Now when you run a jupyter notebook with the command:
jupyter notebook
you will notice that all of the conda environments are now visible under the “new” tab and an additional header called “Conda” has been added to the page.
Step 3: Installing H2o in your virtual environment
You can find a more detailed description on how to download H2o through conda here. But effectively you’re just going to need to run the following commands in terminal while your source environment is activated:
conda install -c h2oai h2o
Once the package has been downloaded go ahead and open a Jupyter Notebook and see if the following two lines of code run.
import h2o
h2o.init()
If you have the correct version of Java and JDK installed these two should run no problem however if you’ve recently updated Java you will need to revert to an older version (Version 8) as the newest versions are yet to be supported by H2o
(possible) Step 4 : Installing Java Developer Kit
In order for H2o to initialize properly you’re going to need to download a copy of Java and JDK version 8. Once you’ve gone through the installer go back to terminal deactivate your environment then enter the code:
/usr/libexec/java_home -V
This will show you all the versions of JDK you currently have installed. In order to select Version 8 as your default you will need to run the following command:
export JAVA_HOME=`/usr/libexec/java_home -v 1.8.0_192`
Where the number after -v is the copy of version 8 that you installed, which should have come up when u ran the java_home command. Finally you can verify your command worked by checking the current version being used with the command:
java -version
Once this correctly shows Version 8 try rerunning you jupyter notebook and enjoy exploring all of the tools H2o has to offer!