• Recent Articles

    Setup Jupyter Notebook on Hortonworks Data Platform (HDP)

    Setup Jupyter Notebook on Hortonworks Data Platform (HDP)

    Jupyter Notebook is a web application that allows creating and sharing documents that contain live code, equations, visualizations and explanatory text.

    A notebook is interactive, so you can executive code directly from a web browser. Jupyter supports multiple kernels with different programming languages.

    My Setup

    HDP 3.1.1
    Python v2.7x
    Apache Spark 2.3.0
    CentOS v7.7

    Install EPEL

    # wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm;
    # rpm -ivh epel-release-latest-7.noarch.rpm 

    Once EPEL is enabled, prepare Python for next step. 

    #  yum upgrade python-setuptools

    Install Python package management system in order to install extra Python libraries.

    Install pip 

    #wget https://bootstrap.pypa.io/ez_setup.py -O - | python
    #yum install python-pip python-wheel python-devel gcc

    Install a few basic data science related Python library

    #pip install --upgrade pip wheel pandas numpy scipy scikit-learn matplotlib virtualenv

    Install Jupyter Notebook:

    # pip install jupyter

    Setup Jupyter Notebook configuration file: 

    # jupyter notebook --generate-config
    # mkdir -p /data/conf
    # chown -R spark:hadoop /data
    # cp ~/.jupyter/jupyter_notebook_config.py /data/conf/

    Set the Following Paths :

    export SPARK_HOME="/usr/hdp/current/spark2-client"
    export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH

    Start Jupyter notebook

    jupyter notebook --config=/data/conf/jupyter_notebook_config.py --ip=JUPYTER_HOST --port=JUPYTER_PORT

    No comments