-->
  • Recent Articles

    How to Create a Highly Available NFS Service with Gluster and Oracle Linux 8

     Introduction

    In this tutorial , we will create an NFS service hosted by three instances: ol-node01, ol-node02, and ol-node03. These instances will replicate a Gluster volume for data redundancy and use clustering tools for service redundancy.

    A fourth instance named ol-client will mount this NFS service for demonstration and testing.

    This tutorial is targeted at Oracle Linux 8 users.

    Components

    • Corosync provides clustering infrastructure to manage which nodes are involved, their communication, and quorum.
    • Pacemaker manages cluster resources and rules of their behavior.
    • Gluster is a scalable and distributed filesystem.
    • Ganesha is an NFS server that can use many different backing filesystem types, including Gluster.

    Objectives

    In this tutorial , you’ll learn to:

    • Create a Gluster volume
    • Configure Ganesha
    • Create a Cluster
    • Create Cluster services

    Prerequisites

    • Four Oracle Linux 8 instances installed with the following configuration:
      • a non-root user with sudo permissions
      • ssh keypair for the non-root user
      • ability to ssh from one host (ol-node01) to the others (ol-node02,ol-node03) using passwordless ssh login
      • additional block volume for use with gluster

    Setup Lab Environment

    If not already connected, open a terminal and connect via ssh to each instance mentioned above.

    ssh oracle@<ip_address_of_instance>

    Install Software

    Enable the required Oracle Linux repositories before installing the Corosync, Ganesha, Gluster, and Pacemaker software.

    1. (On all nodes) Install the Gluster yum repository configuration.

      sudo dnf install -y oracle-gluster-release-el8
      
    2. (On all nodes) Enable the repositories.

      sudo dnf config-manager --enable ol8_addons ol8_UEKR6 ol8_appstream
      
    3. (On all nodes) Install the software.

      sudo dnf install -y corosync glusterfs-server nfs-ganesha-gluster pacemaker pcs pcp-zeroconf fence-agents-all
      

    Create the Gluster volume

    Prepare each attached block volume to create and activate a replicated Gluster volume.

    1. (On all nodes) Create an XFS filesystem on /dev/sdb with a label of gluster-000.

      sudo mkfs.xfs -f -i size=512 -L gluster-000 /dev/sdb
      
      • -f: Forces overwriting the device when detecting an existing filesystem.
      • -i size: Sets the filesystem’s inode size, which defaults to a value of 256 bytes.
      • -L: Sets the filesystem label, which cannot exceed 12 characters in length.
    2. (On all nodes) Create a mountpoint, add a fstab(5) entry for a disk with the label gluster-000, and mount the filesystem.

      sudo mkdir -p /data/glusterfs/sharedvol/mybrick
      echo 'LABEL=gluster-000 /data/glusterfs/sharedvol/mybrick xfs defaults  0 0' | sudo tee -a /etc/fstab > /dev/null
      sudo mount /data/glusterfs/sharedvol/mybrick
      
    3. (On all nodes) Enable and start the Gluster service.

      sudo systemctl enable --now glusterd
      
    4. Configure the firewall to allow traffic on the ports that are specifically used by Gluster.

      sudo firewall-cmd --permanent --zone=trusted --add-source=10.0.0.0/24
      sudo firewall-cmd --permanent --zone=trusted --add-service=glusterfs
      sudo firewall-cmd --reload
      
    5. (Optional) Ensure that each node has a resolvable name across all the nodes in the pool.

      Configure using DNS resolution for each hostname or using the /etc/hosts file instead. When using the hosts file, edit the file on each node and add entries for all Gluster nodes.

      The free lab environment already has name resolution configured.

    6. (On ol-node01) Create the Gluster environment by adding peers.

      sudo gluster peer probe ol-node02
      sudo gluster peer probe ol-node03
      
    7. (On all nodes) Show that the peers have joined the environment.

      sudo gluster peer status
      

      Example Output:

      Number of Peers: 2
      
      Hostname: ol-node02
      Uuid: 2607976e-7004-47e8-821c-7c6985961cda
      State: Peer in Cluster (Connected)
      
      Hostname: ol-node03
      Uuid: c51cb4aa-fccd-47f7-9fb2-edb5766991d2
      State: Peer in Cluster (Connected)
      
    8. (On ol-node01) Create a Gluster volume named sharedvol, which replicates across the three hosts: ol-node01, ol-node02, and ol-node03.

      sudo gluster volume create sharedvol replica 3 ol-node0{1,2,3}:/data/glusterfs/sharedvol/mybrick/brick
      

      For more details on volume types, see the Creating and Managing Volumes section of the Oracle Linux Gluster Storage documentation.

    9. (On ol-node01) Enable the sharedvol Gluster volume.

      sudo gluster volume start sharedvol
      
    10. (On ol-node01) Verify that the replicated Gluster volume is now available from any node.

      sudo gluster volume info
      

      Example Output:

      Volume Name: sharedvol
      Type: Replicate
      Volume ID: 1608bc61-cd4e-4b64-a5f3-f5800b717f76
      Status: Started
      Snapshot Count: 0
      Number of Bricks: 1 x 3 = 3
      Transport-type: tcp
      Bricks:
      Brick1: ol-node01:/data/glusterfs/sharedvol/mybrick/brick
      Brick2: ol-node02:/data/glusterfs/sharedvol/mybrick/brick
      Brick3: ol-node03:/data/glusterfs/sharedvol/mybrick/brick
      Options Reconfigured:
      storage.fips-mode-rchecksum: on
      transport.address-family: inet
      nfs.disable: on
      performance.client-io-threads: off
      
    11. (On ol-node01) Get the status of the Gluster volume.

      sudo gluster volume status
      

      Example Output:

      Status of volume: sharedvol
      Gluster process                             TCP Port  RDMA Port  Online  Pid
      ------------------------------------------------------------------------------
      Brick ol-node01:/data/glusterfs/sharedvol/m
      ybrick/brick                                49152     0          Y       78082
      Brick ol-node02:/data/glusterfs/sharedvol/m
      ybrick/brick                                49152     0          Y       77832
      Brick ol-node03:/data/glusterfs/sharedvol/m
      ybrick/brick                                49152     0          Y       77851
      Self-heal Daemon on localhost               N/A       N/A        Y       78099
      Self-heal Daemon on ol-node02               N/A       N/A        Y       77849
      Self-heal Daemon on ol-node03               N/A       N/A        Y       77868
       
      Task Status of Volume sharedvol
      ------------------------------------------------------------------------------
      There are no active volume tasks

    Configure Ganesha

    Ganesha is the NFS server that shares out the Gluster volume. In this example, we allow any NFS client to connect to our NFS share with read/write permissions.

    1. (On all nodes) Populate the file /etc/ganesha/ganesha.conf with the given configuration.

      sudo tee /etc/ganesha/ganesha.conf > /dev/null <<'EOF'
      EXPORT{
          Export_Id = 1 ;       # Unique identifier for each EXPORT (share)
          Path = "/sharedvol";  # Export path of our NFS share
      
          FSAL {
              name = GLUSTER;          # Backing type is Gluster
              hostname = "localhost";  # Hostname of Gluster server
              volume = "sharedvol";    # The name of our Gluster volume
          }
      
          Access_type = RW;          # Export access permissions
          Squash = No_root_squash;   # Control NFS root squashing
          Disable_ACL = FALSE;       # Enable NFSv4 ACLs
          Pseudo = "/sharedvol";     # NFSv4 pseudo path for our NFS share
          Protocols = "3","4" ;      # NFS protocols supported
          Transports = "UDP","TCP" ; # Transport protocols supported
          SecType = "sys";           # NFS Security flavors supported
      }
      EOF
      

    For more options to control permissions, see the EXPORT {CLIENT{}} section of config_samples-export in the Additional Information section.

    Create a Cluster

    Create and start a Pacemaker/Corosync cluster using the three ol-nodes.

    1. (On all nodes) Set a shared password for the user hacluster.

      echo "hacluster:oracle" | sudo chpasswd
    1. (On all nodes) Enable the Corosync and Pacemaker services.

      sudo systemctl enable corosync
      sudo systemctl enable pacemaker
      
    2. (On all nodes) Enable and start the configuration system service.

      sudo systemctl enable --now pcsd
      
    3. (On all nodes) Configure the firewall to allow traffic on the ports that are specifically used by High Availability.

      sudo firewall-cmd --permanent --zone=trusted --add-service=high-availability
      sudo firewall-cmd --reload
      
    4. (On ol-node01) Authenticate with all cluster nodes using the hacluster user and password defined above.

      sudo pcs host auth ol-node01 ol-node02 ol-node03 -u hacluster -p oracle
      
    5. (On ol-node01) Create a cluster named HA-NFS.

      sudo pcs cluster setup HA-NFS ol-node01 ol-node02 ol-node03
      
    6. (On ol-node01) Start the cluster on all nodes

      sudo pcs cluster start --all
      
    7. (On ol-node01) Enable the cluster to run on all nodes at boot time.

      sudo pcs cluster enable --all
      
    8. (On ol-node01) Disable STONITH

      STONITH is a feature of Linux for maintaining the integrity of nodes in a high-availability (HA) cluster. STONITH automatically powers down, or fences, a node that is not working correctly. An administrator may utilize STONITH if one of the nodes in a cluster is unreachable by the other node(s) in the cluster.

      STONITH is disabled for simplicity in the lab, but setting to disabled is not recommended for production.

      sudo pcs property set stonith-enabled=false
      
    9. (On any node) Check the cluster status.

      The cluster is now running.

      sudo pcs cluster status
      

      Example Output:

      Cluster Status:
       Cluster Summary:
         * Stack: corosync
         * Current DC: ol-node03 (version 2.1.0-8.0.1.el8-7c3f660707) - partition with quorum
         * Last updated: Wed May  4 16:47:55 2022
         * Last change:  Wed May  4 16:47:47 2022 by hacluster via crmd on ol-node03
         * 3 nodes configured
         * 0 resource instances configured
       Node List:
         * Online: [ ol-node01 ol-node02 ol-node03 ]
      
      PCSD Status:
        ol-node01: Online
        ol-node03: Online
        ol-node02: Online
      
    10. (On any node) Check the cluster’s details, including resources, pacemaker status, and node details.

      sudo pcs status
      

      Example Output:

      Cluster name: HA-NFS
      Cluster Summary:
        * Stack: corosync
        * Current DC: ol-node03 (version 2.1.0-8.0.1.el8-7c3f660707) - partition with quorum
        * Last updated: Wed May  4 16:50:21 2022
        * Last change:  Wed May  4 16:47:47 2022 by hacluster via crmd on ol-node03
        * 3 nodes configured
        * 0 resource instances configured
      
      Node List:
        * Online: [ ol-node01 ol-node02 ol-node03 ]
      
      Full List of Resources:
        * No resources
      
      Daemon Status:
        corosync: active/enabled
        pacemaker: active/enabled
        pcsd: active/enabled

    Create Cluster Services

    Create a Pacemaker resource group containing the resources necessary to host NFS services from the hostname nfs (10.0.0.100) defined as a floating secondary IP address on ol-node01.

    1. (On all nodes) Configure the firewall to allow traffic on the ports that are specifically used by NFS.

      sudo firewall-cmd --permanent --zone=trusted --add-service=nfs
      sudo firewall-cmd --reload
      
    2. (On ol-node01) Create a systemd based cluster resource to ensure nfs-ganesha is running.

      sudo pcs resource create nfs_server systemd:nfs-ganesha op monitor interval=10s
      
    3. (On ol-node01) Create an IP cluster resource used to present the NFS server.

      sudo pcs resource create nfs_ip ocf:heartbeat:IPaddr2 ip=10.0.0.100 cidr_netmask=24 op monitor interval=10s
      
    4. (On ol-node01) Join the Ganesha service and IP resource in a group to ensure they remain together on the same host.

      sudo pcs resource group add nfs_group nfs_server nfs_ip
      
    5. (On ol-node01) Verify service is now running.

      sudo pcs status
      

      Example Output:

      Cluster name: HA-NFS
      Cluster Summary:
        * Stack: corosync
        * Current DC: ol-node03 (version 2.1.0-8.0.1.el8-7c3f660707) - partition with quorum
        * Last updated: Wed May  4 16:52:56 2022
        * Last change:  Wed May  4 16:52:39 2022 by root via cibadmin on ol-node01
        * 3 nodes configured
        * 2 resource instances configured
      
      Node List:
        * Online: [ ol-node01 ol-node02 ol-node03 ]
      
      Full List of Resources:
        * Resource Group: nfs_group:
          * nfs_server	(systemd:nfs-ganesha):	 Started ol-node01
          * nfs_ip	(ocf::heartbeat:IPaddr2):	 Started ol-node01
      
      Daemon Status:
        corosync: active/enabled
        pacemaker: active/enabled
        pcsd: active/enabled
      

      Note: The DC (Designated Controller) node is where all the decisions get made, and if the current DC fails, corosync elects a new one from the remaining cluster nodes. The choice of DC is of no significance to an administrator beyond the fact that its logs will generally be more interesting.

    Update the IPaddr2 library configuration

    When a node in the cluster does not respond for some reason, Pacemaker and Corosync will make a call to the IPaddr2 library.

    We will customize this library to include details of our deployment (such as the VNIC OCIDs, and IP Addresses), and it will utilize those details when it calls the Oracle Command Line Interface (CLI). The CLI will do the heavy lifting by asking the OCI Console to migrate the Secondary IP Address from one node to the other.

    1. (On all nodes) Install the Oracle Linux Developer repository.

      sudo dnf install -y oraclelinux-developer-release-el8
      

      The repository is already installed and available in the free lab environment.

    2. (On all nodes) Install the OCI CLI.

      sudo dnf install -y python36-oci-cli
      
    3. (On ol-node01) Verify the OCI CLI install.

      The free lab environment uses Instance Principal for the authorization of the OCI CLI. For self deployments, configure the same or set up the OCI CLI configuration file.

      export LC_ALL=C.UTF-8
      oci os ns get --auth instance_principal
      
    4. (On all nodes) Make a back up of the IPaddr2 file.

      sudo cp /usr/lib/ocf/resource.d/heartbeat/IPaddr2 /usr/lib/ocf/resource.d/heartbeat/IPaddr2.bak
      
    5. (On all nodes) Run script to update IPaddr2 file.

      The script makes the changes within the “add_interface()” function. The reason for making the changes there is once a node fails, Corosync/Pacemaker will run IPaddr2 and move the resource(s) to another node(s) in the cluster. IPaddr2 calls this function during this process.

      sudo ./update-ipaddr2.sh
      

      Here is a sample version of the script for reference.

    Test NFS availability using a client

    If not already open and connected, we recommend opening two terminal windows for these steps as we test failover with ol-node01 and ol-client.

    1. If not already connected, open a terminal and connect via ssh to ol-node01 and ol-client system.

      ssh oracle@<ip_address_of_instance>
      
    2. (On ol-client) Mount the NFS service provided by our cluster and create a file.

      sudo dnf install -y nfs-utils
      sudo mkdir /sharedvol
      sudo mount -t nfs nfs:/sharedvol /sharedvol
      df -h /sharedvol/
      echo "Hello from Oracle CloudWorld" | sudo tee /sharedvol/hello > /dev/null
      
    3. (On ol-node01) Identify the host running the nfs_group resources and put it in standby mode to stop running services.

      sudo pcs status
      

      Example Output:

      Cluster name: HA-NFS
      Cluster Summary:
        * Stack: corosync
        * Current DC: ol-node03 (version 2.1.0-8.0.1.el8-7c3f660707) - partition with quorum
        * Last updated: Thu May  5 00:48:07 2022
        * Last change:  Thu May  5 00:47:50 2022 by root via crm_resource on ol-node01
        * 3 nodes configured
        * 2 resource instances configured
      
      Node List:
        * Online: [ ol-node01 ol-node02 ol-node03 ]
      
      Full List of Resources:
        * Resource Group: nfs_group:
          * nfs_server	(systemd:nfs-ganesha):	    Started ol-node01
          * nfs_ip	      (ocf::heartbeat:IPaddr2):	 Started ol-node01
      
      Daemon Status:
        corosync: active/enabled
        pacemaker: active/enabled
        pcsd: active/enabled
      
      sudo pcs node standby ol-node01
      
    4. (On ol-node01) Verify that the nfs_group resources have moved to another node.

      sudo pcs status
      

      Example Output:

      Cluster name: HA-NFS
      Cluster Summary:
        * Stack: corosync
        * Current DC: ol-node03 (version 2.1.0-8.0.1.el8-7c3f660707) - partition with quorum
        * Last updated: Thu May  5 00:53:19 2022
        * Last change:  Thu May  5 00:53:08 2022 by root via cibadmin on ol-node01
        * 3 nodes configured
        * 2 resource instances configured
      
      Node List:
        * Node ol-node01: standby
        * Online: [ ol-node02 ol-node03 ]
      
      Full List of Resources:
        * Resource Group: nfs_group:
          * nfs_server	(systemd:nfs-ganesha):	    Started ol-node02
          * nfs_ip	      (ocf::heartbeat:IPaddr2):	 Started ol-node02
      
      Daemon Status:
        corosync: active/enabled
        pacemaker: active/enabled
        pcsd: active/enabled
      
    5. (On ol-node02) Verify the floating IP address moved from ol-node01 to ol-node02.

      ip a
      

      Example Output:

      ...
      2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast state UP group default qlen 1000
          link/ether 02:00:17:06:6a:dd brd ff:ff:ff:ff:ff:ff
          inet 10.0.0.151/24 brd 10.0.0.255 scope global dynamic ens3
             valid_lft 83957sec preferred_lft 83957sec
          inet 10.0.0.100/24 brd 10.0.0.255 scope global secondary ens3
             valid_lft forever preferred_lft forever
          inet6 fe80::17ff:fe06:6add/64 scope link 
             valid_lft forever preferred_lft forever
      
    6. (On ol-client) Verify the file is still accessible

      This action has a short delay as the service moves from one node to another.

      sudo ls -la /sharedvol/
      sudo cat /sharedvol/hello
      
    7. (On ol-node01) Bring the standby node back into the cluster.

      sudo pcs node unstandby ol-node01
      
    8. (On ol-node01) Verify that the node is back in the cluster.

      sudo pcs status
      
    9. (On ol-node01) Move resources back to ol-node01.

      sudo pcs resource move nfs_ip ol-node01
      
    10. (On ol-node01) Verify that the resources moved back to ol-node01.

      sudo pcs status
      
    11. (On ol-node01) Verify the floating IP address moved from ol-node02 to ol-node01.

      ip a
      

    We now understand how to use Pacemaker/Corosync to create highly available services backed by Gluster.

    Enable Gluster encryption

    Create a self-signed certificate for each node and have its peers trust it.

    For more options, see Setting up Transport Layer Security in the Gluster Storage for Oracle Linux User’s Guide

    1. (On all nodes) Create a private key and create a certificate for this host signed with this key.

      sudo openssl genrsa -out /etc/ssl/glusterfs.key 2048
      sudo openssl req -new -x509 -days 365 -key /etc/ssl/glusterfs.key \
                                            -out /etc/ssl/glusterfs.pem \
                                            -subj "/CN=${HOSTNAME}/"
      
    2. (On ol-node01) Combine the certificate from each node into one file all nodes can trust.

      cat /etc/ssl/glusterfs.pem > ~/combined.ca.pem
      
      ssh ol-node02 cat /etc/ssl/glusterfs.pem >> ~/combined.ca.pem 
      
      ssh ol-node03 cat /etc/ssl/glusterfs.pem >> ~/combined.ca.pem 
      
    3. (On ol-node01) Copy the combined list of trusted certificates to the local system of each node for Gluster use.

      sudo cp ~/combined.ca.pem /etc/ssl/glusterfs.ca
      
      scp ~/combined.ca.pem ol-node02:~
      
      scp ~/combined.ca.pem ol-node03:~
      
      ssh -t ol-node02 sudo cp ~/combined.ca.pem /etc/ssl/glusterfs.ca  > /dev/null 2>&1
      
      ssh -t ol-node03 sudo cp ~/combined.ca.pem /etc/ssl/glusterfs.ca > /dev/null 2>&1
      
      • The -t option allows running remote ssh commands with sudo.
    4. (On all nodes) Enable encryption for Gluster management traffic.

      sudo touch /var/lib/glusterd/secure-access
      
    5. (On ol-node01) Enable encryption on the Gluster volume sharedvol.

      sudo gluster volume set sharedvol client.ssl on
      sudo gluster volume set sharedvol server.ssl on
      
    6. (On all nodes) Restart the Gluster service.

      sudo systemctl restart glusterd
      
    7. Verify the Gluster volume has transport encryption enabled.

      sudo gluster volume info
      

      Example Output:

      Volume Name: sharedvol
      Type: Replicate
      Volume ID: 674b73a8-8c09-457e-8996-4417db16651e
      Status: Started
      Snapshot Count: 0
      Number of Bricks: 1 x 3 = 3
      Transport-type: tcp
      Bricks:
      Brick1: ol-node01:/data/glusterfs/sharedvol/mybrick/brick
      Brick2: ol-node02:/data/glusterfs/sharedvol/mybrick/brick
      Brick3: ol-node03:/data/glusterfs/sharedvol/mybrick/brick
      Options Reconfigured:
      performance.client-io-threads: off
      nfs.disable: on
      transport.address-family: inet
      storage.fips-mode-rchecksum: on
      client.ssl: on
      server.ssl: on

    No comments