If you find the cloud growing, it may come time to add another node to the cloud. Similarly, you may not need hardware any longer.

This guide will demonstrate how to add or remove a node from an OpenStack cloud. This is only applicable to those who have added an additional Storage and Compute node via Flex Metal Central. If you have not added any additional nodes that means each node is a control plane node and those nodes should not be removed.


Types of Hardware Nodes

The following are the current nodes provided through Flex Metal Central.

  • Cloud Core – Standard
    • This node type provides control plane and compute services.
  • Storage and Compute – Standard
    • This node type only provides the compute service and is not a control plane.

Here is a screenshot showing three Cloud Core nodes and one Storage and Compute node:

Note that the three Cloud Core nodes are part of the original cloud. The Storage and Compute node was added to the cloud.


 

Adding a node

Hardware nodes can be added to an existing cloud using Flex Metal Central.

There is currently only one node type that can be added which is Storage and Compute – Standard. Future updates will see new node types.

Once logged into Flex Metal Central, navigate to the cloud you are working with. Find the button near the top right that says “Add Hardware”. Clicking this button will allow you to add additional hardware to the cloud.

Add Hardware:

 

Add a new Storage and Compute – Standard node:


 

Removing a Compute Node

There is no automated way to safely remove a compute node. In order to safely remove it you will need to ensure all data is removed from the node, including instances, Ceph data, and anything else. In addition, be very careful to ensure you are removing a compute-only node when following this section. The instruction for removing a control plane node deviates from these steps. Using OpenStackClient, you can run openstack host list to determine the services each node provides.

NOTE! — When you are ready to remove the node, you can do so by clicking the three vertical dots, then the Remove link, next to the node in Flex Metal Central. You will see a screen asking you to confirm removal of the node which you must confirm before doing so. It is strongly advised you are sure everything is removed from the node before doing this.


This section will demonstrate the steps needed to remove a hardware node. Note that this is a manual process.

NOTE! — If a compute node needs to be removed and you do not feel comfortable migrating data off of it, it is recommended you reach out to support who can help perform this task for you.

There a number of things that need to be checked and considered before removing a hardware node. In addition the type of hardware node will dictate what needs to occur. This section will describe the general process to remove a Storage and Compute node.


Requirements for Removing a Compute Node

The following is a high-level overview of the requirements needed to remove a compute node. Each section is expanded upon later in the guide.

  • This task can only be performed using the command line.
  • Compute service on the node needs to be disabled.
  • Instances running on the node need to be migrated to another compute node.
  • Stop all services on the node being removed.
  • Ceph OSDs need to be removed from Ceph.
  • Ansible inventories need updating
    • Remove entries of host from ceph-ansible
    • Remove entries of host from kolla-ansible
  • OpenStack needs to be updated that a node will be removed.

Remove Compute Node

With the above in mind, this section will expand upon the steps in the previous section.

Reference: https://docs.openstack.org/kolla-ansible/latest/user/adding-and-removing-hosts.html


Procedure

Live migrate instances:

You will want to first find all instances on the node, confirm the receiving node has enough resources to host the instances, then perform a live migration of those instances.

NOTE! — Since Ceph is used for disk storage you do not need to account for disk space when moving instances from one compute node to another. You will only need to account for RAM and VCPUs.

 

Step 1 — Disable instance scheduling on node

The node being removed will need to have instance scheduling disabled. This causes it so no new instances can be created on this node.

To perform this, run openstack compute service set HOST nova-compute --disable.

For example, to disable instance scheduling for the perfect-lobster.local host run:

$ openstack compute service set perfect-lobster.local nova-compute --disable

 

Step 2 — Collect instance information

This guide will assume you have a compute node called perfect-lobster.local that needs to be removed from the OpenStack cloud.

In addition, there are three instances on this node:

  • migrate_me-1
  • migrate_me-2
  • migrate_me-3

You can use openstack server list with additional flags to obtain the details of the instances on perfect-lobster.local.

For example:

$ openstack server list --host perfect-lobster.local -f value -c ID -c Name
a248c6b2-c4f5-4b5a-82ca-0dc71edd9757 migrate_me-2
ad489cd4-4c0c-42c7-aeae-21da6c00693b migrate_me-3
d143fb86-c7a4-4bc8-b6dc-7969097ef34b migrate_me-1

The above makes use of value output formatting and specifies the ID and Name columns as output.

 

Step 3 — Determine compute host

With the instance information acquired, the next step is to determine what compute host these instances can be migrated to.

To obtain all compute hosts, you can use openstack compute service list.

For example:

$ openstack compute service list
+----+----------------+-------------------------+----------+---------+-------+----------------------------+
| ID | Binary         | Host                    | Zone     | Status  | State | Updated At                 |
+----+----------------+-------------------------+----------+---------+-------+----------------------------+
| 12 | nova-scheduler | eager-sarahl.local      | internal | enabled | up    | 2021-01-28T23:26:59.000000 |
| 51 | nova-scheduler | busy-josephb.local      | internal | enabled | up    | 2021-01-28T23:26:55.000000 |
| 66 | nova-scheduler | pensive-michaelcu.local | internal | enabled | up    | 2021-01-28T23:26:55.000000 |
|  3 | nova-conductor | eager-sarahl.local      | internal | enabled | up    | 2021-01-28T23:27:01.000000 |
| 12 | nova-conductor | busy-josephb.local      | internal | enabled | up    | 2021-01-28T23:27:01.000000 |
| 27 | nova-conductor | pensive-michaelcu.local | internal | enabled | up    | 2021-01-28T23:27:01.000000 |
| 30 | nova-compute   | eager-sarahl.local      | nova     | enabled | up    | 2021-01-28T23:26:59.000000 |
| 33 | nova-compute   | busy-josephb.local      | nova     | enabled | up    | 2021-01-28T23:26:59.000000 |
| 36 | nova-compute   | pensive-michaelcu.local | nova     | enabled | up    | 2021-01-28T23:26:58.000000 |
| 37 | nova-compute   | perfect-lobster.local   | nova     | enabled | up    | 2021-01-28T23:26:56.000000 |
+----+----------------+-------------------------+----------+---------+-------+----------------------------+

What is shown in the above output is there are four total compute hosts. One of them (perfect-lobster.local) is being removed from the cluster.

This example will select eager-sarahl.local as the new host to migrate the instances to.

 

Step 4 — Check available host resources

You must first ensure this host has enough resources to contain these instances. To do so, you can run openstack host show HOSTNAME.

For example:

$ openstack host show eager-sarahl.local
+--------------------+----------------------------------+-----+-----------+---------+
| Host               | Project                          | CPU | Memory MB | Disk GB |
+--------------------+----------------------------------+-----+-----------+---------+
| eager-sarahl.local | (total)                          |  16 |    121026 |   11923 |
| eager-sarahl.local | (used_now)                       |   4 |      6144 |      25 |
| eager-sarahl.local | (used_max)                       |   2 |      2048 |      20 |
| eager-sarahl.local | b9e8639372014c0b85cbfaffa6e1b5a8 |   2 |      2048 |      20 |
+--------------------+----------------------------------+-----+-----------+---------+

 

Step 5 — Find instance resource usage

To ensure the instances will fit you will also need to know what resources they consume. This means you will need to know the flavor set for each instance.

You can run something like this to get the flavor for each instance on the perfect-lobster.local node:

$ openstack server list --host perfect-lobster.local -f value -c ID -c Name -c Flavor
a248c6b2-c4f5-4b5a-82ca-0dc71edd9757 migrate_me-2 hc1.micro
ad489cd4-4c0c-42c7-aeae-21da6c00693b migrate_me-3 hc1.micro
d143fb86-c7a4-4bc8-b6dc-7969097ef34b migrate_me-1 hc1.micro

This shows the hc1.micro flavor is used by each instance on this host.

To know what resources are consumed by this flavor use openstack flavor show FLAVOR:

$ openstack flavor show hc1.micro
+----------------------------+-----------+
| Field                      | Value     |
+----------------------------+-----------+
| OS-FLV-DISABLED:disabled   | False     |
| OS-FLV-EXT-DATA:ephemeral  | 0         |
| access_project_ids         | None      |
| disk                       | 10        |
| id                         | hc1.micro |
| name                       | hc1.micro |
| os-flavor-access:is_public | True      |
| properties                 |           |
| ram                        | 1024      |
| rxtx_factor                | 1.0       |
| swap                       | 1024      |
| vcpus                      | 1         |
+----------------------------+-----------+

This output reveals the amount of VCPUs, RAM, and disk space allocated.

To get tidier output of the flavor details you can run:

$ openstack flavor show hc1.micro -c disk -c ram -c vcpus
+-------+-------+
| Field | Value |
+-------+-------+
| disk  | 10    |
| ram   | 1024  |
| vcpus | 1     |
+-------+-------+

From this information, it can be determined the receiving host will require 3 VCPUs, 3GB of RAM, and 30GB of disk space since there are three instances to migrate and each is using the hc1.micro flavor. Remember, however, that disk space does not need to be accounted for since ceph is used for data storage and the data storage is shared across each node.

 

Step 6 — Live migrate the instances

You can now safely migrate these instances to the eager-sarahl.local node.

The base command to perform the live migration is:

$ openstack --os-compute-api-version 2.79 server migrate 
--live-migration --host HOSTNAME INSTANCE_UUID

The command to live migrate these three instances for this demonstration is:

openstack server list --host perfect-lobster.local -f value -c ID | while read id; do
  openstack --os-compute-api-version 2.79 server migrate --live-migration --host eager-sarahl.local ${id}
done

You can confirm the status of the live migration by using $ openstack server list:

$ openstack server list
+--------------------------------------+-----------------------------+-----------+-----------------------------------------+----------------------------+-----------+
| ID                                   | Name                        | Status    | Networks                                | Image                      | Flavor    |
+--------------------------------------+-----------------------------+-----------+-----------------------------------------+----------------------------+-----------+
| a248c6b2-c4f5-4b5a-82ca-0dc71edd9757 | migrate_me-2                | ACTIVE    | Internal=192.168.0.65                   | Ubuntu 20.04 (focal-amd64) | hc1.micro |
| ad489cd4-4c0c-42c7-aeae-21da6c00693b | migrate_me-3                | MIGRATING | Internal=192.168.0.147, 173.231.217.250 | Ubuntu 20.04 (focal-amd64) | hc1.micro |
| d143fb86-c7a4-4bc8-b6dc-7969097ef34b | migrate_me-1                | MIGRATING | Internal=192.168.0.185                  | Ubuntu 20.04 (focal-amd64) | hc1.micro |
| 85033a0f-66c6-41d4-b679-c7350da2685f | openstackclient_js_demo     | ACTIVE    | Internal=192.168.0.27                   | Ubuntu 20.04 (focal-amd64) | hc1.small |
| 0052cd0f-70fb-4cf7-8b13-2bec350c0e51 | openstackclient_jumpstation | ACTIVE    | Internal=192.168.0.228, 173.231.217.247 | N/A (booted from volume)   | hc1.small |
+--------------------------------------+-----------------------------+-----------+-----------------------------------------+----------------------------+-----------+

Here it can be seen two of the instances are being migrated.

 

Step 7 — Confirm live migration success

To confirm the live migration completed successfully and the instances are on the new host, you can use something similar to:

openstack server list -f value -c ID -c Name | grep migrate_me | while read id name; do
  echo "$name $(openstack server show -f value -c 'OS-EXT-SRV-ATTR:host' $id)"
done

The following is the actual output from running the above command:

migrate_me-2 eager-sarahl.local
migrate_me-3 eager-sarahl.local
migrate_me-1 eager-sarahl.local

This indicates the live migration was successful.

Confirm no instances remain on the original host:

$ openstack server list --host perfect-lobster.local

If the above returns no output, the live migration was a complete success and you can move on to the next step.


 

Remove Ceph OSDs:

After the instances have been migrated, it is time to remove the node’s OSDs from the Ceph cluster.

Reference: https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/

This continues to assume the compute host being removed is perfect-lobster.local.

You will need to determine what OSDs are on this host and remove them from ceph.

 

Step 1 — Determine OSDs

From any of the hardware nodes you can use # ceph osd tree to find which OSD is on a particular host.

Example:

# ceph osd tree
ID  CLASS  WEIGHT    TYPE NAME                   STATUS  REWEIGHT  PRI-AFF
-1         11.64398  root default
-5          2.91100      host busy-josephb
 1    ssd   2.91100          osd.1                   up   1.00000  1.00000
-3          2.91100      host eager-sarahl
 0    ssd   2.91100          osd.0                   up   1.00000  1.00000
-7          2.91100      host pensive-michaelcu
 2    ssd   2.91100          osd.2                   up   1.00000  1.00000
-9          2.91100      host perfect-lobster
 3    ssd   2.91100          osd.3                   up   1.00000  1.00000

This indicates the host perfect-lobster.local has only one OSD, with ID of 3. This is the OSD that will need to be removed.

 

Step 2 — Remove the OSD

To remove the OSD, use the command ceph osd out OSD_NUMBER.

For example:

# ceph osd out 3
marked out osd.3.

Following that, watch ceph’s status using ceph -w to ensure the cluster returns to a healthy state (output truncated):

# ceph -w
  cluster:
    id:     a08c2963-1d75-42ef-be50-1fc61419623d
    health: HEALTH_WARN
            Degraded data redundancy: 441/13998 objects degraded (3.150%), 17 pgs degraded

  [...]

  io:
    recovery: 68 MiB/s, 14 keys/s, 11 objects/s

  progress:
    Rebalancing after osd.3 marked out (1s)
      [............................]

2021-01-29T15:43:20.780381+0000 mon.eager-sarahl [WRN] Health check update: Degraded data redundancy: 120/13998 objects degraded (0.857%), 5 pgs degraded (PG_DEGRADED)
2021-01-29T15:43:20.817317+0000 mon.eager-sarahl [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 120/13998 objects degraded (0.857%), 5 pgs degraded)
2021-01-29T15:43:20.817342+0000 mon.eager-sarahl [INF] Cluster is now healthy

The above shows the cluster in a degraded state as the OSD is outed and that Ceph returns to a healthy state. This indicates no issues occurred with outing the OSD.

 

Step 3 — Stop the OSD

With the OSD removed, the OSD systemctl service on the perfect-lobster.local host needs to be stopped.

In this case, the unit file for this OSD is called ceph-osd@3.service.

Stop the service:

# systemctl stop ceph-osd@3.service

 

Step 4 — Remove OSD from Ceph configuration

The OSD will now need to be removed from Ceph’s configuration.

To do so, you will need to use ceph osd purge OSD_NUMBER --yes-i-really-mean-it:

# ceph osd purge 3 --yes-i-really-mean-it
purged osd.3

 

Step 5 — Remove OSD from Ceph crush map

The OSD will still need to be removed from the crush map.

To remove the perfect-lobster OSD from the crush map, use:

# ceph osd crush rm perfect-lobster

 

Step 6 — Confirm OSD has been removed

You can use ceph osd tree to confirm the OSD from the node has been removed.

For example:

# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME                   STATUS  REWEIGHT  PRI-AFF
-1         8.73299  root default                                       
-5         2.91100      host busy-josephb                              
 1    ssd  2.91100          osd.1                   up   1.00000  1.00000
-3         2.91100      host eager-sarahl                              
 0    ssd  2.91100          osd.0                   up   1.00000  1.00000
-7         2.91100      host pensive-michaelcu                         
 2    ssd  2.91100          osd.2                   up   1.00000  1.00000

Here it can be seen the perfect-lobster.local host is no longer present in this output.

These steps take care of the Ceph changes that are required.


 

Stop all services:

Next, all services running on the host being removed need to be stopped.

This section requires kolla-ansible be prepared on the host being removed. See the kolla-ansible guide to learn how.

Once kolla-ansible is prepared, you can stop all services on that host using a command that takes this form:

$ kolla-ansible -i <inventory> stop --yes-i-really-really-mean-it --limit 
    <limit>
  • <inventory> — This is the kolla-ansible inventory file, located /etc/fm-deploy/kolla-ansible-inventory
  • <limit> — This is the host on which kolla-ansible will run. Only the host specified by --limit will be affected.

The actual command used in this guide to stop all services on perfect-lobster.local appears this way:

kolla-ansible -i /etc/fm-deploy/kolla-ansible-inventory stop 
    --yes-i-really-really-mean-it --limit perfect-lobster.local

 

Remove host entry from kolla-ansible:

Next, the host’s entry needs to be removed from /etc/fm-deploy/kolla-ansible-inventory.

In this example, the host entry shows under the [compute] heading:

[compute]
busy-josephb ansible_host=10.204.28.35
eager-sarahl ansible_host=10.204.24.138
pensive-michaelcu ansible_host=10.204.31.6
perfect-lobster ansible_host=10.204.35.2

Remove the entry for your host.

After the entry is removed, the file appears this way:

[compute]
busy-josephb ansible_host=10.204.28.35
eager-sarahl ansible_host=10.204.24.138
pensive-michaelcu ansible_host=10.204.31.6

 

Clean up services on remaining nodes:

The next steps are to clean up the host from the compute and network service listing within OpenStack.

The following demonstrate removing the host perfect-lobster.local from the network agent list and the compute service list.

Remove host from network agent list:

openstack network agent list --host perfect-lobster.local -f value -c ID | while read id; do
  openstack network agent delete ${id}
done

Remove host from compute service list:

openstack compute service list --os-compute-api-version 2.53 --host perfect-lobster.local -f value -c ID | while read id; do
  openstack compute service delete --os-compute-api-version 2.53 ${id}
done

Confirm the perfect-lobster.local has been removed from both section by using:

$ openstack network agent list --host perfect-lobster.local
$ penstack compute service list --os-compute-api-version 2.53 --host 
perfect-lobster.local

 

Submit the request in Flex Metal Central to remove the node:

At this point, you are ready to request the node removal in Flex Metal Central.

Login to Flex Metal Central and go the Manage section for the cloud. From here you will see a listing of the hardware nodes. Select the node being removed and find the three vertical dots to the right of it. Clicking this will bring up the option to remove the node. Click Remove to initiate that process.

NOTE! — At this time, when you request a node to be removed, it will create a ticket and our support team will handle that request manually and follow up when that is done.


 

Test the remaining nodes to ensure everything still functions:

Once at this point, the node removal is complete. At this time, it is recommended you test general functionality of the cloud, such as can instances still be created, do the hardware nodes still respond to ping, and the like.