Auto-scaling is the process of dynamically adjusting the number of workers in a service cluster according to the current demand, without any direct input from an administrator.
The Catalyst Cloud Orchestration Service supports auto-scaling groups that can be used to scale out (increase the size of) or scale in (decrease the size of) a group of instances running in a stack.
The diagram below shows the layout and workflow of a typical orchestration stack with auto-scaling.
In this example, the Orchestration Service is managing a highly available web application with a load balancer.
An auto-scaling group is used to automatically launch and destroy web servers. The auto-scaling group also manages load balancer pool memberships, so once new instances come online they automatically start serving requests.
Note
A load balancer is not required to implement auto-scaling. Auto-scaling groups can be created for just a group of instances, allowing any kind of workload to be auto-scaled, not just web services.
The Alarm Service is used to monitor the state of the instances in the auto-scaling group. If CPU or memory usage exceeds the configured thresholds the relevant alarm will trigger, notifying the auto-scaling group to add more instances to the cluster. Likewise, when demand goes back down the relevant alarm will trigger, and the auto-scaling group will reduce the number of running instances until the load falls back into the desired ranges.
In this section we show how to create an orchestration stack with auto-scaling enabled from a set of example templates.
We will go through some of the details of the templates themselves, send requests to the load balanced web application, and test scaling instances in the cluster to make sure everything works correctly.
Note
Before getting started, make sure you have your Catalyst Cloud user and system environment setup so that you can access the Orchestration Service using the OpenStack CLI.
This orchestration stack in this example will create everything needed to run the workloads themselves.
The only thing you need to create manually beforehand is an SSH keypair to assign to the instances; to learn how to do this, see the documentation for creating a new keypair or importing an existing keypair. Once you have your keypair, keep a record of the name of the keypair as it will be used to configure the stack.
First, clone the repository containing our Orchestration Service examples
from GitHub, and navigate to the hot/autoscaling directory containing
the auto-scaling example templates.
$ git clone https://github.com/catalyst-cloud/catalystcloud-orchestration.git
$ cd catalystcloud-orchestration/hot/autoscaling
Note
This section goes in depth on how auto-scaling is configured in the stack templates.
To skip the explanation and go straight to creating an auto-scaling stack, see Creating the stack.
In this example stack, the following resources are created:
An internal network and subnet for the instances
A security group for controlling access to/from the internal network
A router for the internal network, to allow Internet access
A bastion host with a floating IP, to allow SSH access into the cluster
A load balancer with another floating IP, to expose the web servers to the Internet as a highly available cluster from a single address
An auto-scaling group that adds and removes web server instances, and load balancer pool memberships, as needed
A set of alarms that monitor the state of the instances in the auto-scaling group, and notify the auto-scaling group to scale out or scale in the cluster when load exceeds the configured thresholds
The stack templates consist of the following files:
autoscaling.yaml - The master template for the cluster,
containing all common resource definitions such as the network,
the bastion host, the load balancer and the auto-scaling group configuration.
webserver.yaml - The template used to manage per-member resources
for the web servers in the auto-scaling group, such as the instance
definition and the load balancer pool membership.
env.yaml - The environment configuration, used in this case
to define webserver.yaml as the OS::Autoscaling::Webserver
resource type to be referenced in the master template.
user_data.sh - The shell script run on startup on the web server instances.
Most of this uses the standard resource definitions as you would see in other templates, but there are a number of special resource definitions used to control the auto-scaling functionality.
The first resource to create is the auto-scaling group,
defined using the OS::Heat::AutoScalingGroup
resource type.
# The auto-scaling group for provisioning web servers.
autoscaling_group:
type: OS::Heat::AutoScalingGroup
properties:
min_size: {get_param: autoscaling_min_size}
max_size: {get_param: autoscaling_max_size}
resource:
type: OS::Autoscaling::Webserver
properties:
keypair: {get_param: keypair}
flavor: {get_param: webserver_flavor}
image: {get_param: webserver_image}
network: {get_resource: network}
security_groups:
- {get_resource: internal_security_group}
group: {get_resource: webserver_group}
loadbalancer_pool: {get_resource: loadbalancer_pool}
The auto-scaling group defines the type of resource to create in a cluster, the minimum and maximum size of the cluster, and other optional parameters that configure how rolling updates of resources are performed. By using a custom resource type as shown above, multiple child resources can be created per auto-scaling group member.
All instances in an auto-scaling group should also be added to a server group,
defined using the OS::Nova::ServerGroup
resource type.
# The server group for the cluster of web servers.
webserver_group:
type: OS::Nova::ServerGroup
properties:
policies:
- {get_param: webserver_group_policy}
Server groups have two purposes here:
By setting a hard or soft anti-affinity policy on the server group, it ensures that no two auto-scaling group members end up on the same physical machine, protecting against hypervisor failures (for more info, see Using server affinity for HA).
The alarms that monitor load across the auto-scaling group query metrics by server group, as a way to associate the auto-scaling group members with each other.
In the instance definition for the auto-scaling group members, the group
scheduler hint and the metering.server_group metadata attribute are used
to correctly configure the server group on the instances.
# An instance to be managed by an auto-scaling group.
# Define as the resource property of an OS::Heat::AutoScalingGroup resource,
# or inside a custom resource type along with other required per-member resources
# (e.g. load balancer pool memberships).
webserver:
type: OS::Nova::Server
properties:
image: {get_param: image}
flavor: {get_param: flavor}
networks:
- network: {get_param: network}
key_name: {get_param: keypair}
security_groups: {get_param: security_groups}
scheduler_hints:
group: {get_param: group}
metadata:
metering.server_group: {get_param: group}
config_drive: true
user_data_format: RAW
user_data: {get_file: user_data.sh}
Now that we have the auto-scaling group and the underlying instances correctly configured, we need to define exactly how instances should be scaled.
Scaling policies are defined for the auto-scaling group
using the OS::Heat::ScalingPolicy
resource type.
# The policy for scaling out web servers when load is high.
autoscaling_up_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: {get_resource: autoscaling_group}
scaling_adjustment: 1
cooldown: {get_param: autoscaling_granularity}
# The policy for scaling in web servers when load is low.
autoscaling_down_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: {get_resource: autoscaling_group}
scaling_adjustment: -1
cooldown: {get_param: autoscaling_granularity}
These configure exactly what happens when a scaling action is triggered for the auto-scaling group, such as the amount of instances to scale at one time, or required cooldown time between scaling actions. Separate policies are required for each type of scaling action, in this case scaling out (up policy) and scaling in (down policy).
The final piece of the puzzle is automating the scaling actions, which is implemented using specially configured alarms.
Resource metric aggregate threshold alarms,
managed using the OS::Aodh::GnocchiAggregationByResourcesAlarm
resource type,
can be used to monitor the state of metrics across all active instances
in the auto-scaling group.
# The alarm that triggers a scale out when CPU usage exceeds the threshold.
autoscaling_cpu_high_alarm:
type: OS::Aodh::GnocchiAggregationByResourcesAlarm
properties:
description:
str_replace:
template: Scale out if average CPU usage exceeds threshold%
params:
threshold: {get_param: autoscaling_cpu_high_threshold}
resource_type: instance
metric: cpu
aggregation_method: "rate:mean"
granularity: {get_param: autoscaling_granularity}
threshold:
yaql:
# 10^9 nanoseconds * number of vCPUs * granularity in seconds * (threshold in percent / 100)
expression: >-
1000000000
* int(regex("^c[^.]\.c([0-9]+).*$").replace($.data.flavor, "\g<1>"))
* $.data.granularity
* (float($.data.threshold) / 100)
data:
flavor: {get_param: webserver_flavor}
granularity: {get_param: autoscaling_granularity}
threshold: {get_param: autoscaling_cpu_high_threshold}
query:
str_replace:
template: '{"and": [{"=": {"server_group": "group_id"}}, {"=": {"ended_at": null}}]}'
params:
group_id: {get_resource: webserver_group}
comparison_operator: gt
evaluation_periods: 1
alarm_actions:
- {get_attr: [autoscaling_up_policy, alarm_url]}
- str_replace:
template: trust+url
params:
url: {get_attr: [autoscaling_up_policy, signal_url]}
repeat_actions: true
# The alarm that triggers a scale in when CPU usage goes below the threshold.
autoscaling_cpu_low_alarm:
type: OS::Aodh::GnocchiAggregationByResourcesAlarm
properties:
description:
str_replace:
template: Scale in if average CPU usage goes below threshold%
params:
threshold: {get_param: autoscaling_cpu_low_threshold}
resource_type: instance
metric: cpu
aggregation_method: "rate:mean"
granularity: {get_param: autoscaling_granularity}
threshold:
yaql:
# 10^9 nanoseconds * number of vCPUs * granularity in seconds * (threshold in percent / 100)
expression: >-
1000000000
* int(regex("^c[^.]\.c([0-9]+).*$").replace($.data.flavor, "\g<1>"))
* $.data.granularity
* (float($.data.threshold) / 100)
data:
flavor: {get_param: webserver_flavor}
granularity: {get_param: autoscaling_granularity}
threshold: {get_param: autoscaling_cpu_low_threshold}
query:
str_replace:
template: '{"and": [{"=": {"server_group": "group_id"}}, {"=": {"ended_at": null}}]}'
params:
group_id: {get_resource: webserver_group}
comparison_operator: lt
evaluation_periods: 1
alarm_actions:
- {get_attr: [autoscaling_down_policy, alarm_url]}
- str_replace:
template: trust+url
params:
url: {get_attr: [autoscaling_down_policy, signal_url]}
repeat_actions: true
# The alarm that triggers a scale out when memory usage exceeds the threshold.
autoscaling_memory_high_alarm:
type: OS::Aodh::GnocchiAggregationByResourcesAlarm
properties:
description:
str_replace:
template: Scale out if average memory usage exceeds threshold%
params:
threshold: {get_param: autoscaling_memory_high_threshold}
resource_type: instance
metric: memory.usage
aggregation_method: mean
granularity: {get_param: autoscaling_granularity}
threshold:
yaql:
# RAM in GiB * 1024 to convert to MiB * (threshold in percent / 100)
expression: >-
int(regex("^c[^.]\.c[0-9]+r([0-9]+).*$").replace($.data.flavor, "\g<1>"))
* 1024
* (float($.data.threshold) / 100)
data:
flavor: {get_param: webserver_flavor}
threshold: {get_param: autoscaling_memory_high_threshold}
query:
str_replace:
template: '{"and": [{"=": {"server_group": "group_id"}}, {"=": {"ended_at": null}}]}'
params:
group_id: {get_resource: webserver_group}
comparison_operator: gt
evaluation_periods: 1
alarm_actions:
- {get_attr: [autoscaling_up_policy, alarm_url]}
- str_replace:
template: trust+url
params:
url: {get_attr: [autoscaling_up_policy, signal_url]}
repeat_actions: true
# The alarm that triggers a scale in when memory usage goes below the threshold.
autoscaling_memory_low_alarm:
type: OS::Aodh::GnocchiAggregationByResourcesAlarm
properties:
description:
str_replace:
template: Scale in if average memory usage goes below threshold%
params:
threshold: {get_param: autoscaling_memory_low_threshold}
resource_type: instance
metric: memory.usage
aggregation_method: mean
granularity: {get_param: autoscaling_granularity}
threshold:
yaql:
# RAM in GiB * 1024 to convert to MiB * (threshold in percent / 100)
expression: >-
int(regex("^c[^.]\.c[0-9]+r([0-9]+).*$").replace($.data.flavor, "\g<1>"))
* 1024
* (float($.data.threshold) / 100)
data:
flavor: {get_param: webserver_flavor}
threshold: {get_param: autoscaling_memory_low_threshold}
query:
str_replace:
template: '{"and": [{"=": {"server_group": "group_id"}}, {"=": {"ended_at": null}}]}'
params:
group_id: {get_resource: webserver_group}
comparison_operator: lt
evaluation_periods: 1
alarm_actions:
- {get_attr: [autoscaling_down_policy, alarm_url]}
- str_replace:
template: trust+url
params:
url: {get_attr: [autoscaling_down_policy, signal_url]}
repeat_actions: true
In the above example, CPU usage and the memory usage are both monitored for load. Two alarms are required for each monitored metric - one that triggers when the high threshold is exceeded, and another one that triggers when load goes below the low threshold. This results in 4 alarms being created for the auto-scaling group.
The example templates allow you to configure the load thresholds as a percentage, which is very convenient since there is no need to manually configure the number of vCPUs or available RAM in the monitoring. But since the monitored metrics are not available in the Catalyst Cloud Metrics Service as a percentage, some complex templating is performed to convert the percentage into the correct threshold figures, based on the flavour used by the instances and the granularity of the performed queries.
Alarm actions are configured on the alarms to notify the appropriate auto-scaling policy, which runs the scaling action. Once load is better distributed across the cluster, the alarms will recover and the cluster will run with the same number of workers until changes in demand cause the alarms to trigger again.
And that’s it! With the above resources added to your stack, you should have a functioning auto-scaling cluster.
Let’s create a new stack and get our example resources up and running.
A number of parameters are available in the example templates
(for more info see autoscaling.yaml), but the only required one
is keypair, which sets the SSH keypair used to login to the instances.
The below command will create a new stack called autoscaling-example
(we will refer to the stack using this name from now on).
Run this command, setting keypair to the name of the keypair you’d
like the use.
openstack stack create autoscaling-example --template autoscaling.yaml \
--environment env.yaml \
--parameter "keypair=<NAME>"
The Orchestration Service will start creating the resources in the background.
$ openstack stack create autoscaling-example --template autoscaling.yaml --environment env.yaml --parameter "keypair=example-keypair"
+---------------------+-----------------------------------------------------------------------+
| Field | Value |
+---------------------+-----------------------------------------------------------------------+
| id | dd254c00-2424-4b9a-a5a0-fff6bf9dc046 |
| stack_name | autoscaling-example |
| description | An example Catalyst Cloud Orchestration Service template |
| | for building a cluster of web servers with auto-scaling. |
| | |
| | This provisions the following cloud resources: |
| | |
| | * Security groups to control access to/from instances. |
| | * An internal network for all instances. |
| | * A bastion host with its own floating IP to allow SSH access |
| | into the cluster. |
| | * A load balancer to allow highly available access to the web servers |
| | from the Internet, with its own floating IP. |
| | * An auto-scaling group that launches, monitors and scales |
| | a cluster of web server instances depending on the configured |
| | load thresholds. |
| creation_time | 2025-10-10T00:55:35Z |
| updated_time | None |
| stack_status | CREATE_IN_PROGRESS |
| stack_status_reason | Stack CREATE started |
+---------------------+-----------------------------------------------------------------------+
To monitor resource creation in real time, you can use the watch command
to continuously report the status of all created resources.
watch openstack stack resource list autoscaling-example
It will take a few minutes for all resources in the stack to reach CREATE_COMPLETE state.
+-----------------------------------+-------------------------------------------------------------------------------------+----------------------------------------------+-----------------+----------------------+
| resource_name | physical_resource_id | resource_type | resource_status | updated_time |
+-----------------------------------+-------------------------------------------------------------------------------------+----------------------------------------------+-----------------+----------------------+
| autoscaling_up_policy | 41e0751171fc4982acfef2c565f29ea7 | OS::Heat::ScalingPolicy | CREATE_COMPLETE | 2025-10-10T22:06:17Z |
| subnet | 9c45e1a1-9878-40cb-b985-8dbdcfbf1339 | OS::Neutron::Subnet | CREATE_COMPLETE | 2025-10-10T22:06:18Z |
| bastion_server | 1842093a-eb7a-4380-9fe9-aab83bc95c4c | OS::Nova::Server | CREATE_COMPLETE | 2025-10-10T22:06:17Z |
| network | d0331ea5-b6fb-4a52-bdfe-61375793ed1f | OS::Neutron::Net | CREATE_COMPLETE | 2025-10-10T22:06:18Z |
| autoscaling_memory_low_alarm | 5c55e371-f16c-4120-9fe2-9f3be11d71ab | OS::Aodh::GnocchiAggregationByResourcesAlarm | CREATE_COMPLETE | 2025-10-10T22:06:17Z |
| internal_security_group_rule_ssh | dee7d066-cb4c-4e66-a621-1fee53ccf3e8 | OS::Neutron::SecurityGroupRule | CREATE_COMPLETE | 2025-10-10T22:06:17Z |
| loadbalancer_floating_ip | 8944d3e1-17e0-4d08-bc74-2a7f68491b0d | OS::Neutron::FloatingIP | CREATE_COMPLETE | 2025-10-10T22:06:17Z |
| autoscaling_group | 5c1c51ab-02f9-471c-a760-7b9e06426808 | OS::Heat::AutoScalingGroup | CREATE_COMPLETE | 2025-10-10T22:06:17Z |
| autoscaling_cpu_low_alarm | f52a14d8-e1c4-4597-a08f-9c07b07ddb06 | OS::Aodh::GnocchiAggregationByResourcesAlarm | CREATE_COMPLETE | 2025-10-10T22:06:17Z |
| internal_security_group | 5f8ce8f0-c6ca-4ece-9b8a-c2f6f5525123 | OS::Neutron::SecurityGroup | CREATE_COMPLETE | 2025-10-10T22:06:18Z |
| webserver_group | 7a612ef2-ad1d-49fa-a72d-51253761cdda | OS::Nova::ServerGroup | CREATE_COMPLETE | 2025-10-10T22:06:17Z |
| loadbalancer_pool | 65b27c46-aa58-4986-8846-ffd80e2b8b24 | OS::Octavia::Pool | CREATE_COMPLETE | 2025-10-10T22:06:17Z |
| autoscaling_cpu_high_alarm | 334411fa-8f4b-482d-ba05-c4399a7a3393 | OS::Aodh::GnocchiAggregationByResourcesAlarm | CREATE_COMPLETE | 2025-10-10T22:06:17Z |
| router | 006e2146-16d2-42d6-99d6-301bb1f130c8 | OS::Neutron::Router | CREATE_COMPLETE | 2025-10-10T22:06:17Z |
| bastion_floating_ip | 36b944e0-b707-4fc7-91be-23fe89bf4c6b | OS::Neutron::FloatingIP | CREATE_COMPLETE | 2025-10-10T22:06:17Z |
| loadbalancer_listener | 2aba82b0-7970-45d3-9263-69ec8b65a6dd | OS::Octavia::Listener | CREATE_COMPLETE | 2025-10-10T22:06:17Z |
| router_interface | 006e2146-16d2-42d6-99d6-301bb1f130c8:subnet_id=9c45e1a1-9878-40cb-b985-8dbdcfbf1339 | OS::Neutron::RouterInterface | CREATE_COMPLETE | 2025-10-10T22:06:17Z |
| internal_security_group_rule_http | 905afa55-6c92-467c-83cb-68601de482d6 | OS::Neutron::SecurityGroupRule | CREATE_COMPLETE | 2025-10-10T22:06:17Z |
| autoscaling_memory_high_alarm | a49a2804-f06d-48c4-a296-6fb925e4503e | OS::Aodh::GnocchiAggregationByResourcesAlarm | CREATE_COMPLETE | 2025-10-10T22:06:17Z |
| loadbalancer | 71a4c4ee-f051-4e98-abe5-cd1e98684202 | OS::Octavia::LoadBalancer | CREATE_COMPLETE | 2025-10-10T22:06:18Z |
| autoscaling_down_policy | b596018eafbc4d6db4b4b0673b851816 | OS::Heat::ScalingPolicy | CREATE_COMPLETE | 2025-10-10T22:06:17Z |
+-----------------------------------+-------------------------------------------------------------------------------------+----------------------------------------------+-----------------+----------------------+
Note
If any resources end up in CREATE_FAILED state, you can find out
the cause using the following command:
openstack stack failures list autoscaling-example
A common reason for resource creation failing is exceeding your project quota while attempting to create the stack. Free up some resources in your project, and try again by deleting and recreating the stack.
openstack stack delete autoscaling-example
If all resources were created successfully, the cluster should now be up and running, so let’s test connectivity to everything.
First, we’ll try to login to the bastion host. Fetch the floating IP address of the bastion host, and then use SSH to login to the host. You should be able to access the console on the bastion host.
$ openstack stack output show autoscaling-example bastion_floating_ip -c output_value -f value
192.0.2.1
$ ssh -i ~/.ssh/keypair_private_key.pem ubuntu@192.0.2.1
The authenticity of host '192.0.2.1 (192.0.2.1)' can't be established.
ED25519 key fingerprint is SHA256:p+3UXWE0tFEIMUPMif6e0+n9gSHLwjveIvmcuugC3Rc.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.0.2.1' (ED25519) to the list of known hosts.
Welcome to Ubuntu 24.04.2 LTS (GNU/Linux 6.8.0-59-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/pro
System information as of Fri Oct 10 22:23:03 UTC 2025
System load: 0.14 Processes: 96
Usage of /: 18.2% of 8.65GB Users logged in: 0
Memory usage: 17% IPv4 address for ens3: 10.0.0.102
Swap usage: 0%
Expanded Security Maintenance for Applications is not enabled.
0 updates can be applied immediately.
Enable ESM Apps to receive additional future security updates.
See https://ubuntu.com/esm or run: sudo pro status
The list of available updates is more than a week old.
To check for new updates run: sudo apt update
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.
ubuntu@autoscaling-example-bastion-server-radhlpt36blw:~$
If this works, you can use SSH from the bastion host to login to the web server instances as needed.
Next, let’s check that the web application is working. Run the following command to get the floating IP address of the load balancer.
$ openstack stack output show autoscaling-example loadbalancer_floating_ip -c output_value -f value
192.0.2.2
The web application is served via unencrypted HTTP on port 80, so simply use
curl to make a request.
$ curl http://192.0.2.2
Hello, world! This request was served by au-x-u6thgn62eec6-ev65hjmjcmss-webserver-vdq2wg3l2mcv (10.0.0.151).
If you receive a response similar to the one above, congratulations! Your highly available web application is now up and running.
This cluster runs with a minimum of 2 web servers. Keep running the command until you have received a response from all running instances.
$ curl http://192.0.2.2
Hello, world! This request was served by au-x-u6thgn62eec6-ev65hjmjcmss-webserver-vdq2wg3l2mcv (10.0.0.151).
$ curl http://192.0.2.2
Hello, world! This request was served by au-x-k2va3k4min64-kgypkpgxo2t3-webserver-goiwpb5kbnil (10.0.0.215).
Let’s take a deeper dive into the auto-scaling aspect of the cluster.
We can check how many instances are running at the moment by getting the currently active server group members.
$ openstack stack resource show autoscaling-example webserver_group -c attributes -f json | jq '.attributes.members'
[
"a6745eaf-2939-4899-9966-3aaa229f617f",
"6cd4fef9-ba78-4342-afa7-ff5d6d12243e"
]
There are 4 auto-scaling alarms created in this example, with the following stack resource names:
autoscaling_cpu_high_alarm
autoscaling_cpu_low_alarm
autoscaling_memory_high_alarm
autoscaling_memory_low_alarm
Check the state of the alarms to see if any are calling for scaling actions to be run.
$ openstack stack resource show autoscaling-example autoscaling_cpu_high_alarm -c attributes -f json | jq '.attributes'
{
"alarm_actions": [
"https://api.nz-por-1.catalystcloud.io:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3A9864e20f92ef47238becfe06b869d2ac%3Astacks/autoscaling-example/cb1e6eae-b59d-4788-8b9b-a21d2d16c150/resources/autoscaling_up_policy?SignatureMethod=HmacSHA256&AWSAccessKeyId=58caf713909b4be1813a63757cc89e1c&SignatureVersion=2&Signature=eBn3wJ21VeqpNxj%2BA00L0Q88joislaLEqPdDF7FZvfs%3D",
"trust+https://api.nz-por-1.catalystcloud.io:8004/v1/9864e20f92ef47238becfe06b869d2ac/stacks/autoscaling-example/cb1e6eae-b59d-4788-8b9b-a21d2d16c150/resources/autoscaling_up_policy/signal"
],
"ok_actions": [],
"name": "autoscaling-example-autoscaling_cpu_high_alarm-p6bosunxssog",
"timestamp": "2025-10-10T22:08:11.081755",
"description": "Scale out if average CPU usage exceeds 20%",
"time_constraints": [],
"enabled": true,
"state_timestamp": "2025-10-10T22:08:11.081755",
"gnocchi_aggregation_by_resources_threshold_rule": {
"evaluation_periods": 1,
"metric": "cpu",
"threshold": 120000000000.0,
"granularity": 600,
"aggregation_method": "rate:mean",
"query": "{\"and\": [{\"or\": [{\"=\": {\"created_by_project_id\": \"9864e20f92ef47238becfe06b869d2ac\"}}, {\"and\": [{\"=\": {\"created_by_project_id\": \"ceecc421f7994cc397380fae5e495179\"}}, {\"=\": {\"project_id\": \"9864e20f92ef47238becfe06b869d2ac\"}}]}]}, {\"and\": [{\"=\": {\"server_group\": \"7a612ef2-ad1d-49fa-a72d-51253761cdda\"}}, {\"=\": {\"ended_at\": null}}]}]}",
"comparison_operator": "gt",
"resource_type": "instance"
},
"alarm_id": "334411fa-8f4b-482d-ba05-c4399a7a3393",
"state": "insufficient data",
"insufficient_data_actions": [],
"repeat_actions": true,
"user_id": "517bcd700274432d96f43616ac1e37ea",
"state_reason": "Not evaluated yet",
"project_id": "9864e20f92ef47238becfe06b869d2ac",
"type": "gnocchi_aggregation_by_resources_threshold",
"evaluate_timestamp": "2025-10-10T23:23:43",
"severity": "low"
}
Note that the alarm is in insufficient data state. This is normal for
newly created clusters; the Metrics Service collects compute metrics every
10 minutes, so it can take up to 20 minutes before there are enough metrics
for the alarms to be evaluated correctly.
Once some time has passed, the alarm should transition into ok state.
$ openstack stack resource show autoscaling-example autoscaling_cpu_high_alarm -c attributes -f json | jq '.attributes'
{
"alarm_actions": [
"https://api.nz-por-1.catalystcloud.io:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3A9864e20f92ef47238becfe06b869d2ac%3Astacks/autoscaling-example/cb1e6eae-b59d-4788-8b9b-a21d2d16c150/resources/autoscaling_up_policy?SignatureMethod=HmacSHA256&AWSAccessKeyId=58caf713909b4be1813a63757cc89e1c&SignatureVersion=2&Signature=eBn3wJ21VeqpNxj%2BA00L0Q88joislaLEqPdDF7FZvfs%3D",
"trust+https://api.nz-por-1.catalystcloud.io:8004/v1/9864e20f92ef47238becfe06b869d2ac/stacks/autoscaling-example/cb1e6eae-b59d-4788-8b9b-a21d2d16c150/resources/autoscaling_up_policy/signal"
],
"ok_actions": [],
"name": "autoscaling-example-autoscaling_cpu_high_alarm-p6bosunxssog",
"timestamp": "2025-10-10T22:08:11.081755",
"description": "Scale out if average CPU usage exceeds 20%",
"time_constraints": [],
"enabled": true,
"state_timestamp": "2025-10-10T22:08:11.081755",
"gnocchi_aggregation_by_resources_threshold_rule": {
"evaluation_periods": 1,
"metric": "cpu",
"threshold": 120000000000.0,
"granularity": 600,
"aggregation_method": "rate:mean",
"query": "{\"and\": [{\"or\": [{\"=\": {\"created_by_project_id\": \"9864e20f92ef47238becfe06b869d2ac\"}}, {\"and\": [{\"=\": {\"created_by_project_id\": \"ceecc421f7994cc397380fae5e495179\"}}, {\"=\": {\"project_id\": \"9864e20f92ef47238becfe06b869d2ac\"}}]}]}, {\"and\": [{\"=\": {\"server_group\": \"7a612ef2-ad1d-49fa-a72d-51253761cdda\"}}, {\"=\": {\"ended_at\": null}}]}]}",
"comparison_operator": "gt",
"resource_type": "instance"
},
"alarm_id": "334411fa-8f4b-482d-ba05-c4399a7a3393",
"state": "ok",
"insufficient_data_actions": [],
"repeat_actions": true,
"user_id": "517bcd700274432d96f43616ac1e37ea",
"state_reason": "Transition to ok due to 1 samples inside threshold, most recent: 2780000000.0",
"project_id": "9864e20f92ef47238becfe06b869d2ac",
"type": "gnocchi_aggregation_by_resources_threshold",
"evaluate_timestamp": "2025-10-10T23:23:43",
"severity": "low"
}
Note
You may find that the “low” threshold alarms are always in alarm state
due to the load on the instances being lower than the configured thresholds.
Normally this would result in a scale in, but because we are already at the configured minimum number of instances in the cluster (2), nothing happens. Similarly, when load exceeds the “high” thresholds and the cluster is already running the maximum number of instances, no scale outs are performed despite the alarms being triggered.
This is normal behaviour, and there are no negative side effects from the alarms constantly being in a triggered state.
We can now test that auto-scaling actually works as intended by inducing a load on one of the web servers.
First, fetch the internal IP address of one of the web server instances using the IDs we fetched earlier.
$ openstack server show a6745eaf-2939-4899-9966-3aaa229f617f -c addresses -f json | jq --raw-output '.addresses | to_entries | [first][0].value[0]'
10.0.0.215
With this, we can login to the web server via the jump host.
Note
Open a new terminal tab or window for interacting with the web server, as we will be coming back to the OpenStack CLI afterwards to keep an eye on the status of the cluster.
The easiest way of doing this is to start an SSH agent in your
terminal, add the SSH key to it, and use the ssh -J option
when logging in to configure the bastion host as the proxy jump host.
$ eval $(ssh-agent -s)
Agent pid 1281681
$ ssh-add ~/.ssh/keypair_private_key.pem
Identity added: ~/.ssh/keypair_private_key.pem (example@example.com)
$ ssh -J ubuntu@192.0.2.1 ubuntu@10.0.0.215
Warning: Permanently added '10.0.0.215' (ED25519) to the list of known hosts.
Welcome to Ubuntu 24.04.2 LTS (GNU/Linux 6.8.0-59-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/pro
System information as of Sat Oct 11 01:09:45 UTC 2025
System load: 0.0 Processes: 101
Usage of /: 18.5% of 8.65GB Users logged in: 0
Memory usage: 20% IPv4 address for ens3: 10.0.0.215
Swap usage: 0%
Expanded Security Maintenance for Applications is not enabled.
0 updates can be applied immediately.
Enable ESM Apps to receive additional future security updates.
See https://ubuntu.com/esm or run: sudo pro status
The list of available updates is more than a week old.
To check for new updates run: sudo apt update
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.
ubuntu@au-x-k2va3k4min64-kgypkpgxo2t3-webserver-goiwpb5kbnil:~$
Next, update the system, and then use APT to install the stress package.
$ sudo apt update
$ sudo apt upgrade
$ sudo apt install stress
Finally, run the stress command in the background
to induce a load on the web server at full utilisation.
The load can then be verified with the htop command.
$ stress --cpu 2 --timeout 1800s &
$ htop
Go back to your original terminal with the OpenStack CLI active. If you check the state of the CPU high alarm again, you should see that it has now been triggered.
Note
Due to compute metrics being collected by the Metrics Service every 10 minutes (as noted above), it will take up to 20 minutes for the alarms to register the increase in load.
$ openstack stack resource show autoscaling-example autoscaling_cpu_high_alarm -c attributes -f json | jq '.attributes'
{
"alarm_actions": [
"https://api.nz-por-1.catalystcloud.io:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3A9864e20f92ef47238becfe06b869d2ac%3Astacks/autoscaling-example/cb1e6eae-b59d-4788-8b9b-a21d2d16c150/resources/autoscaling_up_policy?SignatureMethod=HmacSHA256&AWSAccessKeyId=58caf713909b4be1813a63757cc89e1c&SignatureVersion=2&Signature=eBn3wJ21VeqpNxj%2BA00L0Q88joislaLEqPdDF7FZvfs%3D",
"trust+https://api.nz-por-1.catalystcloud.io:8004/v1/9864e20f92ef47238becfe06b869d2ac/stacks/autoscaling-example/cb1e6eae-b59d-4788-8b9b-a21d2d16c150/resources/autoscaling_up_policy/signal"
],
"ok_actions": [],
"name": "autoscaling-example-autoscaling_cpu_high_alarm-p6bosunxssog",
"timestamp": "2025-10-10T22:08:11.081755",
"description": "Scale out if average CPU usage exceeds 20%",
"time_constraints": [],
"enabled": true,
"state_timestamp": "2025-10-10T22:08:11.081755",
"gnocchi_aggregation_by_resources_threshold_rule": {
"evaluation_periods": 1,
"metric": "cpu",
"threshold": 120000000000.0,
"granularity": 600,
"aggregation_method": "rate:mean",
"query": "{\"and\": [{\"or\": [{\"=\": {\"created_by_project_id\": \"9864e20f92ef47238becfe06b869d2ac\"}}, {\"and\": [{\"=\": {\"created_by_project_id\": \"ceecc421f7994cc397380fae5e495179\"}}, {\"=\": {\"project_id\": \"9864e20f92ef47238becfe06b869d2ac\"}}]}]}, {\"and\": [{\"=\": {\"server_group\": \"7a612ef2-ad1d-49fa-a72d-51253761cdda\"}}, {\"=\": {\"ended_at\": null}}]}]}",
"comparison_operator": "gt",
"resource_type": "instance"
},
"alarm_id": "334411fa-8f4b-482d-ba05-c4399a7a3393",
"state": "ok",
"insufficient_data_actions": [],
"repeat_actions": true,
"user_id": "517bcd700274432d96f43616ac1e37ea",
"state_reason": "Transition to ok due to 1 samples inside threshold, most recent: 2780000000.0",
"project_id": "9864e20f92ef47238becfe06b869d2ac",
"type": "gnocchi_aggregation_by_resources_threshold",
"evaluate_timestamp": "2025-10-10T23:23:43",
"severity": "low"
}
The alarm has now notified the auto-scaling group that a scale out should occur.
If we re-run the command we used earlier to check the number of running instances, we now see that a third instance has joined the cluster!
$ openstack stack resource show autoscaling-example webserver_group -c attributes -f json | jq '.attributes.members'
[
"87e11933-f057-4b45-86b6-da0bb4697905",
"d3d56962-988e-40d6-b5c7-ad7df1abbf63",
"c3bd7529-0140-4d7a-9bb3-ea8286e13dc5"
]
With a third instance now running, we should check that it has joined the
load balancer pool. Using the same curl command we used earlier,
the third instance should have started responding to requests.
$ curl http://192.0.2.2
Hello, world! This request was served by au-x-u6thgn62eec6-ev65hjmjcmss-webserver-vdq2wg3l2mcv (10.0.0.151).
$ curl http://192.0.2.2
Hello, world! This request was served by au-x-k2va3k4min64-kgypkpgxo2t3-webserver-goiwpb5kbnil (10.0.0.215).
$ curl http://192.0.2.2
Hello, world! This request was served by au-x-f6tup25k7exn-jackdd47dc6k-webserver-7rzds4ycexj6 (10.0.0.105).
We have successfully implemented an auto-scaling, highly available web application on the Catalyst Cloud Orchestration Service.
This concludes the tutorial. To clean up, all you need to do is delete the stack and all resources will be quickly deleted.
openstack stack delete autoscaling-example