Set up MATLAB Job Scheduler Cluster for Auto-Resizing
You can customize your MATLAB® Job Scheduler (MJS) cluster to resize automatically. By default, an MJS cluster does not have the resizing functionality enabled. This means that MJS immediately rejects any work you submit to the cluster that requires more than the current number of workers in the cluster. Auto-resizing, also called auto-scaling, allows you to submit such work to the cluster and makes the number of workers in the cluster change automatically with the amount of work submitted. The cluster grows (scales up) when there is more work to do and shrinks (scales down) when there is less work to do. This allows you to use your compute resources more efficiently and can result in cost savings.
To configure your MJS cluster to resize automatically, you need to:
Set the maximum number of workers in the
mjs_def
file.Start an MJS cluster.
Set up an auto-resizing process.
Set Maximum Number of Workers
To make an MJS cluster resizable, you need to define the maximum number of workers of your
cluster by editing the mjs_def
file as follows:
Open the file
mjs_def.sh
(on Linux®) ormjs_def.bat
(on Windows®) located atmatlabroot/toolbox/parallel/bin
, wherematlabroot
is the directory of your MATLAB installation.Uncomment one or both of the lines
#MAX_LINUX_WORKERS=
and#MAX_WINDOWS_WORKERS=
and set them to the desired values. These variables define the maximum number of Linux and Windows workers to which you can resize the cluster, respectively.
A resizable MJS cluster allows jobs in the queue that require more than the current number of workers in the cluster, up to the amount specified in MAX_LINUX_WORKERS and MAX_WINDOWS_WORKERS. Other jobs are cancelled immediately.
Note
To change the maximum number of Linux and Windows workers after the cluster has started, use the
resize
script located at matlabroot/toolbox/parallel/bin
to run the resize update
command. For
example:
% cd matlab/toolbox/parallel/bin % ./resize update -jobmanager myJobManager -maxlinuxworkers 4 -maxwindowsworkers 8
Start MJS Cluster
To create a cluster with the options defined in the mjs_def
file, start
an MJS cluster after editing and saving this file. For more information about how to install,
configure and start an MJS cluster, see Install and Configure MATLAB Parallel Server for MATLAB Job Scheduler and Network License Manager.
Set up Auto-Resizing Process
To make a resizable MJS cluster change size automatically, you must set up a background process to periodically adjust the size of the cluster. The specific implementation of this background process depends on many factors, but you can follow these general recommended steps:
Identify the desired size of the cluster. The desired size of a resizable MJS cluster is reported as the total number of workers for each operating system and hence includes all busy workers and some idle workers that are already in the cluster. The desired size changes based on running jobs and jobs in the queue. Use the
resize
script located atmatlabroot/toolbox/parallel/bin
to run theresize status
command:The% cd matlab/toolbox/parallel/bin % ./resize status
resize status
command above returns information about the resizable cluster in JSON format:Parse the JSON output to extract the{ "jobManagers": [ { "name": "myJobManager", "host": "myhostname", "desiredWorkers": { "linux": 1, "windows": 0 }, "maxWorkers": { "linux": 4, "windows": 8, }, "workers": [ { "name": "worker_1", "host": "myhostname", "operatingSystem": "linux", "state": "busy", "secondsIdle": 0 }, { "name": "worker_2", "host": "myhostname", "operatingSystem": "linux", "state": "idle", "secondsIdle": 60 } ] } ] }
desiredWorkers
values that represent the desired number of Linux and Windows workers for the MJS cluster.Compare the desired number of workers with the workers in the cluster to decide whether you need to start or stop workers. Use the
workers
array in the output of theresize status
command to examine the workers in the cluster. To ensure that jobs in the queue eventually run, you must start enough workers to match or exceed the desired number of workers. You can optionally stop idle workers that exceed the desired number of workers.Note
If workers take a long time to start in your environment, you might want to wait for excess workers to be idle for some time before stopping them. This approach can be more efficient than immediately stopping excess idle workers if they are needed again soon after they become idle. To check how long a worker has been idle, examine the
secondsIdle
value for the worker.Start or stop workers as necessary. To do this, use the
startworker
andstopworker
utility scripts. To avoid interrupting any work when stopping workers, it is recommended that you use the-onidle
flag with thestopworker
command.
See Also
startworker
| stopworker
| mjs