Ray: Distributed Computing For All, Half 2

instalment in my two-part sequence on the Ray library, a Python framework created by AnyScale for distributed and parallel computing. Half 1 lined methods to parallelise CPU-intensive Python jobs in your native PC by distributing the workload throughout all out there cores, leading to marked enhancements in runtime. I’ll go away a hyperlink to Half 1 on the finish of this text.

This half offers with an analogous theme, besides we take distributing Python workloads to the subsequent degree through the use of Ray to parallelise them throughout multi-server clusters within the cloud.

In the event you’ve come to this with out having learn Half 1, the TL;DR of Ray is that it’s an open-source distributed computing framework designed to make it straightforward to scale Python packages from a laptop computer to a cluster with minimal code modifications. That alone ought to hopefully be sufficient to pique your curiosity. In my very own check, on my desktop PC, I took a simple, comparatively easy Python program that finds prime numbers and lowered its runtime by an element of 10 by including simply 4 traces of code.

The place are you able to run Ray clusters?

Ray clusters will be arrange on the next:

AWS and GCP Cloud, though unofficial integrations exist for different suppliers, too, reminiscent of Azure
AnyScale, a completely managed platform developed by the creators of Ray.
Kubernetes may also be used through the formally supported KubeRay venture.

Stipulations

To observe together with my course of, you’ll want a couple of issues arrange beforehand. I’ll be utilizing AWS for my demo, as I’ve an present account there; nevertheless, I anticipate the setup for different cloud suppliers and platforms to be very comparable. It’s best to have:

Credentials set as much as run Cloud CLI instructions out of your chosen supplier.
A default VPC and a minimum of one public subnet related to it that has a publicly reachable IP tackle.
An SSH Key pair file (.pem) you can obtain to your native system in order that Ray (and also you) can connect with the nodes in your cluster
You may have sufficient quotas to fulfill the requested variety of nodes and vCPUs in whichever cluster you arrange.

If you wish to do some native testing of your Ray code earlier than deploying it to a cluster, you’ll additionally want to put in the Ray library. We are able to try this utilizing pip.

$ pip set up ray

I’ll be operating all the pieces from a WSL2 Ubuntu shell on my Home windows desktop.

To confirm that Ray has been put in appropriately, it is best to be capable to use its command-line interpreter. In a terminal window, sort within the following command.

$ ray --help

Utilization: ray [OPTIONS] COMMAND [ARGS]...

Choices:
  --logging-level TEXT   The logging degree threshold, selections=['debug',
                         'info', 'warning', 'error', 'critical'],
                         default='information'
  --logging-format TEXT  The logging format.
                         default="%%(asctime)st%%(levelname)s
                         %%(filename)s:%%(lineno)s -- %%(message)s"
  --version              Present the model and exit.
  --help                 Present this message and exit.

Instructions:
  connect               Create or connect to a SSH session to a Ray cluster.
  check-open-ports     Verify open ports within the native Ray cluster.
  cluster-dump         Get log knowledge from a number of nodes.
...
...
...

In the event you don’t see this, one thing has gone unsuitable, and it is best to double-check the output of your set up command.

Assuming all the pieces is OK, we’re good to go.

One final necessary level, although. Creating sources, reminiscent of compute clusters, on a cloud supplier like AWS will incur prices, so it’s important you bear this in thoughts. The excellent news is that Ray has a built-in command that may tear down any infrastructure you create, however to be protected, it is best to double-check that no unused and doubtlessly pricey companies get left “switched on” by mistake.

Our instance Python code

Step one is to switch our present Ray code from Half 1 to run on a cluster. Right here is the unique code to your reference. Recall that we are attempting to depend the variety of prime numbers inside a particular numeric vary.

import math
import time

# -----------------------------------------
# Change No. 1
# -----------------------------------------
import ray
ray.init()

def is_prime(n: int) -> bool:
    if n < 2: return False
    if n == 2: return True
    if n % 2 == 0: return False
    r = int(math.isqrt(n)) + 1
    for i in vary(3, r, 2):
        if n % i == 0:
            return False
    return True

# -----------------------------------------
# Change No. 2
# -----------------------------------------
@ray.distant(num_cpus=1)  # pure-Python loop → 1 CPU per process
def count_primes(a: int, b: int) -> int:
    c = 0
    for n in vary(a, b):
        if is_prime(n):
            c += 1
    return c

if __name__ == "__main__":
    A, B = 10_000_000, 20_000_000
    total_cpus = int(ray.cluster_resources().get("CPU", 1))

    # Begin "chunky"; we are able to sweep this later
    chunks = max(4, total_cpus * 2)
    step = (B - A) // chunks

    print(f"nodes={len(ray.nodes())}, CPUs~{total_cpus}, chunks={chunks}")
    t0 = time.time()
    refs = []
    for i in vary(chunks):
        s = A + i * step
        e = s + step if i < chunks - 1 else B
        # -----------------------------------------
        # Change No. 3
        # -----------------------------------------
        refs.append(count_primes.distant(s, e))

    # -----------------------------------------
    # Change No. 4
    # -----------------------------------------
    whole = sum(ray.get(refs))

    print(f"whole={whole}, time={time.time() - t0:.2f}s")

What modifications are wanted to run it on a cluster? The reply is that only one minor change is required.

Change 

ray.init() 

to

ray.init(tackle=auto)

That’s one of many beauties of Ray. The identical code runs nearly unmodified in your native PC, and anyplace else you care to run it, together with massive, multi-server cloud clusters.

Establishing our cluster

On the cloud, a Ray cluster consists of a head node and a number of employee nodes. In AWS, all these nodes are merely EC2 situations. Ray clusters will be fixed-size or autoscale up and down primarily based on the sources requested by purposes operating on the cluster. The top node is began first, and the employee nodes are configured with the pinnacle node’s tackle to kind the cluster. If auto-scaling is enabled, employee nodes routinely scale up or down primarily based on the applying’s load and can scale down after a user-specified interval (5 minutes by default).

Ray makes use of YAML recordsdata to arrange clusters. A YAML file is only a plain-text file with a JSON-like syntax used for system configuration.

Right here is the YAML file I’ll be utilizing to arrange my cluster. I discovered that the closest EC2 occasion to my desktop PC, when it comes to CPU core depend and efficiency, was a c7g.8xlarge. For simplicity, I’m having the pinnacle node be the identical server sort as all the employees, however you may combine and match totally different EC2 varieties if desired.

cluster_name: ray_test

supplier:
  sort: aws
  area: eu-west-1
  availability_zone: eu-west-1a

auth:
  # For Amazon Linux AMIs the SSH consumer is 'ec2-user'.
  # In the event you change to an Ubuntu AMI, change this to 'ubuntu'.
  ssh_user: ec2-user
  ssh_private_key: ~/.ssh/ray-autoscaler_eu-west-1.pem

max_workers: 10
idle_timeout_minutes: 10

head_node_type: head_node

available_node_types:
  head_node:
    node_config:
      InstanceType: c7g.8xlarge
      ImageId: ami-06687e45b21b1fca9
      KeyName: ray-autoscaler_eu-west-1

  worker_node:
    min_workers: 5
    max_workers: 5
    node_config:
      InstanceType: c7g.8xlarge
      ImageId: ami-06687e45b21b1fca9
      KeyName: ray-autoscaler_eu-west-1
      InstanceMarketOptions:
        MarketType: spot

# =========================
# Setup instructions (run on head + staff)
# =========================
setup_commands:
  - |
    set -euo pipefail

    have_cmd() { command -v "$1" >/dev/null 2>&1; }
    have_pip_py() {
      python3 -c 'import importlib.util, sys; sys.exit(0 if importlib.util.find_spec("pip") else 1)'
    }

    # 1) Guarantee Python 3 is current
    if ! have_cmd python3; then
      if have_cmd dnf; then
        sudo dnf set up -y python3
      elif have_cmd yum; then
        sudo yum set up -y python3
      elif have_cmd apt-get; then
        sudo apt-get replace -y
        sudo apt-get set up -y python3
      else
        echo "No supported bundle supervisor discovered to put in python3." >&2
        exit 1
      fi
    fi

    # 2) Guarantee pip exists
    if ! have_pip_py; then
      python3 -m ensurepip --upgrade >/dev/null 2>&1 || true
    fi
    if ! have_pip_py; then
      if have_cmd dnf; then
        sudo dnf set up -y python3-pip || true
      elif have_cmd yum; then
        sudo yum set up -y python3-pip || true
      elif have_cmd apt-get; then
        sudo apt-get replace -y || true
        sudo apt-get set up -y python3-pip || true
      fi
    fi
    if ! have_pip_py; then
      curl -fsS  -o /tmp/get-pip.py
      python3 /tmp/get-pip.py
    fi

    # 3) Improve packaging instruments and set up Ray
    python3 -m pip set up -U pip setuptools wheel
    python3 -m pip set up -U "ray[default]"

Here’s a transient clarification of every vital YAML part.

cluster_name: Assigns a reputation to the cluster, permitting Ray to trace and handle 
it individually from others.

supplier:  Specifies which cloud to make use of (AWS right here), together with the area and 
availability zone for launching situations.

auth:  Defines how Ray connects to situations over SSH - the consumer identify and the 
non-public key used for authentication.

max_workers:  Units the utmost variety of employee nodes Ray can scale as much as when 
extra compute is required.

idle_timeout_minutes:  Tells Ray how lengthy to attend earlier than routinely terminating 
idle employee nodes.

available_node_types:  Describes the totally different node varieties (head and staff), their 
occasion sizes, AMI photographs, and scaling limits.

head_node_type:  Identifies which of the node varieties acts because the cluster's controller
(the pinnacle node).

setup_commands:  Lists shell instructions that run as soon as on every node when it is first 
created, usually to put in software program or arrange the surroundings.

To start out the cluster creation, use this ray command from the terminal.

$ ray up -y ray_test.yaml

Ray will do its factor, creating all the mandatory infrastructure, and after a couple of minutes, it is best to see one thing like this in your terminal window.

...
...
...
Subsequent steps
  So as to add one other node to this Ray cluster, run
    ray begin --address='10.0.9.248:6379'

  To hook up with this Ray cluster:
    import ray
    ray.init()

  To submit a Ray job utilizing the Ray Jobs CLI:
    RAY_ADDRESS=' ray job submit --working-dir . -- python my_script.py

  See 
  for extra info on submitting Ray jobs to the Ray cluster.

  To terminate the Ray runtime, run
    ray cease

  To view the standing of the cluster, use
    ray standing

  To observe and debug Ray, view the dashboard at
    10.0.9.248:8265

  If connection to the dashboard fails, test your firewall settings and community configuration.
Shared connection to 108.130.38.255 closed.
  New standing: up-to-date

Helpful instructions:
  To terminate the cluster:
    ray down /mnt/c/Customers/thoma/ray_test.yaml

  To retrieve the IP tackle of the cluster head:
    ray get-head-ip /mnt/c/Customers/thoma/ray_test.yaml

  To port-forward the cluster's Ray Dashboard to the native machine:
    ray dashboard /mnt/c/Customers/thoma/ray_test.yaml

  To submit a job to the cluster, port-forward the Ray Dashboard in one other terminal and run:
    ray job submit --address  --working-dir . -- python my_script.py

  To hook up with a terminal on the cluster head for debugging:
    ray connect /mnt/c/Customers/thoma/ray_test.yaml

  To observe autoscaling:
    ray exec /mnt/c/Customers/thoma/ray_test.yaml 'tail -n 100 -f /tmp/ray/session_latest/logs/monitor*'

Operating a Ray job on a cluster

At this stage, the cluster has been constructed, and we’re able to submit our Ray job to it. To provide the cluster one thing extra substantial to work with, I elevated the vary for the prime search in my code from 10,000,000 to twenty,000,000 to 10,000,000–60,000,000. On my native desktop, Ray ran this in 18 seconds.

I waited a short while for all of the cluster nodes to initialise totally, then ran the code on the cluster with this command.

$  ray exec ray_test.yaml 'python3 ~/ray_test.py'

Right here is my output.

(base) tom@tpr-desktop:/mnt/c/Customers/thoma$ ray exec ray_test2.yaml 'python3 ~/primes_ray.py'
2025-11-01 13:44:22,983 INFO util.py:389 -- setting max staff for head node sort to 0
Loaded cached supplier configuration
In the event you expertise points with the cloud supplier, attempt re-running the command with --no-config-cache.
Fetched IP: 52.213.155.130
Warning: Completely added '52.213.155.130' (ED25519) to the listing of identified hosts.
2025-11-01 13:44:26,469 INFO employee.py:1832 -- Connecting to present Ray cluster at tackle: 10.0.5.86:6379...
2025-11-01 13:44:26,477 INFO employee.py:2003 -- Related to Ray cluster. View the dashboard at 
nodes=6, CPUs~192, chunks=384
(autoscaler +2s) Tip: use `ray standing` to view detailed cluster standing. To disable these messages, set RAY_SCHEDULER_EVENTS=0.
(autoscaler +2s) No out there node varieties can fulfill useful resource requests {'CPU': 1.0}*160. Add appropriate node varieties to this cluster to resolve this challenge.
whole=2897536, time=5.71s
Shared connection to 52.213.155.130 closed.

As you may see the time taken to run on the cluster was simply over 5 seconds. So, 5 employee nodes ran the identical job in lower than a 3rd of the time it took on my native PC. Not too shabby.

If you’re completed along with your cluster, please run the next Ray command to tear it down.

$ ray down -y ray_test.yaml

As I discussed earlier than, it is best to at all times double-check your account to make sure this command has labored as anticipated.

Abstract

This text, the second in a two-part sequence, demonstrates methods to run CPU-intensive Python code on cloud-based clusters utilizing the Ray library. By spreading the workload throughout all out there vCPUs, Ray ensures our code delivers quick efficiency and runtimes.

I described and confirmed methods to create a cluster utilizing a YAML file and methods to utilise the Ray command-line interface to submit code for execution on the cluster.

Utilizing AWS for example platform, I took Ray Python code, which had been operating on my native PC and ran it — nearly unchanged — on a 6-node EC2 cluster. This confirmed vital efficiency enhancements (3x) over the non-cluster run time.

Lastly, I confirmed methods to use the ray command-line instrument to tear down the AWS cluster infrastructure Ray had created.

In the event you haven’t already learn my first article on this sequence, click on on the hyperlink beneath to test it out.

Please notice that aside from being a some-time consumer of their companies, I’ve no affiliation with AnyScale or AWS or some other organisation talked about on this article.

Top Posts

Everything You Need to Know About Domain Registration for IoT Projects

From virtual experiments to biomedical insight with synthetic data

Humanoid Robot Design Takes Center Stage at Robotics Summit Panel

Ray: Distributed Computing For All, Half 2

Insider Leak: Claude Fable 5 Quietly Curbed AI Researchers—And the Internet Exploded

AI-Powered Portfolio Trading: The Future of Automated Investing

Why Decade-Old Residual Connections Still Dominate AI—And Why That’s Holding Us Back

Moonshot AI Launches Kimi Work, a Local Desktop Agent Reportedly Running on Kimi K2.6 With a 300-Sub-Agent Agent Swarm

Building a Feature Stores from the Ground Up: A Hands-On Guide to a Minimal Working Implementation

Turn Your AI Agents Into Star Performers Before They Go Rogue

Everything You Need to Know About Domain Registration for IoT Projects

From virtual experiments to biomedical insight with synthetic data

Humanoid Robot Design Takes Center Stage at Robotics Summit Panel

Insider Leak: Claude Fable 5 Quietly Curbed AI Researchers—And the Internet Exploded

Moonshot AI’s Kimi Work Unleashes 300 AI Agents Directly to Your Desktop

In Other News

From Local to Global: The Strategy Behind Our 10x Scanning Capacity Breakthrough

I Transformed My Phone Into 35 Essential Measuring Tools Using This Free Android App — Here’s My Complete Test Results

Trending

Everything You Need to Know About Domain Registration for IoT Projects

From virtual experiments to biomedical insight with synthetic data

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Ray: Distributed Computing For All, Half 2

The place are you able to run Ray clusters?

Stipulations

Our instance Python code

Establishing our cluster

Operating a Ray job on a cluster

Abstract

Related Posts