Proxyrack - May 6, 2026

How to Scrape RealCommercial.com.au

Data ScrapingTutorialsUse Case

Scrape RealCommercial.com.au

A beginner-friendly guide — no programming knowledge needed. Just follow the steps.


What We're Building

We'll create a small program that:

  1. Searches for commercial properties on RealCommercial

  2. Grabs the details (price, agent, description, demographics)

  3. Prints them nicely in the terminal

The program uses two strategies: a fast API call for search results, and a real browser as backup when the website blocks automated requests.


Step 1: Install Python 3.12

Python Installation

Windows:

  1. Go to python.org/downloads

  2. Click the yellow button that says "Download Python 3.12.x"

  3. Run the downloaded .exe file

  4. IMPORTANT: Check the box that says "Add Python to PATH" at the bottom

  5. Click "Install Now" and wait for it to finish

macOS:

  1. Go to python.org/downloads

  2. Click the yellow button for "Download Python 3.12.x"

  3. Open the downloaded .pkg file and follow the installer

Ubuntu / Debian Linux:

sudo apt update
sudo apt install python3.12 python3.12-venv

Fedora Linux:

sudo dnf install python3.12

Verify it worked: Open a terminal (Command Prompt on Windows, Terminal on Mac/Linux) and type:

python3.12 --versionYou should see: Python 3.12.x


Step 2: Install a Browser

Choose one browser

We need Brave or Google Chrome installed. Either works.

Brave Browser (recommended):

  1. Go to brave.com/download

  2. Download and install like any other program

Google Chrome:

  1. Go to google.com/chrome

  2. Download and install

Important for Linux users: If you install Brave via the terminal, note where it's installed. The default is usually /usr/bin/brave-browser. If it's somewhere else, you'll update one line of code later.


Step 3: Install uv (Package Manager)

Install uv

uv is a fast tool that installs Python libraries. We use it instead of pip.

Windows (PowerShell):

powershell -c "irm <https://astral.sh/uv/install.ps1> | iex"

macOS / Linux:

curl -LsSf <https://astral.sh/uv/install.sh> | sh

After installing, close and reopen your terminal.

Verify:

uv --versionShould show something like uv 0.x.x


Step 4: Create the Project Folder & Files

Set up the project

Open your terminal and run these commands one at a time:

# Create the main folder
mkdir realcommercial-scraper
cd realcommercial-scraper

# Create subfolders
mkdir browser
mkdir models

Now create the following files. Copy-paste each one exactly.

File 1: pyproject.toml

[project]
name = "realcommercial-scraper"
version = "0.1.0"
description = "Scraper for realcommercial.com.au"
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
    "httpx",
    "pydantic",
    "psutil",
    "websocket-client",
]

[tool.ruff.lint]
fixable = ["ALL"]
select = ["I", "B", "E"]

[tool.pyright]
typeCheckingMode = "strict"
venvPath = "."
venv = ".venv"

File 2: constants.py

from pathlib import Path

HOME = Path.home()
BASE_DIR = Path(__file__).parent

# Browser location — CHANGE THIS if your browser is elsewhere
DEBUG_BROWSER_PATH = "/usr/bin/brave-browser"

# Profile folder (cookies & settings are saved here)
USER_PROFILE_DIR = BASE_DIR / "browser" / "browser_profile"

# Browser window size
WIN_W = 720
WIN_H = 760

For Windows users: Change the browser path to something like:

DEBUG_BROWSER_PATH = "C:\\\\Program Files\\\\BraveSoftware\\\\Brave-Browser\\\\Application\\\\brave.exe"For macOS users:

DEBUG_BROWSER_PATH = "/Applications/Brave Browser.app/Contents/MacOS/Brave Browser"If using Chrome, replace brave with chrome in the path.

File 3: api.py

import httpx

from browser.browser import render_html
from listing_parse import parse_listing
from models.browser import Browser
from models.listing import PropertyDetailResponse
from models.request import SearchPayload
from models.search import SearchResponse

class RealCommercialClient:
    def __init__(self) -> None:
        self.client: httpx.Client = httpx.Client(
            headers={
                "Accept": "*/*",
                "Accept-Language": "en-US,en;q=0.6",
                "Content-Type": "application/json",
                "Origin": "<https://www.realcommercial.com.au>",
                "Referer": "<https://www.realcommercial.com.au/>",
                "User-Agent": (
                    "Mozilla/5.0 (X11; Linux x86_64) "
                    "AppleWebKit/537.36 (KHTML, like Gecko) "
                    "Chrome/146.0.0.0 Safari/537.36"
                ),
            },
            timeout=30.0,
        )

    def __enter__(self) -> "RealCommercialClient":
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        self.close()

    def search(self, request: SearchPayload) -> SearchResponse:
        response = self.client.post(
            "<https://api.realcommercial.com.au/listing-ui/searches>",
            json=request.model_dump(
                by_alias=True, exclude_none=True, exclude_unset=True
            ),
        )
        response.raise_for_status()
        return SearchResponse.model_validate(response.json())

    def update_headers(self) -> None:
        self.client.headers.update({
            "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
            "accept-encoding": "utf-8",
            "accept-language": "en-US,en;q=0.9",
            "cache-control": "no-cache",
            "dnt": "1",
            "pragma": "no-cache",
            "priority": "u=0, i",
            "referer": "<https://www.realcommercial.com.au/leased/property-678-high-street-thornbury-vic-3071-505055376>",
            "sec-ch-ua": '"Chromium";v="146", "Not-A.Brand";v="24", "Brave";v="146"',
            "sec-ch-ua-mobile": "?0",
            "sec-ch-ua-platform": '"Linux"',
            "sec-fetch-dest": "document",
            "sec-fetch-mode": "navigate",
            "sec-fetch-site": "same-origin",
            "sec-gpc": "1",
            "upgrade-insecure-requests": "1",
            "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.0.0 Safari/537.36",
        })

    def listing_details(
        self, pdp_url: str, browser: Browser, timeout: int = 5
    ) -> PropertyDetailResponse:
        self.update_headers()
        if pdp_url.startswith("/"):
            pdp_url = pdp_url[1:]
        full_url = f"<https://www.realcommercial.com.au/{pdp_url}>"
        response = self.client.get(full_url)
        if response.status_code == 429:
            html = render_html(browser, full_url, timeout)
            return parse_listing(html)
        response.raise_for_status()
        return parse_listing(response.text)

    def close(self) -> None:
        self.client.close()

File 4: listing_parse.py

import json

from models.listing import PropertyDetailResponse

def isolate_data_json(html_content: str) -> str:
    first_part = "  REA.pageData = "
    second_part = ";</script>"
    chopped = html_content.split(first_part)[-1]
    chopped = chopped.split(second_part)[0]
    return chopped

def parse_listing(html: str) -> PropertyDetailResponse:
    json_data = isolate_data_json(html)
    return PropertyDetailResponse(**json.loads(json_data))

if __name__ == "__main__":
    with open("index.html", "r") as f:
        html_content = f.read()
    result = parse_listing(html_content)
    print(f"Title: {result.listing.title}")
    print(f"Description: {result.listing.description[:200]}...")

File 5: main.py

from api import RealCommercialClient
from browser.browser import get_browser_instance
from models.listing import DemographicData
from models.request import SearchPayload as SearchRequest
from models.request import SimpleFilters as SearchFilters
from models.search import SearchListing

def print_separator(char: str = "=", length: int = 10) -> None:
    print(char * length)

def print_listing_summary(listing: SearchListing) -> None:
    print(f"{listing.title}")
    print(f"   Address: {listing.address.suburb_address}")
    area = getattr(listing.attributes, "area", "N/A")
    agent_name = listing.agents[0].name if listing.agents else "N/A"
    print(f"   Area: {area}")
    print(f"   Agent: {agent_name}")
    print(f"   URL: {listing.pdp_url}")

def print_demographics(demographic_data: DemographicData) -> None:
    print("  Demographics:")
    for insight in demographic_data.insights:
        print(f"    • {insight.label}: {insight.value}")

def main() -> None:
    search_config = SearchRequest(
        channel="leased",
        filters=SearchFilters(
            within_radius="includesurrounding",
            surrounding_suburbs=True,
        ),
        page=1,
        page_size=100,
    )
    browser = get_browser_instance(9999)

    with RealCommercialClient() as client:
        response = client.search(search_config)
        print(f"Available results: {response.available_results}")
        print(f"Returned listings: {len(response.listings)}")
        print_separator()

        for listing in response.listings:
            print_listing_summary(listing)
            print_separator("-")
            data = client.listing_details(listing.pdp_url, browser)
            description = data.listing.description
            print("Details:")
            print(f"{description[:300]}...")
            if data.demographic_data:
                print_demographics(data.demographic_data)
            break

if __name__ == "__main__":
    main()

Step 5: Add the Supporting Files

These files make the browser and data models work

Create each file below in its correct folder.

browser/__init__.py


models/__init__.py


models/request.py

from pydantic import BaseModel

class SimpleFilters(BaseModel):
    within_radius: str = "includesurrounding"
    surrounding_suburbs: bool = True

class SearchPayload(BaseModel):
    channel: str
    filters: SimpleFilters
    page: int = 1
    page_size: int = 100

models/search.py

from typing import Any

from pydantic import BaseModel, Field, HttpUrl

class BrandingLogo(BaseModel):
    alt: str
    url: HttpUrl
    url_template: HttpUrl = Field(alias="urlTemplate")

class Branding(BaseModel):
    color: str
    logo: BrandingLogo

class Address(BaseModel):
    street_address: str = Field(alias="streetAddress")
    suburb_address: str = Field(alias="suburbAddress")

class Phone(BaseModel):
    display: str
    dial: str

class Photo(BaseModel):
    alt: str
    url: HttpUrl
    url_template: HttpUrl | None = Field(None, alias="urlTemplate")

class Agent(BaseModel):
    id: str
    name: str
    image_path: HttpUrl | None = Field(None, alias="imagePath")
    image_url_template: HttpUrl | None = Field(None, alias="imageUrlTemplate")
    enquiry_uri: str | None = Field(None, alias="enquiryUri")
    phone: Phone | None = None

class Agency(BaseModel):
    id: str
    name: str
    additional_branding: bool = Field(alias="additionalBranding")
    branding: Branding
    phone: Phone | None = None
    salespeople: list[Agent] = []

class ListingAttributes(BaseModel):
    area: str | None = None

class ConjunctionalAgency(BaseModel):
    pass

class Omniture(BaseModel):
    pass

class ListingDetails(BaseModel):
    pass

class Advertising(BaseModel):
    pass

class SearchListing(BaseModel):
    highlights: list[str] = []
    agencies: list[Agency] = []
    days_active: int = Field(alias="daysActive")
    agents: list[Agent] = []
    attributes: ListingAttributes
    id: str
    has_tour: bool = Field(alias="hasTour")
    pdp_url: str = Field(alias="pdpUrl")
    title: str
    conjunctional_agencies: list[ConjunctionalAgency] = Field(
        default_factory=list, alias="conjunctionalAgencies"
    )
    product: str
    other_agencies: list[str] = Field(default_factory=list, alias="otherAgencies")
    omniture: Omniture = Field(default_factory=Omniture)
    status: str
    address: Address
    details: ListingDetails = Field(default_factory=ListingDetails)
    branding: Branding
    photos: list[Photo] = []

class SearchResponse(BaseModel):
    listings: list[SearchListing]
    surrounding_suburb_listings: list[Any] = Field(alias="surroundingSuburbListings")
    resolved_locations: list[Any] = Field(alias="resolvedLocations")
    available_results: int = Field(alias="availableResults")
    advertising: Advertising = Field(default_factory=Advertising)

models/listing.py

from datetime import datetime
from typing import Any

from pydantic import BaseModel, Field, HttpUrl

class FullAddress(BaseModel):
    street_address: str = Field(alias="streetAddress")
    suburb_address: str = Field(alias="suburbAddress")
    state: str
    postcode: str
    suburb: str
    marketing_region: str | None = Field(None, alias="marketingRegion")
    marketing_suburb: str | None = Field(None, alias="marketingSuburb")

class PriceInfo(BaseModel):
    display: str
    is_price_hidden: bool = Field(alias="isPriceHidden")

class ListingPrice(BaseModel):
    leased: PriceInfo | None = None

class Photo(BaseModel):
    alt: str
    url: HttpUrl
    url_template: HttpUrl | None = Field(None, alias="urlTemplate")

class FloorPlan(BaseModel):
    alt: str
    url: HttpUrl
    url_template: HttpUrl | None = Field(None, alias="urlTemplate")

class Attribute(BaseModel):
    id: str
    label: str
    value: str

class MapData(BaseModel):
    zoom_level: int = Field(alias="zoomLevel")
    thumbnail: HttpUrl
    lat: float
    lng: float
    precision: str

class PropertyType(BaseModel):
    id: str
    url: str
    old_url: str = Field(alias="oldUrl")
    agency_url: str = Field(alias="agencyUrl")
    marketing: str
    display_text: str = Field(alias="displayText")
    long_display_text: str = Field(alias="longDisplayText")
    sentence_display_text: str = Field(alias="sentenceDisplayText")
    pdp_title: str = Field(alias="pdpTitle")
    icon_name: str | None = Field(None, alias="iconName")

class TenureType(BaseModel):
    key: str
    omniture: str
    display_text: str = Field(alias="displayText")

class AvailableChannel(BaseModel):
    id: str
    price: str
    omniture: str
    tealium: str
    campaign: str
    krux: str
    url: str
    ad_area: str = Field(alias="adArea")
    short_human_readable: str = Field(alias="shortHumanReadable")
    display_in_select: str = Field(alias="displayInSelect")
    human_readable: str = Field(alias="humanReadable")
    title_variant: str = Field(alias="titleVariant")
    title: str

class DescriptionMetadata(BaseModel):
    phone_numbers: list[str] = Field(default_factory=list, alias="phoneNumbers")

class SimilarListing(BaseModel):
    id: str
    title: str
    pdp_url: str = Field(alias="pdpUrl")
    address: Any
    area: str | None = None
    price: ListingPrice
    main_photo: Photo = Field(alias="mainPhoto")
    branding: Any
    product: str
    property_type_objects: list[PropertyType] = Field(alias="propertyTypeObjects")

class DemographicInsightItem(BaseModel):
    icon: str
    label: str
    value: str
    typename: str = Field(alias="__typename")

class DemographicData(BaseModel):
    summary: str
    insights: list[DemographicInsightItem]

class Listing(BaseModel):
    id: str
    title: str
    description: str
    canonical_path: str = Field(alias="canonicalPath")
    product: str
    status: str | None = None
    days_active: int = Field(alias="daysActive")
    last_updated_at: datetime = Field(alias="lastUpdatedAt")
    address: FullAddress
    price: ListingPrice
    photos: list[Photo] = []
    floor_plans: list[FloorPlan] = Field(alias="floorPlans")
    main_photo: Photo | None = None
    attributes: list[Attribute] = []
    highlights: list[str] = []
    map: MapData
    agencies: list[Any] = []
    branding: Any
    property_type_objects: list[PropertyType] = Field(alias="propertyTypeObjects")
    tenure_type_object: TenureType = Field(alias="tenureTypeObject")
    available_channel_objects: list[AvailableChannel] = Field(alias="availableChannelObjects")
    similar_listings: list[SimilarListing] = Field(alias="similarListings")
    description_metadata: DescriptionMetadata = Field(default_factory=DescriptionMetadata, alias="descriptionMetadata")
    high_quality_listing: bool = Field(alias="highQualityListing")
    multiple_properties: bool = Field(alias="multipleProperties")
    tours: list[Any] = []
    websites: list[Any] = []

class PropertyDetailResponse(BaseModel):
    listing: Listing
    demographic_data: DemographicData | None = Field(None, alias="demographicData")

models/browser.py

from pydantic import BaseModel
from websocket import WebSocket

class DebuggerInfo(BaseModel):
    last_updates: int = -1

class Browser(BaseModel):
    ws: WebSocket = WebSocket()
    process_id: int = 0
    process: Any = None
    debugger_info: DebuggerInfo = DebuggerInfo()

    class Config:
        arbitrary_types_allowed = True

    def connect(self) -> None:
        if not self.ws.connected:
            self.ws.connect(self.ws.url)

browser/utils.py

import subprocess
import json
import urllib.request

from constants import DEBUG_BROWSER_PATH, USER_PROFILE_DIR, WIN_W, WIN_H

def spawn_debug_browser(debug_port: int, headless: bool = False):
    cmd = [
        DEBUG_BROWSER_PATH,
        f"--remote-debugging-port={debug_port}",
        f"--user-data-dir={USER_PROFILE_DIR}",
        f"--window-size={WIN_W},{WIN_H}",
        "--no-first-run",
        "--no-default-browser-check",
    ]
    if headless:
        cmd.append("--headless")
    process = subprocess.Popen(cmd, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    return process.pid, process

def attach_debugger(browser, debug_port: int):
    try:
        resp = urllib.request.urlopen(f"<http://localhost>:{debug_port}/json/version")
        data = json.loads(resp.read())
        ws_url = data.get("webSocketDebuggerUrl")
        if ws_url:
            browser.ws.url = ws_url
            browser.ws.connect(ws_url)
            browser.debugger_info.last_updates = 1
    except Exception:
        browser.debugger_info.last_updates = -1

browser/actions.py

import json
import uuid

def _send(browser, method: str, params: dict | None = None):
    msg = {"id": uuid.uuid4().int % 1000000, "method": method}
    if params:
        msg["params"] = params
    browser.ws.send(json.dumps(msg))
    return json.loads(browser.ws.recv())

def browser_init_domains(browser):
    _send(browser, "Page.enable")
    _send(browser, "DOM.enable")
    _send(browser, "Runtime.enable")
    _send(browser, "Network.enable")

def browser_open_url(browser, url: str):
    _send(browser, "Page.navigate", {"url": url})
    # Wait for page to load
    while True:
        resp = _send(browser, "Runtime.evaluate", {"expression": "document.readyState"})
        state = resp.get("result", {}).get("result", {}).get("value", "")
        if state == "complete":
            break

def dom_element_html(browser, selector: str, outer_html: bool = True):
    doc = _send(browser, "DOM.getDocument")
    root = doc["result"]["root"]["nodeId"]
    node = _send(browser, "DOM.querySelector", {"nodeId": root, "selector": selector})
    node_id = node["result"]["nodeId"]
    if outer_html:
        html = _send(browser, "DOM.getOuterHTML", {"nodeId": node_id})
    else:
        html = _send(browser, "DOM.getInnerHTML", {"nodeId": node_id})
    return html["result"]["outerHTML"] if outer_html else html["result"]["innerHTML"]

Step 6: Install Dependencies

Let uv install everything

Make sure you're in the realcommercial-scraper folder, then run:

uv sync

This reads pyproject.toml and installs:

  • httpx — makes web requests

  • pydantic — handles data cleanly

  • psutil — manages the browser process

  • websocket-client — talks to the browser

You should see: Resolved 4 packages in Xms followed by installation messages. If you see errors, double-check that you're in the right folder.


Step 7: Run the Scraper

Execute the program

In your terminal (still in the realcommercial-scraper folder):

uv run python main.py

What happens:

  1. uv creates a virtual environment (one-time setup)

  2. A Brave/Chrome browser window opens (this is normal)

  3. The program searches RealCommercial for leased properties

  4. It fetches details for the first listing

  5. Results print in your terminal

Expected Output

Available results: 2437
Returned listings: 100
==========
Shop & Retail Premises • 148m²
   Address: Thornbury, VIC 3071
   Area: 148m²
   Agent: John Smith
   URL: /leased/property-678-high-street-thornbury-vic-3071-505055376
----------
Details:
Prime retail space located on High Street, Thornbury...

  Demographics:
    • Total Population: 19,200
    • Median Age: 38
    • Average Household Income: $95,400/yr

Troubleshooting

Common Problems & Fixes

"Command not found: uv"

→ Close and reopen your terminal. If it still doesn't work, restart your computer.

"python3.12: command not found"

→ On Windows, try python instead of python3.12. On Mac, make sure you installed from python.org (not the built-in one).

Browser doesn't open / "connection refused"

→ Check constants.py. Make sure DEBUG_BROWSER_PATH points to where your browser is actually installed. For Windows, use double backslashes: C:\\\\Program Files\\\\...

"No module named 'models'"

→ Make sure you created the models/__init__.py file (it can be empty). Same for browser/__init__.py.

Browser opens but nothing happens

→ Wait up to 30 seconds. The first run is slow because it creates a browser profile. Subsequent runs are faster.


Want More Results?

📄 Getting all listings (pagination)

To get more than the first 100 results, replace main.py's main() function with:

def main() -> None:
    search_config = SearchRequest(
        channel="leased",
        filters=SearchFilters(
            within_radius="includesurrounding",
            surrounding_suburbs=True,
        ),
        page=1,
        page_size=100,
    )
    browser = get_browser_instance(9999)

    with RealCommercialClient() as client:
        page = 1
        while True:
            search_config.page = page
            response = client.search(search_config)
            listings = response.listings
            if not listings:
                break
            print(f"Page {page}: {len(listings)} listings")
            for listing in listings:
                print(f"  - {listing.title}")
            page += 1

This loops through all pages until there are no more results.


Final File Checklist

Your folder should look exactly like this:

realcommercial-scraper/ ├── pyproject.toml ├── constants.py ├── api.py ├── listing_parse.py ├── main.py ├── browser/ │ ├── init.py │ ├── browser.py │ ├── actions.py │ └── utils.py └── models/ ├── init.py ├── browser.py ├── listing.py ├── request.py └── search.py

You're done! If every file is in place and you ran uv sync, the scraper should work. Run it anytime with:

cd realcommercial-scraper uv run python main.py

Questions? Double-check file names and paths — 95% of issues are a typo or a missing file.

: Prime retail space located on High Street, Thornbury... Demographics: • Total Population: 19,200 • Median Age: 38 • Average Household Income: $95,400/yr


Troubleshooting

Common Problems & Fixes

"Command not found: uv"

→ Close and reopen your terminal. If it still doesn't work, restart your computer.

"python3.12: command not found"

→ On Windows, try python instead of python3.12. On Mac, make sure you installed from python.org (not the built-in one).

Browser doesn't open / "connection refused"

→ Check constants.py. Make sure DEBUG_BROWSER_PATH points to where your browser is actually installed. For Windows, use double backslashes: C:\\\\Program Files\\\\...

"No module named 'models'"

→ Make sure you created the models/__init__.py file (it can be empty). Same for browser/__init__.py.

Browser opens but nothing happens

→ Wait up to 30 seconds. The first run is slow because it creates a browser profile. Subsequent runs are faster.


Want More Results?

📄 Getting all listings (pagination)

To get more than the first 100 results, replace main.py's main() function with:

def main() -> None:
    search_config = SearchRequest(
        channel="leased",
        filters=SearchFilters(
            within_radius="includesurrounding",
            surrounding_suburbs=True,
        ),
        page=1,
        page_size=100,
    )
    browser = get_browser_instance(9999)

    with RealCommercialClient() as client:
        page = 1
        while True:
            search_config.page = page
            response = client.search(search_config)
            listings = response.listings
            if not listings:
                break
            print(f"Page {page}: {len(listings)} listings")
            for listing in listings:
                print(f"  - {listing.title}")
            page += 1

This loops through all pages until there are no more results.

Get Started by signing up for a Proxy Product