A beginner-friendly guide — no programming knowledge needed. Just follow the steps.
We'll create a small program that:
Searches for commercial properties on RealCommercial
Grabs the details (price, agent, description, demographics)
Prints them nicely in the terminal
The program uses two strategies: a fast API call for search results, and a real browser as backup when the website blocks automated requests.
Windows:
Go to python.org/downloads
Click the yellow button that says "Download Python 3.12.x"
Run the downloaded .exe file
IMPORTANT: Check the box that says "Add Python to PATH" at the bottom
Click "Install Now" and wait for it to finish
macOS:
Go to python.org/downloads
Click the yellow button for "Download Python 3.12.x"
Open the downloaded .pkg file and follow the installer
Ubuntu / Debian Linux:
sudo apt update
sudo apt install python3.12 python3.12-venv
Fedora Linux:
sudo dnf install python3.12
Verify it worked: Open a terminal (Command Prompt on Windows, Terminal on Mac/Linux) and type:
python3.12 --versionYou should see: Python 3.12.x
We need Brave or Google Chrome installed. Either works.
Brave Browser (recommended):
Go to brave.com/download
Download and install like any other program
Google Chrome:
Go to google.com/chrome
Download and install
Important for Linux users: If you install Brave via the terminal, note where it's installed. The default is usually /usr/bin/brave-browser. If it's somewhere else, you'll update one line of code later.
uv is a fast tool that installs Python libraries. We use it instead of pip.
Windows (PowerShell):
powershell -c "irm <https://astral.sh/uv/install.ps1> | iex"
macOS / Linux:
curl -LsSf <https://astral.sh/uv/install.sh> | sh
After installing, close and reopen your terminal.
Verify:
uv --versionShould show something like uv 0.x.x
Open your terminal and run these commands one at a time:
# Create the main folder
mkdir realcommercial-scraper
cd realcommercial-scraper
# Create subfolders
mkdir browser
mkdir models
Now create the following files. Copy-paste each one exactly.
pyproject.toml[project]
name = "realcommercial-scraper"
version = "0.1.0"
description = "Scraper for realcommercial.com.au"
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
"httpx",
"pydantic",
"psutil",
"websocket-client",
]
[tool.ruff.lint]
fixable = ["ALL"]
select = ["I", "B", "E"]
[tool.pyright]
typeCheckingMode = "strict"
venvPath = "."
venv = ".venv"
constants.pyfrom pathlib import Path
HOME = Path.home()
BASE_DIR = Path(__file__).parent
# Browser location — CHANGE THIS if your browser is elsewhere
DEBUG_BROWSER_PATH = "/usr/bin/brave-browser"
# Profile folder (cookies & settings are saved here)
USER_PROFILE_DIR = BASE_DIR / "browser" / "browser_profile"
# Browser window size
WIN_W = 720
WIN_H = 760
For Windows users: Change the browser path to something like:
DEBUG_BROWSER_PATH = "C:\\\\Program Files\\\\BraveSoftware\\\\Brave-Browser\\\\Application\\\\brave.exe"For macOS users:
DEBUG_BROWSER_PATH = "/Applications/Brave Browser.app/Contents/MacOS/Brave Browser"If using Chrome, replace brave with chrome in the path.
api.pyimport httpx
from browser.browser import render_html
from listing_parse import parse_listing
from models.browser import Browser
from models.listing import PropertyDetailResponse
from models.request import SearchPayload
from models.search import SearchResponse
class RealCommercialClient:
def __init__(self) -> None:
self.client: httpx.Client = httpx.Client(
headers={
"Accept": "*/*",
"Accept-Language": "en-US,en;q=0.6",
"Content-Type": "application/json",
"Origin": "<https://www.realcommercial.com.au>",
"Referer": "<https://www.realcommercial.com.au/>",
"User-Agent": (
"Mozilla/5.0 (X11; Linux x86_64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/146.0.0.0 Safari/537.36"
),
},
timeout=30.0,
)
def __enter__(self) -> "RealCommercialClient":
return self
def __exit__(self, exc_type, exc_value, traceback):
self.close()
def search(self, request: SearchPayload) -> SearchResponse:
response = self.client.post(
"<https://api.realcommercial.com.au/listing-ui/searches>",
json=request.model_dump(
by_alias=True, exclude_none=True, exclude_unset=True
),
)
response.raise_for_status()
return SearchResponse.model_validate(response.json())
def update_headers(self) -> None:
self.client.headers.update({
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
"accept-encoding": "utf-8",
"accept-language": "en-US,en;q=0.9",
"cache-control": "no-cache",
"dnt": "1",
"pragma": "no-cache",
"priority": "u=0, i",
"referer": "<https://www.realcommercial.com.au/leased/property-678-high-street-thornbury-vic-3071-505055376>",
"sec-ch-ua": '"Chromium";v="146", "Not-A.Brand";v="24", "Brave";v="146"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": '"Linux"',
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "same-origin",
"sec-gpc": "1",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.0.0 Safari/537.36",
})
def listing_details(
self, pdp_url: str, browser: Browser, timeout: int = 5
) -> PropertyDetailResponse:
self.update_headers()
if pdp_url.startswith("/"):
pdp_url = pdp_url[1:]
full_url = f"<https://www.realcommercial.com.au/{pdp_url}>"
response = self.client.get(full_url)
if response.status_code == 429:
html = render_html(browser, full_url, timeout)
return parse_listing(html)
response.raise_for_status()
return parse_listing(response.text)
def close(self) -> None:
self.client.close()
listing_parse.pyimport json
from models.listing import PropertyDetailResponse
def isolate_data_json(html_content: str) -> str:
first_part = " REA.pageData = "
second_part = ";</script>"
chopped = html_content.split(first_part)[-1]
chopped = chopped.split(second_part)[0]
return chopped
def parse_listing(html: str) -> PropertyDetailResponse:
json_data = isolate_data_json(html)
return PropertyDetailResponse(**json.loads(json_data))
if __name__ == "__main__":
with open("index.html", "r") as f:
html_content = f.read()
result = parse_listing(html_content)
print(f"Title: {result.listing.title}")
print(f"Description: {result.listing.description[:200]}...")
main.pyfrom api import RealCommercialClient
from browser.browser import get_browser_instance
from models.listing import DemographicData
from models.request import SearchPayload as SearchRequest
from models.request import SimpleFilters as SearchFilters
from models.search import SearchListing
def print_separator(char: str = "=", length: int = 10) -> None:
print(char * length)
def print_listing_summary(listing: SearchListing) -> None:
print(f"{listing.title}")
print(f" Address: {listing.address.suburb_address}")
area = getattr(listing.attributes, "area", "N/A")
agent_name = listing.agents[0].name if listing.agents else "N/A"
print(f" Area: {area}")
print(f" Agent: {agent_name}")
print(f" URL: {listing.pdp_url}")
def print_demographics(demographic_data: DemographicData) -> None:
print(" Demographics:")
for insight in demographic_data.insights:
print(f" • {insight.label}: {insight.value}")
def main() -> None:
search_config = SearchRequest(
channel="leased",
filters=SearchFilters(
within_radius="includesurrounding",
surrounding_suburbs=True,
),
page=1,
page_size=100,
)
browser = get_browser_instance(9999)
with RealCommercialClient() as client:
response = client.search(search_config)
print(f"Available results: {response.available_results}")
print(f"Returned listings: {len(response.listings)}")
print_separator()
for listing in response.listings:
print_listing_summary(listing)
print_separator("-")
data = client.listing_details(listing.pdp_url, browser)
description = data.listing.description
print("Details:")
print(f"{description[:300]}...")
if data.demographic_data:
print_demographics(data.demographic_data)
break
if __name__ == "__main__":
main()
Create each file below in its correct folder.
browser/__init__.py
models/__init__.py
models/request.pyfrom pydantic import BaseModel
class SimpleFilters(BaseModel):
within_radius: str = "includesurrounding"
surrounding_suburbs: bool = True
class SearchPayload(BaseModel):
channel: str
filters: SimpleFilters
page: int = 1
page_size: int = 100
models/search.pyfrom typing import Any
from pydantic import BaseModel, Field, HttpUrl
class BrandingLogo(BaseModel):
alt: str
url: HttpUrl
url_template: HttpUrl = Field(alias="urlTemplate")
class Branding(BaseModel):
color: str
logo: BrandingLogo
class Address(BaseModel):
street_address: str = Field(alias="streetAddress")
suburb_address: str = Field(alias="suburbAddress")
class Phone(BaseModel):
display: str
dial: str
class Photo(BaseModel):
alt: str
url: HttpUrl
url_template: HttpUrl | None = Field(None, alias="urlTemplate")
class Agent(BaseModel):
id: str
name: str
image_path: HttpUrl | None = Field(None, alias="imagePath")
image_url_template: HttpUrl | None = Field(None, alias="imageUrlTemplate")
enquiry_uri: str | None = Field(None, alias="enquiryUri")
phone: Phone | None = None
class Agency(BaseModel):
id: str
name: str
additional_branding: bool = Field(alias="additionalBranding")
branding: Branding
phone: Phone | None = None
salespeople: list[Agent] = []
class ListingAttributes(BaseModel):
area: str | None = None
class ConjunctionalAgency(BaseModel):
pass
class Omniture(BaseModel):
pass
class ListingDetails(BaseModel):
pass
class Advertising(BaseModel):
pass
class SearchListing(BaseModel):
highlights: list[str] = []
agencies: list[Agency] = []
days_active: int = Field(alias="daysActive")
agents: list[Agent] = []
attributes: ListingAttributes
id: str
has_tour: bool = Field(alias="hasTour")
pdp_url: str = Field(alias="pdpUrl")
title: str
conjunctional_agencies: list[ConjunctionalAgency] = Field(
default_factory=list, alias="conjunctionalAgencies"
)
product: str
other_agencies: list[str] = Field(default_factory=list, alias="otherAgencies")
omniture: Omniture = Field(default_factory=Omniture)
status: str
address: Address
details: ListingDetails = Field(default_factory=ListingDetails)
branding: Branding
photos: list[Photo] = []
class SearchResponse(BaseModel):
listings: list[SearchListing]
surrounding_suburb_listings: list[Any] = Field(alias="surroundingSuburbListings")
resolved_locations: list[Any] = Field(alias="resolvedLocations")
available_results: int = Field(alias="availableResults")
advertising: Advertising = Field(default_factory=Advertising)
models/listing.pyfrom datetime import datetime
from typing import Any
from pydantic import BaseModel, Field, HttpUrl
class FullAddress(BaseModel):
street_address: str = Field(alias="streetAddress")
suburb_address: str = Field(alias="suburbAddress")
state: str
postcode: str
suburb: str
marketing_region: str | None = Field(None, alias="marketingRegion")
marketing_suburb: str | None = Field(None, alias="marketingSuburb")
class PriceInfo(BaseModel):
display: str
is_price_hidden: bool = Field(alias="isPriceHidden")
class ListingPrice(BaseModel):
leased: PriceInfo | None = None
class Photo(BaseModel):
alt: str
url: HttpUrl
url_template: HttpUrl | None = Field(None, alias="urlTemplate")
class FloorPlan(BaseModel):
alt: str
url: HttpUrl
url_template: HttpUrl | None = Field(None, alias="urlTemplate")
class Attribute(BaseModel):
id: str
label: str
value: str
class MapData(BaseModel):
zoom_level: int = Field(alias="zoomLevel")
thumbnail: HttpUrl
lat: float
lng: float
precision: str
class PropertyType(BaseModel):
id: str
url: str
old_url: str = Field(alias="oldUrl")
agency_url: str = Field(alias="agencyUrl")
marketing: str
display_text: str = Field(alias="displayText")
long_display_text: str = Field(alias="longDisplayText")
sentence_display_text: str = Field(alias="sentenceDisplayText")
pdp_title: str = Field(alias="pdpTitle")
icon_name: str | None = Field(None, alias="iconName")
class TenureType(BaseModel):
key: str
omniture: str
display_text: str = Field(alias="displayText")
class AvailableChannel(BaseModel):
id: str
price: str
omniture: str
tealium: str
campaign: str
krux: str
url: str
ad_area: str = Field(alias="adArea")
short_human_readable: str = Field(alias="shortHumanReadable")
display_in_select: str = Field(alias="displayInSelect")
human_readable: str = Field(alias="humanReadable")
title_variant: str = Field(alias="titleVariant")
title: str
class DescriptionMetadata(BaseModel):
phone_numbers: list[str] = Field(default_factory=list, alias="phoneNumbers")
class SimilarListing(BaseModel):
id: str
title: str
pdp_url: str = Field(alias="pdpUrl")
address: Any
area: str | None = None
price: ListingPrice
main_photo: Photo = Field(alias="mainPhoto")
branding: Any
product: str
property_type_objects: list[PropertyType] = Field(alias="propertyTypeObjects")
class DemographicInsightItem(BaseModel):
icon: str
label: str
value: str
typename: str = Field(alias="__typename")
class DemographicData(BaseModel):
summary: str
insights: list[DemographicInsightItem]
class Listing(BaseModel):
id: str
title: str
description: str
canonical_path: str = Field(alias="canonicalPath")
product: str
status: str | None = None
days_active: int = Field(alias="daysActive")
last_updated_at: datetime = Field(alias="lastUpdatedAt")
address: FullAddress
price: ListingPrice
photos: list[Photo] = []
floor_plans: list[FloorPlan] = Field(alias="floorPlans")
main_photo: Photo | None = None
attributes: list[Attribute] = []
highlights: list[str] = []
map: MapData
agencies: list[Any] = []
branding: Any
property_type_objects: list[PropertyType] = Field(alias="propertyTypeObjects")
tenure_type_object: TenureType = Field(alias="tenureTypeObject")
available_channel_objects: list[AvailableChannel] = Field(alias="availableChannelObjects")
similar_listings: list[SimilarListing] = Field(alias="similarListings")
description_metadata: DescriptionMetadata = Field(default_factory=DescriptionMetadata, alias="descriptionMetadata")
high_quality_listing: bool = Field(alias="highQualityListing")
multiple_properties: bool = Field(alias="multipleProperties")
tours: list[Any] = []
websites: list[Any] = []
class PropertyDetailResponse(BaseModel):
listing: Listing
demographic_data: DemographicData | None = Field(None, alias="demographicData")
models/browser.pyfrom pydantic import BaseModel
from websocket import WebSocket
class DebuggerInfo(BaseModel):
last_updates: int = -1
class Browser(BaseModel):
ws: WebSocket = WebSocket()
process_id: int = 0
process: Any = None
debugger_info: DebuggerInfo = DebuggerInfo()
class Config:
arbitrary_types_allowed = True
def connect(self) -> None:
if not self.ws.connected:
self.ws.connect(self.ws.url)
browser/utils.pyimport subprocess
import json
import urllib.request
from constants import DEBUG_BROWSER_PATH, USER_PROFILE_DIR, WIN_W, WIN_H
def spawn_debug_browser(debug_port: int, headless: bool = False):
cmd = [
DEBUG_BROWSER_PATH,
f"--remote-debugging-port={debug_port}",
f"--user-data-dir={USER_PROFILE_DIR}",
f"--window-size={WIN_W},{WIN_H}",
"--no-first-run",
"--no-default-browser-check",
]
if headless:
cmd.append("--headless")
process = subprocess.Popen(cmd, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
return process.pid, process
def attach_debugger(browser, debug_port: int):
try:
resp = urllib.request.urlopen(f"<http://localhost>:{debug_port}/json/version")
data = json.loads(resp.read())
ws_url = data.get("webSocketDebuggerUrl")
if ws_url:
browser.ws.url = ws_url
browser.ws.connect(ws_url)
browser.debugger_info.last_updates = 1
except Exception:
browser.debugger_info.last_updates = -1
browser/actions.pyimport json
import uuid
def _send(browser, method: str, params: dict | None = None):
msg = {"id": uuid.uuid4().int % 1000000, "method": method}
if params:
msg["params"] = params
browser.ws.send(json.dumps(msg))
return json.loads(browser.ws.recv())
def browser_init_domains(browser):
_send(browser, "Page.enable")
_send(browser, "DOM.enable")
_send(browser, "Runtime.enable")
_send(browser, "Network.enable")
def browser_open_url(browser, url: str):
_send(browser, "Page.navigate", {"url": url})
# Wait for page to load
while True:
resp = _send(browser, "Runtime.evaluate", {"expression": "document.readyState"})
state = resp.get("result", {}).get("result", {}).get("value", "")
if state == "complete":
break
def dom_element_html(browser, selector: str, outer_html: bool = True):
doc = _send(browser, "DOM.getDocument")
root = doc["result"]["root"]["nodeId"]
node = _send(browser, "DOM.querySelector", {"nodeId": root, "selector": selector})
node_id = node["result"]["nodeId"]
if outer_html:
html = _send(browser, "DOM.getOuterHTML", {"nodeId": node_id})
else:
html = _send(browser, "DOM.getInnerHTML", {"nodeId": node_id})
return html["result"]["outerHTML"] if outer_html else html["result"]["innerHTML"]
Make sure you're in the realcommercial-scraper folder, then run:
uv sync
This reads pyproject.toml and installs:
httpx — makes web requests
pydantic — handles data cleanly
psutil — manages the browser process
websocket-client — talks to the browser
You should see: Resolved 4 packages in Xms followed by installation messages. If you see errors, double-check that you're in the right folder.
In your terminal (still in the realcommercial-scraper folder):
uv run python main.py
What happens:
uv creates a virtual environment (one-time setup)
A Brave/Chrome browser window opens (this is normal)
The program searches RealCommercial for leased properties
It fetches details for the first listing
Results print in your terminal
Available results: 2437
Returned listings: 100
==========
Shop & Retail Premises • 148m²
Address: Thornbury, VIC 3071
Area: 148m²
Agent: John Smith
URL: /leased/property-678-high-street-thornbury-vic-3071-505055376
----------
Details:
Prime retail space located on High Street, Thornbury...
Demographics:
• Total Population: 19,200
• Median Age: 38
• Average Household Income: $95,400/yr
"Command not found: uv"
→ Close and reopen your terminal. If it still doesn't work, restart your computer.
"python3.12: command not found"
→ On Windows, try python instead of python3.12. On Mac, make sure you installed from python.org (not the built-in one).
Browser doesn't open / "connection refused"
→ Check constants.py. Make sure DEBUG_BROWSER_PATH points to where your browser is actually installed. For Windows, use double backslashes: C:\\\\Program Files\\\\...
"No module named 'models'"
→ Make sure you created the models/__init__.py file (it can be empty). Same for browser/__init__.py.
Browser opens but nothing happens
→ Wait up to 30 seconds. The first run is slow because it creates a browser profile. Subsequent runs are faster.
To get more than the first 100 results, replace main.py's main() function with:
def main() -> None:
search_config = SearchRequest(
channel="leased",
filters=SearchFilters(
within_radius="includesurrounding",
surrounding_suburbs=True,
),
page=1,
page_size=100,
)
browser = get_browser_instance(9999)
with RealCommercialClient() as client:
page = 1
while True:
search_config.page = page
response = client.search(search_config)
listings = response.listings
if not listings:
break
print(f"Page {page}: {len(listings)} listings")
for listing in listings:
print(f" - {listing.title}")
page += 1
This loops through all pages until there are no more results.
Your folder should look exactly like this:
realcommercial-scraper/ ├── pyproject.toml ├── constants.py ├── api.py ├── listing_parse.py ├── main.py ├── browser/ │ ├── init.py │ ├── browser.py │ ├── actions.py │ └── utils.py └── models/ ├── init.py ├── browser.py ├── listing.py ├── request.py └── search.py
You're done! If every file is in place and you ran uv sync, the scraper should work. Run it anytime with:
cd realcommercial-scraper uv run python main.py
Questions? Double-check file names and paths — 95% of issues are a typo or a missing file.
: Prime retail space located on High Street, Thornbury... Demographics: • Total Population: 19,200 • Median Age: 38 • Average Household Income: $95,400/yr
"Command not found: uv"
→ Close and reopen your terminal. If it still doesn't work, restart your computer.
"python3.12: command not found"
→ On Windows, try python instead of python3.12. On Mac, make sure you installed from python.org (not the built-in one).
Browser doesn't open / "connection refused"
→ Check constants.py. Make sure DEBUG_BROWSER_PATH points to where your browser is actually installed. For Windows, use double backslashes: C:\\\\Program Files\\\\...
"No module named 'models'"
→ Make sure you created the models/__init__.py file (it can be empty). Same for browser/__init__.py.
Browser opens but nothing happens
→ Wait up to 30 seconds. The first run is slow because it creates a browser profile. Subsequent runs are faster.
To get more than the first 100 results, replace main.py's main() function with:
def main() -> None:
search_config = SearchRequest(
channel="leased",
filters=SearchFilters(
within_radius="includesurrounding",
surrounding_suburbs=True,
),
page=1,
page_size=100,
)
browser = get_browser_instance(9999)
with RealCommercialClient() as client:
page = 1
while True:
search_config.page = page
response = client.search(search_config)
listings = response.listings
if not listings:
break
print(f"Page {page}: {len(listings)} listings")
for listing in listings:
print(f" - {listing.title}")
page += 1
This loops through all pages until there are no more results.
Katy Salgado - October 30, 2025
Why Residential IP Intelligence Services Are Highly Inaccurate?
Katy Salgado - November 13, 2025
Why Unmetered Proxies Are Cheaper (Even With a Lower Success Rate)
Katy Salgado - November 27, 2025
TCP OS Fingerprinting: How Websites Detect Automated Requests (and How Proxies Help)
Katy Salgado - December 15, 2025
Analyzing Competitor TCP Fingerprints: Do Their Opt-In Networks Really Match Their Public Claims?