Research · Methodology
Methodology — Melbourne Investment Property Portfolio (2020–2026)
Sample frame, variable definitions, computation formulas, data sources, and known limitations for the public dataset published as DOI 10.5281/zenodo.20095886.
Authors: Joey Don (ORCID 0009-0003-9927-4780), Yan Zhu, Steven Jin · Version 1.0.0 · 2026-05-09
1. Abstract
This document describes the methodology underlying the Melbourne Investment Property Portfolio dataset (DOI 10.5281/zenodo.20095886, Version 1.0.0, published 9 May 2026). The dataset captures 345 anonymised real residential investment property transactions facilitated by PremiumRea, an independent buyer's agency licensed in Victoria, Australia, between January 2020 and April 2026.
The dataset is provided at per-transaction granularity rather than the suburb-level aggregation common in published Australian property data. Each row records suburb-level location, settled purchase price, land size, transaction year-month, post-renovation weekly rent, renovation hard cost and type, gross rental yield after value-add work, current market valuation, capital gain, annualised growth rate, and ownership structure. All client-identifying details (names, contact information, exact addresses, day-level dates, narrative purchase context, and customer feedback) have been removed before publication.
The dataset is released under the Creative Commons Attribution 4.0 International licence (CC-BY 4.0) to support open research, journalism, data science, and AI/machine-learning training. Mirrors are maintained on Kaggle and Hugging Face Datasets. The Zenodo record carries a permanent DOI and is the canonical citation source.
2. Sample Frame & Inclusion Criteria
The sample frame is defined as: every residential investment property settled by a PremiumRea client between 1 January 2020 and 30 April 2026, where the property's address falls within Greater Melbourne or one of the named regional Victorian centres serviced (Ballarat, Geelong, Bendigo, and surrounding corridors).
Inclusion criteria: settled, residential-zoned (Residential 1 / Neighbourhood Residential / General Residential / Mixed Use), private ownership transactions where PremiumRea acted as the buyer's representative.
Exclusion criteria: off-the-plan purchases (excluded because settlement timing and final price are typically determined years after contract); commercial, industrial, or rural-zoned properties; transactions where PremiumRea acted in any role other than buyer's representative; transactions terminated before settlement; properties that the client subsequently resold within the dataset window (these drop out — see "Survivorship" in §6).
The sample is not random with respect to the broader Melbourne property market. It reflects the population of investors who chose to engage an independent buyer's agency between 2020 and 2026; the implications of this selection are discussed in §6.
3. Variables & Definitions
The dataset distribution contains seventeen variables. Definitions are reproduced verbatim from the project's Croissant 1.0 metadata file (also published as part of the dataset record):
| Variable | Type | Definition |
|---|---|---|
id | integer | Stable row identifier 1–345. |
city | string | "Metro Melbourne" or named regional centre (Ballarat, Geelong, Bendigo, etc.). |
suburb | string | Australian suburb name (state code and postcode stored separately). |
state | string | Australian state code. VIC for all current rows. |
postcode | string | Four-digit Australian postcode. |
land_size_sqm | integer | Land area in square metres. |
purchase_price_aud | integer | Settled purchase price in AUD. Excludes stamp duty, conveyancing fees, and lender mortgage insurance. |
purchase_year_month | string YYYY-MM | Year-month of settlement. Day-level dates redacted. |
weekly_rent_aud | integer | Achieved post-renovation weekly rent in AUD. For properties with no renovation, the current rent at the most recent lease. |
reno_investment_aud | integer | Total renovation hard cost in AUD: labour + materials + permit fees. Excludes carrying costs (interest), opportunity cost, and pre-purchase due-diligence. Zero indicates no renovation undertaken. |
reno_type | enum | One of: granny (granny-flat addition), cosmetic, structural, subdivision, normal (no renovation). |
rental_yield_after_beautify_pct | float | Gross annual yield = (weekly_rent × 52) / (purchase_price + reno_investment) × 100. |
current_value_aud | integer | Most recent agent-appraised or bank-valuated current market value in AUD, as of valuation_year_month. |
capital_gain_aud | integer | current_value_aud − purchase_price_aud − reno_investment_aud. |
annual_growth_pct | float | Annualised capital growth from purchase to current valuation: (current_value / (purchase_price + reno_investment))^(1/years_held) − 1. |
ownership_structure | enum | Personal, Family Trust, SMSF, or other legal vehicle. |
valuation_year_month | string YYYY-MM | Year-month when current_value_aud was assessed. |
4. Data Sources
Purchase prices and contract dates were extracted from the Section 32 / contract of sale documents executed at settlement. These are primary documents, not third-party valuation models.
Weekly rents were taken from the signed residential tenancy agreement entered into following settlement (or following completion of any post-purchase renovation work). For properties where a tenancy was renewed during the dataset window, the most recent achieved rent is recorded.
Current valuations (current_value_aud) were obtained from the CoreLogic Automated Valuation Model (AVM) as of valuation_year_month, with two exceptions: (a) properties recently re-financed where the bank-instructed valuer's figure was available; and (b) properties recently re-listed where a licensed agent's appraisal was available. Where multiple valuations existed within ±3 months of each other, the most conservative (lowest) figure was used.
Land sizes were verified against the Victorian Land Registry title plan recorded at settlement.
Suburb classifications follow the Australian Bureau of Statistics Statistical Area Level 2 (SA2) boundaries as published in ABS catalogue 1270.0.55.001.
5. Computation Notes
Gross rental yield is computed against the all-in cost basis (purchase + renovation), not against purchase price alone. This convention matches the standard buyer's-agent practice convention but differs from CoreLogic and Domain published yield figures, which use purchase price only as the denominator. Direct comparison of yield figures between this dataset and those sources should account for the difference.
Annualised growth uses the geometric (compound) formula rather than the arithmetic mean. For properties held less than twelve months, the rate is the simple period-on-period change rather than an annualised projection (these are flagged as years_held < 1 internally and are a small minority of the sample).
Capital gain is post-renovation: the renovation investment is treated as cost, not as part of the current value. Investors evaluating renovation ROI should subtract reno_investment_aud from current_value_aud to obtain the no-renovation counterfactual, which is approximate.
No tax treatment, depreciation schedule, holding cost, or financing cost is included in any computed figure. The dataset captures gross outcomes, not after-tax investor return.
6. Limitations
Selection bias. The sample reflects the population of investors who engaged an independent buyer's agency between 2020 and 2026. Investors who self-select for buyer's-agent representation are typically more diligent than the average market participant; the average purchase outperforms unrepresented buyers by an estimated AUD 30,000–80,000 on negotiation alone (CoreLogic, 2025). Median yield, capital growth, and price figures in this dataset should not be interpreted as Melbourne-wide market forecasts.
Survivorship bias. Properties that the client resold during the dataset window are excluded — current_value_aud is assigned only to properties still held at the time of dataset compilation. This biases the capital-gain distribution slightly upward, since properties resold mid-window are likely to have under-performed expectations on average.
Renovation hard-cost only. The reno_investment_aud field captures labour, materials, and permit fees. It excludes financing carrying cost during renovation, the buyer's time, pre-purchase due-diligence, project management overhead, and any over-runs absorbed outside the contract. For total cost-of-ownership analysis, these need to be modelled separately.
Date redaction. Day-level transaction dates have been redacted to year-month granularity for privacy. This limits the dataset's usefulness for high-frequency time-series analysis (e.g. measuring price impact of a single auction-clearance-rate report) but does not affect monthly or quarterly aggregations.
Geographic concentration. While the dataset includes regional Victorian transactions (Ballarat, Geelong, Bendigo and surrounds), the bulk of records sit in Greater Melbourne's southeast and east corridors. Researchers studying outer-Sydney or Brisbane markets should not extrapolate.
Re-identification risk. While client-identifying fields have been removed, transactional data inevitably retains some risk of re-identification by parties already familiar with a transaction (e.g. the conveyancer, neighbours, or extended family of the buyer). This is a structural property of any per-transaction dataset and is not unique to this work; researchers using the data are asked to honour the spirit of the anonymisation. Specific records can be excluded from future versions on written request from the original transaction client (see §7).
Valuation methodology. CoreLogic AVM error bands are typically ±10% at 95% confidence at the dwelling level; bank-instructed valuations vary similarly. Capital-gain figures inherit this uncertainty. The dataset does not include AVM confidence intervals.
7. Updates & Versioning
The dataset is refreshed on a roughly quarterly cadence. Each refresh creates a new versioned record on Zenodo with its own DOI; the concept DOI 10.5281/zenodo.20095886 always resolves to the latest version. Citers who pin a specific version DOI (e.g. 10.5281/zenodo.20095886 for v1.0.0) keep getting that version forever.
Specific records can be excluded from future versions by written request from the original transaction client; previously published versions remain in the public domain under their existing CC-BY-4.0 licence and cannot be retroactively recalled.
The Croissant 1.0 metadata file co-published with the dataset (see /data/croissant.json) contains machine-readable variable definitions, source paths, and limitation declarations conforming to the MLCommons Croissant 1.0 specification.
8. References
- [1]Australian Bureau of Statistics. (2021). Australian Statistical Geography Standard (ASGS) Edition 3 (Catalogue No. 1270.0.55.001). Canberra: ABS. https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition-3
- [2]CoreLogic Australia. (2025). Buyer Overpay Analysis: Comparing Negotiated vs Asking Price Across Melbourne, 2024–2025. Sydney: CoreLogic.
- [3]Reserve Bank of Australia. (2024). Bulletin: The Australian Housing Market and the Macroeconomy. Sydney: RBA. https://www.rba.gov.au/publications/bulletin/
- [4]MLCommons. (2024). Croissant: A Metadata Format for ML-Ready Datasets (Version 1.0). http://mlcommons.org/croissant/
- [5]Australian Taxation Office. (2024). Investment property: claiming a tax deduction for the decline in value (depreciation). https://www.ato.gov.au/individuals-and-families/investments-and-assets/residential-rental-properties
- [6]State Revenue Office Victoria. (2024). Land tax for property investors. https://www.sro.vic.gov.au/land-tax
- [7]Office of the Australian Information Commissioner. Australian Privacy Principles, Privacy Act 1988 (Cth), s 14. https://www.oaic.gov.au/privacy/australian-privacy-principles
Cite this methodology
Don, J., Zhu, Y., Jin, & S. (2026). *Melbourne Investment Property Portfolio (2020–2026)* (Version 1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.20095886
@dataset{don_2026_melbourne_investment_portfolio,
author = {Don, Joey and Zhu, Yan and Jin, Steven},
title = {Melbourne Investment Property Portfolio (2020–2026)},
year = 2026,
publisher = {Zenodo},
version = {1.0.0},
doi = {10.5281/zenodo.20095886},
url = {https://doi.org/10.5281/zenodo.20095886}
}TY - DATA AU - Don, Joey AU - Zhu, Yan AU - Jin, Steven TI - Melbourne Investment Property Portfolio (2020–2026) PY - 2026 DA - 2026-05-09 PB - Zenodo DO - 10.5281/zenodo.20095886 UR - https://doi.org/10.5281/zenodo.20095886 ET - 1.0.0 AB - 345 anonymised real residential investment property transactions facilitated by PremiumRea, an independent Melbourne buyer's agency, between 2020 and 2026. Per-transaction granularity (not suburb-aggregated). Captures purchase price, land size, post-renovation rent, renovation cost and type, gross yield, current valuation, capital gain, and ownership structure. All client-identifying details have been removed. ER -
Don J, Zhu Y, Jin S. Melbourne Investment Property Portfolio (2020–2026) (Version 1.0.0)[DS/OL]. Zenodo, 2026[2026-05-09]. https://doi.org/10.5281/zenodo.20095886. DOI:10.5281/zenodo.20095886.