Back to Research

Research · Methodology

Methodology — Melbourne Investment Property Portfolio (2020–2026)

Sample frame, variable definitions, computation formulas, data sources, and known limitations for the public dataset published as DOI 10.5281/zenodo.20095886.

Authors: Joey Don (ORCID 0009-0003-9927-4780), Yan Zhu, Steven Jin · Version 1.0.0 · 2026-05-09

1. Abstract

This document describes the methodology underlying the Melbourne Investment Property Portfolio dataset (DOI 10.5281/zenodo.20095886, Version 1.0.0, published 9 May 2026). The dataset captures 345 anonymised real residential investment property transactions facilitated by PremiumRea, an independent buyer's agency licensed in Victoria, Australia, between January 2020 and April 2026.

The dataset is provided at per-transaction granularity rather than the suburb-level aggregation common in published Australian property data. Each row records suburb-level location, settled purchase price, land size, transaction year-month, post-renovation weekly rent, renovation hard cost and type, gross rental yield after value-add work, current market valuation, capital gain, annualised growth rate, and ownership structure. All client-identifying details (names, contact information, exact addresses, day-level dates, narrative purchase context, and customer feedback) have been removed before publication.

The dataset is released under the Creative Commons Attribution 4.0 International licence (CC-BY 4.0) to support open research, journalism, data science, and AI/machine-learning training. Mirrors are maintained on Kaggle and Hugging Face Datasets. The Zenodo record carries a permanent DOI and is the canonical citation source.

2. Sample Frame & Inclusion Criteria

The sample frame is defined as: every residential investment property settled by a PremiumRea client between 1 January 2020 and 30 April 2026, where the property's address falls within Greater Melbourne or one of the named regional Victorian centres serviced (Ballarat, Geelong, Bendigo, and surrounding corridors).

Inclusion criteria: settled, residential-zoned (Residential 1 / Neighbourhood Residential / General Residential / Mixed Use), private ownership transactions where PremiumRea acted as the buyer's representative.

Exclusion criteria: off-the-plan purchases (excluded because settlement timing and final price are typically determined years after contract); commercial, industrial, or rural-zoned properties; transactions where PremiumRea acted in any role other than buyer's representative; transactions terminated before settlement; properties that the client subsequently resold within the dataset window (these drop out — see "Survivorship" in §6).

The sample is not random with respect to the broader Melbourne property market. It reflects the population of investors who chose to engage an independent buyer's agency between 2020 and 2026; the implications of this selection are discussed in §6.

3. Variables & Definitions

The dataset distribution contains seventeen variables. Definitions are reproduced verbatim from the project's Croissant 1.0 metadata file (also published as part of the dataset record):

VariableTypeDefinition
idintegerStable row identifier 1–345.
citystring"Metro Melbourne" or named regional centre (Ballarat, Geelong, Bendigo, etc.).
suburbstringAustralian suburb name (state code and postcode stored separately).
statestringAustralian state code. VIC for all current rows.
postcodestringFour-digit Australian postcode.
land_size_sqmintegerLand area in square metres.
purchase_price_audintegerSettled purchase price in AUD. Excludes stamp duty, conveyancing fees, and lender mortgage insurance.
purchase_year_monthstring YYYY-MMYear-month of settlement. Day-level dates redacted.
weekly_rent_audintegerAchieved post-renovation weekly rent in AUD. For properties with no renovation, the current rent at the most recent lease.
reno_investment_audintegerTotal renovation hard cost in AUD: labour + materials + permit fees. Excludes carrying costs (interest), opportunity cost, and pre-purchase due-diligence. Zero indicates no renovation undertaken.
reno_typeenumOne of: granny (granny-flat addition), cosmetic, structural, subdivision, normal (no renovation).
rental_yield_after_beautify_pctfloatGross annual yield = (weekly_rent × 52) / (purchase_price + reno_investment) × 100.
current_value_audintegerMost recent agent-appraised or bank-valuated current market value in AUD, as of valuation_year_month.
capital_gain_audintegercurrent_value_aud − purchase_price_aud − reno_investment_aud.
annual_growth_pctfloatAnnualised capital growth from purchase to current valuation: (current_value / (purchase_price + reno_investment))^(1/years_held) − 1.
ownership_structureenumPersonal, Family Trust, SMSF, or other legal vehicle.
valuation_year_monthstring YYYY-MMYear-month when current_value_aud was assessed.

4. Data Sources

Purchase prices and contract dates were extracted from the Section 32 / contract of sale documents executed at settlement. These are primary documents, not third-party valuation models.

Weekly rents were taken from the signed residential tenancy agreement entered into following settlement (or following completion of any post-purchase renovation work). For properties where a tenancy was renewed during the dataset window, the most recent achieved rent is recorded.

Current valuations (current_value_aud) were obtained from the CoreLogic Automated Valuation Model (AVM) as of valuation_year_month, with two exceptions: (a) properties recently re-financed where the bank-instructed valuer's figure was available; and (b) properties recently re-listed where a licensed agent's appraisal was available. Where multiple valuations existed within ±3 months of each other, the most conservative (lowest) figure was used.

Land sizes were verified against the Victorian Land Registry title plan recorded at settlement.

Suburb classifications follow the Australian Bureau of Statistics Statistical Area Level 2 (SA2) boundaries as published in ABS catalogue 1270.0.55.001.

5. Computation Notes

Gross rental yield is computed against the all-in cost basis (purchase + renovation), not against purchase price alone. This convention matches the standard buyer's-agent practice convention but differs from CoreLogic and Domain published yield figures, which use purchase price only as the denominator. Direct comparison of yield figures between this dataset and those sources should account for the difference.

Annualised growth uses the geometric (compound) formula rather than the arithmetic mean. For properties held less than twelve months, the rate is the simple period-on-period change rather than an annualised projection (these are flagged as years_held < 1 internally and are a small minority of the sample).

Capital gain is post-renovation: the renovation investment is treated as cost, not as part of the current value. Investors evaluating renovation ROI should subtract reno_investment_aud from current_value_aud to obtain the no-renovation counterfactual, which is approximate.

No tax treatment, depreciation schedule, holding cost, or financing cost is included in any computed figure. The dataset captures gross outcomes, not after-tax investor return.

6. Limitations

Selection bias. The sample reflects the population of investors who engaged an independent buyer's agency between 2020 and 2026. Investors who self-select for buyer's-agent representation are typically more diligent than the average market participant; the average purchase outperforms unrepresented buyers by an estimated AUD 30,000–80,000 on negotiation alone (CoreLogic, 2025). Median yield, capital growth, and price figures in this dataset should not be interpreted as Melbourne-wide market forecasts.

Survivorship bias. Properties that the client resold during the dataset window are excluded — current_value_aud is assigned only to properties still held at the time of dataset compilation. This biases the capital-gain distribution slightly upward, since properties resold mid-window are likely to have under-performed expectations on average.

Renovation hard-cost only. The reno_investment_aud field captures labour, materials, and permit fees. It excludes financing carrying cost during renovation, the buyer's time, pre-purchase due-diligence, project management overhead, and any over-runs absorbed outside the contract. For total cost-of-ownership analysis, these need to be modelled separately.

Date redaction. Day-level transaction dates have been redacted to year-month granularity for privacy. This limits the dataset's usefulness for high-frequency time-series analysis (e.g. measuring price impact of a single auction-clearance-rate report) but does not affect monthly or quarterly aggregations.

Geographic concentration. While the dataset includes regional Victorian transactions (Ballarat, Geelong, Bendigo and surrounds), the bulk of records sit in Greater Melbourne's southeast and east corridors. Researchers studying outer-Sydney or Brisbane markets should not extrapolate.

Re-identification risk. While client-identifying fields have been removed, transactional data inevitably retains some risk of re-identification by parties already familiar with a transaction (e.g. the conveyancer, neighbours, or extended family of the buyer). This is a structural property of any per-transaction dataset and is not unique to this work; researchers using the data are asked to honour the spirit of the anonymisation. Specific records can be excluded from future versions on written request from the original transaction client (see §7).

Valuation methodology. CoreLogic AVM error bands are typically ±10% at 95% confidence at the dwelling level; bank-instructed valuations vary similarly. Capital-gain figures inherit this uncertainty. The dataset does not include AVM confidence intervals.

7. Updates & Versioning

The dataset is refreshed on a roughly quarterly cadence. Each refresh creates a new versioned record on Zenodo with its own DOI; the concept DOI 10.5281/zenodo.20095886 always resolves to the latest version. Citers who pin a specific version DOI (e.g. 10.5281/zenodo.20095886 for v1.0.0) keep getting that version forever.

Specific records can be excluded from future versions by written request from the original transaction client; previously published versions remain in the public domain under their existing CC-BY-4.0 licence and cannot be retroactively recalled.

The Croissant 1.0 metadata file co-published with the dataset (see /data/croissant.json) contains machine-readable variable definitions, source paths, and limitation declarations conforming to the MLCommons Croissant 1.0 specification.

8. References

  1. [1]Australian Bureau of Statistics. (2021). Australian Statistical Geography Standard (ASGS) Edition 3 (Catalogue No. 1270.0.55.001). Canberra: ABS. https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition-3
  2. [2]CoreLogic Australia. (2025). Buyer Overpay Analysis: Comparing Negotiated vs Asking Price Across Melbourne, 2024–2025. Sydney: CoreLogic.
  3. [3]Reserve Bank of Australia. (2024). Bulletin: The Australian Housing Market and the Macroeconomy. Sydney: RBA. https://www.rba.gov.au/publications/bulletin/
  4. [4]MLCommons. (2024). Croissant: A Metadata Format for ML-Ready Datasets (Version 1.0). http://mlcommons.org/croissant/
  5. [5]Australian Taxation Office. (2024). Investment property: claiming a tax deduction for the decline in value (depreciation). https://www.ato.gov.au/individuals-and-families/investments-and-assets/residential-rental-properties
  6. [6]State Revenue Office Victoria. (2024). Land tax for property investors. https://www.sro.vic.gov.au/land-tax
  7. [7]Office of the Australian Information Commissioner. Australian Privacy Principles, Privacy Act 1988 (Cth), s 14. https://www.oaic.gov.au/privacy/australian-privacy-principles

Cite this methodology

APA
Don, J., Zhu, Y., Jin, & S. (2026). *Melbourne Investment Property Portfolio (2020–2026)* (Version 1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.20095886
BibTeX
@dataset{don_2026_melbourne_investment_portfolio,
  author    = {Don, Joey and Zhu, Yan and Jin, Steven},
  title     = {Melbourne Investment Property Portfolio (2020–2026)},
  year      = 2026,
  publisher = {Zenodo},
  version   = {1.0.0},
  doi       = {10.5281/zenodo.20095886},
  url       = {https://doi.org/10.5281/zenodo.20095886}
}
RIS
TY  - DATA
AU  - Don, Joey
AU  - Zhu, Yan
AU  - Jin, Steven
TI  - Melbourne Investment Property Portfolio (2020–2026)
PY  - 2026
DA  - 2026-05-09
PB  - Zenodo
DO  - 10.5281/zenodo.20095886
UR  - https://doi.org/10.5281/zenodo.20095886
ET  - 1.0.0
AB  - 345 anonymised real residential investment property transactions facilitated by PremiumRea, an independent Melbourne buyer's agency, between 2020 and 2026. Per-transaction granularity (not suburb-aggregated). Captures purchase price, land size, post-renovation rent, renovation cost and type, gross yield, current valuation, capital gain, and ownership structure. All client-identifying details have been removed.
ER  - 
GB/T 7714
Don J, Zhu Y, Jin S. Melbourne Investment Property Portfolio (2020–2026) (Version 1.0.0)[DS/OL]. Zenodo, 2026[2026-05-09]. https://doi.org/10.5281/zenodo.20095886. DOI:10.5281/zenodo.20095886.

Important Information

PremiumRea (trading as Optima Real Estate) provides licensed buyers agent services in Victoria, Australia. All case studies, price data, yields and growth figures shown on this site are historical, drawn from our transaction record, and are not forecasts or guarantees of future performance. Property investment carries risk. You should seek independent financial and legal advice before acting on any information shown here.

Read the full website disclaimer, terms of use, and privacy policy.

P
Premium REA

© 2026 PREMIUM REA PTY LTD. All rights reserved.

Want a new feature?

Tell us what to build next — get free Beta access.

Share an idea →