How to compute total dataset size in BigQuery
Posted by David Chandler on October 4, 2023
BigQuery’s INFORMATION_SCHEMA views provide a ton of useful information about BQ usage. A simple query will let you see the total size of each table in a dataset and you can use the ROLLUP() operator to compute a grand total for the whole dataset:
SELECT
table_id,
-- Convert bytes to GB.
sum(ROUND(size_bytes/pow(10,12),2)) as size_tb,
sum(ROUND(row_count/pow(10,9),2)) as billion_rows
FROM
-- Replace bigquery-public-data.wikipedia with your own dataset:
`bigquery-public-data.wikipedia.__TABLES__`
WHERE table_id LIKE 'pageviews%'
GROUP BY ROLLUP(table_id)
ORDER BY table_id;
The query above shows the total size of all the pageviews tables in the wikipedia public dataset. At the time of this writing, these are the results. The first row is the grand total computed by the ROLLUP() function.
Querying all the pageviews tables with a wildcard operator would ingest 20TB and almost half a trillion rows! Be careful where you do that as it might cost you ~$100, but running the INFORMATION_SCHEMA query above is free as it only queries metadata.
Leave a comment