Estimating BigQuery slot usage for a query
Posted by David Chandler on October 3, 2023
One of the most common questions I’m asked when teaching Data Engineering on Google Cloud Platform is how to find out how many slots BigQuery used to run a query. While there is no direct measurement available on a per query basis, the INFORMATION_SCHEMA table documentation provides a way to estimate the number of slots used. You simply divide the total slot time, which is the time the query ran across all workers, by the elapsed time of the query. The query below will show the estimated number of slots for the 10 most recent queries. Alternatively, you can put a BigQuery job ID in the WHERE clause to see the data for a specific query. Obtain the job ID from the Job Information tab in the results or from a log entry in the Cloud Logs Explorer.
# Avg slots per query
# See https://cloud.google.com/bigquery/docs/information-schema-jobs#estimate_slot_usage_and_cost_for_queries
SELECT query, job_id, start_time,
-- Average slot utilization per job is calculated by dividing total_slot_ms by the millisecond duration of the job
SAFE_DIVIDE(job.total_slot_ms,(TIMESTAMP_DIFF(job.end_time, job.start_time, MILLISECOND))) AS job_avg_slots,
user_email,
ROUND(total_bytes_processed / POW(10,9), 2) AS gb_processed,
total_slot_ms, reservation_id
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT job
-- WHERE job_id = "your_job_id"
ORDER BY gb_processed DESC
LIMIT 10
Here are some sample results:
Note that the BigQuery Admin console as well as the Cloud Monitoring section of the cloud console will show the number of queries in flight and slot utilization across your entire organization. However, when you’re just dying to know how many slots an individual query probably used, the above query will do the trick.
Leave a comment