In this post, you will learn the fundamentals of working with timestamp and datetime in BigQuery.
I will go over the key differences between a timestamp and a datetime.
You will see how you can create different timestamps: from a text, from the current date, and even from a number.
I will also use timestamps as filters in the WHERE clause.
And finally, I’m going to show a template for month-to-date (MTD) and year-to-date (YTD) queries.
I include as much working queries as possible, so you can also run them on your own BigQuery console.
RELATED: Setting Up Your Own BigQuery Console
Try it! BigQuery is free for small projects, and you don’t even need to link a credit card to get started.
Timestamp vs Datetime in BigQuery – What’s the Difference?
This is an example of how TIMESTAMP
and DATETIME
are displayed in the BigQuery console.
Do you notice any difference?
Timestamps support time zones. What I mean by support – is that conversions are made accordingly before storing them in the database.
The data are all stored in UTC format. BigQuery takes note of the time zone that you specify and performs a conversion right before storing it.
The three dates above are all in the same date and time but use varying time zones. Notice how each date was changed when adjusted for the UTC time zone.
With timestamps, BigQuery handles the conversion under the hood. You don’t need to do +/- hours manually, or even consider daylight savings.
Timestamps are best used for working on data in different time zones.
Conversely, Datetime does not support time zones. This makes it convenient to use when you always follow the local time.
RELATED: BigQuery Date and Time Data Types
Timestamp vs Datetime – Which one should you use for your queries and reports?
It’s like saying to your colleague, “I’ll see you at 12 noon for lunch!”
Your colleague then responds, “Sure! What time zone?”
This never happens, right?
… or that time when the owner at a local restaurant asks you, “I need to know what’s the best time to open the restaurant. I don’t want to open too early or too late!”
You probably know that you need to find the best opening hours in the local time, and you don’t need to consider any time zone conversions for that matter.
In most cases, you should be using DATETIME
. You assume that all dates are in a single time zone.
This works extremely well for preparing reports such as financial statements, or sales reports.
On the other hand, if you are working on data across the globe, such as application logs, analytics, or event tracking, then you will need to work with a mix of TIMESTAMP
and DATETIME
.
As best practice, store data as a Timestamp or Integer (in Unix time) and analyze data as a Datetime.
I will cover Unix time in a separate section below.
What are the Supported Time Zones in BigQuery?
BigQuery can support the IANA time zone, or tzdata format.
This means that you can specify the time zone through a recognized name, such as America/Los_Angeles, or simply a UTC offset, such as -7:00.
Error: Invalid TIMESTAMP literal at […]
If your time zone does not follow any of the formats above, you will get an error similar to the one below:
Invalid TIMESTAMP literal at […]
I recommend using a named time zone, especially if you need to apply DST (Daylight Savings Time), such as America/Los_Angeles.
How to Write a Timestamp in BigQuery?
Specific Date, Time, and Time Zone
You can create a timestamp in BigQuery with any of the following syntaxes:
Timestamp Syntax with Parenthesis
TIMESTAMP('2020-07-01 10:00:00') TIMESTAMP('2020-07-01 10:00:00.123456', 'Asia/Manila') TIMESTAMP('2020-07-01 10:00:00', 'UTC+8:00') TIMESTAMP('2020-07-01', 'Asia/Manila') TIMESTAMP('2020-07-01 10:00:00', '+8') TIMESTAMP('2020-07-01 10:00:00', '+8:00')
Here, you can declare a timestamp using two parameters.
The first parameter is required, and this is the date and time you want to create. If the time component is missing, it will be set to 12 AM by default.
The second parameter is the time zone. It defaults to UTC, and you can provide a named time zone or an offset from UTC.
Timestamp Syntax without Parenthesis
TIMESTAMP '2020-07-01 10:00:00 Asia/Manila' TIMESTAMP '2020-07-01 10:00:00 UTC+8' TIMESTAMP "2020-07-01 10:00:00+8" TIMESTAMP "2020-07-01 10:00:00+8:30"
You can also declare a timestamp by typing in the keyword TIMESTAMP
, followed by a space, and then a single string. A single or double quote is accepted.
This is similar to the first syntax, except that the date, time, and time zone are all combined into a single text.
Tip: Always add a space before a named time zone. For a UTC offset, never add a space.
Why are All BigQuery Dates in UTC?
Here’s a query for different time zone formats that all work in BigQuery.
SELECT TIMESTAMP('2020-07-01 10:00:00') AS Timestamp_Example, UNION ALL SELECT TIMESTAMP('2020-07-01 10:00:00.123456', 'Asia/Manila') UNION ALL SELECT TIMESTAMP('2020-07-01 10:00:00', 'UTC+8:00') UNION ALL SELECT TIMESTAMP('2020-07-01', 'Asia/Manila') UNION ALL SELECT TIMESTAMP '2020-07-01 10:00:00 Asia/Manila' UNION ALL SELECT TIMESTAMP '2020-07-01 10:00:00 UTC+8' UNION ALL SELECT TIMESTAMP "2020-07-01 10:00:00+8" UNION ALL SELECT TIMESTAMP "2020-07-01 10:00:00+8:30" ;
When you run that, you should see this result in your BigQuery console. Feel free to try it out on your current time zone!
But wait a minute. Why are all the time zones in UTC? I’m sure I entered a different time zone.
That’s my initial reaction when I first did this in BigQuery.
Fear not! That’s not a mistake. It’s an expected outcome.
BigQuery is designed to read your timestamp in the correct time zone but adjusts it to UTC when displaying them.
Now, if you want to see how the time looks like in your local time, wrap the timestamp with a datetime function, like this:
DATETIME(original_timestamp, local_time_zone)
If I have a timestamp for July 1, 2020 at 10 AM Manila time (or UTC+8), this timestamp is equivalent to 2 AM UTC time on the same day.
As an example, by running this query,
SELECT Timestamp_in_UTC, DATETIME(Timestamp_in_UTC, 'Asia/Manila') AS Timestamp_in_Local FROM ( SELECT TIMESTAMP('2020-07-01 10:00:00', 'Asia/Manila') Timestamp_in_UTC );
I got this result:
Tip: You store the data in timestamp but report and extract data in datetime.
Common Dates: Yesterday, Today, Last Week
You can also create common dates in BigQuery.
As a data analyst, you will often use dates such as today, yesterday, last week, or the current month.
Getting Today (or the Current Date and Time) in BigQuery
To get the current date, we can use any of the following:
CURRENT_TIMESTAMP() CURRENT_TIMESTAMP CURRENT_DATETIME() CURRENT_DATETIME
Result: Current date and time. This uses your computer’s date and time settings (including time zone).
Using BigQuery TIMESTAMP_ADD, TIMESTAMP_SUB, DATETIME_ADD, and DATETIME_SUB
In BigQuery, you can get the date yesterday as a two-step process:
- Get the date today
- Subtract 1 day from it
- Alternatively, you can also “add -1 days” to it
To add and subtract from a timestamp in BigQuery, use the TIMESTAMP_ADD and TIMESTAMP_SUB functions.
TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 1 DAY) TIMESTAMP_ADD(CURRENT_TIMESTAMP, INTERVAL -1 DAY)
There’s also an equivalent function for datetime, the DATETIME_ADD and DATETIME_SUB functions.
DATETIME_SUB(CURRENT_DATETIME, INTERVAL 1 DAY) DATETIME_ADD(CURRENT_DATETIME, INTERVAL -1 DAY)
The INTERVAL 1 DAY
tells BigQuery how much to move the date. The two functions above will give the date yesterday. The time component is included in the result.
Aside from the DAY
, you can also use different values for the INTERVAL.
BigQuery INTERVAL Possible Values |
---|
DAY |
HOUR |
MINUTE |
SECOND |
MILLISECOND |
MICROSECOND |
WEEK (only available for DATETIME) |
QUARTER (only available for DATETIME) |
MONTH (only available for DATETIME) |
Timestamp from Epoch / Unix Time (TIMESTAMP_SECONDS)
I’m going to run a query to analyze a Google Analytics session.
I extracted the visitStartTime column from a sample GA Session in a BigQuery public dataset (bigquery-public-data
).
I also added LIMIT 10
to get exactly 10 samples from this table.
SELECT visitStartTime FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170731` LIMIT 10 ;
Here are the results:
I’m pretty sure those don’t look like dates…
However, if I see an INTEGER
column that is named like a date or time (e.g. visitStartTime), it is most likely a date in Epoch or Unix Time.
In data analysis, the Epoch or Unix time is used to represent the number of seconds that have passed since January 1, 1970, 00:00:00 UTC.
To convert the Unix time into a timestamp that humans can understand, use the TIMESTAMP_SECONDS function.
TIMESTAMP_SECONDS(timestamp_in_seconds)
If you try to use this in the Google Analytics data set earlier,
SELECT visitStartTime, TIMESTAMP_SECONDS(visitStartTime) AS visitStartTime_Timestamp FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170731` LIMIT 10 ;
These will be the results:
Converting Timestamp and Datetime in BigQuery
Timestamp to Seconds Using UNIX_SECONDS
As mentioned above, databases can store a timestamp as a number in Epoch or Unix time.
Doing the Reverse. If you want to store a timestamp that is represented by seconds in Unix time, you can use the BigQuery function UNIX_SECONDS
.
UNIX_SECONDS(original_timestamp)
This function can accept a timestamp, as well as a string that follows the correct timestamp format.
Consider this query:
SELECT UNIX_SECONDS('2020-07-01 00:00:00 UTC') ;
Result: 1593561600
You can refer to the table below for some examples of timestamps and their equivalent Unix time.
Timestamp (all in UTC) | Unix Time |
---|---|
January 1, 1970 00:00:00 | 0 |
January 1, 1970 00:00:01 | 1 |
January 1, 1970 00:01:00 | 60 (1 minute) |
January 1, 1971 00:00:00 | 31,536,000 (1 year) |
January 1, 2020 00:00:00 | 1,577,836,800 |
July 1, 2020 00:00:00 | 1,593,561,600 |
Timestamp to Datetime
I always advocate to use the datetime data type when preparing reports and queries.
But what if you have your dates stored in the database as a timestamp? How do you deal with that?
You can convert a timestamp into the local time using DATETIME
.
DATETIME(original_timestamp) DATETIME(original_timestamp, time_zone)
The greatest thing about this function is that you can specify the time zone!
If you want to track website visits on a daily basis, it’s critical to know which time zone you’re working on.
If a user visited on July 1, 2020 at 12 am UTC, you run a similar query like below.
SELECT DATETIME(original_timestamp) AS Timestamp_UTC, DATETIME(original_timestamp, 'America/Los_Angeles') AS Timestamp_LA, DATETIME(original_timestamp, 'Asia/Manila') AS Timestamp_Manila FROM ( SELECT TIMESTAMP('2020-07-01 00:00:00 UTC') AS original_timestamp );
Result:
By telling BigQuery that I are in a specific time zone, I get results that fit my local time.
Extracting Date, Month, and Year from Timestamp
Given a timestamp, you can extract the year, month, day of the month, or date only by using the following:
EXTRACT(DATE FROM original_timestamp AT TIME ZONE 'UTC') EXTRACT(YEAR FROM original_timestamp AT TIME ZONE 'UTC') EXTRACT(MONTH FROM original_timestamp AT TIME ZONE 'UTC') EXTRACT(DAY FROM original_timestamp AT TIME ZONE 'UTC')
The AT TIME ZONE 'UTC'
is optional and defaults to UTC.
I have the following sample results if I run an EXTRACT
on the timestamp July 1, 2020 00:00:00 UTC.
EXTRACT Date Part | Result |
---|---|
DATE | 2020-07-01 |
YEAR | 2020 |
MONTH | 7 |
DAY | 1 |
Finding the Difference Between Two Timestamps Using TIMESTAMP_DIFF
If you have two timestamps or dates, you can get the difference between them by using the TIMESTAMP_DIFF function.
TIMESTAMP_DIFF(timestamp_end, timestamp_start, date_part)
The later date comes first. You can use different values for the date part, such as HOUR, MINUTE, or DAY.
Tip: If you want to include fractional components, you can “downgrade” your date part.
For example, if you want to see the different between two timestamps in hours, and include the fractional number of hours, you can do this:
SELECT TIMESTAMP_DIFF( '2020-07-01 14:00:00 UTC', '2020-07-01 10:30:00 UTC', HOUR ) AS Whole_Difference, TIMESTAMP_DIFF( '2020-07-01 14:00:00 UTC', '2020-07-01 10:30:00 UTC', MINUTE ) / 60 Exact_Difference ;
What’s the difference between Whole_Difference
and Exact_Difference
?
You will see what I mean by doing a “downgrade” of the date part from HOUR
to MINUTE
. Afterwards, divide the result by 60
.
The two timestamps are exactly 3 hours and 30 minutes apart, and you can see the difference in results between the two expressions.
See? The original function will always drop the fractional part, and you can do this hack to include any fractional parts.
Removing the Date/Time Components Using TIMESTAMP_TRUNC
Sometimes, I need to group all timestamps under the same day, or same month.
As a data analyst, when you group timestamps in the same day, you can get statistics such as daily totals, or daily averages.
But before you can analyze the data, you need to remove the time component of each timestamp.
You can use the TIMESTAMP_TRUNC function to achieve this.
TIMESTAMP_TRUNC(original_timestamp, date_part) TIMESTAMP_TRUNC(original_timestamp, date_part, time_zone)
Take this as an example:
SELECT Original_Timestamp, TIMESTAMP_TRUNC(Original_Timestamp, DAY) Trunc_Timestamp FROM ( SELECT TIMESTAMP '2020-07-01 00:00:00' AS Original_Timestamp UNION ALL SELECT TIMESTAMP '2020-07-01 04:30:00' UNION ALL SELECT TIMESTAMP '2020-07-01 05:00:00' UNION ALL SELECT TIMESTAMP '2020-07-01 14:30:00' UNION ALL SELECT TIMESTAMP '2020-07-01 17:15:00' );
Result:
Using Timestamp in Filters
Can you use a timestamp as a filtering condition?
Definitely!
Just like numbers and characters, you can use timestamps, dates, and times as a filtering condition in BigQuery.
In my experience, there are two general cases when you need to filter on a timestamp.
One, when you want to limit your data between two timestamps.
Second, when you want to see data from a specified timestamp onwards (or the other way around).
I used a table named sample_table
. I defined a CTE (or the WITH clause), so you can also run these on your BigQuery console.
sample_table.Original_Timestamp |
---|
2020-06-30 00:00:00 UTC |
2020-07-01 04:30:00 UTC |
2020-07-01 05:00:00 UTC |
2020-07-01 14:30:00 UTC |
2020-08-01 17:15:00 UTC |
BigQuery WHERE Timestamp BETWEEN
I only want to see data from July 1, 2020, to July 7, 2020.
How can you do this in BigQuery?
You need to find a way to filter data between two dates.
Using the column Original_Timestamp
as a filter, I can do this in BigQuery:
WITH sample_table AS ( SELECT TIMESTAMP '2020-06-30 00:00:00' AS Original_Timestamp UNION ALL SELECT TIMESTAMP '2020-07-01 04:30:00' UNION ALL SELECT TIMESTAMP '2020-07-01 05:00:00' UNION ALL SELECT TIMESTAMP '2020-07-01 14:30:00' UNION ALL SELECT TIMESTAMP '2020-08-01 17:15:00' ) SELECT Original_Timestamp FROM sample_table WHERE DATETIME(Original_Timestamp, 'UTC') BETWEEN '2020-07-01 00:00:00' AND '2020-07-07 23:59:59' ;
To work with the local time (in this case it’s UTC), convert the timestamp into a DATETIME data type.
Result:
BigQuery Timestamp Greater Than
Sometimes, you want to see all the information from a specific date onwards.
In this case, use the Greater Than or Equal To (>=)
.
WITH sample_table AS ( SELECT TIMESTAMP '2020-06-30 00:00:00' AS Original_Timestamp UNION ALL SELECT TIMESTAMP '2020-07-01 04:30:00' UNION ALL SELECT TIMESTAMP '2020-07-01 05:00:00' UNION ALL SELECT TIMESTAMP '2020-07-01 14:30:00' UNION ALL SELECT TIMESTAMP '2020-08-01 17:15:00' ) SELECT Original_Timestamp FROM sample_table WHERE DATETIME(Original_Timestamp, 'UTC') >= '2020-07-01 00:00:00' ;
This will only include all dates from July 1, 2020, onwards.
Month-to-Date (MTD) and Year-To-Date (YTD)
As a data analyst, have you worked on a report that you prepare on a regular basis?
Social Media Reports per Week?
Monthly Sales Summary?
Annual Financial Statements?
These are recurring reports, and oftentimes each result is grouped per time frame. Each information can be summarized daily, weekly, monthly, or annually.
In this section, I will teach you how to define YTD and MTD reports.
If today is July 15, 2020, then –
- The MTD report should have data between July 1, 2020 to July 15, 2020.
- The YTD report should have data between January 1, 2020 to July 15, 2020.
For DATE
Columns
WHERE date_column BETWEEN DATE_TRUNC(CURRENT_DATE, MONTH) AND CURRENT_DATE
WHERE date_column BETWEEN DATE_TRUNC(CURRENT_DATE, YEAR) AND CURRENT_DATE
For DATETIME
Columns
WHERE datetime_column >= DATETIME_TRUNC(CURRENT_DATETIME, MONTH) AND datetime_column < DATETIME_TRUNC(DATETIME_ADD(CURRENT_DATETIME, INTERVAL 1 DAY), DAY)
WHERE datetime_column >= DATETIME_TRUNC(CURRENT_DATETIME, YEAR) AND datetime_column < DATETIME_TRUNC(DATETIME_ADD(CURRENT_DATETIME, INTERVAL 1 DAY), DAY)
For TIMESTAMP
and DATETIME
Columns
WHERE timestamp_column >= TIMESTAMP( DATETIME_TRUNC(CURRENT_DATETIME, MONTH), 'Asia/Manila') AND timestamp_column < TIMESTAMP( DATETIME_TRUNC(DATETIME_ADD(CURRENT_DATETIME, INTERVAL 1 DAY), DAY), 'Asia/Manila')
WHERE timestamp_column >= TIMESTAMP( DATETIME_TRUNC(CURRENT_DATETIME, YEAR), 'Asia/Manila') AND timestamp_column < TIMESTAMP( DATETIME_TRUNC(DATETIME_ADD(CURRENT_DATETIME, INTERVAL 1 DAY), DAY), 'Asia/Manila')
These queries are optimized for table partitions and clusters, if any.
Conclusion
Working with dates is such an important aspect in data analysis and SQL, but it’s also one of the trickiest type of data to work with.
Some patterns in data are time-dependent, and you are given several functions to work with date and time information.
Gone are the days when you need to add and subtract hours to adjust for time zones!
Have you been using any of these functions in your queries?
Do you have a favorite function? Leave a comment below!
Pingback: My Homepage
754399 609051Hello! I would wish to supply a large thumbs up for your outstanding info you could have here about this post. Ill be coming back to your blog website for further soon. 130941
Hi. Really useful info and comparison. Specially for those who are about to design a DWH structure.
Thanks
Thanks, John! Glad you found it useful.
Comments are closed.