How to Group by hour in BigQuery - sql

hi i'm trying to group the data by hour in big
so how can I group the data by the hour to count the orders per hour?

Use below
select date,
extract(hour from parse_time('%I:%M:%S %p', time)) as hour,
count(distinct order_id) as orders
from your_table
group by date, hour
if applied to sample data in your question - output is

Related

SQL list number of seen occurrences per month higher than 15

I'm not an SQL expert, so I'm requesting your help to list the MACs that apear more than 15 days in a month.
I made the following query, but is very complex and most probably not efficient. Any suggestions on how to make it simpler and efficient?
I'm using Google BigQuery, if that helps.
SELECT
macDays.macAddress AS macAddress,
macDays.days AS days
FROM (
SELECT
list_mac.macAddress AS macAddress,
COUNT( list_mac.macAddress) AS days
FROM (
SELECT
macAddress,
TIMESTAMP_TRUNC(time, DAY) date,
FROM
`my_table`
WHERE
time BETWEEN '2021-06-01 00:00:00'
AND '2021-06-30 23:59:00.000059'
GROUP BY
macAddress,
date
ORDER BY
macAddress) AS list_mac
GROUP BY
macAddress ) AS macDays
WHERE
macDays.days > 15
GROUP BY
macAddress,
days
The problem is that you are stripping off the time component from the date in your SELECT but grouping with the time portion left in, so you will get one row for every appearance rather than one for every day.
You can probably get rid of the inner subquery by using COUNT(DISTINCT field).
Try something like:
SELECT
macAddress AS macAddress,
COUNT(DISTINCT TIMESTAMP_TRUNC(time, DAY)) AS days
FROM
`my_table`
WHERE
time BETWEEN '2021-06-01 00:00:00'
AND '2021-06-30 23:59:00.000059'
GROUP BY
macAddress
HAVING
COUNT(DISTINCT TIMESTAMP_TRUNC(time, DAY)) > 15
ORDER BY
macAddress
You can do this by using a subquery. It will calculate how many times a MAC exist in a day. Then it will pick only those appeared more than 15times in a month.
I have not used any filter so you can add filter as and when needed. If you need how many times MACs appear in database in a single day, you can use dt as group by. And if you want how many total MACs exists in whole month, just remove distinct.
SELECT COUNT(*) cnt,
MAC,
mnth
FROM
(SELECT DISTINCT -- This will select only unique MACs on a day
macAddress,
TIMESTAMP_TRUNC(TIME, DAY) dt,
TIMESTAMP_TRUNC(TIME, MONTH) mnth,
FROM `my_table`) q
GROUP BY Mnth
HAVING COUNT(*)>15
would go with HAVING, which performs filtering to columns aggregated via group by:
select substr(time, 0, 8) yrmonth, macAddress, count(*) macDays
from my_table
where yrmonth = '2021-06'
group by macAddress, substr(time, 0, 8)
having count(*) >= 15
order by yrmonth desc
have not tried for GoogleBigQuery, here is the example on SQLite: SQL Fiddler

Find the average lowest item in a collection grouped by date in SQL

My SQL isn't the best - I can get this working in C# but it seems more efficient to get it in my data layer - I've got a table Prices:
ID
Price
DateTime
Each row is exactly 1 hour from the next, so I have a snapshot of a price every hour.
I'm trying to work out which hour in a day over the entire dataset has the lowest price (on average).
So ideally I'm after a list of each hour in the day ranked by how cheap on average that hour is over the entire dataset - so a maximum of 24 rows (one for each hour in the day).
Any help would be greatly appreciated!
Thanks :D
Which database are you on?
Different DBs have different ways to extract date from a timestamp column.
Postgres has date(timestamp), In Oracle, you can use trunc(timestamp). Or most DBs have to_char/to_date. So you can try that.
Once you have extracted the date, you can try something like this -
select ID,
Price,
DateTime,
trunc(DateTime) as day,
rank() over (partition by trunc(DateTime) order by Price asc) as least_for_day
from Prices
Now you can use the "least_for_day" ranked column and select by day.
Again, depending on the DB, you can either directly qualify on the ranked column in the same SQL or use the above as a sub-query and filter for the rank.
You can use a query like below
select
hour,
avg(daily_rank) avg_rank
from
(
select *, hour= format((datetime as datetime),'HH'), daily_rank= dense_rank() over (partition by cast(datetime as date) order by price asc)
) t
group by hour
Thank you very much to #Many Manjunath and #DhruvJoshi. Final solution below;
WITH prices AS
(
SELECT
[Price],
[DateTime],
CAST([DateTime] AS TIME) 'Time',
CAST([DateTime] as date) 'Date',
rank() over (partition by cast([DateTime] as date) order by [Price] asc) as least_for_day
FROM [dbo].[Prices]
)
SELECT [Time], count(*) 'Qty Cheapest' FROM prices
WHERE least_for_day = 1
GROUP BY [Time]
ORDER BY 2 DESC
That returns 24 rows:

adding all columns from mutiple tables

I have a simple question.
I need to count all records from multiple tables with day and hour and add all of them together in a single final table.
So the query for each tab is something like this
select timestamp_trunc(timestamp,day) date, timestamp_trunc(timestamp,hour) hour, count(*) from table_1
select timestamp_trunc(timestamp,day) date, timestamp_trunc(timestamp,hour) hour, count(*) from table_2
select timestamp_trunc(timestamp,day) date, timestamp_trunc(timestamp,hour) hour, count(*) from table_3
and so on so forth
I would like to combine all the results showing number of total records for each day and hour from these tables.
Expected results will be like this
date, hour, number of records of table 1, number of records of table 2, number of records of table 3 ........
What would the most optimum SQL query for this?
Probably the simplest way is to union them together and aggregation:
select timestamp_trunc(timestamp, hour) as hh,
countif(which = 1) as num_1,
countif(which = 2) as num_2
from ((select timestamp, 1 as which
from table_1
) union all
(select timestamp, 2 as which
from table_2
) union all
. . .
) t
group hh
order by hh;
You are using timestamp_trunc(). It returns a timestamp truncated to the hour -- there is no need to also include the date.
Below is for BigQuery Standard SQL
#standardSQL
SELECT
TIMESTAMP_TRUNC(TIMESTAMP, DAY) day,
EXTRACT(HOUR FROM TIMESTAMP) hour,
COUNT(*) cnt,
_TABLE_SUFFIX AS table
FROM `project.dataset.table_*`
GROUP BY day, hour, table

Calculate the sum of a column on weekly basis in hive

I have a table say testTable in Hive(with data for 3 years) with the following columns:
retailers, order_total, order_total_qty, order_date
I have to create a new table with these columns:
'source_name' as source, sum(retailers), sum(order_total), sum(order_total_qty)
for each week from the starting order_date.
I am stuck with this. How can I group following data in the way that it will sum up on weekly basis.
Use WEEKOFYEAR() function to calculate aggregation on weekly basis.
select
'source_name' source,
sum(retailers) sum_retailers,
sum(order_total) sum_order_total,
sum(order_total_qty) sum_order_total_qty,
WEEKOFYEAR(order_date) week,
year(order_date) year
from testTable
where order_date >= '2015-01-01' --start_date
group by WEEKOFYEAR(order_date), year(order_date)
order by year, week; --order if necessary

Group by month in SQLite

I have an SQLite database which contains transactions, each of them having a price and a transDate.
I want to retrieve the sum of the transactions grouped by month. The retrieved records should be like the following:
Price month
230 2
500 3
400 4
it is always good while you group by MONTH it should check YEAR also
select SUM(transaction) as Price,
DATE_FORMAT(transDate, "%m-%Y") as 'month-year'
from transaction group by DATE_FORMAT(transDate, "%m-%Y");
FOR SQLITE
select SUM(transaction) as Price,
strftime("%m-%Y", transDate) as 'month-year'
from transaction group by strftime("%m-%Y", transDate);
You can group on the start of the month:
select date(DateColumn, 'start of month')
, sum(TransactionValueColumn)
from YourTable
group by
date(DateColumn, 'start of month')
Try the following:
SELECT SUM(price), strftime('%m', transDate) as month
FROM your_table
GROUP BY strftime('%m', transDate);
Use the corresponding page in SQLite documentation for future references.
SELECT
SUM(Price) as Price, strftime('%m', myDateCol) as Month
FROM
myTable
GROUP BY
strftime('%m', myDateCol)
This another form:
SELECT SUM(price) AS price,
STRFTIME('%Y-%m-01', created_at) as created_at
FROM records
GROUP BY STRFTIME('%Y-%m-01', created_at);
Another way is to substring the year and the month from the column and group by them. Assuming the format it's like 2021-05-27 12:58:00 you can substract the first 7 digits:
SELECT
substr(transDate, 1, 7) as YearMonth
SUM(price) AS price
FROM
records
GROUP BY
substr(transDate, 1, 7);
In Sqlite, if you are storing your date in unixepoch format, in seconds:
select count(myDate) as totalCount,
strftime('%Y-%m', myDate, 'unixepoch', 'localtime') as yearMonth
from myTable group by strftime('%Y-%m', myDate, 'unixepoch', 'localtime');
If you are storing the date in unixepoch format, in milliseconds, divide by 1000:
select count(myDate/1000) as totalCount,
strftime('%Y-%m, myDate/1000, 'unixepoch', 'localtime') as yearMonth
from myTable group by strftime('%Y-%m, myDate/1000, 'unixepoch', 'localtime');