I'm writing an SQL query for Apache Druid and I would like to group results by date. I'm used to DB2 and I would typically do something like:
SELECT DATE(TIMESTAMP), COUNT(*) FROM my_table GROUP BY (DATE(TIMESTAMP))
I'm using the /druid/v2/sql API endpoint and passing in the query in a POST. I get an SQL parsing error when I try this. I know that I can group by day with something like
SELECT EXTRACT(day FROM timestamp), COUNT(*) FROM my_table GROUP BY 1
but I would like the full date if possible.
Thanks in advance.
Does this work?
SELECT DATE_TRUNC('day', TIMESTAMP), COUNT(*)
FROM my_table
GROUP BY DATE_TRUNC('day', TIMESTAMP);
Druid houses timeseries data having __time as PK.
Following SQL script in Druid might work grouping data Year, Month, Day wise.
SELECT TIME_EXTRACT(__time, 'DAY') AS dt, TIME_EXTRACT(__time, 'MONTH') AS mn, TIME_EXTRACT(__time, 'YEAR') AS yr, COUNT(1) AS cnt
FROM <datasource_name/table_name>
WHERE __time BETWEEN '<start_datetime>' and '<end_datetime>'
GROUP BY TIME_EXTRACT(__time, 'DAY'), TIME_EXTRACT(__time, 'MONTH'), TIME_EXTRACT(__time, 'YEAR')
HAVING cnt >= <Threshold_Value>
ORDER BY yr, mn, dt, cnt DESC
Related
In CockroachDB, I want to have such this query on a specific month for its every day:
select count(*), sum(amount)
from request
where code = 'code_string'
and created_at >= '2022-07-31T20:30:00Z' and created_at < '2022-08-31T20:30:00Z'
the problem is that I want it on my local date. What should I do?
My goal is:
"month, day, count, sum" as result columns for a month.
UPDATE:
I have found a suitable query for this purpose:
select count(amount), sum(amount), extract(month from created_at) as monthTime, extract(day from created_at) as dayTime
from request
where code = 'code_string' and created_at >= '2022-07-31T20:30:00Z' and created_at < '2022-08-31T20:30:00Z'
group by dayTime, monthTime
Thanks to #histocrat for easier answer :) by replacing
extract(month from created_at) as monthTime, extract(day from created_at) as dayTime
by this:
date_part('month', created_at) as monthTime, date_part('day', created_at) as dayTime
To group results by both month and day, you can use the date_part function.
select month, day, count(*), sum(things)
from request
where code = 'code_string'
group by date_part('month', created_at) as month, date_part('day', created_at) as day;
Depending on what type created_at is, you may need to cast or convert it first (for example, group by date_part('month', created_at::timestamptz)).
Need last four months data:
select count(distinct session_id)
from master_gui partition for (to_date('11-25-2020','MM-DD-YYYY'))
where session_id in (select distinct session_id
from reporting_data partition for (to_date('11-25-2020','MM-DD-YYYY'))
where flow_name in ('BEGIN_STATUS'));
any suggestion in above query how to include dates for last 4 months.
CHECKED FROM BELOW partition key value:
SELECT OWNER, NAME, OBJECT_TYPE, COLUMN_NAME, COLUMN_POSITION FROM ALL_PART_KEY_COLUMNS
REPORTING_USER REPORTING_DATA TABLE CREATE_TIME 1
REPORTING_USER MASTER_GUI TABLE SESSION_START_TIME 1
using below query to get last 4 months records(Aug, Spet, Oct and Nov month)
select count(distinct session_id)
from master_gui where SESSION_START_TIME >= add_months(trunc(sysdate), -4)
and session_id in (select distinct session_id from reporting_data where CREATE_TIME>= add_months(trunc(sysdate), -4)
and flow_name in ('BEGIN_STATUS'));
Thanks Experts,
Used below query after changes, is it correct:
As we have to get count from master_gui table so used it and parent key value SESSION_START_TIME also reporting_data tab;e parent key value CREATE_TIME.
select count(distinct session_id)
from master_gui where SESSION_START_TIME < trunc(sysdate,'mm')
and SESSION_START_TIME >= add_months( trunc(sysdate, 'mm'),-4)
and session_id in (select distinct session_id from REPORTING_DATA where create_time < trunc(sysdate,'mm')
and create_time >= add_months( trunc(sysdate, 'mm'),-4)
and flow_name in ('BEGIN_STATUS'));
Thanks experts,
is below is correct will get some performance better by using below query, removed distinct clause from subquery inside.
select count(distinct session_id)
from master_gui where SESSION_START_TIME < trunc(sysdate,'mm')
and SESSION_START_TIME >= add_months( trunc(sysdate, 'mm'),-4)
and session_id in (select session_id from REPORTING_DATA where create_time < trunc(sysdate,'mm')
and create_time >= add_months( trunc(sysdate, 'mm'),-4)
and flow_name in ('BEGIN_STATUS'));
Thanks Experts,
I need to use in partition only to get faster perofmance:
select count(distinct session_id)
from master_gui partition for (to_date('11-01-2020','MM-DD-YYYY'))
where session_id in (select distinct session_id from reporting_data partition for (to_date('11-30-2020','MM-DD-YYYY'))
where flow_name in ('BEGIN_STATUS'));
Is above query is correct for 1st Nov 2020 to 30th Nov 2020.
This part of your query means you are selecting records only from the partition which holds values for 25-NOV-2020.
from reporting_data partition for (to_date('11-25-2020','MM-DD-YYYY'))
Therefore if your table is partitioned by daily intervals you will get records only for the 25th. If the partition key is monthly you will get records only for November. Using this syntax you could only get records for the last four months if the partition key is (say) annual.
The solution is simply to omit the partition clause and use a WHERE clause instead.
select count(distinct session_id)
from master_gui
where session_id in (select distinct session_id
from reporting_data partition
where <<partition_key_column>> >= sysdate - interval '4' month)
where flow_name in ('BEGIN_STATUS')
and <<partition_key_column>> >= sysdate - interval '4' month;
This query will still use partition pruning.
is it correct?
Looks like what I suggested. However, you have refined "last four months" to mean the last four complete months i.e. excluding the current month. My search criteria includes the current month. So maybe what you actually need is something like
select session_id
from reporting_data
where create_time < trunc(sysdate,'mm')
and create_time >= add_months( trunc(sysdate, 'mm'),-4)
This will provide a span from 01-AUG-2020 to 30-NOV-2020.
Incidentally, you don't need the DISTINCT in the subquery. The IN clause will handle duplicates so DISTINCT just adds unnecessary work, which could matter if you're dealing with large amounts of data.
There's a DATE datatype column, I presume. If so, include it into the where clause, e.g.
... and date_column >= add_months(trunc(sysdate, 'mm'), -4)
How to get the max logins per hour from am_session table which has columns : userid, create_time (datatype:timestamp(6)).
Attached the sample data.
Thanks in advance
I think you can just aggregate:
select to_char(create_tme, 'YYYY-MM-DD HH24') as yyyymmddhh, COUNT(*)
from am_session
group by to_char(create_tme, 'YYYY-MM-DD HH24')
order by count(*) desc
fetch first 1 row only;
If you want the hour independently of the date, then:
select to_char(create_tme, 'HH24') as yyyymmddhh, COUNT(*)
from am_session
group by to_char(create_tme, 'HH24')
order by count(*) desc
fetch first 1 row only;
You can do as below -
select
to_char(create_time, 'YYYY-MM-DD') as create_date,
extract(hour from create_time) as Hour,
Count(*)
from am_session
group by
to_char(create_time, 'YYYY-MM-DD'),
extract(hour from create_time)
order by 1, 2 desc;
Note : This will not provide you count for hours where there was no login.
It's been a while since I've touched SQL.
I'm working on a pretty large database.
In a certain table which has some 30 million rows I'm trying to figure out when the highest number of entries was made for a certain period e.g. a year, down to the detail-level of one hour.
What I do now is something like this:
For the year 2018:
Find month with highest entry number for 2018 (i.e. 12 queries):
select count(*) from sing
where to_char(create_time, 'YYYY-MM-DD') like '2018-01-%'
select count(*) from sing
where to_char(create_time, 'YYYY-MM-DD') like '2018-02-%'
After I find the month with the highest number I must find the day (i.e. up to 31 queries) :
select count(*) from sing
where to_char(create_time, 'YYYY-MM-DD') = '2018-01-01'
select count(*) from sing
where to_char(create_time, 'YYYY-MM-DD') = '2018-01-02'
After I find the day with the highest number I must find the hour (i.e. 24 queries):
select count(*) from sing
where to_char(create_time, 'YYYY-MM-DD HH24:MI:SS') >= '2018-01-02 08:00:00'
and to_char(create_time, 'YYYY-MM-DD HH24:MI:SS') <= '2018-01-02 08:59:59'
As you can see this is a tedious task. So my question is, if and how I can optimize this process?
The database is a PostgreSQL, and I'm using the pgadmin.
Thanks in advance.
Youy can use GROUP BY and the date_part function to simplify things
SELECT date_part('month', create_time), count(*)
FROM sing
WHERE date_part('year', create_time) = 2018
GROUP BY date_part('month', create_time)
and then for the day
SELECT date_part('day', create_time), count(*)
FROM sing
WHERE date_part('year', create_time) = 2018
AND date_part('month', create_time) = <month from previous query>
GROUP BY date_part('day', create_time)
and so on
For the year 2018 would be 1 query:
select count(*) from sing where date_part('year', create_time) = '2018'
So you can use better date_part then to_char I think
https://www.w3resource.com/PostgreSQL/date_part-function.php
I am trying to pull visits per isoweek from big query.
however I am failing with the date transformation.
Could you support?
StandardSQL
SELECT count (visitid) as Sessions, date,
EXTRACT (ISOYEAR FROM date) AS isoyear
FROM `xxx_*`
WHERE _TABLE_SUFFIX BETWEEN '201806020' AND '20180630'
GROUP BY date
order by date DESC
Have you tried a query like this?
SELECT EXTRACT(ISOYEAR FROM date) as yyyy,
EXTRACT(ISOWEEK FROM DATE) as ww,
COUNT(*) as Sessions
FROM `xxx_*`
WHERE _TABLE_SUFFIX BETWEEN '201806020' AND '20180630'
GROUP BY yyyy, ww
ORDER BY MIN(date) DESC;