It's been a while since I've touched SQL.
I'm working on a pretty large database.
In a certain table which has some 30 million rows I'm trying to figure out when the highest number of entries was made for a certain period e.g. a year, down to the detail-level of one hour.
What I do now is something like this:
For the year 2018:
Find month with highest entry number for 2018 (i.e. 12 queries):
select count(*) from sing
where to_char(create_time, 'YYYY-MM-DD') like '2018-01-%'
select count(*) from sing
where to_char(create_time, 'YYYY-MM-DD') like '2018-02-%'
After I find the month with the highest number I must find the day (i.e. up to 31 queries) :
select count(*) from sing
where to_char(create_time, 'YYYY-MM-DD') = '2018-01-01'
select count(*) from sing
where to_char(create_time, 'YYYY-MM-DD') = '2018-01-02'
After I find the day with the highest number I must find the hour (i.e. 24 queries):
select count(*) from sing
where to_char(create_time, 'YYYY-MM-DD HH24:MI:SS') >= '2018-01-02 08:00:00'
and to_char(create_time, 'YYYY-MM-DD HH24:MI:SS') <= '2018-01-02 08:59:59'
As you can see this is a tedious task. So my question is, if and how I can optimize this process?
The database is a PostgreSQL, and I'm using the pgadmin.
Thanks in advance.
Youy can use GROUP BY and the date_part function to simplify things
SELECT date_part('month', create_time), count(*)
FROM sing
WHERE date_part('year', create_time) = 2018
GROUP BY date_part('month', create_time)
and then for the day
SELECT date_part('day', create_time), count(*)
FROM sing
WHERE date_part('year', create_time) = 2018
AND date_part('month', create_time) = <month from previous query>
GROUP BY date_part('day', create_time)
and so on
For the year 2018 would be 1 query:
select count(*) from sing where date_part('year', create_time) = '2018'
So you can use better date_part then to_char I think
https://www.w3resource.com/PostgreSQL/date_part-function.php
Related
Can anyone describe how can I suppose to retrieve data using filter conditions such as both where and group by clauses of different fields through SQL ?
For instance ,
Require to take out the No of days in a month does the temperature exceeding 35 degrees celsius ?
SELECT temp, count(*)
FROM weather_data
WHERE day between '01-jun-2022' to '30-jun-2022'
GROUP BY temp > '35';
My requirement is to find out the aggregate details like total count
So I tried using group by clause , Inaddition to that , I must use few conditions to filter further ,
Hence I used conditions in where clause before group by clause
it's correct query :
SELECT temp, count(*) FROM weather_data
WHERE temp > '35' AND day between '01-jun-2022' and '30-jun-2022' GROUP BY temp
You want to aggregate your data, so as to get one result row per month. In SQL this is GROUP BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day). Your DBMS may have additional functions to extract a month (year + month to be precise) from a date, such as TO_CHAR(day, 'YYYY-MM'), but this is vendor specific.
Now you only want to count days with a temperature obove 35 degrees. The first idea to solve this, is a WHERE clause that limits the rows you aggregate to the ones in question:
SELECT
EXTRACT(YEAR FROM day) AS year,
EXTRACT(MONTH FROM day) AS month,
COUNT(*)
FROM mytable
WHERE temp > 35
GROUP BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day)
ORDER BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day);
The problem with this: If a month has no day above that temperature, you won't select that month, because your WHERE clause removed those rows. That may be okay with you, but if you want to show the months with a zero count, then move the condition into the aggregation function. Thus you select all months but only count days with high temperatures:
SELECT
EXTRACT(YEAR FROM day) AS year,
EXTRACT(MONTH FROM day) AS month,
COUNT(CASE WHEN temp > 35 THEN 1 END)
FROM mytable
GROUP BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day)
ORDER BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day);
How does this work? COUNT <expression> ) counts non-null occurrences. CASE WHEN temp > 35 THEN 1 END is short for CASE WHEN temp > 35 THEN 1 ELSE NULL END. And instead of 1 you could use any value that is not null, e.g. 'count me'. Or you could use SUM instead, if you like that better: SUM(CASE WHEN temp > 35 THEN 1 ELSE 0 END).
At last you want to limit the date range. Date literals in SQL look like this: DATE 'YYYY-MM-DD'. And as we sometimes deal with dates and other times with datetimes or timestamps, it has become common, not to use BETWEEN, but >= and <, so as to have the range work for all those data types:
SELECT
EXTRACT(YEAR FROM day) AS year,
EXTRACT(MONTH FROM day) AS month,
COUNT(CASE WHEN temp > 35 THEN 1 END)
FROM mytable
WHERE day >= DATE '2022-06-01'
AND day < DATE '2022-07-01'
GROUP BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day)
ORDER BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day);
Try this:
SELECT temp, count(*)
FROM weather_data
WHERE date >= '01-jun-2022' AND date<='30-jun-2022' AND temp > '35'
GROUP BY temp;
In CockroachDB, I want to have such this query on a specific month for its every day:
select count(*), sum(amount)
from request
where code = 'code_string'
and created_at >= '2022-07-31T20:30:00Z' and created_at < '2022-08-31T20:30:00Z'
the problem is that I want it on my local date. What should I do?
My goal is:
"month, day, count, sum" as result columns for a month.
UPDATE:
I have found a suitable query for this purpose:
select count(amount), sum(amount), extract(month from created_at) as monthTime, extract(day from created_at) as dayTime
from request
where code = 'code_string' and created_at >= '2022-07-31T20:30:00Z' and created_at < '2022-08-31T20:30:00Z'
group by dayTime, monthTime
Thanks to #histocrat for easier answer :) by replacing
extract(month from created_at) as monthTime, extract(day from created_at) as dayTime
by this:
date_part('month', created_at) as monthTime, date_part('day', created_at) as dayTime
To group results by both month and day, you can use the date_part function.
select month, day, count(*), sum(things)
from request
where code = 'code_string'
group by date_part('month', created_at) as month, date_part('day', created_at) as day;
Depending on what type created_at is, you may need to cast or convert it first (for example, group by date_part('month', created_at::timestamptz)).
I'm writing an SQL query for Apache Druid and I would like to group results by date. I'm used to DB2 and I would typically do something like:
SELECT DATE(TIMESTAMP), COUNT(*) FROM my_table GROUP BY (DATE(TIMESTAMP))
I'm using the /druid/v2/sql API endpoint and passing in the query in a POST. I get an SQL parsing error when I try this. I know that I can group by day with something like
SELECT EXTRACT(day FROM timestamp), COUNT(*) FROM my_table GROUP BY 1
but I would like the full date if possible.
Thanks in advance.
Does this work?
SELECT DATE_TRUNC('day', TIMESTAMP), COUNT(*)
FROM my_table
GROUP BY DATE_TRUNC('day', TIMESTAMP);
Druid houses timeseries data having __time as PK.
Following SQL script in Druid might work grouping data Year, Month, Day wise.
SELECT TIME_EXTRACT(__time, 'DAY') AS dt, TIME_EXTRACT(__time, 'MONTH') AS mn, TIME_EXTRACT(__time, 'YEAR') AS yr, COUNT(1) AS cnt
FROM <datasource_name/table_name>
WHERE __time BETWEEN '<start_datetime>' and '<end_datetime>'
GROUP BY TIME_EXTRACT(__time, 'DAY'), TIME_EXTRACT(__time, 'MONTH'), TIME_EXTRACT(__time, 'YEAR')
HAVING cnt >= <Threshold_Value>
ORDER BY yr, mn, dt, cnt DESC
I have a table with the following columns:
NETID is a unique identifier for the user, OCCURRENCES is the number of times they've logged in in a month, and EARLIEST_MONTHLY_DATE is the earliest time they logged in for a month. I've currently using:
SELECT
to_char(earliest_monthly_date, 'yyyy-mm-dd') MARCH,
COUNT(NETID) unique_login_count
FROM
REPORT_SERVICE_USAGE
WHERE
earliest_monthly_date >= to_date('2014-03-01', 'yyyy-mm-dd')
AND earliest_monthly_date <= to_date('2014-03-31', 'yyyy-mm-dd')
GROUP BY
to_char(earliest_monthly_date, 'yyyy-mm-dd')
ORDER BY
to_char(earliest_monthly_date, 'yyyy-mm-dd') ASC
which gives me the total number of logins per day in a given month. It returns something like this:
Now, I want to set up my query so that it groups the login count by month instead of by day of a given month. I'm not sure how to do this (or if it can be done), as I'm not very familiar with SQL, but any help would be greatly appreciated. If you need any more info, feel free to ask. By the way, it's an Oracle database.
SELECT
to_char(earliest_monthly_date, 'yyyy-mm') MARCH,
COUNT(NETID) unique_login_count
FROM
REPORT_SERVICE_USAGE
WHERE
earliest_monthly_date >= to_date('2014-03-01', 'yyyy-mm-dd')
AND earliest_monthly_date <= to_date('2014-03-31', 'yyyy-mm-dd')
GROUP BY
to_char(earliest_monthly_date, 'yyyy-mm')
ORDER BY
to_char(earliest_monthly_date, 'yyyy-mm') ASC
Just change the select and group by 'to_char' to extract
SELECT
extract(MONTH from earliest_monthly_date) MARCH,
COUNT(NETID) unique_login_count
FROM
REPORT_SERVICE_USAGE
WHERE
earliest_monthly_date >= to_date('2014-03-01', 'yyyy-mm-dd')
AND earliest_monthly_date <= to_date('2014-03-31', 'yyyy-mm-dd')
GROUP BY
extract(MONTH from earliest_monthly_date)
ORDER BY
extract(MONTH from earliest_monthly_date) ASC
Edit
If you like the groups to by by month and year the answer from #Lamak is a better option.
I have the following database table on a Postgres server:
id date Product Sales
1245 01/04/2013 Toys 1000
1245 01/04/2013 Toys 2000
1231 01/02/2013 Bicycle 50000
456461 01/01/2014 Bananas 4546
I would like to create a query that gives the SUM of the Sales column and groups the results by month and year as follows:
Apr 2013 3000 Toys
Feb 2013 50000 Bicycle
Jan 2014 4546 Bananas
Is there a simple way to do that?
I can't believe the accepted answer has so many upvotes -- it's a horrible method.
Here's the correct way to do it, with date_trunc:
SELECT date_trunc('month', txn_date) AS txn_month, sum(amount) as monthly_sum
FROM yourtable
GROUP BY txn_month
It's bad practice but you might be forgiven if you use
GROUP BY 1
in a very simple query.
You can also use
GROUP BY date_trunc('month', txn_date)
if you don't want to select the date.
select to_char(date,'Mon') as mon,
extract(year from date) as yyyy,
sum("Sales") as "Sales"
from yourtable
group by 1,2
At the request of Radu, I will explain that query:
to_char(date,'Mon') as mon, : converts the "date" attribute into the defined format of the short form of month.
extract(year from date) as yyyy : Postgresql's "extract" function is used to extract the YYYY year from the "date" attribute.
sum("Sales") as "Sales" : The SUM() function adds up all the "Sales" values, and supplies a case-sensitive alias, with the case sensitivity maintained by using double-quotes.
group by 1,2 : The GROUP BY function must contain all columns from the SELECT list that are not part of the aggregate (aka, all columns not inside SUM/AVG/MIN/MAX etc functions). This tells the query that the SUM() should be applied for each unique combination of columns, which in this case are the month and year columns. The "1,2" part is a shorthand instead of using the column aliases, though it is probably best to use the full "to_char(...)" and "extract(...)" expressions for readability.
to_char actually lets you pull out the Year and month in one fell swoop!
select to_char(date('2014-05-10'),'Mon-YY') as year_month; --'May-14'
select to_char(date('2014-05-10'),'YYYY-MM') as year_month; --'2014-05'
or in the case of the user's example above:
select to_char(date,'YY-Mon') as year_month
sum("Sales") as "Sales"
from some_table
group by 1;
There is another way to achieve the result using the date_part() function in postgres.
SELECT date_part('month', txn_date) AS txn_month, date_part('year', txn_date) AS txn_year, sum(amount) as monthly_sum
FROM yourtable
GROUP BY date_part('month', txn_date)
Thanks
Why not just use date_part function. https://www.postgresql.org/docs/8.0/functions-datetime.html
SELECT date_part('year', txn_date) AS txn_year,
date_part('month', txn_date) AS txn_month,
sum(amount) as monthly_sum
FROM payment
GROUP BY txn_year, txn_month
order by txn_year;
Take a look at example 6) of this tutorial -> https://www.postgresqltutorial.com/postgresql-group-by/
You need to call the function on your GROUP BY instead of calling the name of the virtual attribute you created on select.
I was doing what all the answers above recommended and I was getting a column 'year_month' does not exist error.
What worked for me was:
SELECT
date_trunc('month', created_at), 'MM/YYYY' AS month
FROM
"orders"
GROUP BY
date_trunc('month', created_at)
Postgres has few types of timestamps:
timestamp without timezone - (Preferable to store UTC timestamps) You find it in multinational database storage. The client in this case will take care of the timezone offset for each country.
timestamp with timezone - The timezone offset is already included in the timestamp.
In some cases, your database does not use the timezone but you still need to group records in respect with local timezone and Daylight Saving Time (e.g. https://www.timeanddate.com/time/zone/romania/bucharest)
To add timezone you can use this example and replace the timezone offset with yours.
"your_date_column" at time zone '+03'
To add the +1 Summer Time offset specific to DST you need to check if your timestamp falls into a Summer DST. As those intervals varies with 1 or 2 days, I will use an aproximation that does not affect the end of month records, so in this case i can ignore each year exact interval.
If more precise query has to be build, then you have to add conditions to create more cases. But roughly, this will work fine in splitting data per month in respect with timezone and SummerTime when you find timestamp without timezone in your database:
SELECT
"id", "Product", "Sale",
date_trunc('month',
CASE WHEN
Extract(month from t."date") > 03 AND
Extract(day from t."date") > 26 AND
Extract(hour from t."date") > 3 AND
Extract(month from t."date") < 10 AND
Extract(day from t."date") < 29 AND
Extract(hour from t."date") < 4
THEN
t."date" at time zone '+03' -- Romania TimeZone offset + DST
ELSE
t."date" at time zone '+02' -- Romania TimeZone offset
END) as "date"
FROM
public."Table" AS t
WHERE 1=1
AND t."date" >= '01/07/2015 00:00:00'::TIMESTAMP WITHOUT TIME ZONE
AND t."date" < '01/07/2017 00:00:00'::TIMESTAMP WITHOUT TIME ZONE
GROUP BY date_trunc('month',
CASE WHEN
Extract(month from t."date") > 03 AND
Extract(day from t."date") > 26 AND
Extract(hour from t."date") > 3 AND
Extract(month from t."date") < 10 AND
Extract(day from t."date") < 29 AND
Extract(hour from t."date") < 4
THEN
t."date" at time zone '+03' -- Romania TimeZone offset + DST
ELSE
t."date" at time zone '+02' -- Romania TimeZone offset
END)
I also need to find results grouped by YEAR and MONTH.
When I grouped them by TIMESTAMP, sum function grouped them with dates and minutes, but that wasn't what I wanted.
Using this query may be helpful for you.
select sum(sum),
concat(year, '-', month, '-', '01')::timestamp
from (select sum(t.final_price) as sum,
extract(year from t.created_at) as year,
extract(month from t.created_at) as month
from transactions t
where status = 'SUCCESS'
group by t.created_at) t
group by year, month;
transactions table
query result
As you can see in the picture, in '2022-07-01' I have two columns in table, and in query result they are grouped together.