Using Where and group by clause

Using Where and group by clause - sql

Can anyone describe how can I suppose to retrieve data using filter conditions such as both where and group by clauses of different fields through SQL ?
For instance ,
Require to take out the No of days in a month does the temperature exceeding 35 degrees celsius ?
SELECT temp, count(*)
FROM weather_data
WHERE day between '01-jun-2022' to '30-jun-2022'
GROUP BY temp > '35';
My requirement is to find out the aggregate details like total count
So I tried using group by clause , Inaddition to that , I must use few conditions to filter further ,
Hence I used conditions in where clause before group by clause

it's correct query :
SELECT temp, count(*) FROM weather_data
WHERE temp > '35' AND day between '01-jun-2022' and '30-jun-2022' GROUP BY temp

You want to aggregate your data, so as to get one result row per month. In SQL this is GROUP BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day). Your DBMS may have additional functions to extract a month (year + month to be precise) from a date, such as TO_CHAR(day, 'YYYY-MM'), but this is vendor specific.
Now you only want to count days with a temperature obove 35 degrees. The first idea to solve this, is a WHERE clause that limits the rows you aggregate to the ones in question:
SELECT
EXTRACT(YEAR FROM day) AS year,
EXTRACT(MONTH FROM day) AS month,
COUNT(*)
FROM mytable
WHERE temp > 35
GROUP BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day)
ORDER BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day);
The problem with this: If a month has no day above that temperature, you won't select that month, because your WHERE clause removed those rows. That may be okay with you, but if you want to show the months with a zero count, then move the condition into the aggregation function. Thus you select all months but only count days with high temperatures:
SELECT
EXTRACT(YEAR FROM day) AS year,
EXTRACT(MONTH FROM day) AS month,
COUNT(CASE WHEN temp > 35 THEN 1 END)
FROM mytable
GROUP BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day)
ORDER BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day);
How does this work? COUNT <expression> ) counts non-null occurrences. CASE WHEN temp > 35 THEN 1 END is short for CASE WHEN temp > 35 THEN 1 ELSE NULL END. And instead of 1 you could use any value that is not null, e.g. 'count me'. Or you could use SUM instead, if you like that better: SUM(CASE WHEN temp > 35 THEN 1 ELSE 0 END).
At last you want to limit the date range. Date literals in SQL look like this: DATE 'YYYY-MM-DD'. And as we sometimes deal with dates and other times with datetimes or timestamps, it has become common, not to use BETWEEN, but >= and <, so as to have the range work for all those data types:
SELECT
EXTRACT(YEAR FROM day) AS year,
EXTRACT(MONTH FROM day) AS month,
COUNT(CASE WHEN temp > 35 THEN 1 END)
FROM mytable
WHERE day >= DATE '2022-06-01'
AND day < DATE '2022-07-01'
GROUP BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day)
ORDER BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day);

Try this:
SELECT temp, count(*)
FROM weather_data
WHERE date >= '01-jun-2022' AND date<='30-jun-2022' AND temp > '35'
GROUP BY temp;

Related

I want to get the result which should be less time consuming

select count(*)
from table
where EXTRACT(MONTH FROM addondatetime) = EXTRACT(MONTH FROM current_date)
and EXTRACT(year FROM addondatetime) = EXTRACT(year FROM current_date)
this is my query. i want to extract month from table which is equal to current month but this query is taking almost 2 min

Try:
select count(*)
from table
where date_trunc('month', addondatetime) = date_trunc('month', current_date);
Also create a function based index:
create index test_just_month on test (date_trunc('month', addondatetime));

PostgreSQL: Simplifying a SQL query into a shorter query

I have a table called 'daily_prices' where I have 'sale_date', 'last_sale_price', 'symbol' as columns.
I need to calculate how many times 'last_sale_price' has gone up compared to previous day's 'last_sale_price' in 10 weeks.
Currently I have my query like this for 2 weeks:
select count(*) as "timesUp", sum(last_sale_price-prev_price) as "dollarsUp", 'wk1' as "week"
from
(
select last_sale_price, LAG(last_sale_price, 1) OVER (ORDER BY sale_date) as prev_price
from daily_prices
where sale_date <= CAST('2020-09-18' AS DATE) AND sale_date >= CAST('2020-09-14' AS DATE)
and symbol='AAPL'
) nest
where last_sale_price > prev_price
UNION
select count(*) as "timesUp", sum(last_sale_price-prev_price) as "dollarsUp", 'wk2' as "week"
from
(
select last_sale_price, LAG(last_sale_price, 1) OVER (ORDER BY sale_date) as prev_price
from daily_prices
where sale_date <= CAST('2020-09-11' AS DATE) AND sale_date >= CAST('2020-09-07' AS DATE)
and symbol='AAPL'
) nest
where last_sale_price > prev_price
I'm using 'UNION' to combine the weekly data. But as the number of weeks increase the query is going to be huge.
Is there a simpler way to write this query?
Any help is much appreciated. Thanks in advance.

you can extract week from sale_date. then apply group by on the upper query
select EXTRACT(year from sale_date) YEAR, EXTRACT('week' FROM sale_date) week, count(*) as "timesUp", sum(last_sale_price-prev_price) as "dollarsUp"
from (
select
sale_date,
last_sale_price,
LAG(last_sale_price, 1) OVER (ORDER BY sale_date) as prev_price
from daily_prices
where symbol='AAPL'
)
where last_sale_price > prev_price
group by EXTRACT(year from sale_date), EXTRACT('week' FROM sale_date)
to extract only weekdays you can add this filter
EXTRACT(dow FROM sale_date) in (1,2,3,4,5)
PS: make sure that monday is first day of the week. In some countries sunday is the first day of the week

You can filter on the last 8 weeks in the where clause, then group by week and do conditional aggregation:
select extract(year from sale_date) yyyy, extract(week from saledate) ww,
sum(last_sale_price - lag_last_sale_price) filter(where lag_last_sale_price > last_sale_price) sum_dollars_up,
count(*) filter(where lag_last_sale_price > last_sale_price) cnt_dollars_up
from (
select dp.*,
lag(last_sale_price) over(partition by extract(year from sale_date), extract(week from saledate) order by sale_date) lag_last_sale_price
from daily_price
where symbol = 'AAPL'
and sale_date >= date_trunc('week', current_date) - '8 week'::interval
) dp
group by 1, 2
Notes:
I am asssuming that you don't want to compare the first price of a week to the last price of the previous week; if you do, then just remove the partition by clause from the over() clause of lag()
this dynamically computes the date as of 8 (entire) weeks ago
if there is no price increase during a whole week, the query still gives you a row, with 0 as sum_dollars_up and cnt_dollars_up

How to SELECT something with different WHERE statement from the others

What I'm trying to do is:
I make a pivot table in SQL where I create a few columns with user accounts, summing amount per date, etc. Those columns will all have the same WHERE conditions.
However, I want to create another column with amount for last 30 days which will be with condition WHERE date >= CURRENT_TIMESTAMP -30.
How do I create a selection in the same table with its own different condition?
For example:
I have this:
I want to make a pivot table like this
I have already made everything except the last column - it needs to sum the amount with condition WHERE date >= CURRENT_TIMESTAMP -30.
My other columns will have condition WHERE date >= '20200201'
I have already defined the days in the pivot table as days of the current month while this last column needs to include everything from the last 30 days, so not only in the current month.
How do I make the selection where column "Total for last 30 days" has its own conditions, different from the other columns?

Date functions differ by databases, but here is the idea:
select user,
sum(case when extract(day from date) = 1 then amount end) as day_1,
sum(case when extract(day from date) = 2 then amount end) as day_2,
sum(case when extract(day from date) = 3 then amount end) as day_3,
sum(case when extract(year from date) = extract(year from current_date) and
extract(month from date) = extract(month from current_date)
then amount
end) as month_total,
sum(case when date >= current_date - interval '30 da' then amount end) as last_30_days
from t
group by user;
The exact functions depend on the database you are using.

Select data with a rolling date criteria

The below query returns a distinct count of 'members' for a given month and brand (see image below).
select to_char(transaction_date, 'YYYY-MM') as month, brand,
count(distinct UNIQUE_MEM_ID) as distinct_count
from source.table
group by to_char(transaction_date, 'YYYY-MM'), brand;
The data is collected with a 15 day lag after the month closes (meaning September 2016 MONTHLY data won't be 100% until October 15). I am only concerned with monthly data.
The query I would like to build: Until the 15th of this month (October), last month's data (September) should reflect August's data. The current partial month (October) should default to the prior month and thus also to the above logic.
After the 15th of this month, last month's data (September) is now 100% and thus September should reflect September (and October will reflect September until November 15th, and so on).
The current partial month will always = the prior month. The complexity of the query is how to calc prior month.
This query will be ran on a rolling basis so needs to be dynamic.
To be clear, I am trying to build a query where distinct_count for the prior month (until end of current month + 15 days) should reflect (current month - 2) value (for each respective brand). After 15 days of the close of the month, prior month = (current month - 1).
Partial current month defaults to prior month's data. The 15 day value should be variable/modifiable.

First, simplify the query to:
select to_char(transaction_date, 'YYYY-MM') as month, brand,
count(distinct members) as distinct_count
from source.table
group by members, to_char(transaction_date, 'YYYY-MM'), brand;
Then, you are going to have a problem. The problem is that one row (say from Aug 20th) needs to go into two groups. A simple group by won't handle this. So, let's use union all. I think the result is something like this:
select date_trunc('month', transaction_date) as month, brand,
count(distinct members) as distinct_count
from source.table
where (date_trunc('month', transaction_date) < date_trunc('month' current_date) - interval '1 month') or
(day(current_date) > 15 and date_trunc('month', transaction_date) = date_trunc('month' current_date) - interval '1 month')
group by date_trunc('month', transaction_date), brand
union all
select date_trunc('month' current_date) - interval '1 month' as month, brand,
count(distinct members) as distinct_count
from source.table
where (day(current_date) < 15 and date_trunc('month', transaction_date) = date_trunc('month' current_date) - interval '1 month')
group by brand;

Since you already have a working query, I concentrate on the subselect. The condition you can use here is CASE, especially "Searched CASE"
case
when extract(day from current_date) < 15 then
extract(month from current_date - interval '2 months')
else
extract(month from current_date - interval '1 month')
end case
This may be used as part of a where clause, for example.

Here is some sudo code to get the begin date and the end date for your interval.
Begin date:
date DATE_TRUNC('month', CURRENT_DATE - integer 15) - interval '1 month'
This will return the current month only after the 15th day, from there you can subtract a full month to get your starting point.
End Date:
To calculate this, grab the begin date, plus a month, minus a day.

If the source table is partitioned by transaction_date, this syntax (not masking transaction_date with expression) enables partitions eliminatation.
select to_char(transaction_date, 'YYYY-MM') as month
,count (distinct members) as distinct_count
,brand as brand
FROM source.table
where transaction_date between date_trunc('month', current_date) - case when extract (day from current_date) >= 15 then 1 else 2 end * interval '1' month
and date_trunc('month', current_date) - case when extract (day from current_date) >= 15 then 0 else 1 end * interval '1' month - interval '1' day
group by to_char(transaction_date, 'YYYY-MM')
,brand
;

Group query results by month and year in postgresql

I have the following database table on a Postgres server:
id date Product Sales
1245 01/04/2013 Toys 1000
1245 01/04/2013 Toys 2000
1231 01/02/2013 Bicycle 50000
456461 01/01/2014 Bananas 4546
I would like to create a query that gives the SUM of the Sales column and groups the results by month and year as follows:
Apr 2013 3000 Toys
Feb 2013 50000 Bicycle
Jan 2014 4546 Bananas
Is there a simple way to do that?

I can't believe the accepted answer has so many upvotes -- it's a horrible method.
Here's the correct way to do it, with date_trunc:
SELECT date_trunc('month', txn_date) AS txn_month, sum(amount) as monthly_sum
FROM yourtable
GROUP BY txn_month
It's bad practice but you might be forgiven if you use
GROUP BY 1
in a very simple query.
You can also use
GROUP BY date_trunc('month', txn_date)
if you don't want to select the date.

select to_char(date,'Mon') as mon,
extract(year from date) as yyyy,
sum("Sales") as "Sales"
from yourtable
group by 1,2
At the request of Radu, I will explain that query:
to_char(date,'Mon') as mon, : converts the "date" attribute into the defined format of the short form of month.
extract(year from date) as yyyy : Postgresql's "extract" function is used to extract the YYYY year from the "date" attribute.
sum("Sales") as "Sales" : The SUM() function adds up all the "Sales" values, and supplies a case-sensitive alias, with the case sensitivity maintained by using double-quotes.
group by 1,2 : The GROUP BY function must contain all columns from the SELECT list that are not part of the aggregate (aka, all columns not inside SUM/AVG/MIN/MAX etc functions). This tells the query that the SUM() should be applied for each unique combination of columns, which in this case are the month and year columns. The "1,2" part is a shorthand instead of using the column aliases, though it is probably best to use the full "to_char(...)" and "extract(...)" expressions for readability.

to_char actually lets you pull out the Year and month in one fell swoop!
select to_char(date('2014-05-10'),'Mon-YY') as year_month; --'May-14'
select to_char(date('2014-05-10'),'YYYY-MM') as year_month; --'2014-05'
or in the case of the user's example above:
select to_char(date,'YY-Mon') as year_month
sum("Sales") as "Sales"
from some_table
group by 1;

There is another way to achieve the result using the date_part() function in postgres.
SELECT date_part('month', txn_date) AS txn_month, date_part('year', txn_date) AS txn_year, sum(amount) as monthly_sum
FROM yourtable
GROUP BY date_part('month', txn_date)
Thanks

Why not just use date_part function. https://www.postgresql.org/docs/8.0/functions-datetime.html
SELECT date_part('year', txn_date) AS txn_year,
date_part('month', txn_date) AS txn_month,
sum(amount) as monthly_sum
FROM payment
GROUP BY txn_year, txn_month
order by txn_year;

Take a look at example 6) of this tutorial -> https://www.postgresqltutorial.com/postgresql-group-by/
You need to call the function on your GROUP BY instead of calling the name of the virtual attribute you created on select.
I was doing what all the answers above recommended and I was getting a column 'year_month' does not exist error.
What worked for me was:
SELECT
date_trunc('month', created_at), 'MM/YYYY' AS month
FROM
"orders"
GROUP BY
date_trunc('month', created_at)

Postgres has few types of timestamps:
timestamp without timezone - (Preferable to store UTC timestamps) You find it in multinational database storage. The client in this case will take care of the timezone offset for each country.
timestamp with timezone - The timezone offset is already included in the timestamp.
In some cases, your database does not use the timezone but you still need to group records in respect with local timezone and Daylight Saving Time (e.g. https://www.timeanddate.com/time/zone/romania/bucharest)
To add timezone you can use this example and replace the timezone offset with yours.
"your_date_column" at time zone '+03'
To add the +1 Summer Time offset specific to DST you need to check if your timestamp falls into a Summer DST. As those intervals varies with 1 or 2 days, I will use an aproximation that does not affect the end of month records, so in this case i can ignore each year exact interval.
If more precise query has to be build, then you have to add conditions to create more cases. But roughly, this will work fine in splitting data per month in respect with timezone and SummerTime when you find timestamp without timezone in your database:
SELECT
"id", "Product", "Sale",
date_trunc('month',
CASE WHEN
Extract(month from t."date") > 03 AND
Extract(day from t."date") > 26 AND
Extract(hour from t."date") > 3 AND
Extract(month from t."date") < 10 AND
Extract(day from t."date") < 29 AND
Extract(hour from t."date") < 4
THEN
t."date" at time zone '+03' -- Romania TimeZone offset + DST
ELSE
t."date" at time zone '+02' -- Romania TimeZone offset
END) as "date"
FROM
public."Table" AS t
WHERE 1=1
AND t."date" >= '01/07/2015 00:00:00'::TIMESTAMP WITHOUT TIME ZONE
AND t."date" < '01/07/2017 00:00:00'::TIMESTAMP WITHOUT TIME ZONE
GROUP BY date_trunc('month',
CASE WHEN
Extract(month from t."date") > 03 AND
Extract(day from t."date") > 26 AND
Extract(hour from t."date") > 3 AND
Extract(month from t."date") < 10 AND
Extract(day from t."date") < 29 AND
Extract(hour from t."date") < 4
THEN
t."date" at time zone '+03' -- Romania TimeZone offset + DST
ELSE
t."date" at time zone '+02' -- Romania TimeZone offset
END)

I also need to find results grouped by YEAR and MONTH.
When I grouped them by TIMESTAMP, sum function grouped them with dates and minutes, but that wasn't what I wanted.
Using this query may be helpful for you.
select sum(sum),
concat(year, '-', month, '-', '01')::timestamp
from (select sum(t.final_price) as sum,
extract(year from t.created_at) as year,
extract(month from t.created_at) as month
from transactions t
where status = 'SUCCESS'
group by t.created_at) t
group by year, month;
transactions table
query result
As you can see in the picture, in '2022-07-01' I have two columns in table, and in query result they are grouped together.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using Where and group by clause - sql

it's correct query : SELECT temp, count(*) FROM weather_data WHERE temp > '35' AND day between '01-jun-2022' and '30-jun-2022' GROUP BY temp

Try this: SELECT temp, count(*) FROM weather_data WHERE date >= '01-jun-2022' AND date<='30-jun-2022' AND temp > '35' GROUP BY temp;

Related

I want to get the result which should be less time consuming

PostgreSQL: Simplifying a SQL query into a shorter query

How to SELECT something with different WHERE statement from the others

Select data with a rolling date criteria

Group query results by month and year in postgresql

Categories

Resources