How to SELECT something with different WHERE statement from the others - sql

What I'm trying to do is:
I make a pivot table in SQL where I create a few columns with user accounts, summing amount per date, etc. Those columns will all have the same WHERE conditions.
However, I want to create another column with amount for last 30 days which will be with condition WHERE date >= CURRENT_TIMESTAMP -30.
How do I create a selection in the same table with its own different condition?
For example:
I have this:
I want to make a pivot table like this
I have already made everything except the last column - it needs to sum the amount with condition WHERE date >= CURRENT_TIMESTAMP -30.
My other columns will have condition WHERE date >= '20200201'
I have already defined the days in the pivot table as days of the current month while this last column needs to include everything from the last 30 days, so not only in the current month.
How do I make the selection where column "Total for last 30 days" has its own conditions, different from the other columns?

Date functions differ by databases, but here is the idea:
select user,
sum(case when extract(day from date) = 1 then amount end) as day_1,
sum(case when extract(day from date) = 2 then amount end) as day_2,
sum(case when extract(day from date) = 3 then amount end) as day_3,
sum(case when extract(year from date) = extract(year from current_date) and
extract(month from date) = extract(month from current_date)
then amount
end) as month_total,
sum(case when date >= current_date - interval '30 da' then amount end) as last_30_days
from t
group by user;
The exact functions depend on the database you are using.

Related

Using Where and group by clause

Can anyone describe how can I suppose to retrieve data using filter conditions such as both where and group by clauses of different fields through SQL ?
For instance ,
Require to take out the No of days in a month does the temperature exceeding 35 degrees celsius ?
SELECT temp, count(*)
FROM weather_data
WHERE day between '01-jun-2022' to '30-jun-2022'
GROUP BY temp > '35';
My requirement is to find out the aggregate details like total count
So I tried using group by clause , Inaddition to that , I must use few conditions to filter further ,
Hence I used conditions in where clause before group by clause
it's correct query :
SELECT temp, count(*) FROM weather_data
WHERE temp > '35' AND day between '01-jun-2022' and '30-jun-2022' GROUP BY temp
You want to aggregate your data, so as to get one result row per month. In SQL this is GROUP BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day). Your DBMS may have additional functions to extract a month (year + month to be precise) from a date, such as TO_CHAR(day, 'YYYY-MM'), but this is vendor specific.
Now you only want to count days with a temperature obove 35 degrees. The first idea to solve this, is a WHERE clause that limits the rows you aggregate to the ones in question:
SELECT
EXTRACT(YEAR FROM day) AS year,
EXTRACT(MONTH FROM day) AS month,
COUNT(*)
FROM mytable
WHERE temp > 35
GROUP BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day)
ORDER BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day);
The problem with this: If a month has no day above that temperature, you won't select that month, because your WHERE clause removed those rows. That may be okay with you, but if you want to show the months with a zero count, then move the condition into the aggregation function. Thus you select all months but only count days with high temperatures:
SELECT
EXTRACT(YEAR FROM day) AS year,
EXTRACT(MONTH FROM day) AS month,
COUNT(CASE WHEN temp > 35 THEN 1 END)
FROM mytable
GROUP BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day)
ORDER BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day);
How does this work? COUNT <expression> ) counts non-null occurrences. CASE WHEN temp > 35 THEN 1 END is short for CASE WHEN temp > 35 THEN 1 ELSE NULL END. And instead of 1 you could use any value that is not null, e.g. 'count me'. Or you could use SUM instead, if you like that better: SUM(CASE WHEN temp > 35 THEN 1 ELSE 0 END).
At last you want to limit the date range. Date literals in SQL look like this: DATE 'YYYY-MM-DD'. And as we sometimes deal with dates and other times with datetimes or timestamps, it has become common, not to use BETWEEN, but >= and <, so as to have the range work for all those data types:
SELECT
EXTRACT(YEAR FROM day) AS year,
EXTRACT(MONTH FROM day) AS month,
COUNT(CASE WHEN temp > 35 THEN 1 END)
FROM mytable
WHERE day >= DATE '2022-06-01'
AND day < DATE '2022-07-01'
GROUP BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day)
ORDER BY EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day);
Try this:
SELECT temp, count(*)
FROM weather_data
WHERE date >= '01-jun-2022' AND date<='30-jun-2022' AND temp > '35'
GROUP BY temp;

Suggested way to do a 'parallel period' SQL statement

Let's say I want to get the profit between two dates. Then I can do something like this:
SELECT SUM(Profit)
FROM Sales
WHERE date BETWEEN '2014-01-01' AND '2014-02-01' AND <other_filters>
I would then like to compare it to a previous period offset by a fixed amount. It could be written something like this to get it in two rows:
SELECT SUM(Profit)
FROM Sales
WHERE date BETWEEN '2014-01-01' AND '2014-02-01' AND <other_filters>
UNION ALL
SELECT SUM(Profit)
FROM Sales
WHERE date BETWEEN '2014-01-01' - INTERVAL 1 YEAR AND '2014-02-01' - INTERVAL 1 YEAR AND <other_filters>
Is there a way to do this without a union? I am looking for something like this:
SELECT
SELECT SUM(Profit),
???
FROM Sales
WHERE date BETWEEN '2014-01-01' AND '2014-02-01' AND <other_filters>
I think the tricky part here is how to 'un-do' the where filter for the offseted-time calculation.
You can use conditional aggregation and OR the range checks in the WHERE clause (unless they are subsequent in which case you can combine them directly of course).
SELECT sum(CASE
WHEN date >= '2014-01-01'
AND date < '2014-02-02' THEN
profit
ELSE
0
END),
sum(CASE
WHEN date >= '2014-01-01' - INTERVAL 1 YEAR
AND date < '2014-02-02' - INTERVAL 1 YEAR THEN
profit
ELSE
0
END)
FROM sales
WHERE date >= '2014-01-01'
AND date < '2014-02-02'
OR date >= '2014-01-01' - INTERVAL 1 YEAR
AND date < '2014-02-02' - INTERVAL 1 YEAR;
Note: Prefer not to use BETWEEN here but check for a right half open range check. That way, if the precision of date changes, records on the end past midnight are still in the results.

Customizing the range of a week with date_trunc

I've been trying for hours now to write a date_trunc statement to be used in a group by where my week starts on a Friday and ends the following Thursday.
So something like
SELECT
DATE_TRUNC(...) sales_week,
SUM(sales) sales
FROM table
GROUP BY 1
ORDER BY 1 DESC
Which would return the results for the last complete week (by those standards) as 09-13-2019.
You can subtract 4 days and then add 4 days:
SELECT DATE_TRUNC(<whatever> - INTERVAL '4 DAY') + INTERVAL '4 DAY' as sales_week,
SUM(sales) as sales
FROM table
GROUP BY 1
ORDER BY 1 DESC
The expression
select current_date - cast(cast(7 - (5 - extract(dow from current_date)) as text) || ' days' as interval);
should always give you the previous Friday's date.
if by any chance you might have gaps in data (maybe more granular breakdowns vs just per week), you can generate a set of custom weeks and left join to that:
drop table if exists sales_weeks;
create table sales_weeks as
with
dates as (
select generate_series('2019-01-01'::date,current_date,interval '1 day')::date as date
)
,week_ids as (
select
date
,sum(case when extract('dow' from date)=5 then 1 else 0 end) over (order by date) as week_id
from dates
)
select
week_id
,min(date) as week_start_date
,max(date) as week_end_date
from week_ids
group by 1
order by 1
;

Last day of existence in table

Is it possible to find a day the most recent day someone was in the table before they dropped out of it during a subsetted time range?
I have something like:
SELECT
id
, MAX(day) AS day
FROM table
WHERE
day >= '2018-01-01' AND day <= '2019-08-17'
AND day != '2019-08-18'
GROUP BY 1
I'm trying to just get everyone who was within the date range '2018-01-01' and '2019-08-17', but then wasn't in the table on '2019-08-18'
But, this still leads me to capture people who did have a day on '2019-08-18' in the original table, the new table just leaves that day out instead of finding people who truly didn't have a record in that day
Use a having clause:
SELECT id, MAX(day) AS day
FROM table
WHERE day >= '2018-01-01'
GROUP BY id
HAVING MAX(day) <= '2019-08-17';
Put all the conditions in the HAVING clause:
SELECT
id,
MAX(CASE WHEN day BETWEEN '2018-01-01' AND '2019-08-17' THEN day END) AS day
FROM table
GROUP BY id
HAVING
SUM(CASE WHEN day BETWEEN '2018-01-01' AND '2019-08-17' THEN 1 ELSE 0 END) > 0
AND
SUM(CASE WHEN day = '2019-08-18' THEN 1 ELSE 0 END) = 0

Query to get sales from this month and previous month

We'd like to get total sales for this month and previous month. The query is:
SELECT sum(CASE
WHEN date_trunc('month', date_start)= date_trunc('month', now()) THEN sales
ELSE NULL
END) AS curr_sales,
sum(CASE
WHEN date_trunc('month', date_start)= date_trunc('month', now()- interval '1' MONTH) THEN sales
ELSE NULL
END) AS pr_sales
FROM sales
But it returned this error:
"Specified types or functions (one per INFO message) not supported on Redshift tables. We're running Postgresql 8.0.2. Any ideas? Thanks!
Use date_trunc('month', current_date) instead of date_trunc('month', now()) to get current month in redshift.
now() is not a supported function in redshift but current_date will return a date in the current session time zone (UTC by default) in the default format: YYYY-MM-DD.
UPDATE
Query w/ Sample Data >>
with sales(sales, date_start) as(
select 1 , current_date union
select 2 , current_date union
select 2 , current_date - interval '1' month union
select 3 , current_date - interval '1' month
)
SELECT sum(CASE
WHEN date_trunc('month', date_start)= date_trunc('month', current_date) THEN sales
ELSE NULL
END) AS curr_sales,
sum(CASE
WHEN date_trunc('month', date_start)= date_trunc('month', current_date - interval '1' MONTH) THEN sales
ELSE NULL
END) AS pr_sales
FROM sales;
And results are coming as expected:
curr_sales pr_sales
3 5