select first non-null row with minimum date (Big Query) - sql

I want to select the first non-null row with the minimum date. I'll like to use a CASE WHEN that condition is met, then 1 ELSE 0.
So more like CASE WHEN row IS NOT and DATE is minimum DATE then 1 ELSE 0. I just need to select ONLY one row.

Another option (for BigQuery Standard SQL)
#standardSQL
SELECT *, 0 AS marker FROM `project.dataset.table` WHERE item_count IS NULL
UNION ALL
SELECT *, IF(1 = ROW_NUMBER() OVER(PARTITION BY user ORDER BY date), 1, 0)
FROM `project.dataset.table` WHERE NOT item_count IS NULL
ORDER BY user, date

Consider:
select
t.*
case when date = min(case when itemcount is not null then date end) over(partition by user order by date)
then 1
else 0
end as marker
from mytable t
I am unsure whether BigQuery supports minif() as a window function:
select
t.*
case when date = minif(date, itemcount is not null) over(partition by user order by date)
then 1
else 0
end as marker
from mytable

Related

CASE WHEN condition with MAX() function

There are a lot questions on CASE WHEN topic, but the closest my question is related to this How to use CASE WHEN condition with MAX() function query which has not been resolved.
Here is some of my sample data:
date
debet
2022-07-15
57190.33
2022-07-14
815616516.00
2022-07-15
40866.67
2022-07-14
1221510.00
So, I want to all records for the last two dates and three additional columns: sum(sales) for the previous day, sum for the current day and the difference between them:
SELECT
[debet],
[date] ,
SUM( CASE WHEN [date] = MAX(date) THEN [debet] ELSE 0 END ) AS sum_act,
SUM( CASE WHEN [date] = MAX(date) - 1 THEN [debet] ELSE 0 END ) AS sum_prev ,
(
SUM( CASE WHEN [date] = MAX(date) THEN [debet] ELSE 0 END )
-
SUM( CASE WHEN [date] = MAX(date) - 1 THEN [debet] ELSE 0 END )
) AS diff
FROM
Table
WHERE
[date] = ( SELECT MAX(date) FROM Table WHERE date < ( SELECT MAX(date) FROM Table) )
OR
[date] = ( SELECT MAX(date) FROM Table WHERE date = ( SELECT MAX(date) FROM Table ) )
GROUP BY
[date],
[debet]
Further, of course, it informs that I can't use the aggregate function inside CASE WHEN. Now I use this combination: sum(CASE WHEN [date] = dateadd(dd,-3,cast(getdate() as date)) THEN [debet] ELSE 0 END). But here every time I need to make an adjustment for weekends and holidays. The question is, is there any other way than using 'getdate' in 'case when' Statement to get max date?
Expected result:
date
sum_act
sum_prev
diff
2022-07-15
97190.33
0.00
97190.33
2022-07-14
0.00
508769.96
-508769.96
You can use dense_rank() to filter the last 2 dates in your table. After that you can use either conditional case expression with sum() to calculate the required value
select [date],
sum_act = sum(case when rn = 1 then [debet] else 0 end),
sum_prev = sum(case when rn = 2 then [debet] else 0 end),
diff = sum(case when rn = 1 then [debet] else 0 end)
- sum(case when rn = 2 then [debet] else 0 end)
from
(
select *, rn = dense_rank() over (order by [date] desc)
from tbl
) t
where rn <= 2
group by [date]
db<>fiddle demo
Two steps:
Get the sums for the last three dates
Show the results for the last two dates.
Well, we could also get all daily sums in step 1, but we just need the last three in order to calculate the sums for the last two days, so why aggregate more data than necessary?
Here is the query. You may have to put the date column name in brackets in SQL Server, as date is a keyword in SQL.
select top(2)
date,
sum_debit_current,
sum_debit_previous,
sum_debit_current - sum_debit_previous as diff
(
select
date,
sum(debet) as sum_debit_current,
lag(sum(debet)) over (order by date) as sum_debit_previous
from table
where date in (select distinct top(3) date from table order by date desc)
group by date
)
order by date desc;
(SQL Server uses TOP(n) instead of standard SQL FETCH FIRST 3 ROWS and while SELECT DISTINCT TOP(3) date looks like "get the top 3 rows, then apply distinct on their date", it is really "apply distinct on the dates, then get the top 3" like in standard SQL.)

SQL for begin and end of data rows

I've got the following table:
and I was wondering if there is an SQL query, which would give me the begin and end Calender week (CW), where the value is greater than 0.
So in the case of the table above, a result like below:
Thanks in advance!
You can assign a group by counting the number of zeros and then aggregating:
select article_nr, min(year), max(year)
from (select t.*,
sum(case when amount = 0 then 1 else 0 end) over (partition by article_nr order by year) as grp
from t
) t
where amount > 0
group by article_nr, grp;
select Atricle_Nr, min(Year&CW) as 'Begin(Year&CW)',max(Year&CW) as 'End(Year&CW)'
from table where Amount>0 group by Atricle_Nr;

SQL - Set marker for special data-constellations

I need some SQL advice here...
I've got a table with an object (called "entityid") , an updated timestamp and a status of that object.
I now want to track, how often that object was set "inactive" by the user. But it should only count max. 1x inactive per day. If the status before was also inactive, it should not count!
So here's a little example i prepared in Excel to show where the marker should appear and where not:
Do you have any advice how I can solve this by using SQL ? (We're currently working with Redshift -> PostgreSQL).
If I understand correctly, you can use window functions. This returns the first "inactive" on each day:
select t.*,
(content_status = 'inactive' and
row_number() over (partition by entityid, updated_at::date, content_status) = 1
) as needed_marker
from t;
If I understand correctly, you can use window functions. This returns the first "inactive" on each day:
select t.*,
(content_status = 'inactive' and
row_number() over (partition by entityid, updated_at::date, content_status order by lastmodifiedtimestamp) = 1
) as needed_marker
from t;
Note: I'm not sure if updated_at is just the date. If it is, then the logic is more like:
select t.*,
(content_status = 'inactive' and
row_number() over (partition by entityid, updated_at, content_status order by lastmodifiedtimestamp) = 1
) as needed_marker
from t;
EDIT:
If you want the first time that the status changes from active to inactive, then:
select t.*,
(content_status = 'inactive' and
num_actives = 1 and
prev_status = 'active'
) as needed_marker
from (select t.*,
sum(case when status = 'active' then 1 else 0 end) over (partition by entityid, updated_at order by lastmodifiedtimestamp) as num_actives,
lag(content_status) over (partition by entityid, updated_at lastmodifiedtimestamp) as prev_status
from t
) t;
Actually, the subquery is not needed:
select t.*,
(content_status = 'inactive' and
sum(case when status = 'active' then 1 else 0 end) over (partition by entityid, updated_at order by lastmodifiedtimestamp) = 1 and
lag(content_status) over (partition by entityid, updated_at lastmodifiedtimestamp) = 'active'
) as needed_marker
from t;

SQL How can I return 2 different dates in same query

I have a query and what I want to do is return the latest date for 2 columns in the same row. If you look at RowID 1 & 2 I would like to merge both rows. They both have the same Event Type however where they differ is in the isProcessed column one contains 1 and the other 0 value. The LastReceived column should return the latest date where isProcessed is 0 and the LastProcessed column should return the last date where IsProcessed is 1. Right now I have that working but it returns in 2 rows. Row 1 shows the last date of where LastReceived is 0 and Row 2 shows the last date where Lastprocessed is 1. Both LastReceived and LastProcessed come from the column CreatedOn. What I somehow need to do is something like this in the select clause
select MAX(select CreatedOn from mytable where IsProcessed=1) as LastProcessed,
MAX(select CreatedOn from mytable where IsProcessed=0) as LastReceived
This is my query below
with cte as (select distinct EventType,SendingOrganizationID,MAX(CreatedOn) as LastReceived,MAX(CreatedOn) as LastProcessed,case
when SendingOrganizationID = '3yst8' then 'Example 1'
else 'Client Not Found'
END AS ClientName,IsProcessed from mytable
where isprocessed in(0,1)
group by SendingOrganizationID,EventType,IsProcessed
having datediff(hour, MAX(CreatedOn), getdate()) >= 9)
Select ROW_NUMBER() over (ORDER BY SendingOrganizationID) as RowID,* from cte order by SendingOrganizationID,EventType,IsProcessed
Any suggestions would be great
You can use case when:
select EventType,SendingOrganizationID,
MAX(case when isProcessed = 0 then CreatedOn end) as LastReceived,
MAX(case when isProcessed = 1 then CreatedOn end) as LastProcessed
from mytable
group by SendingOrganizationID,EventType;

MSSQL Group by and Select rows from grouping

I'm trying to figure out if what I'm trying to do is possible. Instead of resorting to multiple queries on a table, I wanted to group the records by business date and id then group by the id and select one date for a field and another date for the other field.
SELECT
*
{AMOUNT FROM DATE}
{AMOUNT FROM OTHER DATE}
FROM (
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
AS subquery
GROUP BY id
It seems that you're looking to do a pivot query. I usually use cross tabs for this. Based on the query you posted, it could look like:
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM (
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
)AS subquery
GROUP BY id;
You could also use a CTE.
WITH CTE AS(
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
)
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM CTE
GROUP BY id;
Or even be a rebel and do the operation directly.
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM CTE
GROUP BY id;
However, some people have tested for performance and found that pre-aggregating can improve performance.
If I understand you correctly, then you're just trying to pivot, but only with two particular dates:
select id,
date1 = sum(iif(date = '2000-01-01', amount, null)),
date2 = sum(iif(date = '2000-01-02', amount, null))
from [table]
group by id