SQL help on group by query with inline query

SQL help on group by query with inline query - sql

I've these 2 tables. I want to count number of con_id which has remark '1' continuously for the last period(s).
ex: 2 for A1, 1 for A3, but 0 for A2 and B1 as they don't have '1' continuously for the latest result(s) for the following table.
t_conmast
con_id [pk]
off_code
con_id off_code
A1 1
A2 1
B1 2
A3 1
t_readbak
con_id [fk]
counter
remark
timestamp [not shown in the table; auto inserted by system]
con_id counter remark timestamp
A1 1 0
A1 3 1
A1 6 1
B1 1 1
B1 2 0
A2 1 0
A2 2 1
A2 3 0
A3 1 1
what I tried and failed (I added the off_code just to get result for a single office)
select con_id,
count(con_id)
from t_readbak
where remark=1 and timestamp > (select max(timestamp)
from t_readbak
where remark=0
group by con_id)
and con_id in (select con_id from t_conmast where off_code=1)
Expected output
con_id count(con_id)
A1 2
A2 0
A3 1
B1 0

This is the approach that I took to solving this. First, calculate a cumulative sum of remark going backwards for each con_id. Then, the first time that you hit a row where remark = 0, use the value on that row. You can find the first such row using row_number().
The complication is when you have no remarks with a value of 0. In that case, you just take the total number.
The following query combines this logic into SQL:
select rb.con_id,
(case when NumZeros = 0 then numRemarks else cumsum end) as count1
from (select rb.*,
SUM(remark) over (partition by con_id order by counter desc) as cumsum,
ROW_NUMBER() over (partition by con_id, remark order by counter desc) as remark_counter,
SUM(case when remark = 0 then 1 else 0 end) as NumZeros,
SUM(remark) over (partition by con_id) as numRemarks
from t_readbak rb
) rb
where (remark_counter = 1 and remark = 0) or
(NumZeros = 0 and remark_counter = 1)

A left self join might work. Something like this:
select con_id, count(*) records
from t_readback t1 left join t_readback t2 using (con_id, remark)
where remark = 1
and t1.counter < t2.counter
group by con_id

If you mean that you only want to include con_id counts if every remark in the period is 1, you can do something like this:
SELECT
con_id,
COUNT(CASE remark = 1 THEN 1 END) AS Remark1Count,
COUNT(CASE remark <> 1 THEN 1 END) AS RemarkNot1Count
FROM t_conmast
INNER JOIN t_readbak ON t_conmast.con_id = t_readbak.con_id
WHERE your-timestamp-condition
GROUP BY con_id
HAVING COUNT(CASE remark <> 1 THEN 1 END) = 0
The HAVING will filter out any con_id that has a remark <> 1.

get the maximum timestamp for each con_id where remark is 0.
thereafter, again for each con_id, count items with younger timestamps. remark is set to 1 in these records by construction:
select con_id
, count(*)
from t_readbak master
inner join t_conmast office on ( office.off_code = 1
and office.con_id = master.con_id )
inner join (
select con_id con_id
, max(timestamp) ts
from (
select con_id
, remark
, timestamp
from t_readbak
where remark = 0
) noremark
group by con_id
) cutoff
on ( master.con_id = cutoff.con_id )
where master.timestamp > cutoff.ts
group by master.con_id
;
replace timestamp ( max(timestamp) ) by counter ( min(counter)) and change the comparison operator if you can't trust your timestamp ordering.

Related

SQL: return rows with only the earliest date for each id but only if it satisfies condition

I would like to get list of unique id that have 'condition=1' before 'condition=2'.
id
date
condition1
condition2
1
2022/02
1
0
1
2022/04
0
1
1
2022/05
0
0
2
2021/09
0
1
2
2022/01
1
0
3
2022/02
1
0
3
2022/05
0
1
In this case it would be 1 and 3.
SELECT id, MIN(date) FROM TABLE GROUP BY id
I know that i can do something like this to get first dates for id but i just cant figure out what to do for my problem

We can GROUP BY id and build two conditional MIN dates using CASE WHEN.
In the HAVING clause we say that the minimum date with condition 1 must appear before the minimum date with condition 2.
SELECT id
FROM yourtable
GROUP BY id
HAVING MIN(CASE WHEN condition1 = 1 THEN date END) <
MIN(CASE WHEN condition2 = 1 THEN date END)
ORDER BY id;
Try out here: db<>fiddle

Something like:
SELECT DISTINCT Id
FROM
(SELECT id, MIN(date)
FROM TheTable
WHERE Condition1 = 1
GROUP BY Id) c1
INNER JOIN
(SELECT Id, MIN(date)
FROM TheTable
WHERE Condition2 = 1
GROUP BY Id) c2
ON c1.Id=C2.Id AND c1.Date < c2.Date

Select the greatest occurence from a column, based on date is frequencies are the same

I have the following dataset with let's say ID = {1,[...],5} and Col1 = {a,b,c,Null} :
ID
Col1
Date
1
a
01/10/2022
1
a
02/10/2022
1
a
03/10/2022
2
b
01/10/2022
2
c
02/10/2022
2
c
03/10/2022
3
a
01/10/2022
3
b
02/10/2022
3
Null
03/10/2022
4
c
01/10/2022
5
b
01/10/2022
5
Null
02/10/2022
5
Null
03/10/2022
I would like to group my rows by ID, compute new columns to show the number of occurences and compute a new column that would show a string of characters, depending on the frequency of Col1. With most a = Hi, most b = Hello, most c = Welcome, most Null = Unknown. If multiple modalities except Null have the same frequency, the most recent one based on date wins.
Here is the dataset I need :
ID
nb_a
nb_b
nb_c
nb_Null
greatest
1
3
0
0
0
Hi
2
0
1
2
0
Welcome
3
1
1
0
1
Hello
4
0
0
1
0
Welcome
5
0
1
0
2
Unknown
I have to do this in a compute recipe in Dataiku. The group by is handled by the group by section of the recipe while the rest of the query needs to be done in the "custom aggregations" section of the recipe. I'm having troubles with the if equality then most recent part of the code.
My SQL code looks like this :
CASE WHEN SUM(CASE WHEN Col1 = a THEN 1 ELSE 0) >
SUM(CASE WHEN Col1 = b THEN 1 ELSE 0)
AND SUM(CASE WHEN Col1 = a THEN 1 ELSE 0) >
SUM(CASE WHEN Col1 = c THEN 1 ELSE 0)
THEN 'Hi'
CASE WHEN SUM(CASE WHEN Col1 = b THEN 1 ELSE 0) >
SUM(CASE WHEN Col1 = a THEN 1 ELSE 0)
AND SUM(CASE WHEN Col1 = b THEN 1 ELSE 0) >
SUM(CASE WHEN Col1 = c THEN 1 ELSE 0)
THEN 'Hello'
CASE WHEN SUM(CASE WHEN Col1 = c THEN 1 ELSE 0) >
SUM(CASE WHEN Col1 = a THEN 1 ELSE 0)
AND SUM(CASE WHEN Col1 = c THEN 1 ELSE 0) >
SUM(CASE WHEN Col1 = b THEN 1 ELSE 0)
THEN 'Welcome'
Etc, etc, repeat for other cases.
But surely there must be a better way to do this right? And I have no idea how to include the most recent one when frequencies are the same.
Thank you for your help and sorry if my message isn't clear.

I tried to repro this in Azure Synapse using SQL script. Below is the approach.
Sample Table is created as in below image.
Create table tab1 (id int, col1 varchar(50), date_column date)
Insert into tab1 values(1,'a','2021-10-01')
Insert into tab1 values(1,'a','2021-10-02')
Insert into tab1 values(1,'a','2021-10-03')
Insert into tab1 values(2,'b','2021-10-01')
Insert into tab1 values(2,'c','2021-10-02')
Insert into tab1 values(2,'c','2021-10-03')
Insert into tab1 values(3,'a','2021-10-01')
Insert into tab1 values(3,'b','2021-10-02')
Insert into tab1 values(3,'Null','2021-10-03')
Insert into tab1 values(4,'c','2021-10-01')
Insert into tab1 values(5,'b','2021-10-01')
Insert into tab1 values(5,'Null','2021-10-02')
Insert into tab1 values(5,'Null','2021-10-03')
Step:1
Query is written to find the count of values within the group id,col1 and maximum date value within each combination of id, col1.
select
distinct id,col1,
count(*) over (partition by id,col1) as count,
case when col1='Null' then null else max(date_column) over (partition by id,col1) end as max_date
from tab1
Step:2
Row number is calculated within each id, col1 group on the decreasing order of count and max_date columns. This is done when two or more values have same frequency, then to assign value based on latest date.
select *, row_number() over (partition by id order by count desc, max_date desc) as row_num from
(select
distinct id,col1,
count(*) over (partition by id,col1) as count,
case when col1='Null' then null else max(date_column) over (partition by id,col1) end as max_date
from tab1)q1
Step:3
Line items with row_num=1 are filtered and values for the greatest column is assigned with the logic
most a = Hi, most b = Hello, most c = Welcome, most Null = Unknown.
Full Query
select id,
[greatest]=case when col1='a' then 'Hi'
when col1='b' then 'Hello'
when col1='c' then 'Welcome'
else 'Unknown'
end
from
(select *, row_number() over (partition by id order by count desc, max_date desc) as row_num from
(select
distinct id,col1,
count(*) over (partition by id,col1) as count,
case when col1='Null' then null else max(date_column) over (partition by id,col1) end as max_date
from tab1)q1
)q2 where row_num=1
Output
By this approach, even when the frequencies are same, based on the most recent date, required values can be updated.

SQL to calculate cumulative sum that resets based on previous value in a column in Hive

I am trying to create a cumulative value with something like this
KEY1 Date_ VAL1 CUMU_VAL2
K1 D1 1 0
K1 D2 1 1
K1 D3 0 2
K1 D4 1 0
K1 D5 1 1
So, the issue is basically to keep on adding the value by 1 in column CUMU_VAL2 based on the previous row in VAL1, but this sum resets when the previous value in VAL1 column is zero.
Basically if you do it in excel the formula for say Cell(D3) is
D3 = IF(C2>0, D2+1, 0)
I believe I should be able to something like this, but how do I add in the Case when previous value is zero then reset the sum?
SELECT
a1.*,
SUM(a1.VAL1) OVER (PARTITION BY a1.KEY1 ORDER BY a1.Date_ ) AS CUMU_VAL2
FROM source_table a1

My amendment to #GordonLinoff's answer as the OP didn't quite understand what I meant.
SELECT
t.KEY1, t.Date_, t.VAL1,
ROW_NUMBER() OVER (PARTITION BY key1, grp
ORDER BY Date_
)
- 1
AS CUMU_VAL2
FROM
(
SELECT
*,
SUM(
CASE WHEN val1 = 0 THEN 1 ELSE 0 END
)
OVER (
PARTITION BY key1
ORDER BY date_
)
AS grp
FROM
source_table
)
t;

You can assign a group -- which is the sum of 0s after a given row. Then use count():
select t.KEY1, t.Date_, t.VAL1,
count(*) over (partition by key1, grp, (case when val1 = 0 then 0 else 1 end)
order by date_
) as cume_val1
from (select t.*,
sum(case when a.val1 = 0 then 1 else 0 end) over (partition by key1 order by date_ rows between 1 following and unbounded following) as grp
from source_table t
) t;
If val1 only takes on the values 0 and 1, then use row_number() instead of count().

Finding Missing Numbers series when Data Is Grouped in sql server

I need to write a query that will calculate the missing numbers with their count in a sequence when the data is "grouped". The data are in multiple groups & each group is in sequence.
For Ex. I have number series like 1001-1050, 1245-1270, 4571-4590 and all numbers like 1001,1002,1003,....1050 is stored in Table1 and from that Table1 some numbers are stored in another table Table2. E.g. 1001,1002,1003,1004,1005.
I want to get output like this:
Utilized Numbers | Balance Numbers |
----------- -------------------------
1001 - 1005 = 5 | 1006 - 1050 = 45 |
1245 - 1251 = 7 | 1252 - 1270 = 19 |
4571 - 4573 = 3 | 4574 - 4590 = 17 |
The number of each series is single field which is stored in both tables.

You haven't really explained your data, but guessing that "Utilized" are the numbers found in both Table1 and Table2, and "Balance" are the numbers only in Table1.
You can get the result at least this way, it's a little bit messy, mostly because of formatting the results:
Edit: This is a new version that does not use lag.
select
min (case when C2 = 1 then MINID end), max (case when C2 = 1 then MAXID end), max(case when C2=1 then ROWS end),
min (case when C2 = 0 then MINID end), max (case when C2 = 0 then MAXID end), max(case when C2=0 then ROWS end)
from (
select min(ID) as MINID, max(ID) as MAXID, count(*) as ROWS, C2, row_number() over (partition by C2 order by min(ID)) as GRP3 from (
select *, ID - RN as GRP1, ID - RN2 as GRP2 from (
select
T1.ID, row_number() over (order by T1.ID) as RN,
case when T2.ID is NULL then 0 else 1 end as C2,
row_number() over (partition by case when T2.ID is NULL then 0 else 1 end order by T1.ID) as RN2,
T2.ID as ID2
from #Table1 T1
left outer join #Table2 T2 on T1.ID = T2.ID
) X
) Y
group by GRP1, GRP2, C2
) Z
group by GRP3
order by 1
The idea here is to have a row number ordered by Table1.ID, and it's compared to the Table1.ID, and if the difference changes, then it's a new group. The same logic is used second time, but now partitioned differently for rows that exist in Table2 to handle changes between "Utilized" and "Balance".
From those groupings you can get the min and max value + number of rows. There's one additional grouping with min/max and case to format the result into 2 columns.
See the demo.

Checking if the row has the max value in a group

I'm trying get to find out if a row has the max value in a group. Here's really simple example:
Data
VoteCount LocationId UserId
3 1 1
4 1 2
3 2 2
4 2 1
Pseudo-query
select
LocationId,
sum(case
when UserId = 1 /* and has max vote count*/
then 1 else 0
end) as IsUser1Winner,
sum(case
when UserId = 2 /* and has max vote count*/
then 1 else 0
end) as IsUser2Winner
from LocationVote
group by LocationID
It should return:
LocationId IsUser1Winner IsUser2Winner
1 0 1
2 1 1
I also couldn't find a way to generate dynamic column names here. What would be the simplest way to write this query?

You could also do this using a Case statement
WITH CTE as
(SELECT
MAX(VoteCount) max_votes
, LocationId
FROM LocationResult
group by LocationId
)
SELECT
A.LocationId
, Case When UserId=1
THEN 1
ELSE 0
END IsUser1Winner
, Case when UserId=2
THEn 1
ELSE 0
END IsUser2Winner
from LocationResult A
inner join
CTE B
on A.VoteCount = B.max_votes
and A.LocationId = B.LocationId

Try this:
select *
from table t
cross apply (
select max(votes) max_value
from table ref
where ref.group = t.group
)votes
where votes.max_value = t.votes
but if your table is huge and has no propriate indexes performance may be poor
Another way is to get max values by groups into table variable or temp table and then join it to original table.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL help on group by query with inline query - sql

A left self join might work. Something like this: select con_id, count(*) records from t_readback t1 left join t_readback t2 using (con_id, remark) where remark = 1 and t1.counter < t2.counter group by con_id

Related

SQL: return rows with only the earliest date for each id but only if it satisfies condition

Select the greatest occurence from a column, based on date is frequencies are the same

SQL to calculate cumulative sum that resets based on previous value in a column in Hive

Finding Missing Numbers series when Data Is Grouped in sql server

Checking if the row has the max value in a group

Categories

Resources