Select SQL logic - sql

Folks at a loss here!!!
First, this is what I am trying to achieve:
Select all the records from table CUSTOMER_ORDER_DETAILS table shown below and if multiple entries for the same CUSTOMER_NO exist then:
- select the entry with PAID = 1
- if there are multiple PAID = 1 entries, then select the record with TYPE = Y
Expected Result:
877, CU115, lit, 0, 1, X
878, CU111, Toi, 1, 1, Y
879, CU117, Fla, 1, 1, X
My approach was to get the count(CUSTOMER_NO) > 1 using GROUP BY on CUSTOMER_NO, but as soon as I am adding the remaining columns of the table to the Select statement, the count column is showing a value of 1.
Any pointers to tackle this or implement if-else kind of logic?

This is a prioritization query. Here is one method to do what you want:
select t.*
from (select t.*,
row_number() over (partition by customer_no
order by paid desc, type desc
) as seqnum
from t
) t
where seqnum = 1;
This assumes that paid takes on the values 0 and 1, and that type has the values X and Y.

You can prioritize these conditions with an order by condition in row_number function.
select * from (
select t.*,
row_number() over(partition by customer_no
order by case when paid=1 and type='Y' then 1
when paid=1 then 2
else 3 end) as rnum
from customer_orders t
) t
where rnum=1
This assumes there can only be one row with type='Y' per customer_no if there exist multiple rows with paid=1 for that same customer_no.
If there exist multiple rows with paid =1 and all of them have a type <> 'Y' then a row is arbitrarily picked amongst them.

Related

BigQuery Standard SQL - Cumulative Count of (almost) Duplicated Rows

With the following data:
id
field
eventTime
1
A
1
1
A
2
1
B
3
1
A
4
1
B
5
1
B
6
1
B
7
For visualisation purposes, I would like to turn it into the below. Consecutive occurrences of the same field value essentially get aggregated into one.
id
field
eventTime
1
Ax2
1
1
B
3
1
A
4
1
Bx3
5
I will then use STRING_AGG() to turn it into "Ax2 > B > A > Bx3".
I've tried using ROW_NUMBER() to count the repeated instances, with the plan being to utilise the highest row number to modify the string in field, but if I partition on eventTime, there are no consecutive "duplicates", and if I don't partition on it then all rows with the same field value are counted - not just consecutive ones.
I though about bringing in the previous field with LAG() for a comparison to reset the row count, but that only works for transitions from one field value to the other and is a problem if the same field is repeated consecutively.
I'm been struggling with this to the point where I'm considering writing a script that just CASE WHENs up to a reasonable number of consecutive hits, but I've seen it get as high as 17 on a given day and really don't want to be doing that!
My other alternative will just be to enforce a maximum number of field values to help control this, but now I've started this problem I'd quite like to solve it without that, if at all possible.
Thanks!
Consider below
select id,
any_value(field) || if(count(1) = 1, '', 'x' || count(1)) field,
min(eventTime) eventTime
from (
select id, field, eventTime,
countif(ifnull(flag, true)) over(partition by id order by eventTime) grp
from (
select id, field, eventTime,
field != lag(field) over(partition by id order by eventTime) flag
from `project.dataset.table`
)
)
group by id, grp
# order by eventTime
If applied to sample data in your question - output is
Just use lag() to detect when the value of field changes. You can now do that with qualify:
select t.*
from t
where 1=1
qualify lag(field, 1, '') over (partition by id order by eventtime) <> field;
For your final step, you can use a subquery:
select id, string_agg(field, '->' order by eventtime)
from (select t.*
from t
where 1=1
qualify lag(field, 1, '') over (partition by id order by eventtime) <> field
) t
group by id;

Need to identify multiple records from the existing SELECT query output and remove all duplicates except one with value <> 0 in column K

There is a SELECT statement, Output of that could have duplicates and in those duplicates only one row for column k can have have value and other will have 0 value. Need to remove the duplicates have 0 values except one with value
If you want to filter out 0 values, then you can use:
select t.*
from t
where value = 0;
If you want to keep exactly one row per k, with preference to non-zero values, you can use:
select t.*
from (select t.*,
row_number() over (partition by k order by value desc) as seqnum
from t
) t
where seqnum = 1;
Note: This assumes that value is never negative. If it can be, then use order by abs(value) desc.
We can have ROW_NUMBER() and consider ColumnK value of 0, as rank =1 and filter them out. Assuming there are no negative numbers in ColumnK.
SELECT ColumnA, ColumnB,ColumnC.....
FROM
(SELECT *, row_number() over(partition by columnK ORDER BY columnk asc) as rnk
from tablename
) as t
WHERE rnk > 1 -- 0 will be having rank = 1

How to select distinct records with preference depending on a value

i have this table :
Table 1
notice that some ids have double records with IsTop = 1
if i have this kind of scenario im interested in selecting the one that has IsTop = 1 and if i dont im interested in keeping the one that IsTop = 0.
The goal is to have distinct Id's but take IsTop = 1 is it exists.
How do i do that?
You can use row_number():
select t.*
from (select t.*,
row_number() over (partition by id order by isTop desc) as seqnum
from t
) t
where seqnum = 1;

Select the unique records from duplicates record using group by

I have a table with duplicate records say example multiple records with same account number. like this
Now I want to select only those records id which satisfies below condition priority wise:
Select the account number for which the prim_cust is X
If X is null than select account number which is having dept_id not null.
Both null than we should select the min(id).
Here we will have to group the account number and perform the above conditions.
I just want single record with unique account number with above conditions satisfied.
The condition should follow the priority
I think you have a prioritization query, where you want one row per acct_nbr subject to your various rules.
For this type of problem, row_number() is quite handy:
select t.*
from (select t.*,
row_number() over (partition by acct_nbr
order by (case when prim_cust = 'X' then 1 else 2 end),
(case when dept_id is not null then 1 else 2 end),
id
) as seqnum
from t
) t
where seqnum = 1;

SQL Ranking N records by one criteria and N records by another and repeat

In my table I have 4 columns Id, Type InitialRanking & FinalRanking. Based on certain criteria I’ve managed to apply InitialRanking to the records (1-20). I now need to apply FinalRanking by identifying the top 7 of Type 1 followed by the
top 3 of Type 2. Then I need to repeat the above until all records have a FinalRanking. My goal would be to achieve the output in the final column of the attached image.
The 7 & 3 will vary over time but for the purposes of this example let’s say they are fixed.
you can try like this
SELECT * FROM(
( SELECT ID,DISTINCT TYPE,
CASE WHEN TYPE=1 THEN
( SELECT TOP 7 INITIALRANK, FINALRANK
from table where type=1)
ELSE
( SELECT TOP 3 INITIALRANK, FINALRANK
from table where type=2)
END CASE
FROM TABLE WHERE TYPE IN (1,2)
)
UNION
( SELECT ID,TYPE,
INITIALRANK, FINALRANK
from table where type not in (1,2))
)
)
A simple (or simplistic) approach to your Final Rank would be the following:
row_number() over (partition by type order by initrank) +
case type
when 1 then (ceil((row_number() over (partition by type order by initrank))/7)-1)*(10-7)
when 2 then (ceil((row_number() over (partition by type order by initrank))/3)-1)*(10-3)+7
end FinalRank
This can be generalized for more than 2 groups for example with three groups of size 7, 3 and 2, the pattern size is 7+3+2=12 the general form is PartitionedRowNum+(Ceil(PartitionedRowNum/GroupSize)-1)*(PaternSize-GroupSize)+Offset where the offset is the sum of the preceding group sizes:
row_number() over (partition by type order by initrank) +
case type
when 1 then (ceil((row_number() over (partition by type order by initrank))/7)-1)*(12-7)
when 2 then (ceil((row_number() over (partition by type order by initrank))/3)-1)*(12-3)+7
when 3 then (ceil((row_number() over (partition by type order by initrank))/2)-1)*(12-2)+7+3
end FinalRank