Selecting rows based on priority of a column - sql

refers below as my source table:
Entry ID
Name
Colour
1
John
Red
2
John
null
3
Steve
null
4
Steve
null
I would like to select the entire row based on
Distinct value of the "name" column
If "colour" column is not null, I will select that row. Otherwise, I will accept the null value row.
Lastly, I will select based on the entry ID which is smaller (the smaller the ID, the earlier the entry ID)
The expected result should be:
Entry ID
Name
Colour
1
John
Red
3
Steve
null
I am new to SQL and wondering there's any way to achieve the expected result?
Many thanks and much appreciated.
(pardon my bad English/grammar)

You can use row_number(). This should work in almost any database:
select t.*
from (select t.*,
row_number() over (partition by name order by colour desc, id) as seqnum
from t
) t
where seqnum = 1;
A descending order by puts NULL values last, so non-NULL rows are chosen first.
An alternative method is:
select t.*
from t
where t.colour is not null or
t.id = (select (case when count(colour) = 0 then min(t2.id) end)
from t
)

Related

Select specific data from data only if certain fields can be grouped by

I have the following data:
ID Date Num ClientID Dest
--------------------------------------------------------
123 04/29/2021 -2222 H1234 -1
123 04/29/2021 1 H1234 3
345 04/29/2021 -2222 H3456 -1
345 04/29/2021 1 H8888 .1
BTW: this does not include all the fields, just what I'm currently using for my query.
For every ID in the above table I'll always have 2 records. There are 2 scenarios that can take place:
As for ID = 123, the ClientID is the same
As for ID = 345, the ClientID is different
I'm trying to return the following data:
ID Date Num ClientID Dest
123 04/29/2021 1 H1234 3
The reason I'm returning only this row is because:
I only want 1 row per ID, where the ClientID is the same for both rows
Only need the record that does not have -2222, where the CLientID is the same for both rows
If the ClientID is different for the same ID (ex: 345), then completely skip these records.
Now the numbers for DEST can vary, so we can't always rely that one will be -1 and the other will be positive, however the NUM field will always have -2222 and 1 for the 2nd row (which is the row that I'd want to be returned)
I'm not sure how best to do this, I guess I thought about the alternative of just creating a CTE, and then counting the ClientID and if Count = 2 then select the data. The problem I find is with DEST field, I know that I can do Max(NUM) but since DEST field can very I wouldn't know how to select it.
Here is what I tried:
WITH Ranking AS (
SELECT Rank() OVER (PARTITION BY c.ID,c.date1,c.ClientID ORDER BY num
asc)x, c.*
FROM cte c
)
SELECT * FROM Ranking WHERE x = 2
I'm not if this is a good apprach, I guess it does the job but any thoughts?
One solution is to use the ever-useful window functions with row_number to select which pair to use and lead to check if both ClientIds are the same. This would work regardless if the specific values you have should change, plus is more performant than hitting the table twice:
select id, date, num, clientid, Dest from (
select *,
Row_Number() over(partition by id order by num) rn,
case when Lead(clientid) over(partition by id order by num)=clientid then 1 else 0 end same
from t
)t
where rn=1 and same=1
Just select all rows where Num isn't -2222 and the ID is in a subquery grouped by id and having only one distinct client id.
SELECT *
FROM tbl
WHERE ID IN (SELECT ID FROM tbl GROUP BY ID HAVING COUNT(DISTINCT ClientId) = 1)
AND Num != -2222

Updating Data having same id but different Data in Row

I have a record with same ID but different data in both rows
While updating the final result should be the last record of that ID present in data.
Example
ID | Name | PermanentAddrss | CurrentLocation
1 | R1 | INDIA | USA
1 | R1 | INDIA | UK
Now for ID 1 the record which will be loaded in database
1|R1|INDIA|UK
How this can be done in SQL server for multiple records?
Please understand that SQL server does not store or fetch data in order of data insertion, so to find the latest/last record you should have some way to order the records.
This is typically a timestamp column like last_modified_date. Your current table is prime candidate for a slow changing dimension type 2; and you should consider implementing it.
See explanation on Kimball's group site.
If you are really not affected by any order and just need a row for each id you can try below query.
select
ID,
Name,
PermanentAddress,
CurrentLocation
from
(select
*,
row_number() over(partition by id order by (select null)) r
from yourtable)t
where r=1
You can identify the latest ID value by:
SELECT B.ID, A.NAME, A.PERMANENTADDRS, A.CURRENTLOCATION
FROM
(SELECT ID, NAME, PERMANENTADDRS, CURRENTLOCATION, MAX(RNUM) AS LATEST_ID FROM
(SELECT ID, NAME, PERMANENTADDRS, CURRENTLOCATION, ROW_NUMBER() OVER (PARTITION BY ID) AS RNUM FROM YOUR_TABLE)
GROUP BY ID, NAME, PERMANENTADDRS, CURRENTLOCATION) A
INNER JOIN
YOUR_TABLE B
ON A.LATEST_ID = B.ID;
This will take the last populated record for a given ID value. If the logic for latest record is different, it can be appropriately incorporated in the query.

displaying distinct column value with corresponding column random 3 values

I have a table employee with columns (state_cd,emp_id,emp_name,ai_cd) how can i display disticnt state_cd with 3 different values from ai_cd
the answer should be
state_cd ai_cd
------- --------
TX 1
2
5
CA 9
10
11
This type of operation is normally better done in the application. But, you can do it in the query, if you really want to:
select (case when row_number() over (partition by state_cd order by ai_cd) = 1
then state_cd
end) as state_cd,
ai_cd
from employee e
order by e.state_cd, e.ai_cd;
The order by is very important, because SQL result sets are unordered. Your result requires ordering in order to make sense.
Just group by state_id and then count using count(Distinct column_name_)
select state_id from (select state_id,COUNT(DISTINCT ai_cd) as cnt from employee group by state_id) where cnt==3

SQL Separating Distinct Values using single column

Does anyone happen to know a way of basically taking the 'Distinct' command but only using it on a single column. For lack of example, something similar to this:
Select (Distinct ID), Name, Term from Table
So it would get rid of row with duplicate ID's but still use the other column information. I would use distinct on the full query but the rows are all different due to certain columns data set. And I would need to output only the top most term between the two duplicates:
ID Name Term
1 Suzy A
1 Suzy B
2 John A
2 John B
3 Pete A
4 Carl A
5 Sally B
Any suggestions would be helpful.
select t.Id, t.Name, t.Term
from (select distinct ID from Table order by id, term) t
You can use row number for this
Select ID, Name, Term from(
Select ID, Name, Term, ROW_NUMBER ( )
OVER ( PARTITION BY ID order by Name) as rn from Table
Where rn = 1)
as tbl
Order by determines the order from which the first row will be picked.

SQL Query to obtain the maximum value for each unique value in another column

ID Sum Name
a 10 Joe
a 8 Mary
b 21 Kate
b 110 Casey
b 67 Pierce
What would you recommend as the best way to
obtain for each ID the name that corresponds to the largest sum (grouping by ID).
What I tried so far:
select ID, SUM(Sum) s, Name
from Table1
group by ID, Name
Order by SUM(Sum) DESC;
this will arrange the records into groups that have the highest sum first. Then I have to somehow flag those records and keep only those. Any tips or pointers? Thanks a lot
In the end I'd like to obtain:
a 10 Joe
b 110 Casey
You want the row_number() function:
select id, [sum], name
from (select t.*]
row_number() over (partition by id order by [sum] desc) as seqnum
from table1
) t
where seqnum = 1;
Your question is more confusing than it needs to be because you have a column called sum. You should avoid using SQL reserved words for identifiers.
The row_number() function assigns a sequential number to a group of rows, starting with 1. The group is defined by the partition by clause. In this case, all rows with the same id are in the same group. The ordering of the numbers is determined by the order by clause, so the one with the largest value of sum gets the value of 1.
If you might have duplicate maximum values and you want all of them, use the related function rank() or dense_rank().
select *
from
(
select *
,rn = row_number() over (partition by Id order by sum desc)
from table
)x
where x.rn=1
demo