Group by in T-SQL for selecting different columns - sql

I have following table ContactDetails. This table contains both cell phone as well as emails. These rows can be updated based on Users latest contact details. So userid (here 1) can have multiple rows grouped by Email and cell as below. There can be multiple users 2.3.4...so on
Rows are as below
SrNo Userid ContactType ContactDetail LoadDate
1 1 Email x1.y#gmail.com 2013-01-01
2 1 Cell 12345678 2013-01-01
3 1 Email x2.y#gmail.com 2012-01-01
4 1 Cell 98765432 2012-01-01
5 1 Email x2.y#gmail.com 2011-01-01
6 1 Cell 987654321 2011-01-01
I am looking for recent Email and Cell details of users. I tried running the query as below
Select
Userid,
Max(ContactDetail),
MAX(LoadDate)
from
ContactDetails
group by
Userid, ContactType;
But I understand that this won't work.
Can anyone give some suggestion to pull the latest email and cell in single or sub-queries?
Cheers!
Junni

You can use ROW_NUMBER() to select the most recent row of interest:
;With Ordered as (
select UserId,ContactType,ContactDetail,LoadDate,
ROW_NUMBER() OVER (
PARTITION BY UserID,ContactType
ORDER BY LoadDate DESC) as rn
from ContactDetails
)
select * from Ordered where rn = 1

Related

How to get a set of records from within each partition based on a condition

From a table like this:
id
status
date
category
1
PENDING
2022-07-01
XYZ
2
DONE
2022-07-04
XYZ
3
PENDING
2022-07-03
DEF
4
DONE
2022-07-08
DEF
I would like to get the most recent records within each category (here 2 and 4). But there are at least two factors that complicate things.
First, there might be more than two records in the same category. (The records come in pairs.)
id
status
date
category
1
PENDING
2022-07-01
XYZ
2
PENDING
2022-07-02
XYZ
3
FAILED
2022-07-04
XYZ
4
FAILED
2022-07-05
XYZ
5
PENDING
2022-07-03
DEF
6
DONE
2022-07-08
DEF
In this case, I'd have to get 3, 4, and 6. Were there six records in the XYZ category, I'd have to get the most recent three.
And, secondly, the date could be the same for the most recent records within a category.
I tried something like this:
WITH temp AS (
SELECT *,
dense_rank() OVER (PARTITION BY category ORDER BY date DESC) rnk
FROM tbl
)
SELECT *
FROM temp
WHERE rnk = 1;
But this fails when there are more than 2 records in a category and I need to get the most recent two.
EDIT:
Eli Johnson has pointed out in a comment that there should be information about which messages are pairs. Of course! I digged around a bit, and after a join or two there is.
id
status
date
category
prev_id
1
PENDING
2022-07-01
XYZ
{}
2
PENDING
2022-07-02
XYZ
{}
3
FAILED
2022-07-04
XYZ
{1}
4
FAILED
2022-07-05
XYZ
{2}
5
PENDING
2022-07-03
DEF
{}
6
DONE
2022-07-08
DEF
{5}
The requirements are more hard-coded here then following proper design.
Based on what has been proposed in the question, I just tweaked it a little bit to get last records.
Assuming that records are always in pair, as mentioned in the question.
WITH temp AS (
SELECT *,
row_number() OVER (PARTITION BY category ORDER BY date1 DESC) rnk,
count(1) over (partition by category) cnt
FROM status
)
SELECT *
FROM temp
WHERE rnk*2 <= cnt;
Refer fiddle here.

ORDER BY date but also GROUP BY userid

I have a table of records I want to sort by earliest date first then by userid.
If the user associated to the date also has other records in that table I want to group those under the earliest date.
Desired output
Id UserId Date
1 2 1/1/2020
2 2 2/1/2020
3 2 3/1/2020
4 1 1/2/2020
5 1 2/2/2020
6 3 1/4/2020
7 4 1/5/2020
In this example UserId 2 has the earliest record in that table, so that record should be first followed by his additional records in date asc order
You seems want :
select t.*
from table t
order by min(date) over (partition by userid), date;
Some database product doesn't support window function with order by, so you can do instead :
select t.*, min(date) over (partition by userid) as mndate
from table t
order by mndate, date;
If I understand what you want...
You could do this (sample with DB2 syntax):
SELECT tab.UserId, tab.Date, tab.*
FROM DB2SIS.TABLE_NAME tab
ORDER BY tab.Date ASC, tab.UserId ASC
This way UserId and Date will appear repeatedly. Instead of 'tab.*' use each field you want to show, then UserId and Date will not repeat.

How to select rows where values changed for an ID

I have a table that looks like the following
id effective_date number_of_int_customers
123 10/01/19 0
123 02/01/20 3
456 10/01/19 6
456 02/01/20 6
789 10/01/19 5
789 02/01/20 4
999 10/01/19 0
999 02/01/20 1
I want to write a query that looks at each ID to see if the salespeople have newly started working internationally between October 1st and February 1st.
The result I am looking for is the following:
id effective_date number_of_int_customers
123 02/01/20 3
999 02/01/20 1
The result would return only the salespeople who originally had 0 international customers and now have at least 1.
I have seen similar posts here that use nested queries to pull records where the first date and last have different values. But I only want to pull records where the original value was 0. Is there a way to do this in one query in SQL?
In your case, a simple aggregation would do -- assuming that 0 is the earliest value:
select id, max(number_of_int_customers)
from t
where effective_date in ('2019-10-01', '2020-02-01')
group by id
having min(number_of_int_customers) = 0;
Obviously, this is not correct if the values can decrease to zero. But this having clause fixes that problem:
having min(case when number_of_int_customers = 0 then effective_date end) = min(effective_date)
An alternative is to use window functions, such asfirst_value():
select distinct id, last_noic
from (select t.*,
first_value(number_of_int_customers) over (partition by id order by effective_date) as first_noic,
first_value(number_of_int_customers) over (partition by id order by effective_date desc) as last_noic,
from t
where effective_date in ('2019-10-01', '2020-02-01')
) t
where first_noic = 0;
Hmmm, on second thought, I like lag() better:
select id, number_of_int_customers
from (select t.*,
lag(number_of_int_customers) over (partition by id order by effective_date) as prev_noic
from t
where effective_date in ('2019-10-01', '2020-02-01')
) t
where prev_noic = 0;

RANK records partitioned by a column in series (Vertica SQL)

I'm trying to use the Vertica rank analytic function to create a rank column partitioned by a column, but only include records that are in a series. For example the query below produces the output below the query
select when_created, status
from tablea
when_created Status
1/1/2015 ACTIVE
3/1/2015 ACTIVE
4/1/2015 INACTIVE
4/6/2015 INACTIVE
6/7/2015 ACTIVE
10/9/2015 INACTIVE
I could modify my query to include a rank column which would produce the following output
select
when_created, status, rank() OVER (PARTITION BY status order by when_created) as rnk
from tablea
when_created Status rnk
1/1/2015 ACTIVE 1
3/1/2015 ACTIVE 2
4/1/2015 INACTIVE 1
4/6/2015 INACTIVE 2
6/7/2015 ACTIVE 3
10/9/2015 INACTIVE 3
However my goal is start over the rank when a series is broken so the desired output is:
when_created Status rnk
1/1/2015 ACTIVE 1
3/1/2015 ACTIVE 2
4/1/2015 INACTIVE 1
4/6/2015 INACTIVE 2
6/7/2015 ACTIVE 1
10/9/2015 INACTIVE 1
Is there a way to accomplish this using the RANK function or is there another way to do it in vertica sql?
Thanks,
Ben
This is a gap-and-islands problem, where the tricky part is to identify the groups to use for a row_number() calculation. One solution uses a difference of row numbers to identify the different groups:
select a.*,
row_number() over (partition by status, seqnum - seqnum_s order by when_created) as rnk
from (select a.*,
row_number() over (order by when_created) as seqnum,
row_number() over (partition by status order by when_created) as seqnum_s
from tablea a
) a;
The logic behind this is tricky when you first see it. I advise you to run the subquery and understand the two row_number() calculations -- and to observe that the difference is constant for the groups you are interested in.

Renumbering rows in SQL Server

I'm kinda new into the SQL Server and I'm having the following question: is there any possibility to renumber the rows in a column?
For ex:
id date name
1 2016-01-02 John
2 2016-01-02 Jack
3 2016-01-02 John
4 2016-01-02 John
5 2016-01-03 Jack
6 2016-01-03 Jack
7 2016-01-04 John
8 2016-01-03 Jack
9 2016-01-02 John
10 2016-01-04 Jack
I would like that all "Johns" to start with id 1 and go on (2, 3, 4 etc) and all "Jacks" have the following number when "John" is done (5, 6, 7 etc). Thanks!
I hope this helps..
declare #t table (id int ,[date] date,name varchar(20))
insert into #t
( id, date, name )
values (1,'2016-01-02','John')
,(2,'2016-01-02','Jack')
,(3,'2016-01-02','John')
,(4,'2016-01-02','John')
,(5,'2016-01-03','Jack')
,(6,'2016-01-03','Jack')
,(7,'2016-01-04','John')
,(8,'2016-01-03','Jack')
,(9,'2016-01-02','John')
,(10,'2016-01-04','Jack')
select
row_number() over(order by name,[date]) as ID,
date ,
name
from
#t
order by name
The id should just be an internal identifier you use for joins etc - I wouldn't change it. But you could query such a numbering using a window function:
SELECT ROW_NUMBER() OVER (ORDER BY CASE name WHEN 'John' THE 1 ELSE 2 END) AS rn,
date,
name
FROM mytable
Instead of renumbering the id column, you can use ROW_NUMBER window function to renumber the rows as per your requirement. for e.g.:
SELECT ROW_NUMBER() OVER(PARTITION BY name ORDER BY date) as rowid,date,name
FROM tablename