Selecting minimal dates, or nulls in SQL - sql

This is grossly oversimplified, but:
I have a table, something like the following:
CREATE TABLE Table1
([ID] int, [USER] varchar(5), [DATE] date)
;
INSERT INTO Table1
([ID], [USER], [DATE])
VALUES
(1, 'A', '2018-10-01'),
(2, 'A', '2018-09-01'),
(3, 'A', NULL),
(4, 'B', '2018-05-03'),
(5, 'B', '2017-04-01'),
(6, 'C', NULL)
;
And for each user, I wish to retrieve the whole row of details where the DATE variable is minimal.
SELECT T.USER FROM TABLE1 T
WHERE T.DATE = (SELECT MIN(DATE) FROM TABLE1 T1 WHERE T1.USER = T.USER)
Works great, however in the instance there is no row with a populated DATE field, there will be a row with a NULL, like the final row of my table above, which I also wish to select.
So my ideal output in this case is:
(2, 'A', '2018-09-01'),
(5, 'B', '2017-04-01'),
(6, 'C', NULL)
SQL fiddle: http://www.sqlfiddle.com/#!9/df42b5/6
I think something could be done using an EXCLUDE statement but it gets complex very quickly.

You may try with row_number()
demo
select * from
(select *, row_number() over(partition by [user] order by [user],case when
[date] is null then 0 else 1 end desc,[date]) as rn
from Table1)x where rn=1

use union and and co-related sub-query with min() function
CREATE TABLE Table1 (ID int, usr varchar(50), DATE1 date)
;
INSERT INTO Table1 VALUES
(1, 'A', '2018-10-01'),
(2, 'A', '2018-09-01'),
(3, 'A', NULL),
(4, 'B', '2018-05-03'),
(5, 'B', '2017-04-01'),
(6, 'C', NULL)
;
select * from Table1 t where
DATE1= (select min(date1) from Table1 t1 where t1.usr=t.usr
) and date1 is not null
union
select * from Table1 t where date1 is null
and t.usr not in ( select usr from Table1 where date1 is not null)
DEMO
ID usr DATE1
2 A 01/09/2018 00:00:00
5 B 01/04/2017 00:00:00
6 C

You can use GROUP BY and JOIN to output the desired results.
select t.Id
, x.[User]
, x.[MinDate] as [Date]
from
(select [User]
, min([Date]) as MinDate
from table1
group by [User]) x
inner join table1 t on t.[User] = x.[User] and (t.[Date] = x.[MinDate] or x.[MinDate] is null)

You can use a Common Table Expression:
;WITH chronology AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY [USER]
ORDER BY ISNULL([DATE], '2900-01-01') ASC
) Idx
FROM TABLE1
)
SELECT ID, [USER], [DATE]
FROM chronology
WHERE Idx=1;
Using a CTE in this solution simplifies the query improving its readability, maintainability and extensibility. Furthermore, I expect this approach to be optimal in terms of performance.

Related

SQL Where In clause with multiple fields

I have a table as below.
id date value
1 2011-10-01 xx
1 2011-10-02 xx
...
1000000 2011-10-01 xx
Then I have 1000 ids each associates with a date. I would like to perform something as below:
SELECT id, date, value
FROM the table
WHERE (id, date) IN ((id1, <= date1), (id2, <= date2), (id1000, <= date1000))
What's the best way to achieve above query?
You didn't specify your DBMS, so this is standard SQL.
You could do something like this:
with list_of_dates (id, dt) as (
values
(1, date '2016-01-01'),
(2, date '2016-01-02'),
(3, date '2016-01-03')
)
select
from the_table t
join list_of_dates ld on t.id = ld.id and t.the_date <= ld.dt;
This assumes that you do not have duplicates in the list of dates.
Update - now that the DBMS has been disclosed.
For SQL Server you need to change that to:
with list_of_dates (id, dt) as (
values
select 1, cast('20160101' as datetime) union all
select 2, cast('20160102' as datetime) union all
select 3, cast('20160103' as datetime)
)
select
from the_table t
join list_of_dates ld on t.id = ld.id and t.the_date <= ld.dt;
since this is info known ahead of time build a temp table of this info and then join to it
create table #test(id int, myDate date)
insert into #test(id,myDate) values
(1, '10/1/2016'),
(2, '10/2/2016'),
(3, '10/3/2016')
select a.id, a.date, a.value
from table as a
inner join
#test as b on a.id=b.id and a.date<=b.myDate

SQL: get all pairs and triples from single column and count their frequences over another column

A simple table of user_id, item_id (both text data) on input.
The question is: what is the way to extract all pairs and triples combinations from item_id column and count their frequences over user_id (i.e. 1% percent of all users have (1, 2) item_id pair)
I've tried some barbarism:
select FirstID, SecondID, count(user_id)
from
(
SELECT
t1.item_id as FirstID,
t2.item_id as SecondID
FROM
(
SELECT item_id, ROW_NUMBER()OVER(ORDER BY item_id) as Inc
FROM t1
) t1
LEFT JOIN
(
SELECT item_id, ROW_NUMBER()OVER(ORDER BY item_id)-1 as Inc
FROM t1
) t2 ON t2.Inc = t1.Inc
) t3 join upg_log on t3.FirstID = upg_log.item_id and t3.SecondID = upg_log.item_id
group by FirstID, SecondID
but got nothing
This particular task belongs to the type which is easier to write than to execute:
declare #t table (
UserId int not null,
ItemId int not null
);
insert into #t
values
(1, 1),
(1, 2),
(1, 3),
(2, 1),
(2, 2),
(3, 2),
(3, 3),
(4, 1),
(4, 4),
(5, 4);
-- Pairs
select t1.ItemId as [Item1], t2.ItemId as [Item2], count(*) as [UserCount]
from #t t1
inner join #t t2 on t1.UserId = t2.UserId and t1.ItemId < t2.ItemId
group by t1.ItemId, t2.ItemId
order by UserCount desc, t1.ItemId, t2.ItemId;
As you can see, there is a semi-Cartesian (triangular) join here, which means that performance will drop quickly with the number of records growing. And, of course, proper indices will be crucial for this kind of query.
In theory, you can easily extend this approach to identify triples, but it might prove to be unfeasible on your actual data. Ideally, such things should be calculated using per-row approach, and results cached.

How to get latest records based on Batch ID

declare #tab table
(
BatchID INT,
Code VARCHAR(20),
CommType INT,
LastStatus VARCHAR(5),
SourceModiifedLastDate varchar(30)
)
INSERT INTO #tab(BatchID, Code, CommType, LastStatus, SourceModiifedLastDate)
VALUES (1, 'A003-3', 3, 'I', '2013-06-17 21:28:01.827'),
(2, 'A004-1', 1, 'I', '2014-06-17 21:28:01.827'),
(6, 'A003-3', 3, 'U', '2015-06-17 21:28:01.827'),
(9, 'A003-3', 3, 'D', '2015-06-17 21:28:01.827'),
(11, 'A004-1', 3, 'D', '2013-06-17 21:28:01.827'),
(12, 'A004-1', 1, 'I', '2015-06-17 21:28:01.827'),
(16, 'A005-3', 3, 'I', '2011-06-17 21:28:01.827'),
(19, 'A005-3', 3, 'D', '2013-0617 21:28:01.827'),
(20, 'A006-3', 3, 'U', '2011-06-17 21:28:01.827'),
(21, 'A006-3', 3, 'I', '2013-0617 21:28:01.827')
Select * from #tab
Here in my sample data I need to get only Laststatus = 'D' records based on latest BatchID.
For example if you see Code = 'A003-3' it got inserted, updated and deleted I need to get this record
If you see code = 'A004-1' it got inserted, deleted and inserted I don't need this record.
Output should be :
BatchID Code CommType LastStatus SourceModiifedLastDate
---------------------------------------------------------------
9 A003-3 3 D 2015-06-17 21:28:01.827
19 A005-3 3 D 2013-06-17 21:28:01.827
I need to get only latest deleted records based on latest BatchID and latest date.
I have tried using MAX condition and GROUP BY to filter records but I'm unable to get what I'm looking for.
Please help me find a solution
select tt.*
from (select t.*
, row_number() over (partition by Code order by BatchId desc) as rn
from #tab t
) tt
where tt.rn = 1
and tt.LastStatus = 'D';
Here is an option using a CTE and filtering for only those BatchIds that have the last action as a delete
;
WITH CTE1 AS (
SELECT *, RANK() OVER (partition by Code ORDER BY BatchID DESC ) [CodeRank]
FROM #tab)
SELECT *
FROM CTE1
WHERE CodeRank = 1 and LastStatus = 'D'
SELECT tab1.BatchId, tab1.Code, tab1.CommType, tab1.LastStatus, tab1.SourceModiifedLastDate
FROM #tab tab1
INNER JOIN (
SELECT Code, MAX(SourceModiifedLastDate) MaxSourceModiifedLastDate
FROM #tab
GROUP BY Code) tab2
ON tab2.Code = tab1.Code
AND tab2.MaxSourceModiifedLastDate = tab1.SourceModiifedLastDate
WHERE tab1.LastStatus = 'D'
A typical way of doing tis uses row_number():
select t.*
from (select t.*,
row_number() over (partition by Code order by BatchId desc) as seqnum
from #tab t
) t
where seqnum = 1;
The old-fashioned way of doing it (before window functions) might look like this:
select t.*
from #tab t
where t.BatchId = (select max(t2.BatchId)
from #tab t2
where t2.Code = t.Code
);

How can I do a distinct sum?

I am trying to create a "score" statistic which is derived from the value of a certain column, calculated as the sum of a case expression. Unfortunately, the query structure needs to be a full outer join (this is simplified from the actual query, and the join structure survives from the original code), and thus the sum is incorrect, since each row may occur many times. I could group by the unique key; however, that breaks other aggregate functions that are in the same query.
What I really want to do is sum (case when ... distinct claim_id) which of course does not exist; is there an approach that will do what I need? Or does this have to be two queries?
This is on redshift, in case it matters.
create table t1 (id int, proc_date date, claim_id int, proc_code char(1));
create table t2 (id int, diag_date date, claim_id int);
insert into t1 (id, proc_date, claim_id, proc_code)
values (1, '2012-01-01', 0, 'a'),
(2, '2009-02-01', 1, 'b'),
(2, '2019-02-01', 2, 'c'),
(2, '2029-02-01', 3, 'd'),
(3, '2016-04-02', 4, 'e'),
(4, '2005-01-03', 5, 'f'),
(5, '2008-02-03', 6, 'g');
insert into t2 (id, diag_date, claim_id)
values (4, '2004-01-01', 20),
(5, '2010-02-01', 21),
(6, '2007-04-02', 22),
(5, '2011-02-01', 23),
(6, '2008-04-02', 24),
(5, '2012-02-01', 25),
(6, '2009-04-02', 26),
(7, '2002-01-03', 27),
(8, '2001-02-03', 28);
select id, sum(case when proc_code='a' then 5
when proc_code='b' then 10
when proc_code='c' then 15
when proc_code='d' then 20
when proc_code='e' then 25
when proc_code='f' then 30
when proc_code='g' then 35 end), count(distinct t1.claim_id) as proc_count, min(proc_date) as min_proc_date
from t1 full outer join t2 using (id) group by id order by id;
You can separate out your conditional aggregates into a cte or subquery and use OVER(PARTITION BY id) to get an id level aggregate without grouping, something like this:
with cte AS (SELECT *,sum(case when proc_code='a' then 5
when proc_code='b' then 10
when proc_code='c' then 15
when proc_code='d' then 20
when proc_code='e' then 25
when proc_code='f' then 30
when proc_code='g' then 35 end) OVER(PARTITION BY id) AS Some_Sum
, min(proc_date) OVER(PARTITION BY id) as min_proc_date
FROM t1
)
select id
, Some_Sum
, count(distinct cte.claim_id) as proc_count
, min_proc_date
from cte
full outer join t2 using (id)
group by id,Some_Sum,min_proc_Date
order by id;
Demo: SQL Fiddle
Note that you'll have to add these aggregates to the GROUP BY in the outer query, and the fields in your PARTITION BY should match the t1 fields you previously used in GROUP BY, in this case just id, but if your full query had other t1 fields in the GROUP BY be sure to add them to the PARTITION BY
You can use a subquery (by id and id_claim) and then regroup:
with base as (
select id, avg(case when proc_code='a' then 5
when proc_code='b' then 10
when proc_code='c' then 15
when proc_code='d' then 20
when proc_code='e' then 25
when proc_code='f' then 30
when proc_code='g' then 35 end) as value_proc,
t1.claim_id , min(proc_date) as min_proc_date
from t1 full outer join t2 using (id) group by id, t1.claim_id order by id, t1.claim_id)
select id, sum(value_proc), count(distinct claim_id) as proc_count, min(min_proc_date) as min_proc_date
from base
group by id
order by id;
See that I sugest avg for the internal subquery, but if you are sure that the same claim_id have the same letter you can use max or min and that was integer. If not is prefer this.

how to use SQL group to filter rows with maximum date value

I have the following table
CREATE TABLE Test
(`Id` int, `value` varchar(20), `adate` varchar(20))
;
INSERT INTO Test
(`Id`, `value`, `adate`)
VALUES
(1, 100, '2014-01-01'),
(1, 200, '2014-01-02'),
(1, 300, '2014-01-03'),
(2, 200, '2014-01-01'),
(2, 400, '2014-01-02'),
(2, 30 , '2014-01-04'),
(3, 800, '2014-01-01'),
(3, 300, '2014-01-02'),
(3, 60 , '2014-01-04')
;
I want to achieve the result which selects only Id having max value of date. ie
Id ,value ,adate
1, 300,'2014-01-03'
2, 30 ,'2014-01-04'
3, 60 ,'2014-01-04'
how can I achieve this using group by? I have done as follows but it is not working.
Select Id,value,adate
from Test
group by Id,value,adate
having adate = MAX(adate)
Can someone help with the query?
Select the maximum dates for each id.
select id, max(adate) max_date
from test
group by id
Join on that to get the rest of the columns.
select t1.*
from test t1
inner join (select id, max(adate) max_date
from test
group by id) t2
on t1.id = t2.id and t1.adate = t2.max_date;
Please try:
select
*
from
tbl a
where
a.adate=(select MAX(adate) from tbl b where b.Id=a.Id)
If you are using a DBMS that has analytical functions you can use ROW_NUMBER:
SELECT Id, Value, ADate
FROM ( SELECT ID,
Value,
ADate,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Adate DESC) AS RowNum
FROM Test
) AS T
WHERE RowNum = 1;
Otherwise you will need to use a join to the aggregated max date by Id to filter the results from Test to only those where the date matches the maximum date for that Id
SELECT Test.Id, Test.Value, Test.ADate
FROM Test
INNER JOIN
( SELECT ID, MAX(ADate) AS ADate
FROM Test
GROUP BY ID
) AS MaxT
ON MaxT.ID = Test.ID
AND MaxT.ADate = Test.ADate;
I would try something like this
Select t1.Id, t1.value, t1.adate
from Test as t1
where t1.adate = (select max(t2.adate)
from Test as t2
where t2.id = t1.id)