Joining tables on ID and closest date - sql

I have two tables, one showing when someone left and one showing when they came back (sometimes when they come back, they may forget to enter that they came back. I am tryint to join the tables so that they look like the desired table from the image.

You can try this.
DECLARE #TableA TABLE(ID INT, Leave DATE)
INSERT INTO #TableA VALUES
(62175, '11/29/2019'),
(62175, '11/11/2019'),
(62175, '3/29/2019'),
(62175, '8/22/2019'),
(68454, '11/29/2019'),
(68454, '12/13/2019')
DECLARE #TableB TABLE(ID INT, [Return] DATE)
INSERT INTO #TableB VALUES
(62175, '4/4/2019'),
(62175, '11/16/2019'),
(62175, '11/30/2019'),
(68454, '11/30/2019'),
(68454, '12/14/2019')
SELECT TA.*, CASE WHEN ROW_NUMBER()OVER(PARTITION BY X.ID, X.[Return] ORDER BY TA.Leave DESC) = 1 THEN X.[Return] ELSE NULL END [Return]
FROM #TableA TA
OUTER APPLY (SELECT TOP 1 * FROM #TableB TB
WHERE TA.ID = TB.ID
AND TB.[Return] > TA.Leave
ORDER BY TB.[Return] ) X
ORDER BY TA.ID, TA.Leave
Result:
ID Leave Return
----------- ---------- ----------
62175 2019-03-29 2019-04-04
62175 2019-08-22 NULL
62175 2019-11-11 2019-11-16
62175 2019-11-29 2019-11-30
68454 2019-11-29 2019-11-30
68454 2019-12-13 2019-12-14

These tables are invalid, they should be in one table with 3 columns. ID, Leave, Return

Very tricky question. I think this does what you want:
with ab as (
select id, leave, null as return
from a
union all
select id, null, return
from b
)
select distinct id, coalesce(leave, prev_leave), coalesce(return, next_return)
from (select ab.*,
(case when leave is null
then lag(leave) over (partition by id order by coalesce(leave, return))
end) as prev_leave,
(case when leave is null
then lead(leave) over (partition by id order by coalesce(leave, return))
end) as next_return
from ab
) ab

Related

Estimated number of rows is way off in execution plan

I have a situation where the estimated number of rows in the execution plan is way off
My columns in the join are varchar(50). I have tried different indexes but it does not reduce this problem. I have even tried with an index on the temp table. What else can I do?
PS this is the first place where the estimated number starts to drift... Also the tables are not big (48000 rows).
The code is:
SELECT DISTINCT householdnumber, householdid, primaryCustomerID
INTO #Households
FROM TableA
SELECT
A.*,
MIN(B.[ProfileCreatedDate]) PROFILECREATEDDATE
INTO #Profile
from #Households AS a
LEFT JOIN TableA AS B
ON A.[HouseholdNumber]= B.[HouseholdNumber] and A.[HouseholdId]=B.[HouseholdId]
GROUP BY a.householdnumber, a.householdid, a.primaryCustomerID;
I know it seems that this can be rewritten as:
SELECT householdnumber, householdid, primaryCustomerID, MIN([ProfileCreatedDate]) AS PROFILECREATEDDATE
INTO #Profile2
from TableA
GROUP BY householdnumber, householdid, primaryCustomerID;
But the results are not identical and I don't want to change the results since I am not sure if the creator of this code knew what they were doing.
Some statistics on the columns:
householdnumber is always equal to householdid. householdid is nvarchar(50) but householdnumber is varchar(40). The table has 48877 rows. Distinct combination of householdnumber, householdid, primaryCustomerID has 48029 rows. And distinct number of primaryCustomerID is 47152.
Regarding the code - it appears that the difference between the larger (original) version and your simpler GROUP BY version is that the original finds the minimum profilecreateddate for anyone in that household, whereas your simpler version finds the profilecreateddate for the specific primarycustomerid.
For example (using simpler data)
CREATE TABLE #TableA (householdnumber int, householdid int, primaryCustomerID int, ProfileCreatedDate datetime);
INSERT INTO #TableA (householdnumber, householdid, primaryCustomerID, ProfileCreatedDate) VALUES
(1, 1, 1, '20201001'),
(1, 1, 1, '20201002'),
(1, 1, 2, '20201003');
SELECT DISTINCT householdnumber, householdid, primaryCustomerID
INTO #Households
FROM #TableA;
SELECT
A.*,
MIN(B.[ProfileCreatedDate]) PROFILECREATEDDATE
INTO #Profile
from #Households AS a
LEFT JOIN #TableA AS B
ON A.[HouseholdNumber]= B.[HouseholdNumber] and A.[HouseholdId]=B.[HouseholdId]
GROUP BY a.householdnumber, a.householdid, a.primaryCustomerID;
SELECT * FROM #Profile;
/* -- Results
householdnumber householdid primaryCustomerID PROFILECREATEDDATE
1 1 1 2020-10-01 00:00:00.000
1 1 2 2020-10-01 00:00:00.000
*/
SELECT householdnumber, householdid, primaryCustomerID, MIN([ProfileCreatedDate]) AS PROFILECREATEDDATE
INTO #Profile2
from #TableA
GROUP BY householdnumber, householdid, primaryCustomerID;
SELECT * FROM #Profile2;
/* -- Results
householdnumber householdid primaryCustomerID PROFILECREATEDDATE
1 1 1 2020-10-01 00:00:00.000
1 1 2 2020-10-03 00:00:00.000
*/
If you notice in the above, the PROFILECREATEDATE for row 2 is different.
You could therefore try the following code that should give the same results as the original set - see how that goes for time (and confirm it matches the original results).
SELECT DISTINCT t1.householdnumber, t1.householdid, primaryCustomerID,
MIN([ProfileCreatedDate]) OVER (PARTITION BY t1.householdnumber, t1.householdid) AS PROFILECREATEDDATE
INTO #Profile3
FROM #TableA t1;
SELECT * FROM #Profile3;
/* -- Results
householdnumber householdid primaryCustomerID PROFILECREATEDDATE
1 1 1 2020-10-01 00:00:00.000
1 1 2 2020-10-01 00:00:00.000
*/

How to do pivoting on this layered data

Hi I have sample data
declare #emp table(id int identity(1,1),E_Name varchar(20),E_company varchar(20),Emp_Val VARCHAR(10))
insert into #emp(E_Name,E_company,Emp_Val)VALUES('Rahim','WELLS','A')
insert into #emp(E_Name,E_company,Emp_Val)VALUES('Jag','collebra',NULL)
insert into #emp(E_Name,E_company,Emp_Val)VALUES('Vasu','nunet',NULL)
insert into #emp(E_Name,E_company,Emp_Val)VALUES('Kiran','crystal',NULL)
insert into #emp(E_Name,E_company,Emp_Val)VALUES('Sajan','tiato',NULL)
insert into #emp(E_Name,E_company,Emp_Val)VALUES('RAM','WELLS','A')
insert into #emp(E_Name,E_company,Emp_Val)VALUES('Srinu','Cognizant','B')
insert into #emp(E_Name,E_company,Emp_Val)VALUES('Raju','Cognizant','B')
Sample data :
id E_Name E_company Emp_Val
1 Rahim WELLS A
2 Jag collebra NULL
3 Vasu nunet NULL
4 Kiran crystal NULL
5 Sajan tiato NULL
6 RAM WELLS A
7 Srinu Cognizant B
8 Raju Cognizant B
script :
SELECT [WELLS],[Cognizant],[NULL] from (
select E_Name,E_company,Emp_Val from #emp)T
PIVOT (MAX(E_Name)FOR E_company IN([WELLS],[Cognizant],[NULL]))PVT
output :
WELLS Cognizant NULL
Rahim Srinu collebra
RAM Raju tiato
NULL Srinu crystal
NULL NULL NUNET
You can use conditional aggregation:
select max(case when e_company = 'WELLS' then e_name end) as wells,
max(case when e_company = 'Cognizant' then e_name end) as cognizant,
max(case when e_company not in ('WELLS', 'Cognizant') then e_name end) as nulls
from (select e.*,
row_number() over (partition by (case when e_company in ('WELLS', 'Cognizant') then e_company end) order by id) as seqnum
from #emp e
) e
group by seqnum
order by seqnum;
Here is a db<>fiddle.
your mistake is in the last select statement, it should be like this:
SELECT *
from (
select * from #emp)T
PIVOT (MAX(Emp_Val)FOR E_company IN([WELLS],[Cognizant],[NULL]))PVT
order by 1
This approach uses a self join within the pivot to enumerate the companies with multiple employes and values. It then uses a right join back on to the table to enumerate the companies that do not have those employees. The difference in the output is that all of the null permutations are preserved. Other than that this should cover what you are looking for.
declare #emp table(id int identity(1,1),E_Name varchar(20),E_company
varchar(20),Emp_Val VARCHAR(10))
insert into #emp(E_Name,E_company,Emp_Val)VALUES('Rahim','WELLS','A')
insert into #emp(E_Name,E_company,Emp_Val)VALUES('Jag','collebra',NULL)
insert into #emp(E_Name,E_company,Emp_Val)VALUES('Vasu','nunet',NULL)
insert into #emp(E_Name,E_company,Emp_Val)VALUES('Kiran','crystal',NULL)
insert into #emp(E_Name,E_company,Emp_Val)VALUES('Sajan','tiato',NULL)
insert into #emp(E_Name,E_company,Emp_Val)VALUES('RAM','WELLS','A')
insert into #emp(E_Name,E_company,Emp_Val)VALUES('Srinu','Cognizant','B')
insert into #emp(E_Name,E_company,Emp_Val)VALUES('Raju','Cognizant','B')
select distinct WELLS, Cognizant,case E_Company when 'Wells' then NULL when
'Cognizant' then null else E_Company end as [NULL] from
(
SELECT [WELLS],[Cognizant],[collebra], [nunet], [crystal], [tiato] from (
select e.E_Name,e2.E_name as E2_Name, e.E_company,e2.Emp_Val as Emp2_Val, e.Emp_Val
from #emp e inner join #emp e2 on e.id=e2.id)T
PIVOT (MAX(E_Name)FOR E_company IN([WELLS],[Cognizant],[collebra], [nunet],
[crystal], [tiato]))PVT) stagingtable
right join (select E_Company, E_Name from #emp) c on stagingtable.Cognizant=c.E_Name
or stagingtable.WELLS=c.E_Name
order by 1 desc, 2 desc, 3 desc;

How to the result set form the below table

Table 1 contains certain set of data's. I need to get the following result set form the Table 1
Table1
Id Desc ParentId
1 Cloths 0
2 Mens 1
3 Womens 1
4 T-Shirt_M 2
5 Casual Shirts_M 2
6 T-Shirt_F 3
7 Education 8
If I pass a parameter as "Casual Shirts_M" I should get the below result set.
Result Set
Id Desc ParentId
1 Cloths 0
2 Mens 1
5 Casual Shirts_M 2
As mentioned in comments, there are plenty of Recursive Common Table Expressions examples for this, here's another one
DECLARE #Desc NVARCHAR(50) = 'Casual Shirts_M'
;WITH cteX
AS
( SELECT
B.Id, B.[DESC], B.ParentId
FROM
Table1 b
WHERE
B.[Desc] = #Desc
UNION ALL
SELECT
E.Id, E.[DESC], E.ParentId
FROM
Table1 E
INNER JOIN
cteX r ON e.Id = r.ParentId
)
SELECT * FROM cteX ORDER BY ID ASC
SQL-Fiddle provided by #WhatsThePoint
The question comes under the concept of Building hierarchy using Recursive CTE:
CREATE TABLE cloths
(
id INT,
descr VARCHAR(100),
parentid INT
);
insert into cloths values (1,'Cloths',0);
insert into cloths values (2,'Mens',1);
insert into cloths values (3,'Womens',1);
insert into cloths values (4,'T-Shirt_M',2);
insert into cloths values (5,'Casual Shirts_M',2);
insert into cloths values (6,'T-Shirt_F',3);
insert into cloths values (7,'Education',8);
DECLARE #variety VARCHAR(100) = 'Casual Shirts_M';
WITH
cte1 (id, descr, parentid)
AS (SELECT *
FROM cloths
WHERE descr = #variety
UNION ALL
SELECT c.id,
c.descr,
c.parentid
FROM cloths c
INNER JOIN cte1 r
ON c.id = r.parentid)
SELECT *
FROM cte1
ORDER BY parentid ASC;

Select specific group of records with GROUP BY and MAX(DATE)

I am trying to enlist the agencies whose latest date of visit is Yes for each agency in the table, for example;
Agency Id Date of Visit Passed
1 8/19/2015 No
1 6/9/2015 Yes
1 2/6/2015 No
2 9/2/2015 No
2 5/11/2015 Yes
2 3/4/2015 Yes
3 9/10/2015 Yes
3 5/11/2015 No
3 3/5/2015 No
4 10/6/2015 Yes
4 5/19/2015 No
4 3/25/2015 Yes
The desired result form the table should only be the following because in their latest date they were marked as Yes
Agency Id Date of Visit Passed
3 9/10/2015 Yes
3 5/11/2015 No
3 3/5/2015 No
4 10/6/2015 Yes
4 5/19/2015 No
4 3/25/2015 Yes
I have tried using
SELECT agencyid, max(dateofvisit), passed
FROM tblAgency
WHERE passed = 'Yes'
GROUP BY agencyid
But this does not seems to work as it bring all those records from the table which are passed as Yes.
Can somebody let me know if that is possible.
Use an analytic function to get last value for "passed" column. Then it is easy:
select *
from(
select
agencyid,
passed,
first_value(passed) over (partition by agencyid order by dateofvisit desc) last_passed_value,
dateofvisit
from tblAgency
)
where last_passed_value = 'Yes';
You can also do it only with group bys and simple max, but you need some joins:
select b.*
from(
select
agencyid,
max(dateofvisit) as max_dateofvisit
from tblAgency
group by agencyid
) lastentry
join tblAgency a on a.agencyid = lastentry.agencyid and a.dateofvisit=lastentry.dateofvisit
join tblAgency b on a.agencyid = b.agencyid
where a.passed = 'Yes'
with below SQL query I got the expected result.Please reply me your feedback.
---create table
DECLARE #tblAgency TABLE
(
AgencyId INT,
dateofvisit DateTime,
Passed Nvarchar(10)
)
---Insert Records
INSERT INTO #tblAgency VALUES(1,'2015/8/19','No')
INSERT INTO #tblAgency VALUES(1,'2015/6/9','Yes')
INSERT INTO #tblAgency VALUES(1,'2015/2/6','No')
INSERT INTO #tblAgency VALUES(2,'2015/9/2','No')
INSERT INTO #tblAgency VALUES(2,'2015/5/11','Yes')
INSERT INTO #tblAgency VALUES(2,'2015/3/4','Yes')
INSERT INTO #tblAgency VALUES(3,'2015/9/10','Yes')
INSERT INTO #tblAgency VALUES(3,'2015/5/11','No')
INSERT INTO #tblAgency VALUES(3,'2015/3/5','No')
INSERT INTO #tblAgency VALUES(4,'2015/10/6','Yes')
INSERT INTO #tblAgency VALUES(4,'2015/5/19','No')
DECLARE #PassedAgency Table
(
AgencyId INT
)
--Select AgencyId with Passed='Yes' with Latest dateofvisit
INSERT INTO #PassedAgency
SELECT TM.AgencyId FROM #TblAgency AS TM
OUTER APPLY(SELECT MAX(TD.dateofvisit) AS MaxDate FROM #TblAgency AS TD WHERE TD.AgencyId=TM.AgencyId) A
WHERE TM.DateofVisit=A.MaxDate AND TM.Passed='Yes'
--select agency details
SELECT * from #TblAgency AS tbl1
LEFT JOIN #PassedAgency AS tbl2 ON tbl1.AgencyId=tbl2.AgencyId
WHERE tbl2.AgencyId IS NOT NULL ORDER BY tbl1.AgencyId,tbl1.dateofvisit desc

Get max column with group by

I have a table for contents on a page. The page is divided into sections.
I want to get the last version for each page-section.
Id (int)
Version (int)
SectionID
Id Version SectionID Content
1 1 1 AAA
2 2 1 BBB
3 1 2 CCC
4 2 2 DDD
5 3 2 EEE
I want to get:
Id Version SectionID Content
2 2 1 BBB
5 3 2 EEE
You could use an exclusive self join:
select last.*
from YourTable last
left join
YourTable new
on new.SectionID = last.SectionID
and new.Version > last.Version
where new.Id is null
The where statement basically says: where there is no newer version of this row.
Slightly more readable, but often slower, is a not exists condition:
select *
from YourTable yt
where not exists
(
select *
from YourTable yt2
where yt2.SectionID = yt.SectionID
and yt2.Version > yt.Version
)
Example table definition:
declare #t table(Id int, [Version] int, [SectionID] int, Content varchar(50))
insert into #t values (1,1,1,'AAA');
insert into #t values (2,2,1,'BBB');
insert into #t values (3,1,2,'CCC');
insert into #t values (4,2,2,'DDD');
insert into #t values (5,3,2,'EEE');
Working solution:
select A.Id, A.[Version], A.SectionID, A.Content
from #t as A
join (
select max(C.[Version]) [Version], C.SectionID
from #t C
group by C.SectionID
) as B on A.[Version] = B.[Version] and A.SectionID = B.SectionID
order by A.SectionID
A simpler and more readeable solution:
select A.Id, A.[Version], A.SectionID, A.Content
from #t as A
where A.[Version] = (
select max(B.[Version])
from #t B
where A.SectionID = B.SectionID
)
I just saw that there was a very similar question for Oracle with an accepted answer based on performance.
Maybe if your table is big, an performance is an issue you can give it a try to see if SQL server also performs better with this:
select Id, Version, SectionID, Content
from (
select Id, Version, SectionID, Content,
max(Version) over (partition by SectionID) max_Version
from #t
) A
where Version = max_Version