Join two tables side by side - sql

I have these two tables that I need to join side by side
Table A
id
date
1
03/01/2021
1
04/01/2021
1
05/01/2021
2
04/01/2021
2
05/01/2021
3
03/01/2021
3
04/01/2021
Table B
id
date
1
03/01/2021
1
04/01/2021
1
05/01/2021
1
06/01/2021
2
04/02/2021
2
05/02/2021
3
03/01/2021
The output would be
id
dateA
dateB
1
03/01/2021
03/01/2021
1
04/01/2021
04/01/2021
1
05/01/2021
05/01/2021
1
06/01/2021
2
04/01/2021
04/02/2021
2
05/01/2021
05/02/2021
3
03/01/2021
03/01/2021
3
04/01/2021
Basically, search all records that match a value, (for example 1, then list them side by side)
I tried joining them using id as key but it spawned a multitude of other rows that I don't want. Tried grouping as well but it messes with the order
I'm using sqlite via pandas
The query below causes some extra rows to be returned, which I can't figure out how to filter out
SELECT
A.id, A.date, B.date
FROM
A
JOIN
B ON B.id = A.id
Adding a group by causes the table to output only the first records of each multiple

Use a CTE where you rank all the rows of both tables by id and order of the dates and then aggregate:
WITH cte AS (
SELECT id, date dateA, null dateB, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) rn
FROM TableA
UNION ALL
SELECT id, null, date, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) rn
FROM TableB
)
SELECT id, MAX(dateA) dateA, MAX(dateB) dateB
FROM cte
GROUP BY id, rn
ORDER BY id, rn;
See the demo.
Note that your dates as they are in the format dd/mm/yyyy, they are not comparable.
You should change them to yyyy-mm-dd for the code to work properly.

Related

How do I group aggregated data a certain way

I have the following sample transactional item receipt data, consisting of Item, Vendor and Receipt Date:
Item
Vendor
Receipt_Date
A
1
2021-01-01 00:00:00.000
A
2
2021-01-31 00:00:00.000
B
1
2021-02-01 00:00:00.000
B
2
2021-02-10 00:00:00.000
B
3
2021-02-20 00:00:00.000
C
7
2021-03-01 00:00:00.000
I want to select the Vendor for each Item, based on the last (max) Receipt Date, so the expected result for the above sample would be:
Item
Last_Vendor_For_Receipt
A
2
B
3
C
7
I can group the data per Item and Vendor, but I cannot figure out how to achieve the above expected result with an outer query. I'm using SQL Server 2012. Here's the initial query:
select
ir.Item
,ir.Vendor
,max(ir.Receipt_Date) Last_Receipt_Date
from
ItemReceipt ir
I checked online and in the forum, but it was hard to search for my specific question.
Thanks
Here is one approach using TOP with ROW_NUMBER:
SELECT TOP 1 WITH TIES *
FROM yourTable
ORDER BY ROW_NUMBER() OVER (PARTITION BY Item ORDER BY Receipt_Date DESC);
First you select the desired max date per item:
select max(Receipt_Date) as max_rcpt_date
, Item
from your_unknown_table
group by Item
And then you can use this as a subquery to get the vendor:
select Item
, Vendor
from your_unknown_table
where ( Receipt_Date, Item ) in
( select max(Receipt_Date) as max_rcpt_date
, Item
from your_unknown_table
group by Item
)
This will work in Oracle. I'm not sure if this subquery-structure in SQL-Server wil work.

Select earliest date and count rows in table with duplicate IDs

I have a table called table1:
id created_date
1001 2020-06-01
1001 2020-01-01
1001 2020-07-01
1002 2020-02-01
1002 2020-04-01
1003 2020-09-01
I'm trying to write a query that provides me a list of distinct IDs with the earliest created_date they have, along with the count of rows each id has:
id created_date count
1001 2020-01-01 3
1002 2020-02-01 2
1003 2020-09-01 1
I managed to write a window function to grab the earliest date, but I'm having trouble figuring out where to fit the count statement in one:
SELECT
id,
created_date
FROM ( SELECT
id,
created_date,
row_number() OVER(PARTITION BY id ORDER BY created_date) as row_num
FROM table1)
) AS a
WHERE row_num = 1
You would use aggregation:
select id, min(create_date), count(*)
from table1
group by id;
I find it amusing that you want to use window functions -- which are considered more advanced -- when lowly aggregation suffices.

Getting two rows after nth row for each user

My table structure:
consumer_id, signup_date, plan_id, subscription_date
It has multiple subscription_dates for the same consumer_id.
I wish to get results only for those users who have atleast two rows of data
For each user I need to get a result which gives me the top two rows ordered by subscription_date..
Then I want another set of result of all users who have atleast three rows of data..
For each user I need to get a result which gives me the top second and third rows..
and so on...
I have a feeling its something similar to this but could not get it to work in my case..
Update:
Sample table data:
1 1/1/2015 1 3/1/2015
2 1/1/2015 1 3/1/2015
2 1/1/2015 1 4/1/2015
3 1/1/2015 1 6/1/2015
2 1/1/2015 1 6/1/2015
3 1/1/2015 1 7/1/2015
Sample Output1:
2 1/1/2015 1 3/1/2015
2 1/1/2015 1 4/1/2015
3 1/1/2015 1 6/1/2015
3 1/1/2015 1 7/1/2015
Sample Output2:
2 1/1/2015 1 4/1/2015
2 1/1/2015 1 6/1/2015
This is should give you answer to your first part,But to get entire answer please let us know more details of the query (will be great if we have some output)
Note: I am assuming #test as your main table.
select * from
(
SELECT a.consumer_id, a.signup_date, a.plan_id, a.subscription_date
,RANK() OVER
(PARTITION BY a.consumer_id ORDER BY a.subscription_date ASC) AS Rank1
FROM #test a
where a.consumer_id in
(
select consumer_id as count from #test
group by consumer_id
having count(consumer_id)>=2
)
) as b
where b.Rank1<=2
For the second part.
select * from #test
select * from
(
SELECT a.consumer_id, a.signup_date, a.plan_id, a.subscription_date
,RANK() OVER
(PARTITION BY a.consumer_id ORDER BY a.subscription_date ASC) AS Rank1
FROM #test a
where a.consumer_id in
(
select consumer_id as count from #test
group by consumer_id
having count(consumer_id)>=3
)
) as b
where b.Rank1 between 2 and 3

Select Most Recent Entry in SQL

I'm trying to select the most recent non zero entry from my data set in SQL. Most examples of this are satisfied with returning only the date and the group by variables, but I would also like to return the relevant Value. For example:
ID Date Value
----------------------------
001 2014-10-01 32
001 2014-10-05 10
001 2014-10-17 0
002 2014-10-03 17
002 2014-10-20 60
003 2014-09-30 90
003 2014-10-10 7
004 2014-10-06 150
005 2014-10-17 0
005 2014-10-18 9
Using
SELECT ID, MAX(Date) AS MDate FROM Table WHERE Value > 0 GROUP BY ID
Returns:
ID Date
-------------------
001 2014-10-05
002 2014-10-20
003 2014-10-10
004 2014-10-06
005 2014-10-18
But whenever I try to include Value as one of the selected variables, SQLServer results in an error:
"Column 'Value' is invalid in the select list because it is not
contained in either an aggregate function or the GROUP BY clause."
My desired result would be:
ID Date Value
----------------------------
001 2014-10-05 10
002 2014-10-20 60
003 2014-10-10 7
004 2014-10-06 150
005 2014-10-18 9
One solution I have thought of would be to look up the results back in the original Table and return the Value that corresponds to the relevant ID & Date (I have already trimmed down and so I know these are unique), but this seems to me like a messy solution. Any help on this would be appreciated.
NOTE: I do not want to group by Value as this is the result I am trying to pull out in the end (i.e. for each ID, I want the most recent Value). Further Example:
ID Date Value
----------------------------
001 2014-10-05 10
001 2014-10-06 10
001 2014-10-10 10
001 2014-10-12 8
001 2014-10-18 0
Here, I only want the last non zero entry. (001, 2014-10-12, 8)
SELECT ID, MAX(Date) AS MDate, Value FROM Table WHERE Value > 0 GROUP BY ID, Value
Would return:
ID Date Value
----------------------------
001 2014-10-10 10
001 2014-10-12 8
This can also be done using a window function which is very ofter faster than a join on a grouped query:
select id, date, value
from (
select id,
date,
value,
row_number() over (partition by id order by date desc) as rn
from the_table
) t
where rn = 1
order by id;
Assuming you don't have repeated dates for the same ID in the table, this should work:
SELECT A.ID, A.Date, A.Value
FROM
T1 AS A
INNER JOIN (SELECT ID,MAX(Date) AS Date FROM T1 WHERE Value > 0 GROUP BY ID) AS B
ON A.ID = B.ID AND A.Date = B.Date
select a.id, a.date, a.value from Table1 a inner join (
select id, max(date) mydate from table1
where Value>0 group by ID) b on a.ID=b.ID and a.Date=b.mydate
Using Subqry,
SELECT ID, Date AS MDate, VALUE
FROM table t1
where date = (Select max(date)
from table t2
where Value >0
and t1.id = t2.id
)
Answers provided are perfectly adequate, but Using CTE:
;WITH cteTable
AS
(
SELECT
Table.ID [ID], MAX(Date) [MaxDate]
FROM
Table
WHERE
Table.Value > 0
GROUP BY
Table.ID
)
SELECT
cteTable.ID, cteTable.Date, Table.Value
FROM
Table INNER JOIN cteTable ON (Table.ID = cteTable.ID)

fill in a null cell with cell from previous record

Hi I am using DB2 sql to fill in some missing data in the following table:
Person House From To
------ ----- ---- --
1 586 2000-04-16 2010-12-03
2 123 2001-01-01 2012-09-27
2 NULL NULL NULL
2 104 2004-01-01 2012-11-24
3 987 1999-12-31 2009-08-01
3 NULL NULL NULL
Where person 2 has lived in 3 houses, but the middle address it is not known where, and when. I can't do anything about what house they were in, but I would like to take the previous house they lived at, and use the previous To date to replace the NULL From date, and use the next address info and use the From date to replace the null To date ie.
Person House From To
------ ----- ---- --
1 586 2000-04-16 2010-12-03
2 123 2001-01-01 2012-09-27
2 NULL 2012-09-27 2004-01-01
2 104 2004-01-01 2012-11-24
3 987 1999-12-31 2009-08-01
3 NULL 2009-08-01 9999-01-01
I understand that if there is no previous address before a null address, that will have to stay null, but if a null address is the last know address I would like to change the To date to 9999-01-01 as in person 3.
This type of problem seems to me where set theory no longer becomes a good solution, however I am required to find a DB2 solution because that's what my boss uses!
any pointers/suggestions welcome.
Thanks.
It might look something like this:
select
person,
house,
coalesce(from_date, prev_to_date) from_date,
case when rn = 1 then coalesce (to_date, '9999-01-01')
else coalesce(to_date, next_from_date) end to_date
from
(select person, house, from_date, to_date,
lag(to_date) over (partition by person order by from_date nulls last) prev_to_date,
lead(from_date) over (partition by person order by from_date nulls last) next_from_date,
row_number() over (partition by person order by from_date desc nulls last) rn
from temp
) t
The above is not tested but it might give you an idea.
I hope in your actual table you have a column other than to_date and from_date that allows you to order rows for each person, otherwise you'll have trouble sorting NULL dates, as you have no way of knowing the actual sequence.
create table Temp
(
person varchar(2),
house int,
from_date date,
to_date date
)
insert into temp values
(1,586,'2000-04-16','2010-12-03 '),
(2,123,'2001-01-01','2012-09-27'),
(2,NULL,NULL,NULL),
(2,104,'2004-01-01','2012-11-24'),
(3,987,'1999-12-31','2009-08-01'),
(3,NULL,NULL,NULL)
select A.person,
A.house,
isnull(A.from_date,BF.to_date) From_date,
isnull(A.to_date,isnull(CT.From_date,'9999-01-01')) To_date
from
((select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) A left join
(select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) BF
on A.person = BF.person and
A.rownum = BF.rownum + 1)left join
(select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) CT
on A.person = CT.person and
A.rownum = CT.rownum - 1