Join the records with sql? - sql

I have a table with those 5 rows.
code | type_id | status
-----+--------+--------
123 | 123456 | DONE
123 | 456789 | DONE
321 | 654321 | DONE
321 | 897321 | DONE
456 | 999888 | DONE
456 | 777666 | FAIL
And I want to change it to below with DONE only.
code | type_id1 | type_id2
-----+----------+---------
123 | 123456 | 456789
321 | 654321 | 897321
456 | 999888 | null
How can I join them to show the result?

You can use aggregation
select
code,
min(type) type1,
case when count(*) > 1 then max(type) end type2
from mytable
group by code
Note that this only works as expected if a code has 1 or 2 types.

If I understand correctly that you want one row per code, you can use aggregation:
select code,
min(type_id) as type_id1,
(case when min(type_id) <> max(type_id) then max(type_id) end) as type_id2
from t
where status = 'DONE'
group by code;
Note that SQL tables represent unordered sets. With your sample data, there is no way to preserve "the original" order of the values, because that is undefined -- unless another column specifies that ordering.

You can make a LEFT JOIN, for example:
SELECT
A.code,
A.type_id,
B.type_id
FROM table A
LEFT JOIN
table B ON A.code = B.code AND A.type_id <> B.type_id AND A.status = B.status
WHERE A.status = 'DONE'

You can use cte as (select code, type_id, status, row_number() over(partition by code order by code) as rank from table)
select * from cte where rank =1 and status ='Done'

Related

How can I drop a row if repeated by category?

how would I go about dropping rows which have a duplicate and keeping another row by its category.
For example, let's consider a sample table
Item | location | Status
------------------------
123 | A | done
123 | A | not_done
123 | B | Other
435 | D | Other
So essentially what I want to get to would be this table
Item | location | Status
------------------------
123 | A | done
435 | D | Other
I am not interested in the other status or location IF the status is done. If it is not "done" then I would show the following one.
Any clues if it is possible to create something like this in an SQL query?
Identify rows with done if any and prioritize them.
select * except rn
from (
select item, location, status
, row_number() over (
partition by item
order by case status when 'done' then 0 else 1 end
) as rn
from t
)
where rn = 1
(I didn't try it, excuse syntax errors please.)
Yes you can do it via exists condition like below after enabling standard SQL first.
select * from
yourtable A
where not exists
(
select 1 from yourtable B
where A.id=B.id and A.location=B.location
and A.status<>B.status
AND B.status <> 'done'
)

Pulling multiple entries based on ROW_NUMBER

I got the row_num column from a partition. I want each Type to match with at least one Sent and one Resent. For example, Jon's row is removed below because there is no Resent. Kim's Sheet row is also removed because again, there is no Resent. I tried using a CTE to take all columns for a Code if row_num = 2 but Kim's Sheet row obviously shows up because they're all under one Code. If anyone could help, that'd be great!
Edit: I'm using SSMS 2018. There are multiple Statuses other than Sent and Resent.
What my table looks like:
+-------+--------+--------+---------+---------+
| Code | Name | Type | Status | row_num |
+-------+--------+--------+---------+---------+
| 123 | Jon | Sheet | Sent | 1 |
| 221 | Kim | Sheet | Sent | 1 |
| 221 | Kim | Book | Resent | 1 |
| 221 | Kim | Book | Sent | 2 |
| 221 | Kim | Book | Sent | 3 |
+-------+--------+--------+---------+---------+
What I want it to look like:
+-------+--------+--------+---------+---------+
| Code | Name | Type | Status | row_num |
+-------+--------+--------+---------+---------+
| 221 | Kim | Book | Resent| 1 |
| 221 | Kim | Book | Sent | 2 |
| 221 | Kim | Book | Sent | 3 |
+-------+--------+--------+---------+---------+
Here is my CTE code:
WITH CTE AS
(
SELECT *
FROM #MyTable
)
SELECT *
FROM #MyTable
WHERE Code IN (SELECT Code FROM CTE WHERE row_num = 2)
If sent and resent are the only values for status, then you can use:
select t.*
from t
where exists (select 1
from t t2
where t2.name = t.name and
t2.type = t.type and
t2.status <> t.status
);
You can also phrase this with window functions:
select t.*
from (select t.*,
min(status) over (partition by name, type) as min_status,
max(status) over (partition by name, type) as max_status
from t
) t
where min_status <> max_status;
Both of these can be tweaked if other status values are possible. However, based on your question and sample data, that does not seem necessary.
FIDDLE
CREATE TABLE Table1(ID integer,Name VARCHAR(10),Type VARCHAR(10),Status VARCHAR(10),row_num integer);
INSERT INTO Table1 VALUES
('123','Jon','Sheet','Sent','1'),
('221','Kim','Sheet','Sent','1'),
('221','Kim','Book','Resent','1'),
('221','Kim','Book','Sent','2'),
('221','Kim','Book','Sent','3');
SELECT t1.*
FROM Table1 t1
WHERE EXISTS (
select 1
from Table1 t2
where t2.Name=t1.Name
and t2.Type=t1.TYpe
and t2.Status = case when t1.Status='Sent'
then 'Resent'
else 'Sent' end)
It would be easier if you would provide some scripts to create table and put these test data, but try something like
with a1 as (
select
name, type,
row_number() over (partition by code, Name, type, status) as rn
from #MyTable
), a2 as (
select * from a1 where rn > 1
)
select t.*
from #MyTable as t
inner join a2 on t.name = a2.name and t.type = a2.type;
Here you
calculate another row number using partitions by code, name, type and status,
then fetch these with this new row number > 1
and finally, you use that to join to original table and get interesting you rows
Syntax may vary on MSSQL, but you should give it a try. And please use better names than me ;-)
This solution is quite generic because it doesn't rely on used statuses. They're not hardcoded. And you can easily control what matters by changing partitions.
Fiddle

MAX plus other data with JOIN

This should be a simple one I reckon (at least I thought it would be when I started doing it a couple hours ago).
I am trying to select the MAX value from one table and join it with another to get the pertinent data.
I have two tables: ACCOUNTS & ACCOUNT_BALANCES
Here they are:
ACCOUNTS
ACC_ID | NAME | IMG_LOCATION
------------------------------------
0 | Cash | images/cash.png
500 | MyBank | images/mybank.png
and
ACCOUNT_BALANCES
ACC_ID | BALANCE | UPDATE_DATE
-------------------------------
500 | 100 | 2017-11-10
500 | 250 | 2018-01-11
0 | 100 | 2018-01-05
I would like the end result to look like:
ACC_ID | NAME | IMG_LOCATION | BALANCE | UPDATE_DATE
----------------------------------------------------------------
0 | Cash | images/cash.png | 100 | 2018-01-05
500 | MyBank | images/mybank.png | 250 | 2018-01-11
I thought I could select the MAX(UPDATE_DATE) from the ACCOUNT_BALANCES table, and join with the ACCOUNTS table to get the account name (as displayed above), but having to group by means my end result includes all records from the ACCOUNT_BALANCES table.
I can use this query to select only the records from ACCOUNT_BALANCES with the max UPDATE_DATE, but I can't include the balance.
SELECT
a.ACC_ID,
a.IMG_LOCATION,
a.NAME,
x.UDATE
FROM
ACCOUNTS a
RIGHT JOIN
(
SELECT
b.ACC_ID,
MAX(b.UPDATE_DATE) as UDATE
FROM
ACCOUNT_BALANCES b
GROUP BY
b.ACC_ID
) x
ON
a.ACC_ID = x.ACC_ID
If I include ACCOUNT_BALANCES.BALANCE in the above query (like so):
SELECT
a.ACC_ID,
a.IMG_LOCATION,
a.NAME,
x.UDATE
FROM
ACCOUNTS a
RIGHT JOIN
(
SELECT
b.ACC_ID,
b.BALANCE,
MAX(b.UPDATE_DATE) as UDATE
FROM
ACCOUNT_BALANCES b
GROUP BY
b.ACC_ID, b.BALANCE
) x
ON
a.ACC_ID = x.ACC_ID
The results returned looks like this:
ACC_ID | NAME | IMG_LOCATION | BALANCE | UPDATE_DATE
----------------------------------------------------------------
0 | Cash | images/cash.png | 100 | 2018-01-05
500 | MyBank | images/mybank.png | 100 | 2018-01-11
500 | MyBank | images/mybank.png | 250 | 2018-01-11
Which is obviously the case, since I'm grouping by BALANCE in the subquery.
I'm super hesitant to post this, as this seems exactly the kind of questions that's been answered n times, but I've searched around a lot, but couldn't find anything that really helped me.
This one has a really good answer, but isn't exactly what I'm looking for
I'm obviously missing something really simple and even pointers in the right direction will help. Thank you.
Try This
;WITH CTE
AS
(
SELECT
RN = ROW_NUMBER() OVER(PARTITION BY AC.ACC_ID ORDER BY AB.UPDATE_DATE DESC),
AC.ACC_ID,
NAME,
IMG_LOCATION,
BALANCE,
UPDATE_DATE
FROM ACCOUNTS AC
INNER JOIN ACCOUNT_BALANCES AB
ON AC.ACC_ID = AB.ACC_ID
)
SELECT
*
FROM CTE
WHERE RN = 1
I think the simplest method is outer apply:
select a.*, ab.*
from accounts a outer apply
(select top 1 ab.*
from account_balances ab
where ab.acc_id = a.acc_id
order by ab.update_date desc
) ab;
apply implements what is technically known as a "lateral join". This is a very powerful type of join -- something like a generalization of correlated subqueries.
The accepted answer will work. However, for large data sets try to avoid using the row_number() function. Instead, you can try something like this (assuming update_date is part of a unique constraint):
SELECT
a.Acc_Id,
a.Name,
a.Img_Location,
bDetail.Balance,
bDetail.Update_Date
FROM
#accounts AS a LEFT JOIN
(
SELECT Acc_Id, MAX(Update_Date) AS Update_Date
FROM #account_balances AS b
GROUP BY Acc_Id
) AS maxDate ON a.Acc_Id = maxDate.Acc_Id
LEFT JOIN #account_balances AS bDetail ON maxDate.Acc_Id = bDetail.Acc_Id AND
maxDate.Update_Date = bDetail.Update_Date

Using RANK in HiveQL, dynamic limits

Trying to perform a dynamic limit in hive sql via the rank function.
PROBLEM:
I want to use the limit from table A against table B to create the output. Example below.
TABLE A:
ID | Limit
------------
123 | 1
456 | 3
789 | 2
TABLE B:
ID | User
-------
123 | ABC
123 | DEF
123 | GHI
456 | JKL
456 | MNO
789 | PQR
789 | RST
OUTPUT:
ID | User
----------
123 | ABC
456 | JKL
456 | MNO
789 | PQR
789 | RST
Unfortunately you cannot do a dynamic limit (as far as I know) in hive sql. So I was trying to use rank. My current query looks like this:
SELECT c.id, c.users, c.rnk
FROM (
SELECT b.id, b.user, a.limit, rank() over (ORDER BY b.id DESC) as rnk
FROM a JOIN b
ON a.id = b.id
) c
WHERE rnk < c.limit;
Currently I get the error:
ParseException line 3:9 cannot recognize input near 'rank' '(' ')' in from source 0
Any ideas why? Or maybe a better approach?
Thanks!
SELECT c.id, c.users, c.rnk
FROM (
SELECT b.id, b.user, a.limit, row_number() over (PARTITION by b.id ORDER BY b.id ) as rn
FROM a JOIN b
ON a.id = b.id
) c
WHERE rn <= c.limit;
In the above query row_number() will number rows after join, filter in the where clause will work as limit. ORDER BY is not necessary for simply limiting rows without any preference, replace ORDER BY with your rule, for example order by user.

How do I join to another table and return only the most recent matching row?

I have a table that stores the lines on a contract. Each contract line his it's own unique ID, it also has the ID of its parent contract. Example:
+-------------+---------+
| contract_id | line_id |
+-------------+---------+
| 1111 | 100 |
| 1111 | 101 |
| 1111 | 102 |
+-------------+---------+
I have another table that stores the historical changes to contract lines. For example, every time the number of units on a contract line is changed a new row is added to the table. Example:
+-------------+---------+--------------+-------+
| contract_id | line_id | date_changed | units |
+-------------+---------+--------------+-------+
| 1111 | 100 | 2016-01-01 | 1 |
| 1111 | 100 | 2016-02-01 | 2 |
| 1111 | 100 | 2016-03-01 | 3 |
+-------------+---------+--------------+-------+
As you can see the contract line with ID 100 belonging to the contract with ID 1111 has been edited 3 times over 3 months. The current value is 3 units.
I'm running a query against the contract lines table to select all data. I want to join to the historical data table and select the most recent row for each contract line and show the units in my results. How do I do this?
Expected results (there would single results for 101 and 102 as well):
+-------------+---------+-------+
| contract_id | line_id | units |
+-------------+---------+-------+
| 1111 | 100 | 3 |
+-------------+---------+-------+
I've tried the query below with a left join but it returns 3 rows instead of 1.
Query:
SELECT *, T1.units
FROM contract_lines
LEFT JOIN (
SELECT contract_id, line_id, units, MAX(date_changed) AS maxdate
FROM contract_history
GROUP BY contract_id, line_id, units) AS T1
ON contract_lines.contract_id = T1.contract_id
AND contract_lines.line_id = T1.line_id
Actual results:
+-------------+---------+-------+
| contract_id | line_id | units |
+-------------+---------+-------+
| 1111 | 100 | 1 |
| 1111 | 100 | 2 |
| 1111 | 100 | 3 |
+-------------+---------+-------+
An extra join to contract_history along with maxdate will work
SELECT contract_lines.*,T2.units
FROM contract_lines
LEFT JOIN (
SELECT contract_id, line_id, MAX(date_changed) AS maxdate
FROM contract_history
GROUP BY contract_id, line_id) AS T1
JOIN contract_history T2 ON
T1.contract_id=T2.contract_id and
T1.line_id= T2.line_id and
T1.maxdate=T2.date_changed
ON contract_lines.contract_id = T1.contract_id
AND contract_lines.line_id = T1.line_id
Output
This is my preferred style because it doesn't require self joining and cleanly expresses your intent. Also, it competes very well with the ROW_NUMBER() method in terms of performance.
select a.*
, b.units
from contract_lines as a
join (
select a.contract_id
, a.line_id
, a.units
, Max(a.date_changed) over(partition by a.contract_id, a.line_id) as max_date_changed
from contract_history as a
) as b
on a.contract_id = b.contract_id
and a.line_id = b.line_id
and b.date_changed = b.max_date_changed;
Another possible solution to this. This uses RANK to sort/filter this. Similar to what you did, just a different tact.
SELECT contract_lines.*, T1.units
FROM contract_lines
LEFT JOIN (
SELECT contract_id, line_id, units,
RANK() OVER (PARTITION BY contract_id, line_id ORDER BY date_changed DESC) AS [rank]
FROM contract_history) AS T1
ON contract_lines.contract_id = T1.contract_id
AND contract_lines.line_id = T1.line_id
AND T1.rank = 1
WHERE T1.units IS NOT NULL
You could change this to a INNER JOIN and remove the IS NOT NULL in the WHERE clause if you expect data to be present all the time.
Glad you figured it out!
Try this simple query:
SELECT TOP 1 T1.*
FROM contract_lines T0
INNER JOIN contract_history T1
ON T0.contract_id = T1.contract_id and
T0.line_id = T1.line_id
ORDER BY date_changed DESC
As always seems to be the way after spending an hour looking at it and shouting at StackOverflow for having a rare period of maintenance I solve my own problem not long after posting a question.
In an effort to help anyone else who's stuck I'll show what I found. It might not be an efficient way to achieve this so if someone has a better suggestion I'm all ears.
I adapted the answer from here: T-SQL Subquery Max(Date) and Joins
SELECT *,
Units = (SELECT TOP 1 units
FROM contract_history
WHERE contract_lines.contract_id = contract_history.contract_id
AND contract_lines.line_id = contract_history.line_id
ORDER BY date_changed DESC
)
FROM ....