How to handle duplicate row with join

How to handle duplicate row with join - sql

I have a table bic_table.
-------------------------
KeyInstn | SwiftBICCode
100369 | BOFAUSV1
100369 | MLCOUS33
keyInstn_table
-------------------------
KeyInstn | country
100369 | USA
100370 | India
I am trying to join keyInstn_table with bic_table.
And I want to join both value as a comma separated.
How to get the result as
-------------------------
KeyInstn | country | SwiftBICCode
100369 | USA | BOFAUSV1,MLCOUS33
100370 | India | BOFH76HG
-------------------------

If your database version is SQL Server 2017+ then you can use following:
SELECT a.keyInstn, country,STRING_AGG(SwiftBICCode, ', ') AS SwiftBICCode
FROM tablename a inner join keyInstn_table b on a.keyInstn=b.KeyInstn
GROUP BY a.keyInstn,country
Alternatively, you can use stuff() for lower versions of SQL Server
select u.keyInstn, country,
stuff(( select concat( ',', SwiftBICCode) from tablename y
where y.keyInstn= u.keyInstn for xml path('')),1,1, '')
from tablename u inner join keyInstn_table b on u.keyInstn=b.KeyInstn
group by u.keyInstn,country

Related

SQL LEFT JOIN WITH SPLIT

I want to do a left join on a table where the format of the two columns are not the same. I use REPLACE to remove the "[ ]" but I'm having trouble making one of the rows into two rows so be able to complete the join.
emp_tbl state_tbl
emp state id name
+--------+-------+ +------+-----+
| Steve | [1] | | 1 | AL |
| Greg | [2|3] | | 2 | NV |
| Steve | [4] | | 3 | AZ |
+--------+-------+ | 4 | NH |
+------+-----+
Desired output:
+--------+------+
| Steve | AL |
| Greg | NV |
| Greg | AZ |
| Steve | NH |
+--------+------+
SELECT emp_tbl.emp, state_tbl.name
FROM emp_tbl
LEFT JOIN state_tbl on state_tbl.id = REPLACE(REPLACE(emp_tbl.state, '[', ''), ']', '')
With this query i can remove the "[ ]" and do the join, but the row with two "states" does obiously not work.

Your query will never produce 4 rows because the left table only has 3 rows. You need to flatten the rows that contains multiple state_ids before the join.
Prepare the table and data:
create or replace table emp_tbl (emp varchar, state string);
create or replace table state_tbl (id varchar, name varchar);
insert into emp_tbl values
('Steve', '[1]'), ('Greg', '[2|3]'), ('Steve', '[4]');
insert into state_tbl values
(1, 'AL'), (2, 'NV'), (3, 'AZ'), (4, 'NH');
Then below query should give you the data you want:
with emp_tbl_tmp as (
select emp, parse_json(replace(state, '|', ',')) as states from emp_tbl
),
flattened_tbl as (
select emp, value as state_id from emp_tbl_tmp, table(flatten(input => states))
)
select emp, name from flattened_tbl emp
left join state_tbl state on (emp.state_id = state.id);
Or if you want to save one step:
with flattened_emp_tbl as (
select emp, value as state_id
from emp_tbl,
table(flatten(
input => parse_json(replace(state, '|', ','))
))
)
select emp, name from flattened_emp_tbl emp
left join state_tbl state
on (emp.state_id = state.id);

here is how you can do it :
select emp_tbl.emp, state_tbl.name
from emp_tbl tw
lateral flatten (input=>split(parse_json(tw.state), '|')) s
left join state_tbl on s.value = state_tbl.id

Redshift create all the combinations of any length for the values in one column

How can we create all the combinations of any length for the values in one column and return the distinct count of another column for that combination?
Table:
+------+--------+
| Type | Name |
+------+--------+
| A | Tom |
| A | Ben |
| B | Ben |
| B | Justin |
| C | Ben |
+------+--------+
Output Table:
+-------------+-------+
| Combination | Count |
+-------------+-------+
| A | 2 |
| B | 2 |
| C | 1 |
| AB | 3 |
| BC | 2 |
| AC | 2 |
| ABC | 3 |
+-------------+-------+
When the combination is only A, there are Tom and Ben so it's 2.
When the combination is only B, 2 distinct names so it's 2.
When the combination is A and B, 3 distinct names: Tom, Ben, Justin so it's 3.
I'm working in Amazon Redshift. Thank you!

NOTE: This answers the original version of the question which was tagged Postgres.
You can generate all combinations with this code
with recursive td as (
select distinct type
from t
),
cte as (
select td.type, td.type as lasttype, 1 as len
from td
union all
select cte.type || t.type, t.type as lasttype, cte.len + 1
from cte join
t
on 1=1 and t.type > cte.lasttype
)
You can then use this in a join:
with recursive t as (
select *
from (values ('a'), ('b'), ('c'), ('d')) v(c)
),
cte as (
select t.c, t.c as lastc, 1 as len
from t
union all
select cte.type || t.type, t.type as lasttype, cte.len + 1
from cte join
t
on 1=1 and t.type > cte.lasttype
)
select type, count(*)
from (select name, cte.type, count(*)
from cte join
t
on cte.type like '%' || t.type || '%'
group by name, cte.type
having count(*) = length(cte.type)
) x
group by type
order by type;

There is no way to generate all possible combinations (A, B, C, AB, AC, BC, etc) in Amazon Redshift.
(Well, you could select each unique value, smoosh them into one string, send it to a User-Defined Function, extract the result into multiple rows and then join it against a big query, but that really isn't something you'd like to attempt.)
One approach would be to create a table containing all possible combinations — you'd need to write a little program to do that (eg using itertools in Python). Then, you could join the data against that reasonably easy to get the desired result (eg IF 'ABC' CONTAINS '%A%').

TSQL Delete Duplicates in table after comparing results found in duplicate search

I have duplicate data in a single table.
Table Layout
accountNumber | firstName | lastName | address | zip
SMI2365894511 | Paul | Smith | 1245 Rd | 89120
SMI2365894511 | Paul | Smith | |
I have the below query to find and display the duplicates.
select *
from tableA a
join (select accountNumber
from tableA
group by accountNumber
having count(*) > 1 ) b
on a.accountNumber = b.accountNumber
What I would like to do is compare the results of the above query and remove the duplicate that doesn't have any address information. I'm using MS SQL Server 2014
EDIT** I have the query the way it is so can see both duplicate rows

delete a
from XmaCustomerDetails a
join ( select accountNumber
from XmaCustomerDetails
group by accountNumber
having count(*) > 1 ) b
on a.accountNumber = b.accountNumber
WHERE address is null

Count by unique ID, group by another

I've inherited some scripts that count the number of people in a team by department; the current scripts create a table for each individual department and the previous user would copy/paste the data into Excel. I've been tasked to pull this report into SSRS so I need one table for all the departments by team.
Current Table
+-------+-----------+---------+
| Dept | DataMatch | Team |
+-------+-----------+---------+
| 01 | 4687Joe | Dodgers |
| 01 | 3498Cindy | RedSox |
| 01 | 1057Bob | Yankees |
| 01 | 0497Lucy | Dodgers |
| 02 | 7934Jean | Yankees |
| 02 | 4584Tom | Dodgers |
+-------+-----------+---------+
Desired Results
+-------+---------+--------+---------+
| Dept | Dodgers | RedSox | Yankees |
+-------+---------+--------+---------+
| 01 | 2 | 1 | 1 |
| 02 | 1 | 0 | 1 |
+-------+---------+--------+---------+
The DataMatch field is the unique identifier I will be counting. I started by wrapping each department in a CTE however this results in the Dept as the Column which would not work for my report, so I need to transpose my results and I haven't been able to figure that out. There are 60 departments and my query was getting very long.
Current query
SELECT Dept, DataMatch, Team INTO #temp_Team
FROM TeamDatabase
WHERE Status = 14
AND Team <> 'Missing'
;WITH A_cte (Team, Dept01)
AS
(
SELECT Team
, COUNT(DISTINCT datamatch) AS 'Dept01'
FROM #temp_Team
WHERE Dept = '01'
GROUP BY Team
),
B_cte (Team, Dept02) AS
(
SELECT Team
, COUNT(DISTINCT datamatch) AS 'Dept02'
FROM #temp_Team
WHERE Dept = '02'
GROUP BY Team
)
SELECT A_cte.Team
, A_cte.Dept01
, B_cte.Dept02
FROM A_cte
INNER JOIN B_cte
ON A_cte.Team=B_cte.Team
Which results in:
+----------------------------+-------+-------+
| Team | Prg01 | Prg02 |
+----------------------------+-------+-------+
| RedSox | 144 | 141 |
| Yankees | 63 | 236 |
| Dodgers | 298 | 196 |
+----------------------------+-------+-------+
I feel that using a pivot on my already very long query would be excessive and impact performance, 60 departments with over 30,000 rows.
What, mostly likely basic, step am I missing?
TL;DR - How do I count people by team and list by department?

I would replace the whole query with a dynamic pivot instead of adding a pivot to your CTEs.
You can add your Status/Team conditions to the SELECT inside the dynamic query at the bottom. They would be WHERE STATUS=14 AND TEAM !=''MISSING'' - note that is two single quotes to nest it within the string.
IF OBJECT_ID('tempdb..#data') IS NOT NULL DROP TABLE #data
CREATE TABLE #data (Dept VARCHAR(50), DataMatch NVARCHAR(50), Team VARCHAR(50))
INSERT INTO #data (Dept, DataMatch, Team)
VALUES ('01', '4687Joe','Dodgers'),
('01', '3498Cindy','RedSox'),
('01', '1057Bob','Yankees'),
('01', '0497Lucy','Dodgers'),
('02', '7934Jean','Yankees'),
('02', '4584Tom','Dodgers')
DECLARE #cols AS NVARCHAR(MAX),
#sql AS NVARCHAR(MAX)
SET #cols = STUFF(
(SELECT N',' + QUOTENAME(y) AS [text()]
FROM (SELECT DISTINCT Team AS y FROM #data) AS Y
ORDER BY y
FOR XML PATH('')),
1, 1, N'');
SET #sql = 'SELECT Dept, '+#cols+'
FROM (SELECT Dept, DataMatch, Team
FROM #data D) SUB
PIVOT (COUNT([DataMatch]) FOR Team IN ('+#cols+')) AS P'
PRINT #SQL
EXEC (#SQL)
In case you don't want to use a dynamic pivot, here is just a stand-alone query... again, add your conditions as you need.
SELECT Dept, Dodgers, RedSox, Yankees
FROM (SELECT Dept, DataMatch, Team
FROM #data D) SUB
PIVOT (COUNT([DataMatch]) FOR Team IN ([Dodgers], [RedSox], [Yankees])) AS P

I'm not sure I follow what relevance your existing query has, but to get from your current table to your desired results is a pretty straightforward usage of PIVOT:
SELECT *
FROM Table1
PIVOT(COUNT(DataMatch) FOR Team IN (Dodgers,RedSox,Yankees))pvt
Demo: SQL Fiddle
And this of course could be done dynamically if the teams list isn't static.

Formatting SQL Output (Pivot)

This is running on SQL Server 2008.
Anyway, I have sales data, and I can write a query to get the output to look like this:
id | Name | Period | Sales
1 | Customer X | 2013/01 | 50
1 | Customer X | 2013/02 | 45
etc. Currently, after running this data, I am rearranging the data in the code behind so that the final output looks like this:
id | Name | 2013/01 | 2013/02
1 | Customer X | 50 | 40
The issues are:
The date (YYYY/MM) range is an input from the user.
If the user selects more outputs (like, say, address, and a ton of other possible fields relating to that customer), that information is duplicated in every line. When you're doing 10-15 items per line, over a period of 5+ years, for 50000+ users, this causes problems with running out of memory, and is also inefficient.
I've considered pulling only the necessary data (the customer id -- how they're joined together, the period, and the sales figure), and then after the fact running a separate query to get the additional data. This doesn't seem like it would be efficient though, but it's a possibility.
The other, which is what I'm thinking should be the best option, would be to rewrite my query to go ahead and do what my current code behind is doing, and pivot the data together, that way the customer data is never duplicated and I'm not moving a lot of unnecessary data around.
To give a better example of what I'm working with, let's assume these tables:
Address
id | HouseNum | Street | Unit | City | State
Customer
id | Name |
Sales
id | Period | Sales
So I would like to join these tables on the customer id, display all of the address data, assume the user inputs "2012/01 -- 2012/12", I can translate that into 2012/01, 2012/02 ... 2012/12 in my code behind to input into the query before it executes, so I have that available.
What I want it to look like would be:
id | Name | HouseNum | Street | City | State | 2012/01 | 2012/02 | ... | 2012/12
1 | X | 100 | Main St. | ABC | DEF | 30 | | ... | 20
(no sales data for that customer on 2012/02 -- if any of the data is blank I want it to be a blank string "", not a NULL)
I realize I may not be explaining this the best way possible, so just let me know and I'll add more information. Thank you!
edit: oh, one last thing. Would it be possible to add a Min, Max, Avg, & Total columns to the end, which sum up all of the pivoted data? It wouldn't be a big deal to do it on the code behind, but the more sql server can do for me the better, imo!
edit: One more, the period is in the tables as "2013/01" etc, but I'd like to rename them to "Jan 2013" etc, if it's not too complicated?

You can implement the PIVOT function to transform the data from rows into columns. You can use the following to get the result:
select id,
name,
HouseNum,
Street,
City,
State,
isnull([2013/01], 0) [2013/01],
isnull([2013/02], 0) [2013/02],
isnull([2012/02], 0) [2012/02],
isnull([2012/12], 0) [2012/12],
MinSales,
MaxSales,
AvgSales,
TotalSales
from
(
select c.id,
c.name,
a.HouseNum,
a.Street,
a.city,
a.state,
s.period,
s.sales,
min(s.sales) over(partition by c.id) MinSales,
max(s.sales) over(partition by c.id) MaxSales,
avg(s.sales) over(partition by c.id) AvgSales,
sum(s.sales) over(partition by c.id) TotalSales
from customer c
inner join address a
on c.id = a.id
inner join sales s
on c.id = s.id
) src
pivot
(
sum(sales)
for period in ([2013/01], [2013/02], [2012/02], [2012/12])
) piv;
See SQL Fiddle with Demo.
If you have a unknown number of period values that you want to transform into column, then you will have to use dynamic SQL to get the result:
DECLARE #cols AS NVARCHAR(MAX),
#colsNull AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT distinct ',' + QUOTENAME(period)
from Sales
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
select #colsNull = STUFF((SELECT distinct ', IsNull(' + QUOTENAME(period) + ', 0) as '+ QUOTENAME(period)
from Sales
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT id,
name,
HouseNum,
Street,
City,
State,' + #colsNull + ' ,
MinSales,
MaxSales,
AvgSales,
TotalSales
from
(
select c.id,
c.name,
a.HouseNum,
a.Street,
a.city,
a.state,
s.period,
s.sales,
min(s.sales) over(partition by c.id) MinSales,
max(s.sales) over(partition by c.id) MaxSales,
avg(s.sales) over(partition by c.id) AvgSales,
sum(s.sales) over(partition by c.id) TotalSales
from customer c
inner join address a
on c.id = a.id
inner join sales s
on c.id = s.id
) x
pivot
(
sum(sales)
for period in (' + #cols + ')
) p '
execute(#query)
See SQL Fiddle with Demo. These give the result:
| ID | NAME | HOUSENUM | STREET | CITY | STATE | 2012/02 | 2012/12 | 2013/01 | 2013/02 | MINSALES | MAXSALES | AVGSALES | TOTALSALES |
---------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | Customer X | 100 | Maint St. | ABC | DEF | 0 | 20 | 50 | 45 | 20 | 50 | 38 | 115 |
| 2 | Customer Y | 108 | Lost Rd | Unknown | Island | 10 | 0 | 0 | 0 | 10 | 10 | 10 | 10 |

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to handle duplicate row with join - sql

Related

SQL LEFT JOIN WITH SPLIT

Redshift create all the combinations of any length for the values in one column

TSQL Delete Duplicates in table after comparing results found in duplicate search

Count by unique ID, group by another

Formatting SQL Output (Pivot)

Categories

Resources