SQL Server stored procedure inserting duplicate rows - sql

I have a table with column GetDup and I'd like to the duplicate records based on the value of this column. For example, if value on is 1 in GetDup, then duplicate the record once. If value in the column is 2, then duplicate the record twice and so on and the statement has to be in looping statement.
What will be a good way to write a stored procedures for this? Please help.
Input:
+--------+--------------+---------------+
| Getdup | CustomerName | CustomerAdd |
+--------+--------------+---------------+
| 1 | John | 123 SomeWhere |
| 2 | Bob | 987 SomeWhere |
+--------+--------------+---------------+
What I want:
+--------+--------------+---------------+
| Getdup | CustomerName | CustomerAdd |
+--------+--------------+---------------+
| 1 | John | 123 SomeWhere |
| 1 | John | 123 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
+--------+--------------+---------------+
picture of data

Answer #2 After Clarification
Number Table to the Rescue!
The number table in my example (or tally table, if you want to call it that), is both temporary and very small. To make it bigger, just add more values to z and add more CROSS JOINs. In my opinion, a number table and a calendar table are both things that should be in every database you have. They are extremely useful.
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE mytable ( Getdup int, CustomerName varchar(10), CustomerAdd varchar(20) ) ;
INSERT INTO mytable (Getdup, CustomerName, CustomerAdd)
VALUES (1,'John','123 SomeWhere'), (2,'Bob','987 SomeWhere')
;
Query 1:
;WITH z AS (
SELECT *
FROM ( VALUES(0),(0),(0),(0) ) v(x)
)
, numTable AS (
SELECT num
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY z1.x)-1 num
FROM z z1
CROSS JOIN z z2
) s1
)
SELECT t1.Getdup, t1.CustomerName, t1.CustomerAdd
FROM mytable t1
INNER JOIN numTable ON t1.getdup >= numTable.num
ORDER BY CustomerName, CustomerAdd
Results:
| Getdup | CustomerName | CustomerAdd |
|--------|--------------|---------------|
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 1 | John | 123 SomeWhere |
| 1 | John | 123 SomeWhere |
--------------------------------------------------------------------------
ORIGINAL ANSWER
EDIT: After further clarification of the problem, this won't duplicate rows, this will only duplicate the data in a column.
Something like one of these might work.
T-SQL
SELECT replicate(mycolumn,getdup) AS x
FROM mytable
MySQL
SELECT repeat(mycolumn,getdup) AS x
FROM mytable
Oracle SQL
SELECT rpad(mycolumn,getdup*length(mycolumn),mycolumn) AS x
FROM mytable
PostgreSQL
SELECT repeat(mycolumn,getdup+1) AS x
FROM mytable
If you can provide more details for exactly what you want and what you're working with, we might be able to help you better.
NOTE 2: Depending on what you need, you may need to do some math magic. You say above if GetDup is 1 then you want one duplicate. If that means that your output should be GetDup``GetDup, then you'll want to add one in the repeat(),replicate() or rpad() functions. ie replicate(mycolumn,getdup+1). Oracle SQL will be a little different, since it uses rpad().

In standard SQL you can use a recursive CTE:
with recursive cte as (
select t.dup, . . .
from t
union all
select cte.dup - 1, . . .
from cte
where cte.dup > 1
)
select *
from cte;
Of course, not all databases support recursive CTEs (and the recursive keyword is not used in some of them).

So, you want recursive solution :
with t as (
select Getdup, CustomerName, CustomerAdd, 0 as id
from table
union all
select Getdup, CustomerName, CustomerAdd, id + 1
from t
where id < getdup
)
insert into table (col1, col2, col3)
select Getdup, CustomerName, CustomerAdd
from t
order by getdup
option (maxrecursion 0);

Related

Finding duplicated column value pairs in sql table

I have a table containing pairs for matches, and the table looks like this:
|pairing_id|player1_id|player2_id|number_of_round|
| 132 | Thomas | Brian | 1 |
I try to write an sql query, which shows me all the redundant pairs, so the pairing is the same if the 2 player names are the same, but for the second time, Brian is player1 and Thomas is the player2.
So this 2 matchups are considered the same pairs, as the player names are the same:
|pairing_id|player1_id|player2_id|number_of_round|
| 132 | Thomas | Brian | 1 |
| 458 | Brian | Thomas | 4 |
I need to find all the redundant pairings in the table, but sadly I dont know how to query for this.
Can be done with EXISTS
select t1.pairing_id, t1.player1_id, t1.player2_id, t1.number_of_round
from myTable t1
where exists (select null
from myTable t2
where t2.player1_id = t1.player2_id and t2.player2_id = t1.player1_id)
order by case when t1.player1_id > t1.player2_id then t1.player2_id else t1.player1_id end
You can do it with exists:
select t.* from tablename t
where exists (
select 1 from tablename
where pairing_id <> t.pairing_id and player1_id = t.player2_id and player2_id = t.player1_id
)

Columns to Rows Two Tables in Cross Apply

Using SQL Server I have two tables, below sample Table #T1 in DB has well over a million rows, Table #T2 has 100 rows. Both tables are in Column format and I need to Pivot to rows and join both.
Can I get it all in one query with Cross Apply and remove the cte?
This is my code, I have correct output but is this the most efficient way to do this considering number of rows?
with cte_sizes
as
(
select SizeRange,Size,ColumnPosition
from #T2
cross apply (
values(Sz1,1),(Sz2,2),(Sz3,3),(Sz4,4)
) X (Size,ColumnPosition)
)
select a.ProductID,a.SizeRange,c.Size,isnull(x.Qty,0) as Qty
from #T1 a
cross apply (
values(a.Sale1,1),(a.Sale2,2),(a.Sale3,3),(a.Sale4,4)
) X (Qty,ColumnPosition)
inner join cte_sizes c
on c.SizeRange = a.SizeRange
and c.ColumnPosition = x.ColumnPosition
I have also code and considered this but is this the CROSS APPLY a better method?
with cte_sizes
as
(
select 1 as SizePos
union all
select SizePos + 1 as SizePos
from cte_sizes
where SizePos < 4
)
select a.ProductID
,a.SizeRange
,(case when b.SizePos = 1 then c.Sz1
when b.SizePos = 2 then c.Sz2
when b.SizePos = 3 then c.Sz3
when b.SizePos = 4 then c.Sz4 end
) as Size
,isnull((case when b.SizePos = 1 then a.Sale1
when b.SizePos = 2 then a.Sale2
when b.SizePos = 3 then a.Sale3
when b.SizePos = 4 then a.Sale4 end
),0) as Qty
from #T1 a
inner join #T2 c on c.SizeRange = a.SizeRange
cross join cte_sizes b
This is wild guessing, but my magic crystall ball told me, that you might be looking for something like this:
For this we do not need your table #TS at all.
WITH Unpivoted2 AS
(
SELECT t2.SizeRange,A.* FROM #t2 t2
CROSS APPLY(VALUES(1,t2.Sz1)
,(2,t2.Sz2)
,(3,t2.Sz3)
,(4,t2.Sz4)) A(SizePos,Size)
)
SELECT t1.ProductID
,Unpivoted2.SizeRange
,Unpivoted2.Size
,Unpivoted1.Qty
FROM #t1 t1
CROSS APPLY(VALUES(1,t1.Sale1)
,(2,t1.Sale2)
,(3,t1.Sale3)
,(4,t1.Sale4)) Unpivoted1(SizePos,Qty)
LEFT JOIN Unpivoted2 ON Unpivoted1.SizePos=Unpivoted2.SizePos AND t1.SizeRange=Unpivoted2.SizeRange
ORDER BY t1.ProductID,Unpivoted2.SizeRange;
The result:
+-----------+-----------+------+------+
| ProductID | SizeRange | Size | Qty |
+-----------+-----------+------+------+
| 123 | S-XL | S | 1 |
+-----------+-----------+------+------+
| 123 | S-XL | M | 12 |
+-----------+-----------+------+------+
| 123 | S-XL | L | 13 |
+-----------+-----------+------+------+
| 123 | S-XL | XL | 14 |
+-----------+-----------+------+------+
| 456 | 8-14 | 8 | 2 |
+-----------+-----------+------+------+
| 456 | 8-14 | 10 | 22 |
+-----------+-----------+------+------+
| 456 | 8-14 | 12 | NULL |
+-----------+-----------+------+------+
| 456 | 8-14 | 14 | 24 |
+-----------+-----------+------+------+
| 789 | S-L | S | 3 |
+-----------+-----------+------+------+
| 789 | S-L | M | NULL |
+-----------+-----------+------+------+
| 789 | S-L | L | 33 |
+-----------+-----------+------+------+
| 789 | S-L | XL | NULL |
+-----------+-----------+------+------+
The idea in short:
The cte will return your #T2 in an unpivoted structure. Each name-numbered column (something you should avoid) is return as a single row with an index indicating the position.
The SELECT will do the same with #T1 and join the cte against this set.
UPDATE: After a lot of comments...
If I get this (and the changes to the initial question) correctly, the approach above works perfectly well, but you want to know, what was best in performance.
The first answer to "What is the fastest approach?" is Race your horses by Eric Lippert.
Good to know 1: A CTE is nothing more then syntactic sugar. It will allow to type a sub-query once and use it like a table, but it has no effect to the way how the engine will work this down.
Good to know 2: It is a huge difference whether you use APPLY or JOIN. The first will call the sub-source once per row, using the current row's values. The second will have to create two sets first and will then join them by some condition. There is no general "what is better"...
For your issue: As there is one very big set and one very small set, all depends on when you reduce the big set usig any kind of filter. The earlier the better.
And most important: It is - in any case - a sign of bad structures - when you find name numbering (something like phone1, phone2, phoneX). The most expensive work will be to transform your 4 name-numbered columns to some dedicated rows. This should be stored in normalized format...
If you still need help, I'd ask you to start a new question.

SQL Server Find the date in joining order

I am using MS-SQL Server there are two tables
membership
+---+-----------------+---------------------+----------------
| | membershipName | createddate | price |
+---+-----------------+---------------------+----------------
| 1 | Swimming | 2010-01-01 | 30 |
| 2 | Swimming | 2010-05-01 | 32 |
| 3 | Swimming | 2011-01-01 | 35 |
| 4 | Swimming | 2012-01-01 | 40 |
+---+-----------------+---------------------+----------------
member
+---+-----------------+---------------------+-----------------
| | memberName | membership | joiningDate |
+---+-----------------+---------------------+-----------------
| 0 | Andy | Swimming | 2008-02-02 |
| 1 | John | Swimming | 2010-02-02 |
| 2 | Andy | Swimming | 2011-02-02 |
| 3 | Alice | Swimming | 2015-02-02 |
+---+-----------------+---------------------+----------------
I want find the member's membership price for the right period of time
e.g
Andy return NULL
John return 30
Alice return 40
the best logic is to see
if the joiningDate is in between two start date
if yes choose the earlier date
if not
if the joining date is before the earlier date then use the earliest date
if the joining date is after the latest date then use the latest date
I am a Java programmer, do this in sql is quite tricky for me, any hint would be nice!
edit 1: sorry I forgot to consider month
edit 2: added desirable result
I hope I understood you correctly. try this out:
SELECT TOP 1 ms.Price
FROM membership ms
LEFT JOIN member m
ON m.joiningdate > ms.createdate
WHERE m.id = 3
ORDER BY price DESC
I hope I got this correctly. You might try it like this:
Declared table variable to mock-up a test scenario:
DECLARE #membership TABLE(id INT, membershipName VARCHAR(100),createddate DATETIME,price DECIMAL(10,4));
INSERT INTO #membership VALUES
(1,'Swimming',{d'2010-01-01'},30)
,(2,'Swimming',{d'2010-05-01'},32)
,(3,'Swimming',{d'2011-01-01'},35)
,(4,'Swimming',{d'2012-01-01'},40);
DECLARE #member TABLE(id INT,memberName VARCHAR(100),membership VARCHAR(100),joiningDate DATETIME);
INSERT INTO #member VALUES
(0,'Andy','Swimming',{d'2008-02-02'})
,(1,'John','Swimming',{d'2010-02-02'})
,(2,'Andy','Swimming',{d'2011-02-02'})
,(3,'Alice','Swimming',{d'2015-02-02'});
As you are on SQL-Server 2012 you are lucky. You can use LEAD:
The CTE "Intervalls" will return the membership table as is and it will add one column with one second before the next rows createddate. LEAD helps you to get hands on a value of a later coming row. First I take away one second, then I set a very high date in case of NULL:
WITH Intervalls AS
(
SELECT *
,ISNULL(DATEADD(SECOND ,-1,LEAD(createddate) OVER(ORDER BY createddate)),{d'2100-01-01'}) AS EndOfIntervall
FROM #membership AS ms
)
--The SELECT reads all members and joins them to the membership where their date is in the range according to "Intervalls". Only the case ealier than the first must be treated specially:
SELECT m.*
,ISNULL(i.price, CASE WHEN YEAR(m.joiningDate)<(SELECT MIN(x.createddate) FROM #membership as x)
THEN (SELECT TOP 1 x.price FROM #membership AS x ORDER BY x.createddate ASC) END)
FROM #member AS m
LEFT JOIN Intervalls AS i ON m.joiningDate BETWEEN i.createddate AND i.EndOfIntervall
UPDATE Better approach (thx to Paparis)
SELECT m.*
,ISNULL(Corresponding.price, (SELECT TOP 1 x.price FROM #membership AS x ORDER BY x.createddate ASC)) AS price
FROM #member AS m
OUTER APPLY
(
SELECT TOP 1 ms.price
FROM #membership AS ms
WHERE ms.createddate<=m.joiningDate
ORDER BY ms.createddate DESC
) AS Corresponding
UPDATE 2: Even simpler!
SELECT m.*
,ISNULL
(
(
SELECT TOP 1 ms.price
FROM #membership AS ms
WHERE ms.createddate<=m.joiningDate
ORDER BY ms.createddate DESC
),
(
SELECT TOP 1 x.price FROM #membership AS x ORDER BY x.createddate ASC
)
) AS price
FROM #member AS m

Select rows with same value in one column but different value in another column

I've been trying to build this query but am new to SQL so I'd really appreciate some help.
In the below table example, I have a Customer Code, a linked Customer Code (which is used to link a child customer to a parent customer), a salesperson, and other irrelevant columns. The goal is to have one Salesperson for each parent customer and it's children. So in the example, CustCode #100 is the parent of itself, #200, #500, and #800. All of these accounts have the same Salesperson (JASON) which is perfect. But for CustCode #300, it is the parent of itself, #400, and #600. However, there isn't one salesperson assigned - its both JIM and SUZY. I want to build a query that shows all accounts for this example. Basically, accounts where the Salesperson field isn't the same value for all of it's child customers.
I tried a Where clause for Salesperson <> Salesperson but its not showing up right.
+-----------+-----------------+------------+----------------------+
| CustCode | Linked CustCode | Salesperson| additional columns...|
+-----------+-----------------+------------+----------------------+
| 100 | 100 | JASON | ... |
| 200 | 100 | JASON | ... |
| 300 | 300 | JIM | ... |
| 400 | 300 | JIM | ... |
| 500 | 100 | JASON | ... |
| 600 | 300 | SUZY | ... |
| 700 | NULL | JIM | ... |
| 800 | 100 | JASON | ... |
+-----------+-----------------+------------+----------------------+
Thanks so much for your help!
You can do self join on the table.
select distinct r2.* from
table r1
join table r2
on
r1.linkedcustcode = r2.linkedcustcode and r1.salesperson <> r2.salesperson
This solution uses a recursive CTE first to build the hierarchy and find the leading code for each row, even if a linked code points to a row which is pointing to an upper row itself.
The final query shows the count of different Salespersons:
DECLARE #tbl TABLE(CustCode INT,[Linked CustCode] INT,Salesperson VARCHAR(100));
INSERT INTO #tbl VALUES
(100,100,'JASON')
,(200,100,'JASON')
,(300,300,'JIM')
,(400,300,'JIM')
,(500,100,'JASON')
,(600,300,'SUZY')
,(700,NULL,'JIM')
,(800,100,'JASON');
--The query
WITH CleanUp AS
(
SELECT CustCode
,CASE WHEN [Linked CustCode]=CustCode THEN NULL ELSE [Linked CustCode] END AS [Linked CustCode]
,Salesperson
FROM #tbl
)
,recCTE AS
(
SELECT CustCode AS LeadingCode,CustCode,[Linked CustCode],Salesperson
FROM CleanUp
WHERE [Linked CustCode] IS NULL
UNION ALL
SELECT recCTE.LeadingCode,t.CustCode,t.[Linked CustCode],t.Salesperson
FROM recCTE
INNER JOIN CleanUp AS t ON t.[Linked CustCode]=recCTE.CustCode
)
SELECT LeadingCode,COUNT(DISTINCT Salesperson) AS CountSalesperson
FROM recCTE
GROUP BY LeadingCode
The result
LeadingCode CountSalesperson
100 1
300 2
700 1

CTE to represent a logical table for the rows in a table which have the max value in one column

I have an "insert only" database, wherein records aren't physically updated, but rather logically updated by adding a new record, with a CRUD value, carrying a larger sequence. In this case, the "seq" (sequence) column is more in line with what you may consider a primary key, but the "id" is the logical identifier for the record. In the example below,
This is the physical representation of the table:
seq id name | CRUD |
----|-----|--------|------|
1 | 10 | john | C |
2 | 10 | joe | U |
3 | 11 | kent | C |
4 | 12 | katie | C |
5 | 12 | sue | U |
6 | 13 | jill | C |
7 | 14 | bill | C |
This is the logical representation of the table, considering the "most recent" records:
seq id name | CRUD |
----|-----|--------|------|
2 | 10 | joe | U |
3 | 11 | kent | C |
5 | 12 | sue | U |
6 | 13 | jill | C |
7 | 14 | bill | C |
In order to, for instance, retrieve the most recent record for the person with id=12, I would currently do something like this:
SELECT
*
FROM
PEOPLE P
WHERE
P.ID = 12
AND
P.SEQ = (
SELECT
MAX(P1.SEQ)
FROM
PEOPLE P1
WHERE P.ID = 12
)
...and I would receive this row:
seq id name | CRUD |
----|-----|--------|------|
5 | 12 | sue | U |
What I'd rather do is something like this:
WITH
NEW_P
AS
(
--CTE representing all of the most recent records
--i.e. for any given id, the most recent sequence
)
SELECT
*
FROM
NEW_P P2
WHERE
P2.ID = 12
The first SQL example using the the subquery already works for us.
Question: How can I leverage a CTE to simplify our predicates when needing to leverage the "most recent" logical view of the table. In essence, I don't want to inline a subquery every single time I want to get at the most recent record. I'd rather define a CTE and leverage that in any subsequent predicate.
P.S. While I'm currently using DB2, I'm looking for a solution that is database agnostic.
This is a clear case for window (or OLAP) functions, which are supported by all modern SQL databases. For example:
WITH
ORD_P
AS
(
SELECT p.*, ROW_NUMBER() OVER ( PARTITION BY id ORDER BY seq DESC) rn
FROM people p
)
,
NEW_P
AS
(
SELECT * from ORD_P
WHERE rn = 1
)
SELECT
*
FROM
NEW_P P2
WHERE
P2.ID = 12
PS. Not tested. You may need to explicitly list all columns in the CTE clauses.
I guess you already put it together. First find the max seq associated with each id, then use that to join back to the main table:
WITH newp AS (
SELECT id, MAX(seq) AS latestseq
FROM people
GROUP BY id
)
SELECT p.*
FROM people p
JOIN newp n ON (n.latestseq = p.seq)
ORDER BY p.id
What you originally had would work, or moving the CTE into the "from" clause. Maybe you want to use a timestamp field rather than a sequence number for the ordering?
Following up from #Glenn's answer, here is an updated query which meets my original goal and is on par with #mustaccio's answer, but I'm still not sure what the performance (and other) implications of this approach vs the other are.
WITH
LATEST_PERSON_SEQS AS
(
SELECT
ID,
MAX(SEQ) AS LATEST_SEQ
FROM
PERSON
GROUP BY
ID
)
,
LATEST_PERSON AS
(
SELECT
P.*
FROM
PERSON P
JOIN
LATEST_PERSON_SEQS L
ON
(
L.LATEST_SEQ = P.SEQ)
)
SELECT
*
FROM
LATEST_PERSON L2
WHERE
L2.ID = 12