SQL Server 2012 writing duplicate entries into table from CTE - sql

So I am writing to a table the output from a few sequential CTEs, and when I fixed a join in one of the CTEs from an inner to a left join, there are now duplicated entries in the Table that do not show up if I just run the query without the insert.
Is there something I need to understanding about creating and inserting into a table with regards to joins in a CTE?
EDIT
create table MYTABLE
(
ID int,
Date smalldatetime,
Val1 int,
Val2 int
)
; with cte1 as (
select
a.ID,
a.Date,
a.Val1,
b.Val2
from table1 a
left join table2 b
on a.ID = b.ID
and a.Date = b.Date
)
insert into MYTABLE
(ID, Date, Val1, Val2)
select * from cte1
When creating the table on the inner join there is no problem with duplicates; on the left join (as shown above), rows where there are NULLs appear to be duplicated many times.

Check your right table (table2) my guess is that there are more than one record that have the same ID and Date.
If that is the case, the records are not technically duplicated if you do a select all (*) in the CTE, you will see the other fields that have changed.
If you do not care about the rest of the fields being different though, just try adding a Row_Number to your CTE and select where the Row_Number = 1 outside of the CTE.
For Instance:
create table MYTABLE
(
ID int,
Date smalldatetime,
Val1 int,
Val2 int
)
; with cte1 as (
select
a.ID,
a.Date,
a.Val1,
b.Val2
Rnum = ROW_NUMBER() OVER(PARTITION BY a.ID, a.Date, a.Val1, a.Val2 ORDER BY ID)
from table1 a
left join table2 b
on a.ID = b.ID
and a.Date = b.Date
)
insert into MYTABLE
(ID, Date, Val1, Val2)
select ID, Date, Val1, Val2 from cte1
where Rnum = 1
The row_number acts as a "distinct" and depending on what combination of fields you want to not duplicate, you will get different results.
For instance, if you do not want the IDs to duplicate, then
Rnum = ROW_NUMBER() OVER(PARTITION BY a.ID ORDER BY ID)
if you do not care about the IDs duplicating, but you do not want the same ID on the same date, then
Rnum = ROW_NUMBER() OVER(PARTITION BY a.ID, a.Date ORDER BY ID)
etc.... just depends on your selection criteria of what you do not want to duplicate.
Hope this helps

Related

How to replace TOP 1000 rows of select columns indiscriminately

Basically I have a table that contains 1000 rows with three columns. (TABLE A)
I have ANOTHER table with 200 columns with 1million+ records. (TABLE B)
I am trying to replace the THREE COLUMNS OF 1000 rows of TABLE B with those of TABLE A. I've read a lot of solutions where you can INSERT into table B from TABLE A.. but that's useless because I'll get NULLs in the remaining 197 columns that I need data for.
So the task is to replace rows of certain columns from one table to select columns of another table. There is NO conditions, just the top rows or whatever order you can think of is fine. If you can give an answer that takes ORDER BY something into account, that'd be bonus! Thank you so much!
If I understood your requirements
WITH TA
AS (SELECT *,
ROW_NUMBER()
OVER (
ORDER BY col1) AS RN
FROM TableA),
TB
AS (SELECT *,
ROW_NUMBER()
OVER (
ORDER BY col1) AS RN
FROM TableB)
UPDATE TB
SET TB.col1 = TA.col1,
TB.col2 = TA.col2,
TB.col3 = TA.col3
FROM TB
JOIN TA
ON TB.RN = TA.RN
Try something like this:
WITH topB AS (
SELECT TOP 1000 row_number() OVER(ORDER BY field_n) rn, b.* FROM table_b b
ORDER BY field_x),
topA AS (
SELECT row_number() OVER(ORDER BY field_m) rn, a.*
FROM table_a a)
UPDATE b
SET
b.Field_1 = a.Field_1,
b.Field_2 = a.Field_2,
b.Field_3 = a.Field_3
FROM
TopB b JOIN TopA a ON b.rn = a.rn
Idea here is to assign row numbers in both tables, join them by these numbers, and update the B part of the join with values from A.

getting top row of joined table

I have 2 tables, tableA and tableB
tableA - id int
name varchar(50)
tableB - id int
fkid int
name varchar(50)
Both tables are joined between id and fkid.
Below are sample rows from tableA
Below is output from tableB
I want to join both tables and get only top row of joined table. So output will be like below
Id Name fkid
1 P1 1
2 P2 4
3 P3 null
Here is Sql fiddle
How can i achieve this with single query? I know that i can loop through in my .net code and retrieve top rows. But i want it in single query.
select a.id,a.name,b.fid from tableA a left join
(
select min(id) fid ,fkid from tableB group by fkid
)b
on a.id = b.fkid
select ta.id, ta.name, min(tb.id) from tableA ta
left join tableB tb on tb.fkid=ta.id
group by ta.id, ta.name
You could do this:
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY fkID ORDER BY ID) AS RowNbr,
tableB.*
FROM
tableB
)
SELECT
*
FROM
tableA
LEFT JOIN CTE
ON CTE.fkID=tableA.id
AND CTE.RowNbr=1
Demo here
Or without window function. Like this:
SELECT
*
FROM
tableA
LEFT JOIN
(
SELECT
ROW_NUMBER() OVER(PARTITION BY fkID ORDER BY ID) AS RowNbr,
tableB.*
FROM
tableB
) as tbl
ON tbl.fkID=tableA.id
AND tbl.RowNbr=1
Demo here
Update:
The reason why I choose to do it with row_number is that if there is more columns in tableB then the example. Then there is no need for additional aggregate if you want to show more columns. For me personally it is more clear with an order by on the ID

How to Insert Select every with TOP() clause

these are my table,
http://sqlfiddle.com/#!3/a8087/1
What i'm trying to achieve is insert into another new table by selecting from Tbl2 and CustTable.
E.g:
INSERT INTO tbl3
SELECT TOP(SELECT Counter FROM Tbl2) a.name, a.amount FROM custTable a
INNER JOIN Tbl2 b ON a.custId = b.custid
I want to insert X number of ROW base on CustId's [Counter].
It's not working because Subquery returned more than 1 value.
how can i fix the query in TOP()?
You can use Windowing functions to do rank the rows by customer, and then filter by the counter:
WITH cte as
(
SELECT a.Name, a.Amount, b.Counter,
ROW_NUMBER() OVER (PARTITION BY a.CustID ORDER BY a.Amount DESC) AS RN
FROM custTable a
INNER JOIN Tbl2 b ON a.custId = b.custid
)
SELECT cte.name, cte.amount
INTO tbl3
FROM cte
WHERE cte.rn <= Counter;
You'll need to choose an ORDER on each customer to determine 'which' of the TOP records get included (I've assumed you want the top amounts here)
I've also used SELECT ... INTO to create table 3 on the fly, but you can INSERT INTO if it is already created.
Updated your SqlFiddle here

How to query total when I have a join table

Hallo,
I have a join table, said tableA and tableB. tableA have a column called Amount. tableB have a column called refID. I would like to total up the Amount column when refID having the same value. I was using SUM in my query, but it throw me an error:
ORA-30483: window functions are not allowed here
30483. 00000 - "window functions are not allowed here"
*Cause: Window functions are allowed only in the SELECT list of a query.
And, window function cannot be an argument to another window or group
function.
Here is my query for your reference:
select *
from (
select SUM(A.Amount), B.refId, Rank() over (partition by B.refID order by B.id desc) as ranking
from table A
left outer join table B on A.refID = B.refID
)
where ranking=1;
May I know is there any alternate solution in order for me to SUM the Amount?
THanks #!
select
SUM(A.Amount),
B.refId
from table A
left outer join table B on A.refID = B.refID
GROUP BY
B.refId
SELECT *
FROM (
SELECT A.Amount, B.refId,
Rank() over (partition by A.refID order by B.id desc) as ranking,
SUM(amount) OVER (PARTITION BY a.refId) AS asum
FROM tableA A
LEFT JOIN
tableB B
ON B.refID = A.refID
)
WHERE ranking = 1
Declare #T table(id int)
insert into #T values (1),(2)
Declare #T1 table(Tid int,fkid int,Amount int)
insert into #T1 values (1,1,200),(2,1,250),(3,2,100),(4,2,25)
Select SUM(t1.Amount) as amount,t1.fkid as id from #T t
left outer join #T1 t1 on t1.fkid = t.id group by t1.fkid
SELECT refid, sum(a.amount)
FROM table AS a LEFT table AS b USING (refid)
GROUP BY refid;
I'm a little confused. The query you posted did not have a SUM function anywhere, and performed a self-join of a table named "TABLE" to itself. I'm going to guess that you actually have two tables (I'll call them TABLE_A and TABLE_B), in which case the following should do it:
SELECT a.REFID, SUM(a.AMOUNT)
FROM TABLE_A a
INNER JOIN TABLE_B b
ON (b.REFID = a.REFID)
GROUP BY a.REFID;
If I understood your question you only wanted results when you have a TABLE_B.REFID which matches a TABLE_A.REFID, so an INNER JOIN would be appropriate.
Share and enjoy.

SQL Select Distinct with Conditional

Table1 has columns (id, a, b, c, group). There are several rows that have the same group, but id is always unique. I would like to SELECT group,a,b FROM Table1 WHERE the group is distinct. However, I would like the returned data to be from the row with the greatest id for that group.
Thus, if we have the rows
(id=10, a=6, b=40, c=3, group=14)
(id=5, a=21, b=45, c=31, group=230)
(id=4, a=42, b=65, c=2, group=230)
I would like to return these 2 rows:
[group=14, a=6,b=40] and
[group=230, a=21,b=45] (because id=5 > id=4)
Is there a simple SELECT statement to do this?
Try:
select grp, a, b
from table1 where id in
(select max(id) from table1 group by grp)
You can do it using a self join or an inner-select. Here's inner select:
select `group`, a, b from Table1 AS T1
where id=(select max(id) from Table1 AS T2 where T1.`group` = T2.`group`)
And self-join method:
select T1.`group`, T2.a, T2.b from
(select max(id) as id,`group` from Table1 group by `group`) T1
join Table1 as T2 on T1.id=T2.id
2 selects, your inner select gets:
SELECT MAX(id) FROM YourTable GROUP BY [GROUP]
Your outer select joins to this table.
Think about it logically, the inner select gets a sub set of the data you need.
The outer select inner joins to this subset and can get further data.
SELECT [group], a, b FROM YourTable INNER JOIN
(SELECT MAX(id) FROM YourTable GROUP BY [GROUP]) t
ON t.id = YourTable.id
SELECT mi.*
FROM (
SELECT DISTINCT grouper
FROM mytable
) md
JOIN mytable mi
ON mi.id =
(
SELECT id
FROM mytable mo
WHERE mo.grouper = md.grouper
ORDER BY
id DESC
LIMIT 1
)
If your table is MyISAM or id is not a PRIMARY KEY, then make sure you have a composite index on (grouper, id).
If your table is InnoDB and id is a PRIMARY KEY, then a simple index on grouper will suffice (id, being a PRIMARY KEY, will be implictly included).
This will use an INDEX FOR GROUP-BY to build the list of distinct groupers, and for each grouper it will use the index access to find the maximal id.
Don't know how to do it in mysql. But the following code will work for MsSQL...
SELECT Y.* FROM
(
SELECT DISTINCT [group], MAX(id) ID
FROM Table1
GROUP BY [group]
) X
INNER JOIN Table1 Y ON X.ID=Table1.ID