Related
I'm stuck with this since last week. I have two tables, where the id column of CustomerTbl correlates with CustomerID column of PurchaseTbl:
What I'm trying to achieve is I want to duplicate the data of the table from itself, but copy the newly generated id of CustomerTbl to PurchaseTbl's CustomerID
Just like from the screenshots above. Glad for any help :)
You may use OUTPUT clause to access to the new ID. But to access to both OLD ID and NEW ID, you will need to use MERGE statement. INSERT statement does not allow you to access to the source old id.
first you need somewhere to store the old and new id, a mapping table. You may use table variable or temp table
declare #out table
(
old_id int,
new_id int
)
then the merge statement with output clause
merge
#CustomerTbl as t
using
(
select id, name
from CustomerTbl
) as s
on 1 = 2 -- force it to `false`, not matched
when not matched then
insert (name)
values (name)
output -- the output clause
s.id, -- old_id
inserted.id -- new_id
into #out (old_id, new_id);
after that you just use the #out to join back using old_id to obtain the new_id for the PurchaseTbl
insert into PurchaseTbl (CustomerID, Item, Price)
select o.new_id, p.Item, p.Price
from #out o
inner join PurchaseTbl p on o.old_id = p.CustomerID
Not sure what your end game is, but one way you could solve this is this:
INSERT INTO purchaseTbl ( customerid ,
item ,
price )
SELECT customerid + 3 ,
item ,
price
FROM purchaseTbl;
I have two SQL temp tables #Temp1 and #Temp2.
I want to get entryno which contain set of temp table two.
For example: #Temp2 has 8 records. I want to search in #Temp1 which contains a set of records from #Temp1.
CREATE TABLE #Temp1 (entryNo INT, setid INT, measurid INT,measurvalueid int)
CREATE TABLE #Temp2(setid INT, measurid INT,measurvalueid int)
INSERT INTO #Temp1 (entryNo,setid,measurid,measurvalueid )
VALUES (1,400001,1,1),
(1,400001,2,110),
(1,400001,3,1001),
(1,400001,4,1100),
(2,400002,5,100),
(2,400002,6,102),
(2,400002,7,1003),
(2,400002,8,10004),
(3,400001,1,1),
(3,400001,2,110),
(3,400001,3,1001),
(3,400001,4,1200)
INSERT INTO #Temp2 (setid,measurid,measurvalueid )
VALUES (400001,1,1),
(400001,2,110),
(400001,3,1001),
(400001,4,1100),
(400002,5,100),
(400002,6,102),
(400002,7,1003),
(400002,8,10004)
I want output
EntryNo
1
2
It contains two sets.
One is:
(400001,1,1),
(400001,2,110),
(400001,3,1001),
(400001,4,1100)
The second is:
(400002,5,100),
(400002,6,102),
(400002,7,1003),
(400002,8,10004)
Try this:
WITH DataSourceInialData AS
(
SELECT *
,COUNT(*) OVER (PARTITION BY [entryNo], [setid]) AS [GroupCount]
FROM #Temp1
), DataSourceFilteringData AS
(
SELECT *
,COUNT(*) OVER (PARTITION BY [setid]) AS [GroupCount]
FROM #Temp2
)
SELECT A.[entryNo]
FROM DataSourceInialData A
INNER JOIN DataSourceFilteringData B
ON A.[setid] = B.[setid]
AND A.[measurid] = B.[measurid]
AND A.[measurvalueid] = B.[measurvalueid]
-- we are interested in groups which are passed completely by the filtering groups
AND A.[GroupCount] = B.[GroupCount]
GROUP BY A.[entryNo]
-- aftering joining the rows, the filtered rows must match the filtering rows
HAVING COUNT(A.[setid]) = MAX(B.[GroupCount]);
The algorithm is simple:
we count how many rows exists per data group
we count how many rows exists per filtering group
we join the initial data and the filtering data
after the join we count how many rows are left in the initial data and if there count is equal to the filtering count for the given group
and the result is:
Note, that I am checking for each match. For example, if in your sample data, there is one more row for entryNo = 1 it won't be included in the result. In order to change this behavior, comment this row:
-- we are interested in groups which are passed completely by the filtering groups
AND A.[GroupCount] = B.[GroupCount]
I have data from sql server attached :
select * from log
What I want to do is I want to check if there any changes in code for the column name. So if you see the data from table log, the code change 2 times (B02,B03).
What I want to do is I want to retrieve the row which is the first changes everytime the code change. In this sample, the first changes is on the red box. So I want to have the result for row 5 and row 9.
I've tried to use partition like code below:
select a.name,a.code from(
select name,code,row_number() over(partition by code order by name) as rank from log)a
where a.rank=1
and get result like this.
However, I don't want the first row to be retrieved. Since it is the first value and I don't need that. So i just want to retrieve the changes indicates by column code. Please help if you know how to do it.
and please note, I can't write query using filter where code <> 'B01', because in this case, I don't know what is the first value.
Please assume the first value is the data that first inserted into the table.
Use lag to get the previous row's value (assuming id specifies ordering) and get the rows where it is different from the current row's value.
create table #log (id int identity(1,1) not null, name nvarchar(100), code nvarchar(100));
insert into #log(name,code) values ('SARUMA','B01'), ('SARUMA','B01'), ('SARUMA','B01'), ('SARUMA','B01');
insert into #log(name,code) values ('SARUMA','B02'), ('SARUMA','B02'), ('SARUMA','B02'), ('SARUMA','B02');
insert into #log(name,code) values ('SARUMA','B03'), ('SARUMA','B03');
select name
,code
from (
select l.*
,lag(code) over (
partition by name order by id
) as prev_code
from #log l
) l
where prev_code <> code
create table #log (name nvarchar(100), code nvarchar(100));
insert into #log values ('SARUMA','B01'), ('SARUMA','B01'), ('SARUMA','B01'), ('SARUMA','B01');
insert into #log values ('SARUMA','B02'), ('SARUMA','B02'), ('SARUMA','B02'), ('SARUMA','B02');
insert into #log values ('SARUMA','B03'), ('SARUMA','B03');
-- remove duplicates
with Singles (name, code)
AS (
select distinct name, code from #log
),
-- At first you need an order, in time? By alphanumerical code? Otherwise you cannot decide which is the first item you want to remove
-- So I added an identity ordering, but it is preferable to use a physical column
OrderedSingles (name, code, id)
AS (
select *, row_number() over(order by name)
from Singles
)
-- Now self-join to get the next one, if the index is sequential you can join id = id+1
-- and take the join columns
select distinct ii.name, ii.Code
from OrderedSingles i
inner join OrderedSingles ii
on i.Name = ii.Name and i.Code <> ii.Code
where i.id < ii.Id;
I think that your original post was pretty close, though you would want the windowing function to be on the [NAME] column, not the code. Please see my modifications, below. I've also changed the predicate to be >1, as 1 would be the original record.
SELECT
a.[name]
,a.[code]
FROM (
SELECT
[name]
,[code]
,ROW_NUMBER() OVER(PARTITION BY [name] order by [name], [code]) AS [rank]
FROM log)a
WHERE a.rank>1
NOTE: you may want to not use NAME as a field, since it is a reserved word. Additionally, RANK is a reserved word as well, and you've used it to alias the ROW_NUMBER in the nested query. You may want to use another non-reserved word for that - personally, I use RANKED for that purpose.
So I'm trying to insert data into the Main_Contract_Data table from three different tables and it is producing an error that is shown below, does anyone know why?
Error:
Msg 120, Level 15, State 1, Line 1
The select list for the INSERT statement contains fewer items than the insert list. The number of SELECT values must match the number of INSERT columns.
//SQL Server 2008 Code
INSERT INTO Main_Contract_Data
(organisation_name,
contract_start_date,
a_manager,
d_manager)
(SELECT [Client]
FROM [Internal].[dbo].[RequiredFields$])
(SELECT [Start Date]
FROM [Internal].[dbo].[RequiredFields$])
(SELECT person_id
FROM A_Manager
WHERE person_id = '5')
(SELECT person_id
FROM D_Manager
WHERE person_id = '6')
You just need to make those sub queries:
INSERT INTO Main_Contract_Data
(organisation_name,
contract_start_date,
a_manager,
d_manager)
SELECT
(SELECT [Client]
FROM [Internal].[dbo].[RequiredFields$]),
(SELECT [Start Date]
FROM [Internal].[dbo].[RequiredFields$]),
(SELECT person_id
FROM A_Manager
WHERE person_id = '5'),
(SELECT person_id
FROM D_Manager
WHERE person_id = '6')
But keep in mind that each sub query can only return one row, while the overall query needs to return an entire result set. If that's only one row too, that's fine, but the overall SELECT is to return one or more while each sub query returns one row, and one value for each row in the overall query.
I have a table which has records that contain a persons information and a filename that the information originated from, so the table looks like so:
|Table|
|id, first-name, last-name, ssn, filename|
I also have a stored procedure that provides some analytics for the files in the system and i'm trying to add information to that stored procedure to shed light into the possibility of duplicates.
Here is the current stored procedure
SELECT [filename],
COUNT([filename]) as totalRecords,
COUNT(closedleads.id) as closedRecords,
ROUND(--calcs percent of records closed in a file)
FROM table
LEFT OUTER JOIN closedleads ON closedleads.leadid = table.id
GROUP BY [filename]
What I want to add is the ability to see maybe # of possible duplicates, defined as records with matching SSNs and I am at a loss as to how I could perform a count on a sub query or join and include it in the results set. Can anyone provide some pointers?
What I'm trying to do is add something like this to my procedure above
SELECT COUNT(
SELECT COUNT(*) FROM Table T1
INNER JOIN Table T2 on T1.SSN = T2.SSN
WHERE T1.id != T2.id
) as PossibleDuplicates
What I'm looking for is merging this code with my procedure above so I can get all of the same data in one and possible have this # of duplicates across each filename, so for each filename I get a result of # of records, # of records closed and # of possible duplicates
EDIT:
I'm very close to my desired goal but I'm failing on the last little bit--getting the number of possible duplicates BY filename, here is my query
select [q1].[filename], [q1].leads, [q1].closed, [q2].dups
FROM (
SELECT [filename], count([filename]) as leads,
count(closedleads.id) as closed
FROM Table
left join closedleads on closedleads.leadid = Table.id
group by [filename]
) as [q1]
INNER JOIN (
select count([ssn]) as dups, [filename] from Table
group by [ssn], [filename]
having count([ssn]) > 1
) as [q2] on [q1].[filename] = [q2].[filename]
This works but it showing multiple results for each filename with values of 2-5 instead of summing the total count of possible duplicates
Working Query
Hey everyone, thanks for all the help, this is eventually what I got to that worked exactly as I wanted
select [q1].[filename], [q1].leads, [q1].closed, [q2].dups,
round(([q1].closed / [q1].leads), 3) as percentClosed
FROM (
SELECT [filename], count([filename]) as leads,
count(closedleads.id) as closed
FROM Table
left join closedleads on closedleads.leadid = Table.id
and [filename] is not null
group by [filename]
) as [q1]
INNER JOIN (
select [filename], count(*) - count(distinct [ssn]) as dups
from Table
group by [filename]
) as [q2] on [q1].[filename] = [q2].[filename]
You'll probably want to make use of a HAVING clause somewhere, eg:
LEFT JOIN (
SELECT SSN, COUNT(SSN) - 1 DupeCount FROM Table T1
GROUP BY SSN
HAVING COUNT(SSN) > 1 ) AS PossibleDuplicates
ON table.ssn = PossibleDuplicates.SSN
If you want to include 0 possible duplicates (rather than null) you actually don't need the HAVING clause, just the left join.
Edit - Updated with a better example which matches your question better
Here's an example if I understand correctly.
create table #table (id int,ssn varchar(10))
insert into #table values(1,'10')
insert into #table values(2,'10')
insert into #table values(3,'11')
insert into #table values(4,'12')
insert into #table values(5,'11')
insert into #table values(6,'13')
select sum(cnt)
from (
select count(distinct ssn) as cnt
from #table
group by ssn
having count(*)>1
) dups
You shouldn't need to self join the table if you group by ssn and then pull back only ssn's where you have more then one.
I think the existing answers don't quite understand your question. I think I do but it's not completely specified yet. Is it a duplicate if the same SSN appears in two different files or only within the same file? Because you group by filename, that becomes the grain.
The Output of your query is like
StateFarm1, 500, 50, 10%, <your new value goes here>
AllState2, 100, 90, 90% <your new value goes here>
So if you have the same SSN in those two files, you have 1 duplicate, so on which row do you show 1, on the AllState row or the Statefarm row? If you say both, invariably someone will SUM that column and get a doubling of the results.
Now What if you have a Geico row with the same SSN, is that 1 duplicate or 2? and again which row?
I know this isn't a final answer but these questions do highlight the the question as it stands is unanswerable... you fix this and I'll change the answer,
please no downvotes in the meantime
Addendum
I believe the only thing you are missing is a DISTINCT.
select [q1].[filename], [q1].leads, [q1].closed, [q2].dups
FROM (
SELECT [filename], count([filename]) as leads,
count(closedleads.id) as closed
FROM tbldata
left join closedleads on closedleads.leadid = Table.id
group by [filename]
) as [q1]
INNER JOIN (
select count( DISTINCT [ssn]) as dups, [filename] from Table '<---- here'
group by [ssn], [filename]
having count([ssn]) > 1
) as [q2] on [q1].[filename] = [q2].[filename]
You don't need the outer COUNT - your inner SELECT COUNT(*)... will return you just one number, a count of records with duplicate SSN but different id.