Find duplicate groups of rows in SQL Server - sql

I have a table with materials information where one material has from one to many constituents.
The table looks like this:
material_id contstiuent_id constituent_wt_pct
1 1 10.5
1 2 89.5
2 1 10.5
2 5 15.5
2 7 74
3 1 10.5
3 2 89.5
Generally, I can have different material ID's with the same constituents (both ID's and weight percent), but also the same constituent id with the same weight percent can be in multiple materials.
I need to find the material ID's that have exactly the same amount of constituents, same constituents id's and same weight percent (in the example of data that will be material ID 1 and 3)
What would be great is to have the output like:
ID Duplicate ID's
1 1,3
2 15,25
....
Just to clarify the question: I have several thousands of materials and it won't help me if I get just the id's of duplicate rows - I would like to see if it is possible to get the groups of duplicate material id's in the same row or field.

Build a XML string in a CTE that contains all constituents and use that string to figure out what materials is duplicate.
SQL Fiddle
MS SQL Server 2008 Schema Setup:
create table Materials
(
material_id int,
constituent_id int,
constituent_wt_pct decimal(10, 2)
);
insert into Materials values
(1, 1, 10.5),
(1, 2, 89.5),
(2, 1, 10.5),
(2, 5, 15.5),
(2, 7, 74),
(3, 1, 10.5),
(3, 2, 89.5);
Query 1:
with C as
(
select M1.material_id,
(
select M2.constituent_id as I,
M2.constituent_wt_pct as P
from Materials as M2
where M1.material_id = M2.material_id
order by M2.constituent_id,
M2.material_id
for xml path('')
) as constituents
from Materials as M1
group by M1.material_id
)
select row_number() over(order by 1/0) as ID,
stuff((
select ','+cast(C2.material_id as varchar(10))
from C as C2
where C1.constituents = C2.constituents
for xml path('')
), 1, 1, '') as MaterialIDs
from C as C1
group by C1.constituents
having count(*) > 1
Results:
| ID | MATERIALIDS |
--------------------
| 1 | 1,3 |

Well you can use the following code to get the duplicate value,
Select EMP_NAME as NameT,count(EMP_NAME) as DuplicateValCount From dbo.Emp_test
group by Emp_name having count(EMP_NAME) > 1

Related

SQL, order by data entered

I wasn't quite sure how to word this question. But if you imagine I have:
var contents = "5, 7, 1, 3, 4";
And I want to do a query:
SELECT id,name FROM db WHERE id in (contents);
I would get the following response (a):
ID NAME
1 ONE
3 THREE
4 FOUR
5 FIVE
7 SEVEN
But in reality, I want it to be ordered by the order of contents, i.e (b):
ID NAME
5 FIVE
7 SEVEN
1 ONE
3 THREE
4 FOUR
Is there anyway to have the resposne ordered as b and not a
An ANSI SQL method uses a giant case expression:
SELECT id,name
FROM db
WHERE id in (contents)
ORDER BY (CASE id WHEN 5 THEN 1 WHEN 7 THEN 2 WHEN 1 THEN 3 WHEN 3 THEN 4 WHEN 4 THEN 5 END);
With standard ANSI SQL you could join to a list of values that specify the sort order:
select t.*
from the_table t
join (
values
(5, 1),
(7, 2),
(1, 3),
(3, 4),
(4, 5)
) as s (id, sort_order) on t.id = s.id
order by s.sort_order ;
Online example: http://rextester.com/UDW37167

Recursive view that sum value from double tree structure SQL Server

First sorry for numerous repost of my question, I'm new around and getting used to properly and clearly asking questions.
I'm working on a recursive view that sum up values from a double tree structure.
I have researched around and found many questions about recursive sums but none of their solutions seemed to work for my issue specifically.
As of now I have issues aggregating the values in the right cells, the logic being i need the sum of each element per year in it's parent and also the sum of all the years for a given element.
Here is a fiddle of my tables and actual script:
SQL Fiddle
And here is a screenshot of the output I'm looking for:
My question is:
How can I get my view to aggregate the value from child to parent in this double tree structure?
If I understand your question correctly, you are trying to get an aggregation at 2 different levels to show in a single result set.
Clarification Scenario:
Below is an over-simplified sample data set for what I believe you are trying to achieve.
create table #agg_table
(
group_one int
, group_two int
, group_val int
)
insert into #agg_table
values (1, 1, 6)
, (1, 1, 7)
, (1, 2, 8)
, (1, 2, 9)
, (2, 3, 10)
, (2, 3, 11)
, (2, 4, 12)
, (2, 4, 13)
Given the sample data above, you want want to see the following output:
+-----------+-----------+-----------+
| group_one | group_two | group_val |
+-----------+-----------+-----------+
| 1 | NULL | 30 |
| 1 | 1 | 13 |
| 1 | 2 | 17 |
| 2 | NULL | 46 |
| 2 | 3 | 21 |
| 2 | 4 | 25 |
+-----------+-----------+-----------+
This output can be achieved by making use of the group by grouping sets
(example G. in the link) syntax in SQL Server as shown in the query below:
select a.group_one
, a.group_two
, sum(a.group_val) as group_val
from #agg_table as a
group by grouping sets
(
(
a.group_one
, a.group_two
)
,
(
a.group_one
)
)
order by a.group_one
, a.group_two
What that means for your scenario, is that I believe your Recursive-CTE is not the issue. The only thing that needs to change is in the final select query from the entire CTE.
Answer:
with Temp (EntityOneId, EntityOneParentId, EntityTwoId, EntityTwoParentId, Year, Value)
as
(
SELECT E1.Id, E1.ParentId, E2.Id, E2.ParentId, VY.Year, VY.Value
FROM ValueYear AS VY
FULL OUTER JOIN EntityOne AS E1
ON VY.EntityOneId = E1.Id
FULL OUTER JOIN EntityTwo AS E2
ON VY.EntityTwoId = E2.Id
),
T (EntityOneId, EntityOneParentId, EntityTwoId, EntityTwoParentId, Year, Value, Levels)
as
(
Select
T1.EntityOneId,
T1.EntityOneParentId,
T1.EntityTwoId,
T1.EntityTwoParentId,
T1.Year,
T1.Value,
0 as Levels
From
Temp
As T1
Where
T1.EntityOneParentId is null
union all
Select
T1.EntityOneId,
T1.EntityOneParentId,
T1.EntityTwoId,
T1.EntityTwoParentId,
T1.Year,
T1.Value,
T.Levels +1
From
Temp
AS T1
join
T
On T.EntityOneId = T1.EntityOneParentId
)
Select
T.EntityOneId,
T.EntityOneParentId,
T.EntityTwoId,
T.EntityTwoParentId,
T.Year,
sum(T.Value) as Value
from T
group by grouping sets
(
(
T.EntityOneId,
T.EntityOneParentId,
T.EntityTwoId,
T.EntityTwoParentId,
T.Year
)
,
(
T.EntityOneId,
T.EntityOneParentId,
T.EntityTwoId,
T.EntityTwoParentId
)
)
order by T.EntityOneID
, T.EntityOneParentID
, T.EntityTwoID
, T.EntityTwoParentID
, T.Year
FYI - I believe the sample data did not have the records necessary to match the expected output completely, but the last 20 records in the SQL Fiddle match the expected output perfectly.

Partially sort an SQL table according to column values

Suppose I have a table that looks like this:
product, color
1, 1
2, 1
3, 1
4, 2
5, 2
6, 2
7, 3
8, 3
would it be possible to re-arrange the table such that products are re-arranged by color? For example, in this case the answer would be:
product, color
1, 1
4, 2
7, 3
2, 1
5, 2
8, 3
2, 1
6, 2
3, 1
Sure, you can order by NEWID()
Let's make the test data;
IF OBJECT_ID('tempdb..#TestData') IS NOT NULL DROP TABLE #TestData
GO
CREATE TABLE #TestData (product int, colour int)
INSERT INTO #TestData (product, colour)
VALUES
(1,1)
,(2,1)
,(3,1)
,(4,2)
,(5,2)
,(6,2)
,(7,3)
,(8,3)
Then run the query on this;
SELECT
product
,colour
FROM #TestData
ORDER BY NEWID()
Which gives a random order of the data like this;
product colour
4 2
1 1
5 2
7 3
6 2
3 1
2 1
8 3
Edit: I've just seen that you seem to want to order with some pattern in the colour column, not random. I'm going to leave this answer anyway as a random result.
I would select a random number as third column and sort by that random number. In pseudo code:
SELECT PRODUCT,
COLOR,
RANDOM_NUMBER()
FROM YOUR_TABLE
ORDER BY 3
The generation of a random number depends on your database. In Oracle, it would be dbms_random.random.
You can get rid of the random number by re-selecting from the table as follows:
SELECT PRODUCT,
COLOR
FROM (SELECT PRODUCT,
COLOR,
RANDOM_NUMBER()
FROM YOUR_TABLE
ORDER BY 3)
Sounds like a job for row_number:
SELECT product, colour, ROW_NUMBER() OVER (PARTITION BY color ORDER BY product)
FROM TABLE
ORDER BY 3, 2

sql select a field into 2 columns

I am trying to run below 2 queries on the same table and hoping to get results in 2 different columns.
Query 1: select ID as M from table where field = 1
returns:
1
2
3
Query 2: select ID as N from table where field = 2
returns:
4
5
6
My goal is to get
Column1 - Column2
-----------------
1 4
2 5
3 6
Any suggestions? I am using SQL Server 2008 R2
Thanks
There has to be a primary key to foreign key relationship to JOIN data between two tables.
That is the idea about relational algebra and normalization. Otherwise, the correlation of the data is meaningless.
http://en.wikipedia.org/wiki/Database_normalization
The CROSS JOIN will give you all possibilities. (1,4), (1,5), (1, 6) ... (3,6). I do not think that is what you want.
You can always use a ROW_NUMBER() OVER () function to generate a surrogate key in both tables. Order the data the way you want inside the OVER () clause. However, this is still not in any Normal form.
In short. Why do this?
Quick test database. Stores products from sporting goods and home goods using non-normal form.
The results of the SELECT do not mean anything.
-- Just play
use tempdb;
go
-- Drop table
if object_id('abnormal_form') > 0
drop table abnormal_form
go
-- Create table
create table abnormal_form
(
Id int,
Category int,
Name varchar(50)
);
-- Load store products
insert into abnormal_form values
(1, 1, 'Bike'),
(2, 1, 'Bat'),
(3, 1, 'Ball'),
(4, 2, 'Pot'),
(5, 2, 'Pan'),
(6, 2, 'Spoon');
-- Sporting Goods
select * from abnormal_form where Category = 1
-- Home Goods
select * from abnormal_form where Category = 2
-- Does not mean anything to me
select Id1, Id2 from
(select ROW_NUMBER () OVER (ORDER BY ID) AS Rid1, Id as Id1
from abnormal_form where Category = 1) as s
join
(select ROW_NUMBER () OVER (ORDER BY ID) AS Rid2, Id as Id2
from abnormal_form where Category = 2) as h
on s.Rid1 = h.Rid2
We definitely need more information from the user.

Need to find average value across multi-level nested SQL query in Oracle

This one's a bit of a mess, and there's probably some far superior way of doing this but we just need the information for some reports we're working on.
So, we have a bunch of projects; each project has a bunch of tasks and each task has a document type ID associated with it. A project can belong to one or more workgroups.
We want to analyze projects that have at least one task of doc type x, and then see how many workgroups it has. I can do that with:
select distinct T.PROJECTID,
(select COUNT(*) from TPM_PROJECTWORKGROUPS where PROJECTID=T.PROJECTID) as NumWorkgroups
from TPM_TASK T
where T.DOCUMENTTYPEID=17
Now, we want to see the average number of workgroups across these projects. So I can do:
select AVG(NumWorkgroups) FROM (
select distinct T.PROJECTID,
(select COUNT(*) from TPM_PROJECTWORKGROUPS where PROJECTID=T.PROJECTID) as NumWorkgroups
from TPM_TASK T
where T.DOCUMENTTYPEID=17
)
However, we want to run this same query across all the document types (there's about 200 of them). I can't find a way to do this without copying and pasting the query 200 times. I've tried:
select DOCUMENTTYPEID,
(select AVG(NumWorkgroups) FROM (
select distinct T.PROJECTID,
(select COUNT(*) from TPM_PROJECTWORKGROUPS where PROJECTID=T.PROJECTID) as NumWorkgroups
from TPM_TASK T
where T.DOCUMENTTYPEID=DT.DOCUMENTTYPEID
))
from TPM_DOCUMENTTYPE DT
However, I get the error:
ORA-00904: "TPM_DOCUMENTTYPE"."DOCUMENTTYPEID": invalid identifier
I believe because DT is out of scope more than one level down in a nested query. Is there a better way to do this query?
Update for Justin:
Here's a sample schema:
create table Test_Projects (
id number primary key
)
create table Test_Tasks (
id number primary key,
project number,
doctype number
)
create table Test_Workgroups (
id number primary key,
workgroup number,
project number
)
With some sample data:
insert into Test_Projects VALUES (1) --Create projects 1 and 2
insert into Test_Projects VALUES (2)
insert into Test_Tasks VALUES (1, 1, 5) --Project 1 has two tasks, doc types 5 and 6
insert into Test_Tasks VALUES (2, 1, 6)
insert into Test_Tasks VALUES (3, 2, 6) --Project 2 has one task, doc type 6
insert into Test_Workgroups VALUES (1, 1, 1) --Project 1 belongs to workgroups 1 and 2
insert into Test_Workgroups VALUES (2, 2, 1)
insert into Test_Workgroups VALUES (3, 2, 2) --Project 2 belongs to workgroup 2
We need to know the average number of workgroups that a project with a task of type x belongs to.
For example, doc type 5 has only project 1 which has 2 workgroups, so the average is 2. Doc type 6 has 2 projects (1 and 2) - 1 has 2 workgroups and 2 has one workgroup - so the average is 1.5.
We need to list all doc types and the average number of workgroups in each.
I'd expect this query to return:
DOCTYPE AverageWorkgroups
------- -----------------
5 2
6 1.5
Thanks for the sample data. That makes it much clearer.
I believe this does what you want (I'm including the calculations for the number of projects and the number of workgroups in the output as well just because that made my testing easier)
SQL> ed
Wrote file afiedt.buf
1 select t.doctype,
2 count(distinct p.id) numProjects,
3 count(*) numWorkgroups,
4 count(*)/ count( distinct p.id) avgNumWorkgroups
5 from test_projects p,
6 test_tasks t,
7 test_workgroups w
8 where p.id = t.project
9 and p.id = w.project
10* group by t.doctype
SQL> /
DOCTYPE NUMPROJECTS NUMWORKGROUPS AVGNUMWORKGROUPS
---------- ----------- ------------- ----------------
6 2 3 1.5
5 1 2 2