SQL Server 2012 - A Little Guidance - sql

I have searched the net but I am certain I must not be phrasing my keywords correctly because I am not finding possible solutions for my problem. think it might be recursion but I'm not quite certain.
I have a table that has the following categories:
ID, Author, Customer, Group
A sample dataset would be like:
ID | Author | Customer | Group
------------------------------------------
1 | Paula Hawkins | John Doe | NULL
2 | Harlan Coben | John Doe | NULL
3 | James Patterson| John Doe | NULL
4 | Paula Hawkins | Jane Doe | NULL
5 | James Patterson| Jane Doe | NULL
6 | James Patterson| Steven Doe| NULL
7 | Harlan Coben | Steven Doe| NULL
8 | Paula Hawkins | Harry Doe | NULL
9 | James Patterson| Harry Doe | NULL
Its possible a customer may have one ore more then one author checked out so what I am trying to do is group them with a unique id based on what total are checked out (regardless of the customer name):
ID | Author | Customer | Group
--------------------------------------------
1 | Paula Hawkins | John Doe | 1
2 | Harlan Coben | John Doe | 1
3 | James Patterson| John Doe | 1
4 | Paula Hawkins | Jane Doe | 2
5 | James Patterson| Jane Doe | 2
6 | James Patterson| Steven Doe | 3
7 | Harlan Coben | Steven Doe | 3
8 | Paula Hawkins | Harry Doe | 2
9 | James Patterson| Harry Doe | 2
its very possible the same customer could be found hundreds of times for multiple books so the final group category would represent the unique value for that customer (other customers would have the same value only if everything they have checked out also matches everything the other customer has checked out).
Using the above data, Harry and Jane have the exact same authors checked out so they are in the same group but John and Steven have different combinations so they have their own unique group.
Hopefully this makes sense. Is this what is called recursion? If so then I will look towards a cte solution that uses some sort of ranking for the unique id value. Thanks for any help you give.

Not sure how to get your exact group order, but to just group customers together you can combine their authors with FOR XML and group the customers based on exact matches.
WITH cte AS (
SELECT
*,
RANK() OVER (ORDER BY Authors) [Group]
FROM (
SELECT
[Customer],
STUFF((SELECT ',' + [Author]
FROM myTable WHERE Customer = mt.Customer
ORDER BY Author
FOR XML PATH('')), 1, 1, '') AS Authors
FROM
myTable mt
GROUP BY [Customer] ) t
)
SELECT
mt.[ID],
mt.[Author],
mt.[Customer],
cte.[Group]
FROM
cte
JOIN myTable mt ON mt.Customer = cte.Customer
ORDER BY mt.[ID]
SQL FIDDLE DEMO

Try using cursors... Cursors are slow, but they're also easier to understand..
Here's a sample implementation...
DECLARE #GroupExists Bit
DECLARE #CurrGroup Int
DECLARE #NextGroup Int
DECLARE #Customer VARCHAR(250)
SET #NextGroup = 1
DECLARE customer_cursor CURSOR FAST_FORWARD
FOR SELECT distinct Customer FROM dbo.TableName
OPEN customer_cursor
FETCH NEXT FROM customer_cursor
INTO #Customer
WHILE ##FETCH_STATUS = 0
BEGIN
SET #GroupExists = 0
--Test condition to check if group of authors in in use
IF #GroupExists = 1 Then
BEGIN
UPDATE dbo.TableName
SET Group = #CurrGroup
WHERE Customer = #Customer
END
ELSE
BEGIN
UPDATE dbo.TableName
SET Group = #NextGroup
WHERE Customer = #Customer
SET #NextGroup= #NextGroup+ 1
END
FETCH NEXT FROM customer_cursor
INTO #Customer
END

You should be able to generate groups using standard SQL. The following query should do the job; I make no promises of its performance though.
WITH
CTE_CheckOutBookCount AS
(
SELECT [ID]
,[Author]
,[Customer]
,COUNT([Author]) OVER (PARTITION BY [Customer]) AS [CheckOutBooks] -- Count the number of books checked out by each customer. This will be used for our initial compare between customers.
FROM CheckedOutBooks
),
CTE_AuthorAndCountCompare AS
(
SELECT CB.[ID]
,CBC.[Customer] AS MatchedCustomers
FROM CTE_CheckOutBookCount CB
INNER JOIN CTE_CheckOutBookCount CBC ON CB.[Author] = CBC.[Author] AND CB.[CheckOutBooks] = CBC.[CheckOutBooks] --Join customer information on number of books checked out and author name of books checked out.
)
,CTE_MatchedCustomers
AS
(
SELECT
[ID]
,[Author]
,[Customer]
--Get the minimum record id of customers which match exactly on count and authors checked out. This will be used to help generate group ID.
,(
SELECT MIN(ID)
FROM CTE_AuthorAndCountCompare
WHERE CheckedOutBooks.[Customer] = CTE_AuthorAndCountCompare.MatchedCustomers
) MinID
FROM CheckedOutBooks
)
SELECT
[ID]
,[Author]
,[Customer]
,DENSE_RANK() OVER (ORDER BY MinID) AS [Group] -- Generate new group id
FROM CTE_MatchedCustomers
ORDER BY ID

Related

SQL Query Find Exact and Near Dupes

I have a SQL table with FirstName, LastName, Add1 and other fields. I am working to get this data cleaned up. There are a few instances of likely dupes -
All 3 columns are the exact same for more than 1 record
The First and Last are the same, only 1 has an address, the other is blank
The First and Last are similar (John | Doe vs John C. | Doe) and the address is the same or one is blank
I'm wanting to generate a query I can provide to the users, so they can check these records out, compare their related records and then delete the one they don't need.
I've been looking at similarity functions, soundex, and such, but it all seems so complicated. Is there an easy way to do this?
Thanks!
Edit:
So here is some sample data:
FirstName | LastName | Add1
John | Doe | 1 Main St
John | Doe |
John A. | Doe |
Jane | Doe | 2 Union Ave
Jane B. | Doe | 2 Union Ave
Alex | Smith | 3 Broad St
Chris | Anderson | 4 South Blvd
Chris | Anderson | 4 South Blvd
I really like Critical Error's query for identifying all different types of dupes. That would give me the above sample data, with the Alex Smith result not included, because there are no dupes for that.
What I want to do is take that result set and identify which are dupes for Jane Doe. She should only have 2 dupes. John Doe has 3, and Chris Anderson has 2. Can I get at that sub-result set?
Edit:
I figured it out! I will be marking Critical Error's answer as the solution, since it totally got me where I needed to go. Here is the solution, in case it might help others. Basically, this is what we are doing.
Selecting the records from the table where there are dupes
Adding a WHERE EXISTS sub-query to look in the same table for exact dupes, where the ID from the main query and sub-query do not match
Adding a WHERE EXISTS sub-query to look in the same table for similar dupes, using a Difference factor between duplicative columns, where the ID from the main query and sub-query do not match
Adding a WHERE EXISTS sub-query to look in the same table for dupes on 2 fields where a 3rd may be null for one of the records, where the ID from the main query and sub-query do not match
Each subquery is connected with an OR, so that any kind of duplicate is found
At the end of each sub-query add a nested requirement that either the main query or sub-query be the ID of the record you are looking to identify duplicates for.
DECLARE #CID AS INT
SET ANSI_NULLS ON
SET NOCOUNT ON;
SET #CID = 12345
BEGIN
SELECT
*
FROM #Customers c
WHERE
-- Exact duplicates.
EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Add1 = c.Add1
AND x.Id <> c.Id
AND (x.ID = #CID OR c.ID = #CID)
)
-- Match First/Last name are same/similar and the address is same.
OR EXISTS (
SELECT * FROM #Customers x WHERE
DIFFERENCE( x.FirstName, c.FirstName ) = 4
AND DIFFERENCE( x.LastName, c.LastName ) = 4
AND x.Add1 = c.Add1
AND x.Id <> c.Id
AND (x.ID = #CID OR c.ID = #CID)
)
-- Match First/Last name and one address exists.
OR EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Id <> c.Id
AND (
x.Add1 IS NULL AND c.Add1 IS NOT NULL
OR
x.Add1 IS NOT NULL AND c.Add1 IS NULL
)
AND (x.ID = #CID OR c.ID = #CID)
);
Assuming you have a unique id between records, you can give this a try:
DECLARE #Customers table ( FirstName varchar(50), LastName varchar(50), Add1 varchar(50), Id int IDENTITY(1,1) );
INSERT INTO #Customers ( FirstName, LastName, Add1 ) VALUES
( 'John', 'Doe', '123 Anywhere Ln' ),
( 'John', 'Doe', '123 Anywhere Ln' ),
( 'John', 'Doe', NULL ),
( 'John C.', 'Doe', '123 Anywhere Ln' ),
( 'John C.', 'Doe', '15673 SW Liar Dr' );
SELECT
*
FROM #Customers c
WHERE
-- Exact duplicates.
EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Add1 = c.Add1
AND x.Id <> c.Id
)
-- Match First/Last name are same/similar and the address is same.
OR EXISTS (
SELECT * FROM #Customers x WHERE
DIFFERENCE( x.FirstName, c.FirstName ) = 4
AND DIFFERENCE( x.LastName, c.LastName ) = 4
AND x.Add1 = c.Add1
AND x.Id <> c.Id
)
-- Match First/Last name and one address exists.
OR EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Id <> c.Id
AND (
x.Add1 IS NULL AND c.Add1 IS NOT NULL
OR
x.Add1 IS NOT NULL AND c.Add1 IS NULL
)
);
Returns
+-----------+----------+-----------------+----+
| FirstName | LastName | Add1 | Id |
+-----------+----------+-----------------+----+
| John | Doe | 123 Anywhere Ln | 1 |
| John | Doe | 123 Anywhere Ln | 2 |
| John | Doe | NULL | 3 |
| John C. | Doe | 123 Anywhere Ln | 4 |
+-----------+----------+-----------------+----+
Initial resultset:
+-----------+----------+------------------+----+
| FirstName | LastName | Add1 | Id |
+-----------+----------+------------------+----+
| John | Doe | 123 Anywhere Ln | 1 |
| John | Doe | 123 Anywhere Ln | 2 |
| John | Doe | NULL | 3 |
| John C. | Doe | 123 Anywhere Ln | 4 |
| John C. | Doe | 15673 SW Liar Dr | 5 |
+-----------+----------+------------------+----+

Oracle SQL: Divided one row to other row in other table without repatriation depending in variable value

I have tow table :
Customer :
name
-----
TOMMY
LOUIE
HUGO
OLLIE
DAVID
LEWIS
JACKSON
Employees :
name | stage
---------+---------
OLIVER | 1
NOAH | 1
ALFIE | 1
OSCAR | 2
NOAH | 2
OLIVER | 2
LEO | 2
In Employee I have two stages. In each stage can it have same employees or different, what I want to divide or distribute the customer to employee with two condition :
the customer in first stage must have different employee in second stage
each employee must have same number of customer in each stage and each customer must have 1 employee in each stage without repetition.
I have done the procedure with cursor that insert the result in different table but the problem it give wrong result that repeating the customer in stage 1 to the same employee in stage 2 (e.g NOAH take the same Customer )
CREATE PROCEDURE AUDIT_Customer AS
CURSOR Customer_STAGE1 IS SELECT * FROM (
select
s.name Customer_name
,t.name Employees_name
from (select name, row_number() over(order by name) as rn from Employees WHERE stage = 1 ) t
join (select name, row_number() over(order by name) as rn from Customer ) s
on mod(s.rn - 1, (select count(*) from Employees WHERE stage = 1)) = t.rn -1);
CURSOR Customer_STAGE2 IS SELECT * FROM (
select
s.name Customer_name
,t.name Employees_name
from (select name, row_number() over(order by name) as rn from Employees WHERE stage = 2 ) t
join (select name, row_number() over(order by name) as rn from Customer ) s
on mod(s.rn - 1, (select count(*) from Employees WHERE stage = 2)) = t.rn -1);
Begin
For y in Customer_STAGE1 Loop
Insert into Customer_Employee(Customer_name,Employees_name,RECIVE_DATE,stage)
Values (Y.Customer_name ,Y.Employees_name,sysdate,1) ;
End Loop ;
For y in Customer_STAGE2 Loop
Insert into Customer_Employee(Customer_name,Employees_name,RECIVE_DATE,stage)
Values (Y.Customer_name ,Y.Employees_name,sysdate,2) ;
End Loop ;
COMMIT;
End AUDIT_Customer;
the results :
Customer| Employees| stage
--------+--------- +---------
TOMMY | OLIVER | 1
LOUIE | OLIVER | 1
HUGO | NOAH | 1
OLLIE | NOAH | 1
DAVID | ALFIE | 1
LEWIS | ALFIE | 1
JACKSON | ALFIE | 1
TOMMY | OSCAR | 2
LOUIE | OSCAR | 2
HUGO | NOAH | 2
OLLIE | NOAH | 2
DAVID | OLIVER | 2
LEWIS | OLIVER | 2
JACKSON | LEO | 2
how i can solve it?
filter the second section of your procedure
this might help:
(select name, row_number() over(order by name) as rn from Employees WHERE stage = 2 and Employee.Name not in (select Employee.Name from Employee WHERE stage = 1))

How to get the previous row-text

First of all: I am a SQL beginner and I use SQL Server 2008.
The tables as it is now, is written as:
SELECT
Transaction.description, Person.name
FROM
Transaction, Person, SystemUser
WHERE
Person.personnumber = SystemUser.personnumber
AND Transaction.art_ID = SystemUser.art_ID
ORDER BY
Transaction.description
where personnumber is PK nvarchar (could look like N0890) where the last numbers of it grows with +1 for every new person.
art_ID (Transaction) is PK smallint, art_ID (SystemUser) is smallint, description is nvarchar.
I want to get the text from the previous row, in the same column, so that I can manipulate the text to be clear and make the result-table look more simple.
Example as it is now:
|Transactions | Persons |
|-------------------|----------|
|Statistic | Ursula |
|Statistic | Peter |
|Statistic | Alan |
|Settlement | Christie |
|Settlement | Tania |
|Deptor department | Jack |
|Economy department | Rickie |
|Economy department | Annie |
|Economy department | Tom |
|Economy department | Seth |
How I want it to be:
|Transactions | Persons |
|-------------------|----------|
|Statistic | Ursula |
| | Peter |
| | Alan |
|Settlement | Christie |
| | Tania |
|Deptor department | Jack |
|Economy department | Rickie |
| | Annie |
| | Tom |
| | Seth |
as in select case when description = description - 1 row then ''
I have searched for examples and every one of them are based on integers, not varchar/nvarchar), and I keep getting errors when i try to do it with varchars. Such as With CTE, min() and max().
Do you have any ideas of what function I can use or how to set up the select-statement to do as I want?
First use a rank function to identify just one of them:
SELECT Transaction.description, Person.name,
RANK() OVER (PARTITION BY Transaction.description ORDER BY Person.name) As R
FROM Transaction, Person, SystemUser
WHERE Person.personnumber = SystemUser.personnumber
AND Transaction.art_ID = SystemUser.art_ID
ORDER BY Transaction.description, Person.name
Notice the lines you want to see have 1 against them? Use that:
SELECT
CASE WHEN R=1 THEN Transaction.description ELSE '' END description,
Person.name
FROM
(
SELECT Transaction.description, Person.name,
RANK() OVER (PARTITION BY Transaction.description ORDER BY Person.name) As R
FROM Transaction, Person, SystemUser
WHERE Person.personnumber = SystemUser.personnumber
AND Transaction.art_ID = SystemUser.art_ID
) Subtable
ORDER BY Transaction.description, Person.name
I think following SQL should work
CREATE TABLE #TempTable (rowrank INT, description VARCHAR(256), name VARCHAR(256));
INSERT INTO #TempTable (rowrank, description, name)
VALUES
Select RANK() OVER (ORDER BY Transaction.description)
,Transaction.description
,name
FROM Transaction, Person, SystemUser
WHERE Person.personnumber = SystemUser.personnumber
AND Transaction.art_ID = SystemUser.art_ID
ORDER BY Transaction.description
SELECT
CASE
WHEN prev.RANK = TT.RANK
THEN ""
ELSE TT.Description
END AS Description,
name
FROM #TempTable TT
LEFT JOIN #TempTable prev ON prev.rownum = TT.rownum - 1

Data Matching with SQL and assigning Identity ID's

How to write a query that will match data and produce and identity for it.
For Example:
RecordID | Name
1 | John
2 | John
3 | Smith
4 | Smith
5 | Smith
6 | Carl
I want a query which will assign an identity after matching exactly on Name.
Expected Output:
RecordID | Name | ID
1 | John | 1X
2 | John | 1X
3 | Smith | 1Y
4 | Smith | 1Y
5 | Smith | 1Y
6 | Carl | 1Z
Note: The ID should be unique for every match. Also, it can be numbers or varchar.
Can somebody help me with this? The main thing is to assign the ID's.
Thanks.
How about this:
with temp as
(
select 1 as id,'John' as name
union
select 2,'John'
union
select 3,'Smith'
union
select 4,'Smith'
union
select 5,'Smith'
union
select 6,'Carl'
)
SELECT *, DENSE_RANK() OVER
(ORDER BY Name) as NewId
FROM TEMP
Order by id
The first part is for testing purposes only.
Please try:
SELECT *,
Rank() over (order by Name ASC)
FROM table
This structure seems to work:
CREATE TABLE #Table
(
Department VARCHAR(100),
Name VARCHAR(100)
);
INSERT INTO #Table VALUES
('Sales','michaeljackson'),
('Sales','michaeljackson'),
('Sales','jim'),
('Sales','jim'),
('Sales','jill'),
('Sales','jill'),
('Sales','jill'),
('Sales','j');
WITH Cte_Rank AS
(
SELECT [Name],
rw = ROW_NUMBER() OVER (ORDER BY [Name])
FROM #Table
GROUP BY [Name]
)
SELECT a.Department,
a.Name,
b.rw
FROM #Table a
INNER JOIN Cte_Rank b
ON a.Name = b.Name;

Collapse SQL rows

Say I have this table:
id | name
-------------
1 | john
2 | steve
3 | steve
4 | john
5 | steve
I only want the rows that are unique compared to the previous row, these:
id | name
-------------
1 | john
2 | steve
4 | john
5 | steve
I can partly achieve this by using this query:
SELECT *, (
SELECT `name` FROM demotable WHERE id=t.id-1
) AS prevName FROM demotable AS t GROUP BY prevName ORDER BY id ASC
But when I am using a query with multiple UNIONs and stuff, this gets way to complicated. Is there an easy way to do this (like GROUP BY, but than more specific)?
This should work, but I don't know if it's simpler :
select demotable.*
from demotable
left join demotable as prev on prev.id = demotable.id - 1
where demotable.name != prev.name