SQL consolidating data best Strategy - sql

I am using SQL Server 2005.
I have 8 databases in the same SQL Server.
There is a table containing thousand of Customers in each database (Property).
To make it simple
CustomerID numeric(18,0)
PropertyID int
CustomerSurname varchar(100)
CustomerName varchar(50)
CustomerEmail varchar(100)
Until now each Property populated its Customers individually. Now there is a need to consolidate the Customers for reporting purposes.
I want to find all the common Customers in all databases
(Criteria= CustomerSurname + CustomerEmail + first letter of Customer
Name)
and populate a new table (Consolidation) that contains the PropertyID and the CustomerID of the Property databases for eachcommon Customer.
ConsolidationID numeric(18,0)
PropertyID int
CustomerID numeric(18,0)
Imagine:
Customers on Property 1
1000 1 Smith Adrian smith#jj.com
Customers on Property 2
9876 2 Smith A smith#jj.com
Consolidation Table
1 1 1000
1 2 9876
So in the Consolidation table we have ID=1 for Smith which in Database1 (property) has local ID 1000 and in Database2 (property) has localID 9876
I am puzzled as to how I can find the common Customers using the criteria between 8 databases.The strategy to achieve it.

Consolidating your data is a pretty straight-forward process in this scenario.
Here's an example you can run in SSMS to get you started. Note that I am using TABLE variables instead of separate databases, but the concept remains the same.
Declare the tables ( representing databases ):
DECLARE #database1 TABLE ( CustomerID NUMERIC(18,0), PropertyID INT, CustomerSurname VARCHAR(100), CustomerName VARCHAR(50), CustomerEmail VARCHAR(100) );
DECLARE #database2 TABLE ( CustomerID NUMERIC(18,0), PropertyID INT, CustomerSurname VARCHAR(100), CustomerName VARCHAR(50), CustomerEmail VARCHAR(100) );
Insert the sample data you provided:
INSERT INTO #database1 ( CustomerID, PropertyID, CustomerSurname, CustomerEmail, CustomerName )
VALUES ( 1, 1000, 'Smith', 'Adrian', 'smith#jj.com' );
INSERT INTO #database2 ( CustomerID, PropertyID, CustomerSurname, CustomerEmail, CustomerName )
VALUES ( 2, 9876, 'Smith', 'A', 'smith#jj.com' );
Sprinkle in some SQL Server magic:
SELECT
ROW_NUMBER() OVER ( PARTITION BY CustomerSurname, CustomerEmail, FirstInitial ORDER BY CustomerSurname, CustomerEmail, FirstInitial ) AS ConsolidationID
, Consolidated.CustomerID
, Consolidated.PropertyID
FROM (
SELECT CustomerID, PropertyID, CustomerSurname, CustomerName, CustomerEmail, LEFT( CustomerName, 1 ) AS FirstInitial FROM #database1
UNION
SELECT CustomerID, PropertyID, CustomerSurname, CustomerName, CustomerEmail, LEFT( CustomerName, 1 ) AS FirstInitial FROM #database2
) AS Consolidated
ORDER BY
CustomerID, CustomerSurname, CustomerEmail, FirstInitial;
Returns consolidated resultset:
+-----------------+------------+------------+
| ConsolidationID | CustomerID | PropertyID |
+-----------------+------------+------------+
| 1 | 1 | 1000 |
| 1 | 2 | 9876 |
+-----------------+------------+------------+
Putting it to use:
To use this with your eight databases, you would simply replace the table variables ( #database1, #database2, etc... ) with the fully-qualified name to the database and table to be referenced.
SELECT {column-list} FROM MyDatabase1.dbo.TableName...
UNION
SELECT {column-list} FROM MyDatabase2.dbo.TableName...
Etc...
ROW_NUMBER() is the real "magic" here. By using its PARTION BY and ORDER BY we can get a single "ConsolidationID" for each row matching the partition criteria, in this case CustomerSurname, CustomerEmail and FirstInitial. ORDER BY is required to ensure the data is ordered correctly so the partion works as expected.
A few important things to note:
The column names must be exact between all tables and in the same order. You can
alias column names if needed.
Using UNION will exclude exact duplicates when all compared columns
are the same. I expect this is the behavior you want in this case,
but if not, replace UNION with UNION ALL to return all rows,
including exact matches.
SQL Server's ROW_NUMBER() is a pretty slick feature. You can read
more about it here.
I hope this helps get you on your way.

Related

Insert into 2 different data to 2 table at the same time using first data's id

I have 2 tables: Order and product_order. Every order has some product in it and that's because I store products another table.
Table Order:
Id name
Table PRODUCT_ORDER:
id product_id order_id
Before I start to insert, I don't know what the Order Id is. I want to insert the data into both tables at once and I need the order id to do that.
Both id's are auto incremented. I'm using SQL Server. I can insert first order and then find the id of the order and than execute the second insert, but I want to do these both to execute at once.
The output clause is your friend here.
DECLARE #Orders TABLE (OrderID INT IDENTITY, OrderDateUTC DATETIME, CustomerID INT)
DECLARE #OrderItems TABLE (OrderItemID INT IDENTITY, OrderID INT, ProductID INT, Quantity INT, Priority TINYINT)
We'll use these table variables as demo tables with IDs to insert into. You're liking going to be passing the set of items for an order in together, but for the purpose of a demo we'll ad hoc them as a VALUES list.
DECLARE #Output TABLE (OrderID INT)
INSERT INTO #Orders (OrderDateUTC, CustomerID)
OUTPUT INSERTED.OrderID INTO #Output
VALUES (GETUTCDATE(), 1)
We inserted the Order into the Orders table, and used the OUTPUT clause to cause the inserted (and generated by the engine) into the table variable #Output. We can now use this table however we'd like:
INSERT INTO #OrderItems (OrderID, ProductID, Quantity, Priority)
SELECT OrderID, ProductID, Quantity, Priority
FROM (VALUES (5,1,1),(2,1,2),(3,1,3)) AS x(ProductID, Quantity, Priority)
CROSS APPLY #Output
We cross applied it to our items list, and inseted it as if it was any other row.
DELETE FROM #Output
INSERT INTO #Orders (OrderDateUTC, CustomerID)
OUTPUT INSERTED.OrderID INTO #Output
VALUES (GETUTCDATE(), 1)
INSERT INTO #OrderItems (OrderID, ProductID, Quantity, Priority)
SELECT OrderID, ProductID, Quantity, Priority
FROM (VALUES (1,1,1)) AS x(ProductID, Quantity, Priority)
CROSS APPLY #Output
Just to demo a little farther here's another insert. (You likely wouldn't need the DELETE normally, but we're still using the same variable here)
Now when we select that data we can see the two separate orders, with their IDs and the products that belong to them:
SELECT *
FROM #Orders o
INNER JOIN #OrderItems oi
ON o.OrderID = oi.OrderID
OrderID OrderDateUTC CustomerID OrderItemID OrderID ProductID Quantity Priority
------------------------------------------------------------------------------------------------
1 2022-12-08 23:23:21.923 1 1 1 5 1 1
1 2022-12-08 23:23:21.923 1 2 1 2 1 2
1 2022-12-08 23:23:21.923 1 3 1 3 1 3
2 2022-12-08 23:23:21.927 1 4 2 1 1 1
Dale is correct. You cannot insert into multiple tables at once, but if you use a stored procedure to handle your inserts, you can capture the ID and use it in the next insert.
-- table definitions
create table [order]([id] int identity, [name] nvarchar(100))
go
create table [product_order]([id] int identity, [product_id] nvarchar(100), [order_id] int)
go
-- stored procedure to handle inserts
create procedure InsertProductWithOrder(
#OrderName nvarchar(100),
#ProductID nvarchar(100))
as
begin
declare #orderID int
insert into [order] ([name]) values(#OrderName)
select #orderID = ##identity
insert into [product_order]([product_id], [order_id]) values(#ProductID, #orderID)
end
go
-- insert records using the stored procedure
exec InsertProductWithOrder 'Order ONE', 'AAAAA'
exec InsertProductWithOrder 'Order TWO', 'BBBBB'
-- verify the results
select * from [order]
select * from [product_order]

SELECT From Multiple tables with many to many relations

Hello everyone I have inherirted a poorly designed database and I need to get some information from 3 tables
Franchise
Id(Int, PK)
FrID (varchar(50))
FirstName (varchar(50))
LastName (varchar(50))
Store
Id (Int, PK)
FrID (varchar(50))
StoreNumber (varchar(50))
StoreName
Address
Pricing
Id (int, PK)
StoreNumber (varchar(50))
Price1
Price2
Price3
and the data
ID, FrID ,FirstName,LastName
1, 10 ,John Q , TestCase
2, 10 ,Jack Q , TestCase
3, 11 ,Jack Q , TestCase
ID, FrID, StoreNumber , StoreName , Address
10, 10 , 22222 , TestStore1, 123 Main street
11, 10 , 33333 , TestStore2, 144 Last Street
12, 10 , 44444 , TestStore2, 145 Next Street
13, 11 , 55555 , Other Test, 156 Other st
ID, StoreNumber, Price1, Price2, Price3
1, 22222 , 19.99, 20.99 , 30.99
2, 33333 , 19.99, 20.99 , 30.99
3, 44444 , 19.99, 20.99 , 30.99
4, 55555 , 19.99, 20.99 , 30.99
Here is what I have done
SELECT F.FirstName,F.LastName,F.FrID , S.StoreNumber,S.StoreName,S.Address,
P.Price1,P.Price2,P.Price3
FROM Franchisee F
JOIN Store S on F.FrID = S.FrID
JOIN Pricing P on P.StoreNumber = S.StoreNumber
This part works, but I end up with lots of duplicates, For example Jack Q gets listed for his store plus every store that John Q is on. Is there anyway to fix this with out a database redesign.
Okay, there is a whole laundry list of issues such as character fields such as [FrId] being used as strings, reserved words such as [address] being used as name, etc.
Let's put the bad design issues aside.
First, I need to create a quick test environment. I did not put in Foreign Keys since that constraint is not needed to get the correct answer.
--
-- Setup test tables
--
-- Just playing
use Tempdb;
go
-- drop table
if object_id('franchise')> 0
drop table franchise;
go
-- create table
create table franchise
(
Id int primary key,
FrID varchar(50),
FirstName varchar(50),
LastName varchar(50)
);
-- insert data
insert into franchise values
( 1, 10, 'John Q', 'TestCase'),
( 2, 10, 'Jack Q', 'TestCase'),
( 3, 11, 'Jack Q', 'TestCase');
-- select data
select * from franchise;
go
-- drop table
if object_id('store')> 0
drop table store;
go
-- create table
create table store
(
Id int primary key,
FrID varchar(50),
StoreNumber varchar(50),
StoreName varchar(50),
Address varchar(50)
);
-- insert data
insert into store values
(10, 10, 22222, 'TestStore1', '123 Main street'),
(11, 10, 33333, 'TestStore2', '144 Last Street'),
(12, 10, 44444, 'TestStore2', '145 Next Street'),
(13, 11, 55555, 'Other Test', '156 Other Street');
-- select data
select * from store;
go
-- drop table
if object_id('pricing')> 0
drop table pricing;
go
-- create table
create table pricing
(
Id int primary key,
StoreNumber varchar(50),
Price1 money,
Price2 money,
Price3 money
);
-- insert data
insert into pricing values
(1, 22222, 19.99, 20.99 , 30.99),
(2, 33333, 19.99, 20.99 , 30.99),
(3, 44444, 19.99, 20.99 , 30.99),
(4, 55555, 19.95, 20.95 , 30.95);
-- select data
select * from pricing;
go
The main issue is that the franchise table should have the primary key (PK) on FrId, not Id. I do not understand why there are duplicates.
However, the query below removes them by grouping. I changed the pricing data for Jack Q to show it is a different record.
--
-- Fixed Query - Version 1
--
select
f.FirstName,
f.LastName,
f.FrID,
s.StoreNumber,
s.StoreName,
s.Address,
p.Price1,
p.Price2,
p.Price3
from
-- Remove duplicates from francise
(
select
LastName,
FirstName,
Max(FrID) as FrID
from
franchise
group by
LastName,
FirstName
) as f
join store s on f.FrID = s.FrID
join pricing p on p.StoreNumber = s.StoreNumber;
The correct output is below.
If I am correct, remove the duplicates entries and change the primary key.
Change Requirements
Okay, you are placing two or more owners in the same table.
Below uses a sub query to combine the owners list into one string. Another way is to have a flag called primary owner. Choose that as the display name.
--
-- Fixed Query - Version 2
--
select
f.OwnersList,
f.FrID,
s.StoreNumber,
s.StoreName,
s.Address,
p.Price1,
p.Price2,
p.Price3
from
-- Compose owners list
(
select
FrID,
(
SELECT FirstName + ' ' + LastName + ';'
FROM franchise as inner1
WHERE inner1.FrID = outer1.FrID
FOR XML PATH('')
) as OwnersList
from franchise as outer1
group by FrID
) as f (FrId, OwnersList)
join store s on f.FrID = s.FrID
join pricing p on p.StoreNumber = s.StoreNumber
Here is the output from the second query.

I need to split string in select statement and insert to table

I have a data in one table. I need to copy it to another table. One of the column is text delimited string. So what I'm thinking to select all columns insert get indentity value and with subquery to split based on delimiter and insert it to another table.
Here is the data example
ID Name City Items
1 Michael Miami item|item2|item3|item4|item5
2 Jorge Hallandale item|item2|item3|item4|item5
copy Name, City to one table get identity
and split and copy Items to another table with Identity Column Value
So output should be
Users table
UserID Name City
1 Michael Miami
2 Jorge Hallandale
...
Items table
ItemID UserID Name
1 1 Item
2 1 Item2
3 1 Item3
4 1 Item4
5 2 Item
6 2 Item2
7 2 Item3
8 2 Item4
Not really sure how to do it with T-SQL. Answers with examples would be appreciated
You may create you custom function to split the string in T-Sql. You could then use the Split function as part of a JOIN with your base table to generate the final results for your INSERT statement. Have a look at this post. Hope this help.
You can do this using xml and cross apply.
See the following:
DECLARE #t table (ID int, Name varchar(20), City varchar(20), Items varchar(max));
INSERT #t
SELECT 1,'Michael','Miami' ,'item|item2|item3|item4|item5' UNION
SELECT 2,'Jorge' ,'Hallandale','item|item2|item3|item4|item5'
DECLARE #u table (UserID int identity(1,1), Name varchar(20), City varchar(20));
INSERT #u (Name, City)
SELECT DISTINCT Name, City FROM #t
DECLARE #i table (ItemID int identity(1,1), UserID int, Name varchar(20));
WITH cte_Items (Name, Items) as (
SELECT
Name
,CAST(REPLACE('<r><i>' + Items + '</i></r>','|','</i><i>') as xml) as Items
FROM
#t
)
INSERT #i (UserID, Name)
SELECT
u.UserID
,s.Name as Name
FROM
cte_Items t
CROSS APPLY (SELECT i.value('.','varchar(20)') as Name FROM t.Items.nodes('//r/i') as x(i) ) s
INNER JOIN #u u ON t.Name = u.Name
SELECT * FROM #i
See more here:
http://www.kodyaz.com/articles/t-sql-convert-split-delimeted-string-as-rows-using-xml.aspx
Can you accomplish this with recursion? My T-SQL is rusty but this may help send you in the right direction:
WITH CteList AS (
SELECT 0 AS ItemId
, 0 AS DelimPos
, 0 AS Item_Num
, CAST('' AS VARCHAR(100)) AS Item
, Items AS Remainder
FROM Table1
UNION ALL
SELECT Row_Number() OVER(ORDER BY UserID) AS ItemId
, UserID
, CASE WHEN CHARINDEX('|', Remainder) > 0
THEN CHARINDXEX('|', Remainder)
ELSE LEN(Remainder)
END AS dpos
, Item_num + 1 as Item_Num
, REPLACE(Remainder, '|', '') AS Element
, right(Remainder, dpos+1) AS Remainder
FROM CteList
WHERE dpos > 0
AND ItemNum < 20 /* Force a MAX depth for recursion */
)
SELECT ItemId
, Item
FROM CteList
WHERE item_num > 0
ORDER BY ItemID, Item_Num

versioning of a table

anybody has seen any examples of a table with multiple versions for each record
something like if you would had the table
Person(Id, FirstName, LastName)
and you change a record's LastName than you would have both versions of LastName (first one, and the one after the change)
I've seen this done two ways. The first is in the table itself by adding an EffectiveDate and CancelDate (or somesuch). To get the current for a given record, you'd do something like: SELECT Id, FirstName, LastName FROM Table WHERE CancelDate IS NULL
The other is to have a global history table (which holds all of your historical data). The structure for such a table normally looks something like
Id bigint not null,
TableName nvarchar(50),
ColumnName nvarchar(50),
PKColumnName nvarchar(50),
PKValue bigint, //or whatever datatype
OriginalValue nvarchar(max),
NewValue nvarchar(max),
ChangeDate datetime
Then you set a trigger on your tables (or, alternatively, add a policy that all of your Updates/Inserts will also insert into your HX table) so that the correct data is logged.
The way we're doing it (might not be the best way) is to have an active bit field, and a foreign key back to the parent record. So for general queries you would filter on active employees, but you can get the history of a single employee with their Employee ID.
declare #employees
(
PK_emID int identity(1,1),
EmployeeID int,
FirstName varchar(50),
LastName varchar(50),
Active bit,
FK_EmployeeID int
primary key(PK_emID)
)
insert into #employees
(
EmployeeID,
FirstName,
LastName,
Active,
FK_EployeeID
)
select 1, 'David', 'Engle', 1,null
union all
select 2, 'Amy', 'Edge', 0,null
union all
select 2, 'Amy','Engle',1,2

Moving away from STI - SQL to break single table into new multi-table structure

I am moving old project that used single table inheritance in to a new database, which is more structured. How would I write a SQL script to port this?
Old structure
I've simplified the SQL for legibility.
CREATE TABLE customers (
id int(11),
...
firstName varchar(50),
surname varchar(50),
address1 varchar(50),
address2 varchar(50),
town varchar(50),
county varchar(50),
postcode varchar(50),
country varchar(50),
delAddress1 varchar(50),
delAddress2 varchar(50),
delTown varchar(50),
delCounty varchar(50),
delPostcode varchar(50),
delCountry varchar(50),
tel varchar(50),
mobile varchar(50),
workTel varchar(50),
);
New structure
CREATE TABLE users (
id int(11),
firstName varchar(50),
surname varchar(50),
...
);
CREATE TABLE addresses (
id int(11),
ForeignKey(user),
street1 varchar(50),
street2 varchar(50),
town varchar(50),
county varchar(50),
postcode varchar(50),
country varchar(50),
type ...,
);
CREATE TABLE phone_numbers (
id int(11),
ForeignKey(user),
number varchar(50),
type ...,
);
With appropriate cross-database notations for table references if appropriate:
INSERT INTO Users(id, firstname, surname, ...)
SELECT id, firstname, surname, ...
FROM Customers;
INSERT INTO Addresses(id, street1, street2, ...)
SELECT id, street1, street2, ...
FROM Customers;
INSERT INTO Phone_Numbers(id, number, type, ...)
SELECT id, phone, type, ...
FROM Customers;
If you want both the new and the old address (del* version), then repeat the address operation on the two sets of source columns with appropriate tagging. Similarly, for the three phone numbers, repeat the phone number operation. Or use a UNION in each case.
First make sure to backup your existing data!
The process is differnt if you are going to use the original id field or generate a new one.
Assuming you are going to use the orginal, make sure that you have the ability to insert id fields into the table before you start (the SQL Server equivalent if you are autogenrating the number is Set identity Insert on, not sure what mysql would use). Wirte an insert from the old table to the parent table:
insert newparenttable (idfield, field1, field2)
select idfield, field1, field2 from old parent table
then write similar inserts for all the child tables depending on what fields you need. Where you have multiple phone numbers in differnt fields, for instance, you would use a union all stament as your insert select.
Insert newphone (phonenumber, userid, phonetype)
select home_phone, id, 100 from oldparenttable
union all
select work_phone, id, 101 from oldparenttable
Union all
select cell_phone, id, 102 from oldparenttable
If you are going to have a new id generated, then create the table with a field for the old id. You can drop this at the end (although I'd keep it for about six months). Then you can join from the new parent table to the old parent table on the oldid and grab the new id from the new parent table when you do you inserts to child tables. Something like:
Insert newphone (phonenumber, userid, phonetype)
select home_phone, n.id, 100 from oldparenttable o
join newparenttable n on n.oldid = o.id
union all
select work_phone, n.id, 101 fromoldparenttable o
join newparenttable n on n.oldid = o.id
Union all
select cell_phone, n.id, 102 from oldparenttable o
join newparenttable n on n.oldid = o.id