Most efficient way of comparing two tables in SQL Server - sql

So I have two tables which will store sales figures for products. Table one holds the last 6 weeks sales figures for each product and table 2 shows the last 12 months. I need to find a way to compare these two tables to then produce a 3rd table which will contain the difference between the 2 values as column 2 as well as the products Sage code in column one. What would be the most efficient (in terms of time) way to do this as there will be a fair amount of products to compare and it will only continue to grow? The product Sage code is the key identifier here. The two tables are created as below.
IF OBJECT_ID('tempdb..#Last6WeeksProductSales') IS NOT NULL DROP TABLE #Last6WeeksProductSales;
CREATE TABLE #Last6WeeksProductSales
(
CompoundSageCode varchar(200),
Value decimal(18,2)
)
INSERT INTO #Last6WeeksProductSales
SELECT [SalesOrderLine].[sProductSageCode] AS [CompoundSageCode],
SUM([SalesOrderLine].[fQtyOrdered] * [SalesOrderLine].[fPricePerUnit]) AS [Value]
FROM [SalesOrderLine]
INNER JOIN [SalesOrder] ON (SalesOrder.iSalesOrderID = SalesOrderLine.iSalesOrderID)
WHERE [SalesOrder].[dOrderDateTime] > DateAdd("ww", -6, CURRENT_TIMESTAMP)
GROUP BYsProductSageCode;
SELECT * FROM #Last6WeeksProductSales
ORDER BY CompoundSageCode;
IF OBJECT_ID('tempdb..#Last12MonthsProductSales') IS NOT NULL DROP TABLE #Last12MonthsProductSales;
CREATE TABLE #Last12MonthsProductSales
(
CompoundSageCode varchar(200),
Value decimal(18,2)
)
INSERT INTO #Last12MonthsProductSales SELECT [SalesOrderLine].[sProductSageCode] AS [CompoundSageCode],
SUM([SalesOrderLine].[fQtyOrdered] * [SalesOrderLine].[fPricePerUnit]) AS [Value]
FROM [SalesOrderLine]
INNER JOIN [SalesOrder] ON (SalesOrder.iSalesOrderID = SalesOrderLine.iSalesOrderID)
WHERE [SalesOrder].[dOrderDateTime] > DateAdd(month, -12, CURRENT_TIMESTAMP)
GROUP BY sProductSageCode;
SELECT * FROM#Last12MonthsProductSales
ORDER BY CompoundSageCode;
DROP TABLE #Last6WeeksProductSales;
DROP TABLE #Last12MonthsProductSales;

Use a view. That way you don't have to worry about updating your third table, and it will reflect current information. Base the view on a basic SELECT:
SELECT sixS.CompoundSageCode,
(twelveS.value - sixS.Value ) as diffValue
FROM Last6WeeksProductSales sixS
INNER JOIN Last12MonthsProductSales twelveS ON sixS.CompoundSageCode = twelveS.CompoundSageCode
(I have not tested this code, but it should be a good starting point.)

Computing the difference of two tables is usually done using a FULL OUTER JOIN. SQL Server can implement it using all three of the physical join operators. Apply reasonable indexing and it will run fine.
If you can manage it, create covering indexes on both tables that are sorted by the join key. This will result in a highly efficient merge join plan.

Related

Create New SQL Table w/o duplicates

I'm learning how to create tables in SQL pulling data from existing tables from two different databases. I am trying to create a table combining two tables without duplicates. I've seen some say using UNION but I could not get that to work.
Say TABLE 1 has 2 COLUMNS (IdNumber, Material) and TABLE 2 has 3 COLUMNS (IdNumber, Size, Description)
How can I create a new table (named TABLE3) that combines those two but only shows the columns (PartDescription, Weight, Color) but without duplicates.
What I have done so far is as follows:
CREATE TABLE #Materialsearch (IdNumber varchar(30), Material varchar(30))
CREATE TABLE #Sizesearch (idnumber varchar(30), Size varchar(30), Description varchar(50))
INSERT INTO #Materialsearch (IdNumber, Material)
SELECT [IdNumber],[Material]
FROM [datalist].[dbo].[Table1]
WHERE Material LIKE 'Steel' AND IdNumber NOT LIKE 'Steel'
INSERT INTO #Sizesearch (idnumber, Size, Description)
SELECT [idNumber],[itemSize], [ShortDesc]
FROM [515dap].[dbo].[Table2]
WHERE itemSize LIKE '1' AND idnumber NOT LIKE 'Steel'
SELECT DISTINCT #Materialsearch.IdNumber, #Materialsearch.Material,
#Sizesearch.Size, #Sizesearch.Description
FROM #Materialsearch
INNER JOIN #Sizesearch
ON #Materialsearch.IdNumber = #Sizesearch.idnumber
ORDER BY #Materialsearch.IdNumber
DROP TABLE #Materialsearch
DROP TABLE #Sizesearch
This would show all items that are made from steel but do not have steel as their itemid's.
Thanks for your help
I'm not 100% sure what you're after - but you may find this useful.
You could use a FULL OUTER JOIN which takes takes all rows from both tables, matches the ones it can, then reports all rows.
I'd suggest (for your understanding) running
SELECT A.*, B.*
FROM #Materialsearch AS A
FULL OUTER JOIN #Sizesearch AS B ON A.[IdNumber] = B.[IdNumber]
Then to get the relevant data, you just need some tweaks on that e.g.,
SELECT
ISNULL(A.[IdNumber], B.[IdNumber]) AS [IdNumber],
A.Material,
B.Size,
B.Description
FROM #Materialsearch AS A
FULL OUTER JOIN #Sizesearch AS B ON A.[IdNumber] = B.[IdNumber]
Edit: Changed typoed INNER JOINs to FULL OUTER JOINs. Oops :( Thankyou very much #Thorsten for finding it!

How to create a query on 2 existing tables and build a table(view) with data from both tables and a condition?

sorry the title might be a little bit confusing.
I will try to explain in full here and add some tables as examples.
Ok, what I have is a MS-SQL database that I use to store data/info coming from equipments that are mounted in some vehicles.
For the moment there are 2 tables in the database: "Communication" - a big table Used to store every information from the equipments when they connect to the TCP-server. Records are added one after another (only INSERTS here).
The table looks like this:
The second table "EquipmentStatus" - it has EquipmentID as a primary key, so it is a smaller table (only UPDATES here), and it looks like this:
Well, what I would need is a new table with Vehicles that DID communicate between Date_1 and Date_2, i.e. a SQL query that can provide this:
where Date_1 and Date_2 have been set to 20 respectively 22 of June).
Also, the next step would be to get also a table(view) for Vehicles that DID NOT communicate between Date_1 and Date_2, i.e. the one you see below:
Thanks for your time and patience!
I took a stab at these without having the data on my end, but this should get you on the right path.
A:
SELECT DISTINCT s.Vehicle_Number, s.Vehicle_Status, s.DateLastCommunication
FROM EquipmentStatus s
INNER JOIN Communications c
ON s.Equipment_ID = c.Equipment_ID
WHERE c.DateTimeCommunication BETWEEN '2015-06-20' AND '2015-06-22'
B:
SELECT DISTINCT s.Vehicle_Number, s.Vehicle_Status, s.DateLastCommunication
FROM EquipmentStatus s
INNER JOIN Communications c
ON s.Equipment_ID = c.Equipment_ID
WHERE s.Equipment_ID NOT IN
(SELECT Equipment_ID FROM Communications WHERE DateTimeCommunication
BETWEEN '2015-06-20' AND '2015-06-22')
is this what you are looking for?
declare #startDate datetime, #enddate datetime
--vehicles communicated
select A.Vehicle_Number, b.Vehicle_Status, A.DataLastCommunication
from (
Select Vehicle_Number, MAX(DataLastCommunication) as DataLastCommunication
from dbo.Communications
where DataLastCommunication between #startDate and #enddate
group by Vehicle_Number
) as A
inner join dbo.EquipmentStatus b
on a.Vehicle_Number=b.Vehicle_Number
and a.DataLastCommunication = b.DataLastCommunication
--vehicles did not communicate
select a.Vehicle_Number, a.Vehicle_Status, a.DataLastCommunication
from dbo.EquipmentStatus a
where not exists(
select 1
from dbo.Communications
where Vehicle_Number=a.Vehicle_Number
and DataLastCommunication between #startDate and #enddate
)
The last query should use a Vehicles table instead of the communications table because it could be possible that there is a new vehicle which has no entries yet in any of these 2 tables (which will return the vehicle number but the status and date will be null)....
In sql-server, you can use functions to return the two different set of data, just like two new tables:
create function f_has_comm(#date1 datetime, #date2 datetime)
returns table
as
return (select Vehicle_Number, Vehical_status, DateLastCommunication
from EquipmentStatus where EquipmentID in (select EquipmentID from Communication
where DatetimeCommunication>=#date1 and DatetimeCommunication<=#date2)
)
go
create function f_no_comm(#date1 datetime, #date2 datetime)
returns table
as
return (select Vehicle_Number, Vehical_status, DateLastCommunication
from EquipmentStatus where EquipmentID NOT IN (select EquipmentID from Communication
where DatetimeCommunication>=#date1 and DatetimeCommunication<=#date2)
)
go
and now you can select from the two functions like:
select * from f_has_comm('2015-6-1','2015-6-20');
select * from f_no_comm('2015-6-1','2015-6-20');

1 to Many join for SQL View where joined table may have 0 records

I have been tasked with creating a view that needs to bring in up to 10 records from another table. Problem is this table may have 0, 5, 10, or more corresponding records.
Here is the very simplified design to only include what is relevant
SalesOrderTable OutsideSalesRepTable SalesRepTable
OrderID BranchID SalesRepID
CustID CustID SalesRepName
BranchID SalesRepID
The first join needs to be between SalesOrderTable and OutsideSalesRepTable on BranchID & CustID
The second join needs to be between OutsideSalesRepTable and SalesRepTable on SalesRepID
The view will need to have columns listed as OutsideSalesRep1, OutsideSalesRep2, ... OutsideSalesRep10 and filled with the SalesRepName. I have no control over the design of this database. I would have much rather seen 10 fields dedicated to SalesRepIDs in the customer table and just used left joins.
If only 3 OutsideSalesReps exsit for the branch/customer than OutsideSalesRep4-10 should be null
This is the only part of the 165 column / 35+ table view I wasn't able to figure out.
Any help would be sincerely appreciated.
PS I'm semi-fresh to TSQL. Only been using it about 6 months.
EDIT: I linked to an image that shows a sample of the source data to assist (I hope) explain what I'm looking for.
the Pivot Table needs to show
SONum OutsideRep1 OutsideRep2 OutsideRep3 ..... Outside Rep10
5819 59 69 70 null null
5821 59 70 null null null
http://www.bayernsupport.com/SQL.png
Sounds like you need to join your tables with an outer join (ie: left join or right join) (to allow for joins where there are no results) and to use a pivot to create columns from rows.
http://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx
got it working with the assistance of a friend. It did require a pivot, but it also required an interesting query as the source, bear in mind the field names below don't exactly match, but the structure and end result was dead on.
SELECT *
FROM
(
SELECT so.OrderID,
so.OrderName,
sr.SalesRepName,
'SalesRep_'+CAST(ROW_NUMBER() OVER(PARTITION BY OrderName ORDER BY SalesRepName) AS VARCHAR(30)) rn
FROM #SalesOrderTable so
JOIN #OutsideSalesRepTable osp ON so.BranchID = osp.BranchID and so.CustID=osp.CustID
JOIN #SalesRepTable sr ON osp.SalesRepID = sr.SalesRepID
) src
PIVOT
(
MAX(SalesRepName)
FOR rn in (SalesRep_1, SalesRep_2, SalesRep_3,SalesRep_4, SalesRep_5,
SalesRep_6,SalesRep_7,SalesRep_8,SalesRep_9,SalesRep_10)
) piv

finding consecutive date pairs in SQL

I have a question here that looks a little like some of the ones that I found in search, but with solutions for slightly different problems and, importantly, ones that don't work in SQL 2000.
I have a very large table with a lot of redundant data that I am trying to reduce down to just the useful entries. It's a history table, and the way it works, if two entries are essentially duplicates and consecutive when sorted by date, the latter can be deleted. The data from the earlier entry will be used when historical data is requested from a date between the effective date of that entry and the next non-duplicate entry.
The data looks something like this:
id user_id effective_date important_value useless_value
1 1 1/3/2007 3 0
2 1 1/4/2007 3 1
3 1 1/6/2007 NULL 1
4 1 2/1/2007 3 0
5 2 1/5/2007 12 1
6 3 1/1/1899 7 0
With this sample set, we would consider two consecutive rows duplicates if the user_id and the important_value are the same. From this sample set, we would only delete row with id=2, preserving the information from 1-3-2007, showing that the important_value changed on 1-6-2007, and then showing the relevant change again on 2-1-2007.
My current approach is awkward and time-consuming, and I know there must be a better way. I wrote a script that uses a cursor to iterate through the user_id values (since that breaks the huge table up into manageable pieces), and creates a temp table of just the rows for that user. Then to get consecutive entries, it takes the temp table, joins it to itself on the condition that there are no other entries in the temp table with a date between the two dates. In the pseudocode below, UDF_SameOrNull is a function that returns 1 if the two values passed in are the same or if they are both NULL.
WHILE (##fetch_status <> -1)
BEGIN
SELECT * FROM History INTO #history WHERE user_id = #UserId
--return entries to delete
SELECT h2.id
INTO #delete_history_ids
FROM #history h1
JOIN #history h2 ON
h1.effective_date < h2.effective_date
AND dbo.UDF_SameOrNull(h1.important_value, h2.important_value)=1
WHERE NOT EXISTS (SELECT 1 FROM #history hx WHERE hx.effective_date > h1.effective_date and hx.effective_date < h2.effective_date)
DELETE h1
FROM History h1
JOIN #delete_history_ids dh ON
h1.id = dh.id
FETCH NEXT FROM UserCursor INTO #UserId
END
It also loops over the same set of duplicates until there are none, since taking out rows creates new consecutive pairs that are potentially dupes. I left that out for simplicity.
Unfortunately, I must use SQL Server 2000 for this task and I am pretty sure that it does not support ROW_NUMBER() for a more elegant way to find consecutive entries.
Thanks for reading. I apologize for any unnecessary backstory or errors in the pseudocode.
OK, I think I figured this one out, excellent question!
First, I made the assumption that the effective_date column will not be duplicated for a user_id. I think it can be modified to work if that is not the case - so let me know if we need to account for that.
The process basically takes the table of values and self-joins on equal user_id and important_value and prior effective_date. Then, we do 1 more self-join on user_id that effectively checks to see if the 2 joined records above are sequential by verifying that there is no effective_date record that occurs between those 2 records.
It's just a select statement for now - it should select all records that are to be deleted. So if you verify that it is returning the correct data, simply change the select * to delete tcheck.
Let me know if you have questions.
select
*
from
History tcheck
inner join History tprev
on tprev.[user_id] = tcheck.[user_id]
and tprev.important_value = tcheck.important_value
and tprev.effective_date < tcheck.effective_date
left join History checkbtwn
on tcheck.[user_id] = checkbtwn.[user_id]
and checkbtwn.effective_date < tcheck.effective_date
and checkbtwn.effective_date > tprev.effective_date
where
checkbtwn.[user_id] is null
OK guys, I did some thinking last night and I think I found the answer. I hope this helps someone else who has to match consecutive pairs in data and for some reason is also stuck in SQL Server 2000.
I was inspired by the other results that say to use ROW_NUMBER(), and I used a very similar approach, but with an identity column.
--create table with identity column
CREATE TABLE #history (
id int,
user_id int,
effective_date datetime,
important_value int,
useless_value int,
idx int IDENTITY(1,1)
)
--insert rows ordered by effective_date and now indexed in order
INSERT INTO #history
SELECT * FROM History
WHERE user_id = #user_id
ORDER BY effective_date
--get pairs where consecutive values match
SELECT *
FROM #history h1
JOIN #history h2 ON
h1.idx+1 = h2.idx
WHERE h1.important_value = h2.important_value
With this approach, I still have to iterate over the results until it returns nothing, but I can't think of any way around that and this approach is miles ahead of my last one.

MySQL LEFT JOIN SELECT not selecting all the left side records?

I'm getting odd results from a MySQL SELECT query involving a LEFT JOIN, and I can't understand whether my understanding of LEFT JOIN is wrong or whether I'm seeing a genuinely odd behavior.
I have a two tables with a many-to-one relationship: For every record in table 1 there are 0 or more records in table 2. I want to select all the records in table 1 with a column that counts the number of related records in table 2. As I understand it, LEFT JOIN should always return all records on the LEFT side of the statement.
Here's a test database that exhibits the problem:
CREATE DATABASE Test;
USE Test;
CREATE TABLE Dates (
dateID INT UNSIGNED NOT NULL AUTO_INCREMENT,
date DATE NOT NULL,
UNIQUE KEY (dateID)
) TYPE=MyISAM;
CREATE TABLE Slots (
slotID INT UNSIGNED NOT NULL AUTO_INCREMENT,
dateID INT UNSIGNED NOT NULL,
UNIQUE KEY (slotID)
) TYPE=MyISAM;
INSERT INTO Dates (date) VALUES ('2008-10-12'),('2008-10-13'),('2008-10-14');
INSERT INTO Slots (dateID) VALUES (3);
The Dates table has three records, and the Slots 1 - and that record points to the third record in Dates.
If I do the following query..
SELECT d.date, count(s.slotID) FROM Dates AS d LEFT JOIN Slots AS s ON s.dateID=d.dateID GROUP BY s.dateID;
..I expect to see a table with 3 rows in - two with a count of 0, and one with a count of 1. But what I actually see is this:
+------------+-----------------+
| date | count(s.slotID) |
+------------+-----------------+
| 2008-10-12 | 0 |
| 2008-10-14 | 1 |
+------------+-----------------+
The first record with a zero count appears, but the later record with a zero count is ignored.
Am I doing something wrong, or do I just not understand what LEFT JOIN is supposed to do?
You need to GROUP BY d.dateID. In two of your cases, s.DateID is NULL (LEFT JOIN) and these are combined together.
I think you will also find that this is invalid (ANSI) SQL, because d.date is not part of a GROUP BY or the result of an aggregate operation, and should not be able to be SELECTed.
I think you mean to group by d.dateId.
Try removing the GROUP BY s.dateID
The dateid for 10-12 and 10-13 are groupd together by you. Since they are 2 null values the count is evaluated to 0
I don't know if this is valid in MySQL but you could probably void this mistake in the future by using the following syntax instead
SELECT date, count(slotID) as slotCount
FROM Dates LEFT OUTER JOIN Slots USING (dateID)
GROUP BY (date)
By using the USING clause you don't get two dateID's to keep track of.
replace GROUP BY s.dateID with d.dateID.