Related
Is there any Oracle SQL operator or function, which compares 2 result sets whether they are the exact same or not. Currently my idea is to use MINUS operator in both directions, but I am looking for a better and performanter solution to achieve. The one result set is fixed (see below), the other depends on the records.
Very important: I am not allowed to change the schema and structure. So CREATE TABLE and CREATE TYPE etc. are not allowed here for me. Also important that oracle11g version is used where the solution must be found.
The shema for SQL Fiddle is:
CREATE TABLE DETAILS (ID INT, MAIN_ID INT, VALUE INT);
INSERT INTO DETAILS VALUES (1,1,1);
INSERT INTO DETAILS VALUES (2,1,2);
INSERT INTO DETAILS VALUES (3,1,3);
INSERT INTO DETAILS VALUES (4,1,4);
INSERT INTO DETAILS VALUES (5,2,1);
INSERT INTO DETAILS VALUES (6,2,2);
INSERT INTO DETAILS VALUES (7,3,1);
INSERT INTO DETAILS VALUES (7,3,2);
Now this is my SQL query for doing the job well (selects MAIN_IDs of those, whose 'VALUE's are exactly the same as the given lists'):
SELECT DISTINCT D.MAIN_ID FROM DETAILS D WHERE NOT EXISTS
(SELECT VALUE FROM DETAILS WHERE MAIN_ID=D.MAIN_ID
MINUS
SELECT * FROM TABLE(SYS.ODCINUMBERLIST(1, 2)))
AND NOT EXISTS
(SELECT * FROM TABLE(SYS.ODCINUMBERLIST(1, 2))
MINUS
SELECT VALUE FROM DETAILS WHERE MAIN_ID=D.MAIN_ID)
The SQL Fiddle link: http://sqlfiddle.com/#!4/25dde/7/0
If you use a collection (rather than a VARRAY) then you can aggregate the values into a collection and directly compare two collections:
CREATE TYPE int_list AS TABLE OF INT;
Then:
SELECT main_id
FROM details
GROUP BY main_id
HAVING CAST( COLLECT( value ) AS int_list ) = int_list( 1, 2 );
Outputs:
| MAIN_ID |
| ------: |
| 2 |
| 3 |
db<>fiddle here
Update
Based on your expanded fiddle in comments, you can use:
SELECT B.ID
FROM BUSINESS_DATA B
INNER JOIN BUSINESS_NAME N
ON ( B.NAME_ID=N.ID )
WHERE N.NAME='B1'
AND EXISTS (
SELECT business_id
FROM ORDERS O
LEFT OUTER JOIN TABLE(
SYS.ODCIDATELIST( DATE '2021-01-03', DATE '2020-04-07', DATE '2020-05-07' )
) d
ON ( o.orderdate = d.COLUMN_VALUE )
WHERE O.BUSINESS_ID=B.ID
GROUP BY business_id
HAVING COUNT( CASE WHEN d.COLUMN_VALUE IS NULL THEN 1 END ) = 0
AND COUNT( DISTINCT o.orderdate )
= ( SELECT COUNT(DISTINCT COLUMN_VALUE) FROM TABLE( SYS.ODCIDATELIST( DATE '2021-01-03', DATE '2020-04-07', DATE '2020-05-07' ) ) )
)
(Note: Do not implicitly create dates from strings; it will cause the query to fail, without there being any changes to the query text, if a user changes their NLS_DATE_FORMAT session parameter. Instead use TO_DATE with an appropriate format model or a DATE literal.)
db<>fiddle here
I have two tables. Table A and table B. Both of them have date fields. I need compare those fields and get a table C with the less or equal date between Table A and table B, taking into account that the table A is the main.
CONTEXT: I have in Table A Expiration of products, and in table B on business days. The user can update table B when it is determined
that a date is not to be considered as a "business day". Then delete
the date from table B and then go to table A to update all product
expirations that were registered with that date and assign them the
previous business day. So in my case I am creating table C, which
contains the Id of table A and the working date less or equal to the
date mentioned. Then I will make the respective update.
IF OBJECT_ID('tempdb..#tmpA') IS NOT NULL DROP TABLE #tmpA
IF OBJECT_ID('tempdb..#tmpB') IS NOT NULL DROP TABLE #tmpB
CREATE TABLE #tmpA(Id INT IDENTITY(100,1),Fecha date)
INSERT INTO #tmpA(Fecha)
VALUES
('20170101'),('20171003'),('20170504'),('2017-09-01')
SELECT * FROM #tmpA
Id Fecha
----------- ----------
100 2017-01-01
101 2017-10-03
102 2017-05-04
103 2017-09-01
CREATE TABLE #tmpB(Id INT IDENTITY(1,4),Fecha date)
INSERT INTO #tmpB(Fecha)
VALUES
('20170101'),('20171001'),('20170504')
SELECT * FROM #tmpB
Id Fecha
----------- ----------
1 2017-01-01
5 2017-10-01
9 2017-05-04
I want to get this result (The same number of records in table A):
Id Fecha
----------- ----------
100 2017-01-01
101 2017-10-01 --> **this row is less than 2017-10-03**
102 2017-05-04
103 2017-05-04 --> **this row is less than 2017-09-01**
I tried to built some queries without results,
IF OBJECT_ID('tempdb..#tmpC') IS NOT NULL DROP TABLE #tmpC
SELECT A.* INTO #tmpC FROM #tmpA A LEFT JOIN #tmpB B ON A.Fecha = B.Fecha WHERE B.Fecha IS NULL
SELECT * FROM #tmpC
SELECT *
FROM #tmpA A INNER JOIN
(
SELECT *
FROM #tmpC
GROUP BY id, Fecha
) AS Q ON MAX(Q.Fecha) <= A.Fecha
UPDATE:
NOTE. The Id column is simply an identity, but it does not mean that it should be related. The important thing is the dates.
Regards
While I'm not sure if this will scale well (if you have more than 100k rows) this will bring back the results which you want.
Theoretically, the correct way for you to do this, in a fashion which will scale well, would be to have a view where you utilize RANK() and join both of these tables together, though this was the quick and easy way. Please try this and let me know if it meets your requirements.
For your edification, I have left both of the dates in there for you to be able to compare them.
SELECT
A.ID
,A.FECHA OLDDATE
,B.FECHA CORRECTDATE
FROM #TMPA A
LEFT OUTER JOIN #TMPB B ON 1=1
WHERE 1=1
AND B.FECHA = (
SELECT MAX(FECHA)
FROM #TMPB
WHERE FECHA <= A.FECHA)
Is this what you want?
select a.id,
(case when b.fecha < a.fecha then b.fecha else a.fecha end) as fecha
from #tmpA a left join
#tmpB b
on a.id = b.id;
You can get minmum by union all
select id, min(fecha) from (
select * from #tmpA
union all
select * from #tmpB
) a
group by a.id
#JotaPardo WHERE 1=1 is used to basically make sure the query runs if the WHERE conditions don't hold up. 1=1 will equate to true so saying WHERE 1=1 or WHERE TRUE, and TRUE is always TRUE, ensures the query will have at least one WHERE clause condition that will always hold up.
I am using SQL Server 2008 and need to create a query that shows rows that fall within a date range.
My table is as follows:
ADM_ID WH_PID WH_IN_DATETIME WH_OUT_DATETIME
My rules are:
If the WH_OUT_DATETIME is on or within 24 hours of the WH_IN_DATETIME of another ADM_ID with the same WH_P_ID
I would like another column added to the results which identify the grouped value if possible as EP_ID.
e.g.
ADM_ID WH_PID WH_IN_DATETIME WH_OUT_DATETIME
------ ------ -------------- ---------------
1 9 2014-10-12 00:00:00 2014-10-13 15:00:00
2 9 2014-10-14 14:00:00 2014-10-15 15:00:00
3 9 2014-10-16 14:00:00 2014-10-17 15:00:00
4 9 2014-11-20 00:00:00 2014-11-21 00:00:00
5 5 2014-10-17 00:00:00 2014-10-18 00:00:00
Would return rows with:
ADM_ID WH_PID EP_ID EP_IN_DATETIME EP_OUT_DATETIME WH_IN_DATETIME WH_OUT_DATETIME
------ ------ ----- ------------------- ------------------- ------------------- -------------------
1 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-12 00:00:00 2014-10-13 15:00:00
2 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-14 14:00:00 2014-10-15 15:00:00
3 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-16 14:00:00 2014-10-17 15:00:00
4 9 2 2014-11-20 00:00:00 2014-11-20 00:00:00 2014-10-16 14:00:00 2014-11-21 00:00:00
5 5 1 2014-10-17 00:00:00 2014-10-18 00:00:00 2014-10-17 00:00:00 2014-10-18 00:00:00
The EP_OUT_DATETIME will always be the latest date in the group. Hope this clarifies a bit.
This way, I can group by the EP_ID and find the EP_OUT_DATETIME and start time for any ADM_ID/PID that fall within.
Each should roll into the next, meaning that if another row has an WH_IN_DATETIME which follows on the WH_OUT_DATETIME of another for the same WH_PID, than that row's WH_OUT_DATETIME becomes the EP_OUT_DATETIME for all of the WH_PID's within that EP_ID.
I hope this makes some sense.
Thanks,
MR
Since the question does not specify that the solution be a "single" query ;-), here is another approach: using the "quirky update" feature dealy, which is updating a variable at the same time you update a column. Breaking down the complexity of this operation, I create a scratch table to hold the piece that is the hardest to calculate: the EP_ID. Once that is done, it gets joined into a simple query and provides the window with which to calculate the EP_IN_DATETIME and EP_OUT_DATETIME fields.
The steps are:
Create the scratch table
Seed the scratch table with all of the ADM_ID values -- this lets us do an UPDATE as all of the rows already exist.
Update the scratch table
Do the final, simple select joining the scratch table to the main table
The Test Setup
SET ANSI_NULLS ON;
SET NOCOUNT ON;
CREATE TABLE #Table
(
ADM_ID INT NOT NULL PRIMARY KEY,
WH_PID INT NOT NULL,
WH_IN_DATETIME DATETIME NOT NULL,
WH_OUT_DATETIME DATETIME NOT NULL
);
INSERT INTO #Table VALUES (1, 9, '2014-10-12 00:00:00', '2014-10-13 15:00:00');
INSERT INTO #Table VALUES (2, 9, '2014-10-14 14:00:00', '2014-10-15 15:00:00');
INSERT INTO #Table VALUES (3, 9, '2014-10-16 14:00:00', '2014-10-17 15:00:00');
INSERT INTO #Table VALUES (4, 9, '2014-11-20 00:00:00', '2014-11-21 00:00:00');
INSERT INTO #Table VALUES (5, 5, '2014-10-17 00:00:00', '2014-10-18 00:00:00');
Step 1: Create and Populate the Scratch Table
CREATE TABLE #Scratch
(
ADM_ID INT NOT NULL PRIMARY KEY,
EP_ID INT NOT NULL
-- Might need WH_PID and WH_IN_DATETIME fields to guarantee proper UPDATE ordering
);
INSERT INTO #Scratch (ADM_ID, EP_ID)
SELECT ADM_ID, 0
FROM #Table;
Alternate scratch table structure to ensure proper update order (since "quirky update" uses the order of the Clustered Index, as noted at the bottom of this answer):
CREATE TABLE #Scratch
(
WH_PID INT NOT NULL,
WH_IN_DATETIME DATETIME NOT NULL,
ADM_ID INT NOT NULL,
EP_ID INT NOT NULL
);
INSERT INTO #Scratch (WH_PID, WH_IN_DATETIME, ADM_ID, EP_ID)
SELECT WH_PID, WH_IN_DATETIME, ADM_ID, 0
FROM #Table;
CREATE UNIQUE CLUSTERED INDEX [CIX_Scratch]
ON #Scratch (WH_PID, WH_IN_DATETIME, ADM_ID);
Step 2: Update the Scratch Table using a local variable to keep track of the prior value
DECLARE #EP_ID INT; -- this is used in the UPDATE
;WITH cte AS
(
SELECT TOP (100) PERCENT
t1.*,
t2.WH_OUT_DATETIME AS [PriorOut],
t2.ADM_ID AS [PriorID],
ROW_NUMBER() OVER (PARTITION BY t1.WH_PID ORDER BY t1.WH_IN_DATETIME)
AS [RowNum]
FROM #Table t1
LEFT JOIN #Table t2
ON t2.WH_PID = t1.WH_PID
AND t2.ADM_ID <> t1.ADM_ID
AND t2.WH_OUT_DATETIME >= (t1.WH_IN_DATETIME - 1)
AND t2.WH_OUT_DATETIME < t1.WH_IN_DATETIME
ORDER BY t1.WH_PID, t1.WH_IN_DATETIME
)
UPDATE sc
SET #EP_ID = sc.EP_ID = CASE
WHEN cte.RowNum = 1 THEN 1
WHEN cte.[PriorOut] IS NULL THEN (#EP_ID + 1)
ELSE #EP_ID
END
FROM #Scratch sc
INNER JOIN cte
ON cte.ADM_ID = sc.ADM_ID
Step 3: Select Joining the Scratch Table
SELECT tab.ADM_ID,
tab.WH_PID,
sc.EP_ID,
MIN(tab.WH_IN_DATETIME) OVER (PARTITION BY tab.WH_PID, sc.EP_ID)
AS [EP_IN_DATETIME],
MAX(tab.WH_OUT_DATETIME) OVER (PARTITION BY tab.WH_PID, sc.EP_ID)
AS [EP_OUT_DATETIME],
tab.WH_IN_DATETIME,
tab.WH_OUT_DATETIME
FROM #Table tab
INNER JOIN #Scratch sc
ON sc.ADM_ID = tab.ADM_ID
ORDER BY tab.ADM_ID;
Resources
MSDN page for UPDATE
look for "#variable = column = expression"
Performance Analysis of doing Running Totals (not exactly the same thing as here, but not too far off)
This blog post does mention:
PRO: this method is generally pretty fast
CON: "The order of the UPDATE is controlled by the order of the clustered index". This behavior might rule out using this method depending on circumstances. But in this particular case, if the WH_PID values are not at least grouped together naturally via the ordering of the clustered index and ordered by WH_IN_DATETIME, then those two fields just get added to the scratch table and the PK (with implied clustered index) on the scratch table becomes (WH_PID, WH_IN_DATETIME, ADM_ID).
I would do this using exists in a correlated subquery:
select t.*,
(case when exists (select 1
from table t2
where t2.WH_P_ID = t.WH_P_ID and
t2.ADM_ID = t.ADM_ID and
t.WH_OUT_DATETIME between t2.WH_IN_DATETIME and dateadd(day, 1, t2.WH_OUT_DATETIME)
)
then 1 else 0
end) as TimeFrameFlag
from table t;
Try this query :
;WITH cte
AS (SELECT t1.ADM_ID AS EP_ID,*
FROM #yourtable t1
WHERE NOT EXISTS (SELECT 1
FROM #yourtable t2
WHERE t1.WH_PID = t2.WH_PID
AND t1.ADM_ID <> t2.ADM_ID
AND Abs(Datediff(HH, t1.WH_OUT_DATETIME, t2.WH_IN_DATETIME)) <= 24)
UNION ALL
SELECT t2.EP_ID,t1.ADM_ID,t1.WH_PID,t1.WH_IN_DATETIME,t1.WH_OUT_DATETIME
FROM #yourtable t1
JOIN cte t2
ON t1.WH_PID = t2.WH_PID
AND t1.ADM_ID <> t2.ADM_ID
AND Abs(( Datediff(HH, t2.WH_IN_DATETIME, t1.WH_OUT_DATETIME) )) <= 24),
cte_result
AS (SELECT t1.*,Dense_rank() OVER ( partition BY wh_pid ORDER BY t1.WH_PID, ISNULL(t2.EP_ID, t1.ADM_ID)) AS EP_ID
FROM #yourtable t1
LEFT OUTER JOIN (SELECT DISTINCT ADM_ID,
EP_ID
FROM cte) t2
ON t1.ADM_ID = t2.ADM_ID)
SELECT ADM_ID,WH_PID,EP_ID,Min(WH_IN_DATETIME)OVER(partition BY wh_pid, ep_id) AS [EP_IN_DATETIME],Max(WH_OUT_DATETIME)OVER(partition BY wh_pid, ep_id) AS [EP_OUT_DATETIME],
WH_IN_DATETIME,
WH_OUT_DATETIME
FROM cte_result
ORDER BY ADM_ID
I assumed these things :
Those rows which follow your rule, are a group.
min(WH_IN_DATETIME) of the group will be shown in EP_IN_DATETIME column for all rows belong to that group. Similarly, max(WH_OUT_DATETIME) of the group will be shown in EP_IN_DATETIME column for all rows belong to that group.
EP_ID will be assigned to groups of each WH_PID separately.
One thing which is not justified by your question that how EP_OUT_DATETIME and WH_IN_DATETIME of 4th row become 2014-11-20 00:00:00 and 2014-10-16 14:00:00 respectively. Assuming that it is a typo and it should be 2014-11-21 00:00:00.000 and 2014-11-20 00:00:00.000.
Explaination :
First CTE cte will return the possible groups based on your rule. Second CTE cte_result will assign EP_ID to groups. In the last, you can select min(WH_IN_DATETIME) and Max(WH_OUT_DATETIME) in partitions of wh_pid, ep_id.
sqlfiddle
Here's yet another alternative... which may miss your results still.
I agree with #NoDisplayName that there appears to be an error in your ADM_ID 5 output, the 2 OUT dates should match - at least that seems logical to me. I can't understand why you would want an out date to ever be showing an in date value, but of course there could be a good reason. :)
Also, the wording of your question makes it sound like this is just a part of the problem and that you may take this output to then further. I'm not sure what you are really aiming for, but I've broken the query below up into 2 CTEs and you may find your final information in the 2nd CTE (as it sounds like you want to group the data back together).
Here's the complete structure & query on SQL Fiddle
-- The Cross Join ensures we always have a pair of first and last time pairs
-- The left join matches all overlapping combinations,
-- allowing the where clause to restrict to just the first and last
-- These first/last pairs are then grouped in the first CTE
-- Data is restricted in the second CTE
-- The final select is then quite simple
With GroupedData AS (
SELECT
(Row_Number() OVER (ORDER BY t1.WH_PID, t1.WH_IN_DATETIME) - 1) / 2 Grp,
t1.WH_IN_DATETIME, t1.WH_OUT_DATETIME, t1.WH_PID
FROM yourtable t1
CROSS JOIN (SELECT 0 AS [First] UNION SELECT 1) SetOrder
LEFT OUTER JOIN yourtable t2
ON t1.WH_PID = t2.WH_PID
AND ((DATEADD(d,1,t1.WH_OUT_DATETIME) BETWEEN t2.WH_IN_DATETIME AND t2.WH_OUT_DATETIME AND [First] = 0)
OR (DATEADD(d,1,t2.WH_OUT_DATETIME) BETWEEN t1.WH_IN_DATETIME AND t1.WH_OUT_DATETIME AND [First] = 1))
WHERE t2.WH_PID IS NULL
), RestrictedData AS (
SELECT WH_PID, MIN(WH_IN_DATETIME) AS WH_IN_DATETIME, MAX(WH_OUT_DATETIME) AS WH_OUT_DATETIME
FROM GroupedData
GROUP BY Grp, WH_PID
)
SELECT yourtable.ADM_ID, yourtable.WH_PID, RestrictedData.WH_IN_DATETIME AS EP_IN_DATETIME, RestrictedData.WH_OUT_DATETIME AS EP_OUT_DATETIME, yourtable.WH_IN_DATETIME, yourtable.WH_OUT_DATETIME
FROM RestrictedData
INNER JOIN yourtable
ON RestrictedData.WH_PID = yourtable.WH_PID
AND yourtable.WH_IN_DATETIME BETWEEN RestrictedData.WH_IN_DATETIME AND RestrictedData.WH_OUT_DATETIME
ORDER BY yourtable.ADM_ID
A Left Outer Join and DateDiff Function should help you to filter the records. Finally Use Window Function to create GroupID's
create table #test
(ADM_ID int,WH_PID int,WH_IN_DATETIME DATETIME,WH_OUT_DATETIME DATETIME)
INSERT #test
VALUES ( 1,9,'2014-10-12 00:00:00','2014-10-13 15:00:00'),
(2,9,'2014-10-14 14:00:00','2014-10-15 15:00:00'),
(3,9,'2014-10-16 14:00:00','2014-10-17 15:00:00'),
(1,10,'2014-10-16 14:00:00','2014-10-17 15:00:00'),
(2,10,'2014-10-18 14:00:00','2014-10-19 15:00:00')
SELECT Row_number()OVER(partition by a.WH_PID ORDER BY a.WH_IN_DATETIME) Group_Id,
a.WH_PID,
a.WH_IN_DATETIME,
b.WH_OUT_DATETIME
FROM #test a
LEFT JOIN #test b
ON a.WH_PID = b.WH_PID
AND a.ADM_ID <> b.ADM_ID
where Datediff(hh, a.WH_OUT_DATETIME, b.WH_IN_DATETIME)BETWEEN 0 AND 24
OUTPUT :
Group_Id WH_PID WH_IN_DATETIME WH_OUT_DATETIME
-------- ------ ----------------------- -----------------------
1 9 2014-10-12 00:00:00.000 2014-10-15 15:00:00.000
2 9 2014-10-14 14:00:00.000 2014-10-17 15:00:00.000
1 10 2014-10-16 14:00:00.000 2014-10-19 15:00:00.000
Say, I have a table with C columns and N rows. I would like to produce a select statement that represents the "join" of that table with a data range comprising, M days. The resultant result set should have C+1 columns (the last one being the date) and NXM rows.
Trivial example to clarify things:
Given the table A below:
select * from A;
avalue |
--------+
"a" |
And a date range from 10 to 12 of October 2012, I want the following result set:
avalue | date
--------+-------
"a" | 2012-10-10
"a" | 2012-10-11
"a" | 2012-10-12
(this is a stepping stone I need towards ultimately calculating inventory levels on any given day, given starting values and deltas)
The Postgres way for this is simple: CROSS JOIN to the function generate_series():
SELECT t.*, g.day::date
FROM tbl t
CROSS JOIN generate_series(timestamp '2012-10-10'
, timestamp '2012-10-12'
, interval '1 day') AS g(day);
Produces exactly the output requested.
generate_series() is a set-returning function (a.k.a. "table function") producing a derived table. There are a couple of overloaded variants, here's why I chose timestamp input:
Generating time series between two dates in PostgreSQL
For arbitrary dates, replace generate_series() with a VALUES expression. No need to persist a table:
SELECT *
FROM tbl t
CROSS JOIN (
VALUES
(date '2012-08-13') -- explicit type in 1st row
, ('2012-09-05')
, ('2012-10-10')
) g(day);
If the date table has more dates in it than you're interested in, then do
select a.avalue, b.date from a, b where b.date between '2012-10-10' and '2012-10-12'
Other wise if the date table contained only the dates you were interested in, a cartesian join would accomplish this:
select * from a,b;
declare
#Date1 datetime = '20121010',
#Date2 datetime = '20121012';
with Dates
as
(
select #Date1 as [Date]
union all
select dateadd(dd, 1, D.[Date]) as [Date]
from Dates as D
where D.[Date] <= DATEADD(dd, -1, #Date2)
)
select
A.value, D.[Date]
from Dates as D
cross join A
For MySQL
schema/data:
CREATE TABLE someTable
(
someCol varchar(8) not null
);
INSERT INTO someTable VALUES ('a');
CREATE TABLE calendar
(
calDate datetime not null,
isBus bit
);
ALTER TABLE calendar
ADD CONSTRAINT PK_calendar
PRIMARY KEY (calDate);
INSERT INTO calendar VALUES ('2012-10-10', 1);
INSERT INTO calendar VALUES ('2012-10-11', 1);
INSERT INTO calendar VALUES ('2012-10-12', 1);
query:
select s.someCol, c.calDate from someTable s, calendar c;
You really have two options for what you are trying to do.
If your RDBMS supports it (I know SQL Server does, but I don't know any others), you can create a table-valued function which takes in a date range and returns a result set of all the discrete dates within that range. You would do a cartesian join between your table and the function.
You can create a static table of date values and then do a cartesian join between the two tables.
The second option will perform better, especially if you are dealing with large date ranges, however, that solution will not be able to handle arbitrary date ranges. But then, you should know your minimum date, and you can alway add more dates to your table as time goes on.
I am not very clear about your M table. Providing that you have such a table(M) with dates, following cross join will bring the results.
SELECT C.*, M.date FROM C CROSS JOIN M
I have an interesting SQL problem that I need help with.
Here is the sample dataset:
Warehouse DateStamp TimeStamp ItemNumber ID
A 8/1/2009 10001 abc 1
B 8/1/2009 10002 abc 1
A 8/3/2009 12144 qrs 5
C 8/3/2009 12143 qrs 5
D 8/5/2009 6754 xyz 6
B 8/5/2009 6755 xyz 6
This dataset represents inventory transfers between two warehouses. There are two records that represent each transfer, and these two transfer records always have the same ItemNumber, DateStamp, and ID. The TimeStamp values for the two transfer records always have a difference of 1, where the smaller TimeStamp represents the source warehouse record and the larger TimeStamp represents the destination warehouse record.
Using the sample dataset above, here is the query result set that I need:
Warehouse_Source Warehouse_Destination ItemNumber DateStamp
A B abc 8/1/2009
C A qrs 8/3/2009
D B xyz 8/5/2009
I can write code to produce the desired result set, but I was wondering if this record combination was possible through SQL. I am using SQL Server 2005 as my underlying database. I also need to add a WHERE clause to the SQL, so that for example, I could search on Warehouse_Source = A. And no, I can't change the data model ;).
Any advice is greatly appreciated!
Regards,
Mark
SELECT source.Warehouse as Warehouse_Source
, dest.Warehouse as Warehouse_Destination
, source.ItemNumber
, source.DateStamp
FROM table source
JOIN table dest ON source.ID = dest.ID
AND source.ItemNumber = dest.ItemNumber
AND source.DateStamp = dest.DateStamp
AND source.TimeStamp = dest.TimeStamp + 1
Mark,
Here is how you can do this with row_number and PIVOT. With a clustered index or primary key on the columns as I suggest, it will use a straight-line query plan with no Sort operation, thus be particularly efficient.
create table T(
Warehouse char,
DateStamp datetime,
TimeStamp int,
ItemNumber varchar(10),
ID int,
primary key(ItemNumber,DateStamp,ID,TimeStamp)
);
insert into T values ('A','20090801','10001','abc','1');
insert into T values ('B','20090801','10002','abc','1');
insert into T values ('A','20090803','12144','qrs','5');
insert into T values ('C','20090803','12143','qrs','5');
insert into T values ('D','20090805','6754','xyz','6');
insert into T values ('B','20090805','6755','xyz','6');
with Tpaired(Warehouse,DateStamp,TimeStamp,ItemNumber,ID,rk) as (
select
Warehouse,DateStamp,TimeStamp,ItemNumber,ID,
row_number() over (
partition by ItemNumber,DateStamp,ID
order by TimeStamp
)
from T
)
select
max([1]) as Warehouse_Source,
max([2]) as Warehouse_Destination,
ItemNumber,
DateStamp
from Tpaired
pivot (
max(Warehouse) for rk in ([1],[2])
) as P
group by ItemNumber, DateStamp, ID;
go
drop table T;