My query is as follows
SELECT
LEFT(TimePeriod,6) Period, -- string field with YYYYMMDD
SUM(Value) Value
FROM
f_Trans_GL
WHERE
Account = 228
GROUP BY
TimePeriod
And it returns
Period Value
---------------
201412 80
201501 20
201502 30
201506 50
201509 100
201509 100
I'd like to know the Value difference between rows where the period is 1 month apart. The calculation being [value period] - [value period-1].
The desired output being;
Period Value Calculated
-----------------------------------
201412 80 80 - null = 80
201501 20 20 - 80 = -60
201502 30 30 - 20 = 10
201506 50 50 - null = 50
201509 100 (100 + 100) - null = 200
This illustrates a second challenge, as the period needs to be evaluated if the year changes (the difference between 201501 and 201412 is one month).
And the third challenge being a duplicate Period (201509), in which case the sum of that period needs to be evaluated.
Any indicators on where to begin, if this is possible, would be great!
Thanks in advance
===============================
After I accepted the answer, I tailored this a little to suit my needs, the end result is:
WITH cte
AS (SELECT
ISNULL(CAST(TransactionID AS nvarchar), '_nullTransactionId_') + ISNULL(Description, '_nullDescription_') + CAST(Account AS nvarchar) + Category + Currency + Entity + Scenario AS UID,
LEFT(TimePeriod, 6) Period,
SUM(Value1) Value1,
CAST(LEFT(TimePeriod, 6) + '01' AS date) ord_date
FROM MyTestTable
GROUP BY LEFT(TimePeriod, 6),
TransactionID,
Description,
Account,
Category,
Currency,
Entity,
Scenario,
TimePeriod)
SELECT
a.UID,
a.Period,
--a.Value1,
ISNULL(a.Value1, 0) - ISNULL(b.Value1, 0) Periodic
FROM cte a
LEFT JOIN cte b
ON a.ord_date = DATEADD(MONTH, 1, b.ord_date)
ORDER BY a.UID
I have to get the new value (Periodic) for each UID. This UID must be determined as done here because the PK on the table won't work.
But the issue is that this will return many more rows than I actually have to begin with in my table. If I don't add a GROUP BY and ORDER by UID (as done above), I can tell that the first result for each combination of UID and Period is actually correct, the subsequent rows for that combination, are not.
I'm not sure where to look for a solution, my guess is that the UID is the issue here, and that it will somehow iterate over the field... any direction appreciated.
As pointed by other, first mistake is in Group by you need to Left(timeperiod, 6) instead of timeperiod.
For remaining calculation try something like this
;WITH cte
AS (SELECT LEFT(timeperiod, 6) Period,
Sum(value) Value,
Cast(LEFT(timeperiod, 6) + '01' AS DATE) ord_date
FROM f_trans_gl
WHERE account = 228
GROUP BY LEFT(timeperiod, 6))
SELECT a.period,
a.value,
a.value - Isnull(b.value, 0)
FROM cte a
LEFT JOIN cte b
ON a.ord_date = Dateadd(month, 1, b.ord_date)
If you are using SQL SERVER 2012 then this can be easily done using LAG analytic function
Using a derived table, you can join the data to itself to find rows that are in the preceding period. I have converted your Period to a Date value so you can use SQL Server's dateadd function to check for rows in the previous month:
;WITH cte AS
(
SELECT
LEFT(TimePeriod,6) Period, -- string field with YYYYMMDD
CAST(TimePeriod + '01' AS DATE) PeriodDate
SUM(Value) Value
FROM f_Trans_GL
WHERE Account = 228
GROUP BY LEFT(TimePeriod,6)
)
SELECT c1.Period,
c1.Value,
c1.Value - ISNULL(c2.Value,0) AS Calculation
FROM cte c1
LEFT JOIN cte c2
ON c1.PeriodDate = DATEADD(m,1,c2.PeriodDate)
Without cte, you can also try something like this
SELECT A.Period,A.Value,A.Value-ISNULL(B.Value) Calculated
FROM
(
SELECT LEFT(TimePeriod,6) Period
DATEADD(M,-1,(CONVERT(date,LEFT(TimePeriod,6)+'01'))) PeriodDatePrev,SUM(Value) Value
FROM f_Trans_GL
WHERE Account = 228
GROUP BY LEFT(TimePeriod,6)
) AS A
LEFT OUTER JOIN
(
SELECT LEFT(TimePeriod,6) Period
(CONVERT(date,LEFT(TimePeriod,6)+'01')) PeriodDate,SUM(Value) Value
FROM f_Trans_GL
WHERE Account = 228
GROUP BY LEFT(TimePeriod,6)
) AS B
ON (A.PeriodDatePrev = B.PeriodDate)
ORDER BY 1
Related
I'm creating a report in a SQL Server database. I will show it's code first and then describe what it does and where is problem.
SELECT
COUNT(e.flowid) AS [count],
t.name AS [process],
CAST(DATEPART(YEAR, e.dtcr) AS VARCHAR) + '-' + CAST(RIGHT('0' + RTRIM(DATEPART(MONTH, e.dtcr)), 2) AS VARCHAR) + '-' + CAST(RIGHT('0' + RTRIM(DATEPART(DAY, e.dtcr)), 2) AS VARCHAR) AS [day]
FROM
dbo.[Event] e
JOIN
dbo.Flow f ON e.flowid = f.id
JOIN
dbo.WorkOrder o ON f.workorderno = o.number
AND o.treenodeid IN (26067, 26152, 2469, 1815, 1913) -- only from requested processes
JOIN
dbo.TreeNode t ON o.treenodeid = t.id -- for process name in select statement
JOIN
dbo.Product p ON f.productid = p.id
AND p.materialid NOT IN (26094, 27262, 27515, 27264, 28192, 28195, 26090, 26092, 26093, 27065, 26969, 27471, 28351, 28353, 28356, 28976, 27486, 29345, 29346, 27069, 28653, 28654, 26735, 26745, 28686) -- exclude unwanted family codes
WHERE
e.pass = 1 -- only passed units
AND e.treenodeid IN (9036, 9037, 9038, 9039, 12594, 26330) -- only from requested events
AND e.dtcr BETWEEN '2015-12-01 00:00:00.000' AND '2016-05-31 23:59:59.999' -- only from requested time interval
GROUP BY
DATEPART(YEAR, e.dtcr), DATEPART(MONTH, e.dtcr), DATEPART(DAY, e.dtcr), t.name
ORDER BY
[day]
What query does is count units that passed specific events in a time periods (with some filters).
Important tables are:
Event - basically log for units passing specific events.
Product - list of units.
Output is something like this:
COUNT PROCESS DAY
71 Process-1 2015-12-01
1067 Process-2 2015-12-01
8 Process-3 2015-12-01
3 Process-4 2015-12-01
15 Process-1 2015-12-02
276 Process-2 2015-12-02
47 Process-3 2015-12-02
54 Process-4 2015-12-02
It does well but there is an issue. In some specific cases unit can pass same event several times and this query counts every such passing. I need to count every unit only once.
"Duplicated" records are in Event table. They have different dates and ids. Same for all records I need to count only once is flowid. Is there any simple way to achieve this?
Thank you for your time and answers!
To count each flowid only once, do count(distinct flowid), i.e.
SELECT
COUNT(distinct e.flowid) AS [count],
t.name AS [process],
CAST(DATEPART(YEAR, e.dtcr) AS VARCHAR) + '-' + CAST(RIGHT('0' + RTRIM(DATEPART(MONTH, e.dtcr)), 2) AS VARCHAR) + '-' + CAST(RIGHT('0' + RTRIM(DATEPART(DAY, e.dtcr)), 2) AS VARCHAR) AS [day]
FROM
...
It sounds like you need the first time that something passes the threshold. You can get the first time using row_number(). This can be tricky with the additional conditions on the query. This modification might work for you:
select sum(case when seqnum = 1 then 1 else 0 end) as cnt,
. . .
from (select e.*,
row_number() over (partition by eventid order by e.dtcr) as seqnum
from event e
where e.pass = 1 and -- only passed units
e.treenodeid IN (9036, 9037, 9038, 9039, 12594, 26330) and
e.dtcr >= '2015-12-01' AND e.dtcr < '2016-06-01'
) e join
. . .
You don't specify how the same event is identified for the duplicates. The above uses eventid for this purpose.
I want to count the number of 2 or more consecutive week periods that have negative values within a range of weeks.
Example:
Week | Value
201301 | 10
201302 | -5 <--| both weeks have negative values and are consecutive
201303 | -6 <--|
Week | Value
201301 | 10
201302 | -5
201303 | 7
201304 | -2 <-- negative but not consecutive to the last negative value in 201302
Week | Value
201301 | 10
201302 | -5
201303 | -7
201304 | -2 <-- 1st group of negative and consecutive values
201305 | 0
201306 | -12
201307 | -8 <-- 2nd group of negative and consecutive values
Is there a better way of doing this other than using a cursor and a reset variable and checking through each row in order?
Here is some of the SQL I have setup to try and test this:
IF OBJECT_ID('TempDB..#ConsecutiveNegativeWeekTestOne') IS NOT NULL DROP TABLE #ConsecutiveNegativeWeekTestOne
IF OBJECT_ID('TempDB..#ConsecutiveNegativeWeekTestTwo') IS NOT NULL DROP TABLE #ConsecutiveNegativeWeekTestTwo
CREATE TABLE #ConsecutiveNegativeWeekTestOne
(
[Week] INT NOT NULL
,[Value] DECIMAL(18,6) NOT NULL
)
-- I have a condition where I expect to see at least 2 consecutive weeks with negative values
-- TRUE : Week 201328 & 201329 are both negative.
INSERT INTO #ConsecutiveNegativeWeekTestOne
VALUES
(201327, 5)
,(201328,-11)
,(201329,-18)
,(201330, 25)
,(201331, 30)
,(201332, -36)
,(201333, 43)
,(201334, 50)
,(201335, 59)
,(201336, 0)
,(201337, 0)
SELECT * FROM #ConsecutiveNegativeWeekTestOne
WHERE Value < 0
ORDER BY [Week] ASC
CREATE TABLE #ConsecutiveNegativeWeekTestTwo
(
[Week] INT NOT NULL
,[Value] DECIMAL(18,6) NOT NULL
)
-- FALSE: The negative weeks are not consecutive
INSERT INTO #ConsecutiveNegativeWeekTestTwo
VALUES
(201327, 5)
,(201328,-11)
,(201329,20)
,(201330, -25)
,(201331, 30)
,(201332, -36)
,(201333, 43)
,(201334, 50)
,(201335, -15)
,(201336, 0)
,(201337, 0)
SELECT * FROM #ConsecutiveNegativeWeekTestTwo
WHERE Value < 0
ORDER BY [Week] ASC
My SQL fiddle is also here:
http://sqlfiddle.com/#!3/ef54f/2
First, would you please share the formula for calculating week number, or provide a real date for each week, or some method to determine if there are 52 or 53 weeks in any particular year? Once you do that, I can make my queries properly skip missing data AND cross year boundaries.
Now to queries: this can be done without a JOIN, which depending on the exact indexes present, may improve performance a huge amount over any solution that does use JOINs. Then again, it may not. This is also harder to understand so may not be worth it if other solutions perform well enough (especially when the right indexes are present).
Simulate a PREORDER BY windowing function (respects gaps, ignores year boundaries):
WITH Calcs AS (
SELECT
Grp =
[Week] -- comment out to ignore gaps and gain year boundaries
-- Row_Number() OVER (ORDER BY [Week]) -- swap with previous line
- Row_Number() OVER
(PARTITION BY (SELECT 1 WHERE Value < 0) ORDER BY [Week]),
*
FROM dbo.ConsecutiveNegativeWeekTestOne
)
SELECT
[Week] = Min([Week])
-- NumWeeks = Count(*) -- if you want the count
FROM Calcs C
WHERE Value < 0
GROUP BY C.Grp
HAVING Count(*) >= 2
;
See a Live Demo at SQL Fiddle (1st query)
And another way, simulating LAG and LEAD with a CROSS JOIN and aggregates (respects gaps, ignores year boundaries):
WITH Groups AS (
SELECT
Grp = T.[Week] + X.Num,
*
FROM
dbo.ConsecutiveNegativeWeekTestOne T
CROSS JOIN (VALUES (-1), (0), (1)) X (Num)
)
SELECT
[Week] = Min(C.[Week])
-- Value = Min(C.Value)
FROM
Groups G
OUTER APPLY (SELECT G.* WHERE G.Num = 0) C
WHERE G.Value < 0
GROUP BY G.Grp
HAVING
Min(G.[Week]) = Min(C.[Week])
AND Max(G.[Week]) > Min(C.[Week])
;
See a Live Demo at SQL Fiddle (2nd query)
And, my original second query, but simplified (ignores gaps, handles year boundaries):
WITH Groups AS (
SELECT
Grp = (Row_Number() OVER (ORDER BY T.[Week]) + X.Num) / 3,
*
FROM
dbo.ConsecutiveNegativeWeekTestOne T
CROSS JOIN (VALUES (0), (2), (4)) X (Num)
)
SELECT
[Week] = Min(C.[Week])
-- Value = Min(C.Value)
FROM
Groups G
OUTER APPLY (SELECT G.* WHERE G.Num = 2) C
WHERE G.Value < 0
GROUP BY G.Grp
HAVING
Min(G.[Week]) = Min(C.[Week])
AND Max(G.[Week]) > Min(C.[Week])
;
Note: The execution plan for these may be rated as more expensive than other queries, but there will be only 1 table access instead of 2 or 3, and while the CPU may be higher it is still respectably low.
Note: I originally was not paying attention to only producing one row per group of negative values, and so I produced this query as only requiring 2 table accesses (respects gaps, ignores year boundaries):
SELECT
T1.[Week]
FROM
dbo.ConsecutiveNegativeWeekTestOne T1
WHERE
Value < 0
AND EXISTS (
SELECT *
FROM dbo.ConsecutiveNegativeWeekTestOne T2
WHERE
T2.Value < 0
AND T2.[Week] IN (T1.[Week] - 1, T1.[Week] + 1)
)
;
See a Live Demo at SQL Fiddle (3rd query)
However, I have now modified it to perform as required, showing only each starting date (respects gaps, ignored year boundaries):
SELECT
T1.[Week]
FROM
dbo.ConsecutiveNegativeWeekTestOne T1
WHERE
Value < 0
AND EXISTS (
SELECT *
FROM
dbo.ConsecutiveNegativeWeekTestOne T2
WHERE
T2.Value < 0
AND T1.[Week] - 1 <= T2.[Week]
AND T1.[Week] + 1 >= T2.[Week]
AND T1.[Week] <> T2.[Week]
HAVING
Min(T2.[Week]) > T1.[Week]
)
;
See a Live Demo at SQL Fiddle (3rd query)
Last, just for fun, here is a SQL Server 2012 and up version using LEAD and LAG:
WITH Weeks AS (
SELECT
PrevValue = Lag(Value, 1, 0) OVER (ORDER BY [Week]),
SubsValue = Lead(Value, 1, 0) OVER (ORDER BY [Week]),
PrevWeek = Lag(Week, 1, 0) OVER (ORDER BY [Week]),
SubsWeek = Lead(Week, 1, 0) OVER (ORDER BY [Week]),
*
FROM
dbo.ConsecutiveNegativeWeekTestOne
)
SELECT #Week = [Week]
FROM Weeks W
WHERE
(
[Week] - 1 > PrevWeek
OR PrevValue >= 0
)
AND Value < 0
AND SubsValue < 0
AND [Week] + 1 = SubsWeek
;
See a Live Demo at SQL Fiddle (4th query)
I am not sure I am doing this the best way as I haven't used these much, but it works nonetheless.
You should do some performance testing of the various queries presented to you, and pick the best one, considering that code should be, in order:
Correct
Clear
Concise
Fast
Seeing that some of my solutions are anything but clear, other solutions that are fast enough and concise enough will probably win out in the competition of which one to use in your own production code. But... maybe not! And maybe someone will appreciate seeing these techniques, even if they can't be used as-is this time.
So let's do some testing and see what the truth is about all this! Here is some test setup script. It will generate the same data on your own server as it did on mine:
IF Object_ID('dbo.ConsecutiveNegativeWeekTestOne', 'U') IS NOT NULL DROP TABLE dbo.ConsecutiveNegativeWeekTestOne;
GO
CREATE TABLE dbo.ConsecutiveNegativeWeekTestOne (
[Week] int NOT NULL CONSTRAINT PK_ConsecutiveNegativeWeekTestOne PRIMARY KEY CLUSTERED,
[Value] decimal(18,6) NOT NULL
);
SET NOCOUNT ON;
DECLARE
#f float = Rand(5.1415926535897932384626433832795028842),
#Dt datetime = '17530101',
#Week int;
WHILE #Dt <= '20140106' BEGIN
INSERT dbo.ConsecutiveNegativeWeekTestOne
SELECT
Format(#Dt, 'yyyy') + Right('0' + Convert(varchar(11), DateDiff(day, DateAdd(year, DateDiff(year, 0, #Dt), 0), #Dt) / 7 + 1), 2),
Rand() * 151 - 76
;
SET #Dt = DateAdd(day, 7, #Dt);
END;
This generates 13,620 weeks, from 175301 through 201401. I modified all the queries to select the Week values instead of the count, in the format SELECT #Week = Expression ... so that tests are not affected by returning rows to the client.
I tested only the gap-respecting, non-year-boundary-handling versions.
Results
Query Duration CPU Reads
------------------ -------- ----- ------
ErikE-Preorder 27 31 40
ErikE-CROSS 29 31 40
ErikE-Join-IN -------Awful---------
ErikE-Join-Revised 46 47 15069
ErikE-Lead-Lag 104 109 40
jods 12 16 120
Transact Charlie 12 16 120
Conclusions
The reduced reads of the non-JOIN versions are not significant enough to warrant their increased complexity.
The table is so small that the performance almost doesn't matter. 261 years of weeks is insignificant, so a normal business operation won't see any performance problem even with a poor query.
I tested with an index on Week (which is more than reasonable), doing two separate JOINs with a seek was far, far superior to any device to try to get the relevant related data in one swoop. Charlie and jods were spot on in their comments.
This data is not large enough to expose real differences between the queries in CPU and duration. The values above are representative, though at times the 31 ms were 16 ms and the 16 ms were 0 ms. Since the resolution is ~15 ms, this doesn't tell us much.
My tricky query techniques do perform better. They might be worth it in performance critical situations. But this is not one of those.
Lead and Lag may not always win. The presence of an index on the lookup value is probably what determines this. The ability to still pull prior/next values based on a certain order even when the order by value is not sequential may be one good use case for these functions.
you could use a combination of EXISTS.
Assuming you only want to know groups (series of consecutive weeks all negative)
--Find the potential start weeks
;WITH starts as (
SELECT [Week]
FROM #ConsecutiveNegativeWeekTestOne AS s
WHERE s.[Value] < 0
AND NOT EXISTS (
SELECT 1
FROM #ConsecutiveNegativeWeekTestOne AS p
WHERE p.[Week] = s.[Week] - 1
AND p.[Value] < 0
)
)
SELECT COUNT(*)
FROM
Starts AS s
WHERE EXISTS (
SELECT 1
FROM #ConsecutiveNegativeWeekTestOne AS n
WHERE n.[Week] = s.[Week] + 1
AND n.[Value] < 0
)
If you have an index on Week this query should even be moderately efficient.
You can replace LEAD and LAG with a self-join.
The counting idea is basically to count start of negative sequences rather than trying to consider each row.
SELECT COUNT(*)
FROM ConsecutiveNegativeWeekTestOne W
LEFT OUTER JOIN ConsecutiveNegativeWeekTestOne Prev
ON W.week = Prev.week + 1
INNER JOIN ConsecutiveNegativeWeekTestOne Next
ON W.week = Next.week - 1
WHERE W.value < 0
AND (Prev.value IS NULL OR Prev.value > 0)
AND Next.value < 0
Note that I simply did "week + 1", which would not work when there is a year change.
For a development aid project I am helping a small town in Nicaragua improving their water-network-administration.
There are about 150 households and every month a person checks the meter and charges the houshold according to the consumed water (reading from this month minus reading from last month). Today all is done on paper and I would like to digitalize the administration to avoid calculation-errors.
I have an MS Access Table in mind - e.g.:
*HousholdID* *Date* *Meter*
0 1/1/2013 100
1 1/1/2013 130
0 1/2/2013 120
1 1/2/2013 140
...
From this data I would like to create a query that calculates the consumed water (the meter-difference of one household between two months)
*HouseholdID* *Date* *Consumption*
0 1/2/2013 20
1 1/2/2013 10
...
Please, how would I approach this problem?
This query returns every date with previous date, even if there are missing months:
SELECT TabPrev.*, Tab.Meter as PrevMeter, TabPrev.Meter-Tab.Meter as Diff
FROM (
SELECT
Tab.HousholdID,
Tab.Data,
Max(Tab_1.Data) AS PrevData,
Tab.Meter
FROM
Tab INNER JOIN Tab AS Tab_1 ON Tab.HousholdID = Tab_1.HousholdID
AND Tab.Data > Tab_1.Data
GROUP BY Tab.HousholdID, Tab.Data, Tab.Meter) As TabPrev
INNER JOIN Tab
ON TabPrev.HousholdID = Tab.HousholdID
AND TabPrev.PrevData=Tab.Data
Here's the result:
HousholdID Data PrevData Meter PrevMeter Diff
----------------------------------------------------------
0 01/02/2013 01/01/2013 120 100 20
1 01/02/2013 01/01/2012 140 130 10
The query above will return every delta, for every households, for every month (or for every interval). If you are just interested in the last delta, you could use this query:
SELECT
MaxTab.*,
TabCurr.Meter as CurrMeter,
TabPrev.Meter as PrevMeter,
TabCurr.Meter-TabPrev.Meter as Diff
FROM ((
SELECT
Tab.HousholdID,
Max(Tab.Data) AS CurrData,
Max(Tab_1.Data) AS PrevData
FROM
Tab INNER JOIN Tab AS Tab_1
ON Tab.HousholdID = Tab_1.HousholdID
AND Tab.Data > Tab_1.Data
GROUP BY Tab.HousholdID) As MaxTab
INNER JOIN Tab TabPrev
ON TabPrev.HousholdID = MaxTab.HousholdID
AND TabPrev.Data=MaxTab.PrevData)
INNER JOIN Tab TabCurr
ON TabCurr.HousholdID = MaxTab.HousholdID
AND TabCurr.Data=MaxTab.CurrData
and (depending on what you are after) you could only filter current month:
WHERE
DateSerial(Year(CurrData), Month(CurrData), 1)=
DateSerial(Year(DATE()), Month(DATE()), 1)
this way if you miss a check for a particular household, it won't show.
Or you might be interested in showing last month present in the table (which can be different than current month):
WHERE
DateSerial(Year(CurrData), Month(CurrData), 1)=
(SELECT MAX(DateSerial(Year(Data), Month(Data), 1))
FROM Tab)
(here I am taking in consideration the fact that checks might be on different days)
I think the best approach is to use a correlated subquery to get the previous date and join back to the original table. This ensures that you get the previous record, even if there is more or less than a 1 month lag.
So the right query looks like:
select t.*, tprev.date, tprev.meter
from (select t.*,
(select top 1 date from t t2 where t2.date < t.date order by date desc
) prevDate
from t
) join
t tprev
on tprev.date = t.prevdate
In an environment such as the one you describe, it is very important not to make assumptions about the frequency of reading the meter. Although they may be read on average once per month, there will always be exceptions.
Testing with the following data:
HousholdID Date Meter
0 01/12/2012 100
1 01/12/2012 130
0 01/01/2013 120
1 01/01/2013 140
0 01/02/2013 120
1 01/02/2013 140
The following query:
SELECT a.housholdid,
a.date,
b.date,
a.meter,
b.meter,
a.meter - b.meter AS Consumption
FROM (SELECT *
FROM water
WHERE Month([date]) = Month(Date())
AND Year([date])=year(Date())) a
LEFT JOIN (SELECT *
FROM water
WHERE DateSerial(Year([date]),Month([date]),Day([date]))
=DateSerial(Year(Date()),Month(Date())-1,Day([date])) ) b
ON a.housholdid = b.housholdid
The above query selects the records for this month Month([date]) = Month(Date()) and compares them to records for last month ([date]) = Month(Date()) - 1)
Please do not use Date as a field name.
Returns the following result.
housholdid a.date b.date a.meter b.meter Consumption
0 01/02/2013 01/01/2013 120 100 20
1 01/02/2013 01/01/2013 140 130 10
Try
select t.householdID
, max(s.theDate) as billingMonth
, max(s.meter)-max(t.meter) as waterUsed
from myTbl t join (
select householdID, max(theDate) as theDate, max(meter) as meter
from myTbl
group by householdID ) s
on t.householdID = s.householdID and t.theDate <> s.theDate
group by t.householdID
This works in SQL not sure about access
You can use the LAG() function in certain SQL dialects. I found this to be much faster and easier to read than joins.
Source: http://blog.jooq.org/2015/05/12/use-this-neat-window-function-trick-to-calculate-time-differences-in-a-time-series/
I have the following table and rows defined in SQLFiddle
I need to select rows from products table where difference between two rows start_date and
nvl(return_date,end_date) is 1. i.e. start_date of current row and nvl(return_date,end_date) of previous row should be one
For example
PRODUCT_NO TSH098 and PRODUCT_REG_NO FLDG, the END_DATE is August, 15 2012 and
PRODUCT_NO TSH128 and PRODUCT_REG_NO FLDG start_date is August, 16 2012, so the difference is only of a day.
How can I get the desired output using sql.
Any help is highly appreciable.
Thanks
You can use lag analytical function to get access to a row at a given physical offset prior to the current position. According to your sorting order it might look like this(not so elegant though).
select *
from products p
join (select *
from(select p.Product_no
, p.Product_Reg_No
, case
when (lag(start_date, 1, start_date) over(order by product_reg_no)-
nvl(return_date, end_date)) = 1
then lag(start_date, 1, start_date)
over(order by product_reg_no)
end start_date
, End_Date
, Return_Date
from products p
order by 2,1 desc
)
where start_date is not null
) s
on (p.start_date = s.start_date or p.end_date = s.end_date)
order by 2, 1 desc
SQL FIddle DEMO
In SQL, date + X adds X days to the date. So you can:
select *
from products
where start_date + 1 = nvl(end_date, return_date)
If the dates could contain a time part, use trunc to remove the time part:
select *
from products
where trunc(start_date) + 1 = trunc(nvl(end_date, return_date))
Live example at SQL Fiddle.
I am under the impression you only want the matching dates differing by 1 day if the product reg no matches. So I simply joint it and I think this is what you want
select p1.product_reg_no,
p1.product_no product_no_1,
p2.product_no product_no_2,
p1.start_date start_date_1,
nvl(p2.return_date,p2.end_date) return_or_end_date_2
from products p1
join products p2 on (p1.product_reg_no = p2.product_reg_no)
where p1.start_date-1 = nvl(p2.return_date,p2.end_date)
SQL Fiddle
If I was wrong with the grouping then just leave the join condition away which with the given example products table brings the same result
select p1.product_reg_no,
p1.product_no product_no_1,
p2.product_no product_no_2,
p1.start_date start_date_1,
nvl(p2.return_date,p2.end_date) return_or_end_date_2
from products p1, products p2
where p1.start_date-1 = nvl(p2.return_date,p2.end_date)
SQL Fiddle 2
Now you say the difference is 1 day. I automatically assumed that start_date is 1 day higher than the nvl(return_date,end_date). Also I assumed that the date is always midnight. But to have all that also excluded you can work with trunc and go in both directions:
select p1.product_reg_no,
p1.product_no product_no_1,
p2.product_no product_no_2,
p1.start_date start_date_1,
nvl(p2.return_date,p2.end_date) return_or_end_date_2
from products p1, products p2
where trunc(p1.start_date)-1 = trunc(nvl(p2.return_date,p2.end_date))
or trunc(p1.start_date)+1 = trunc(nvl(p2.return_date,p2.end_date))
SQL Fiddle 3
And this all works because dates (not timestamp) can be calculated by adding and subtracting.
EDIT: Following your comment you want return_date or end_date to be compared and equal dates are also wanted:
select p1.product_reg_no,
p1.product_no product_no_1,
p2.product_no product_no_2,
p1.start_date start_date_1,
p2.return_date return_date_2,
p2.end_date end_date_2
from products p1, products p2
where trunc(p1.start_date) = trunc(p2.return_date)
or trunc(p1.start_date)-1 = trunc(p2.return_date)
or trunc(p1.start_date)+1 = trunc(p2.return_date)
or trunc(p1.start_date) = trunc(p2.end_date)
or trunc(p1.start_date)-1 = trunc(p2.end_date)
or trunc(p1.start_date)+1 = trunc(p2.end_date)
SQL Fiddle 4
The way to compare the current row with the previous row is to user the LAG() function. Something like this:
select * from
(
select p.*
, lag (end_date) over
(order by start_date )
as prev_end_date
, lag (return_date) over
(order by start_date )
as prev_return_date
from products p
)
where (trunc(start_date) - 1) = trunc(nvl(prev_return_date, prev_end_date))
order by 2,1 desc
However, this will not return the results you desire, because you have not defined a mechanism for defining a sort order. And without a sort order the concept of "previous row" is meaningless.
However, what you can do is this:
select p1.*
, p2.*
from products p1 cross join products p2
where (trunc(p2.start_date) - 1) = trunc(nvl(p1.return_date, p1.end_date))
order by 2, 1 desc
This SQL queries your table twice, filtering on the basis of dates. The each row in the result set contains a record from each table. If a given start_date matches more than one end_date or vice versa you will get records for multiple hits.
You mean like this?
SELECT T2.*
FROM PRODUCTS T1
JOIN PRODUCTS T2 ON (
nvl(T1.end_date, T1.return_date) + 1 = T2.start_date
);
In your SQL Fiddle example, it returns:
PRODUCT_NO PRODUCT_REG_NO START_DATE END_DATE RETURN_DATE
TSH128 FLDG August, 16 2012 00:00:00-0400 September, 15 2012 00:00:00-0400 (null)
TSH125 SCRW August, 08 2012 00:00:00-0400 September, 07 2012 00:00:00-0400 (null)
TSH137 SCRW September, 08 2012 00:00:00-0400 October, 07 2012 00:00:00-0400 (null)
TSH128 is returned for the reasons you already explained.
TSH125 is returned because TSH116 end_date is August, 07 2012.
TSH137 is returned because TSH125 end_date is September, 07 2012.
If you want to compare only rows within the same product_reg_no, it's easy to add that to the JOIN condition. If you want both "directions" of the 1-day difference, it's easy to add that too.
I have an order file, with order id and ship date. Orders can only be shipped monday - friday. This means there are no records selected for Saturday and Sunday.
I use the same order file to get all order dates, with date in the same format (yyyymmdd).
i want to select a count of all the records from the order file based on order date... and (i believe) full outer join (or maybe right join?) the date file... because i would like to see
20120330 293
20120331 0
20120401 0
20120402 920
20120403 430
20120404 827
etc...
however, my sql statement is still not returning a zero record for the 31st and 1st.
with DatesTable as (
select ohordt "Date" from kivalib.orhdrpf
where ohordt between 20120315 and 20120406
group by ohordt order by ohordt
)
SELECT ohscdt, count(OHTXN#) "Count"
FROM KIVALIB.ORHDRPF full outer join DatesTable dts on dts."Date" = ohordt
--/*order status = filled & order type = 1 & date between (some fill date range)*/
WHERE OHSTAT = 'F' AND OHTYP = 1 and ohscdt between 20120401 and 20120406
GROUP BY ohscdt ORDER BY ohscdt
any ideas what i'm doing wrong?
thanks!
It's because there is no data for those days, they do not show up as rows. You can use a recursive CTE to build a contiguous list of dates between two values that the query can join on:
It will look something like:
WITH dates (val) AS (
SELECT CAST('2012-04-01' AS DATE)
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT Val + 1 DAYS
FROM dates
WHERE Val < CAST('2012-04-06' AS DATE)
)
SELECT d.val AS "Date", o.ohscdt, COALESCE(COUNT(o.ohtxn#), 0) AS "Count"
FROM dates AS d
LEFT JOIN KIVALIB.ORDHRPF AS o
ON o.ohordt = TO_CHAR(d.val, 'YYYYMMDD')
WHERE o.ohstat = 'F'
AND o.ohtyp = 1