Oracle SQL: Using LAG when the current row is missing - sql

I have a table from which I'm trying to extract information using a LAG function.
Type
Date
Value
A
01
1
A
02
2
B
01
3
I'm trying to get lines by Type with the Value from this month and the month before that, so ideally:
Type
Date
Value M
Value M-1
A
02
2
1
B
02
0
3
SELECT
Type,
Date,
Value as Value M,
LAG (Value,1,0) over(PARTITION BY Type ORDER BY Date) as Value M-1
FROM Table
Except that, of course, because there is no line for Type B and Month 02, I don't get a line for Type B.
Do you have any suggestions?

A simple lag probably won't do the trick, because you need to construct a record for the last month if it doesn't exist for a given type. If your date is stored as an integer as in the sample data, a pattern like this is something to consider. If it's stored as a date, you'll need to have some kind of ranking baked in to the join or extract(month from date) - 1 (being careful about January), but this should give the gist.
WITH TYPE_LATEST_MONTH AS
( SELECT DISTINCT
TYPE,
(SELECT MAX(DATE) FROM TABLE) AS LATEST_MONTH
FROM TABLE
)
SELECT TLM.TYPE,
TLM.LATEST_MONTH AS DATE,
COALESCE(TLM.VALUE_M, 0) AS VALUE_M,
COALESCE(TLM_PREV.VALUE_M, 0) AS VALUE_M_MINUS_1
FROM TYPE_LATEST_MONTH TLM
LEFT
JOIN TABLE TBL
ON TLM.Type = TBL.Type
AND TLM.LATEST_MONTH = TBL.DATE
LEFT
JOIN TABLE TBL_PREV
ON TLM.Type = TBL_PREV.Type
AND TLM.LATEST_MONTH = TBL_PREV.DATE - 1

Related

The nearest row in the other table

One table is a sample of users and their purchases.
Structure:
Email | NAME | TRAN_DATETIME (Varchar)
So we have customer email + FirstName&LastName + Date of transaction
and the second table that comes from second system contains all users, they sensitive data and when they got registered in our system.
Simplified Structure:
Email | InstertDate (varchar)
My task is to count minutes difference between the rows insterted from sale(first table)and the rows with users and their sensitive data.
The issue is that second table contain many rows and I want to find the nearest in time row that was inserted in 2nd table, because sometimes it may be a few minutes difeerence(delay or opposite of delay)and sometimes it can be a few days.
So for x email I have row in 1st table:
E_MAIL NAME TRAN_DATETIME
p****#****.eu xxx xxx 2021-10-04 00:03:09.0000000
But then I have 3 rows and the lastest is the one I want to count difference
Email InstertDate
p****#****.eu 2021-05-20 19:12:07
p****#****.eu 2021-05-20 19:18:48
p****#****.eu 2021-10-03 18:32:30 <--
I wrote that some query, but I have no idea how to match nearest row in the 2nd table
SELECT DISTINCT TOP (100)
,a.[E_MAIL]
,a.[NAME]
,a.[TRAN_DATETIME]
,CASE WHEN b.EMAIL IS NOT NULL THEN 'YES' ELSE 'NO' END AS 'EXISTS'
,(ABS(CONVERT(INT, CONVERT(Datetime,LEFT(a.[TRAN_DATETIME],10),120))) - CONVERT(INT, CONVERT(Datetime,LEFT(b.[INSERTDATE],10),120))) as 'DateAccuracy'
FROM [crm].[SalesSampleTable] a
left join [crm].[SensitiveTable] b on a.[E_MAIL]) = b.[EMAIL]
Totally untested: I'd need sample data and database the area of suspect is the casting of dates and the datemath.... since I dont' know what RDBMS and version this is.. consider the following "pseudo code".
We assign a row number to the absolute difference in seconds between the dates those with rowID of 1 win.
WTIH CTE AS (
SELECT A.*, B.* row_number() over (PARTITION BY A.e_mail
ORDER BY abs(datediff(second, cast(Tran_dateTime as Datetime), cast(InsterDate as DateTime)) desc) RN
FROM [crm].[SalesSampleTable] a
LEFT JOIN [crm].[SensitiveTable] b
on a.[E_MAIL] = b.[EMAIL])
SELECT * FROM CTE WHERE RN = 1

SQL SELECT repeating rows from table for specific time interval

I have table and I want to find repeating rows for specific time interval (DATE is input parameter for SQL query) where it will list all rows with the same PERSON and TYPE value.
ID DATE PERSON TYPE
1 01.01.2017 PERSON1 TYPE1
2 02.02.2017 PERSON1 TYPE1
3 03.03.2017 PERSON2 TYPE1
4 04.04.2017 PERSON2 TYPE2
5 05.05.2017 PERSON2 TYPE1
6 06.06.2017 PERSON1 TYPE2
So for example if DATE is between 01.01 and 04.04 it should list me rows with ID 1 and 2.
If DATE is between 01.01 and 06.06 it should list me rows with ID 1, 2, 3 and 5 because 1 and 2 have the same person and type in that interval and 3 and 5 have the same person and type in that interval.
SELECT ID FROM TABLE
WHERE DATE>='01.01.2017' AND DATE<='06.06.2017'
but I am not sure even how to start to define this repeating clause based on PERSON and TYPE columns.
Maybe can INNER JOIN help with this if referencing the same table and matching those two columns and third column ID is different?: TABLE.PERSON=TABLE.PERSON and TABLE.TYPE=TABLE.TYPE and TABLE.ID!=TABLE.ID of course table is the same but different alias can be used for this?
Please try...
SELECT ID AS ID
FROM tableName
JOIN
(
SELECT person AS person,
type AS type,
COUNT( person ) AS countOfPair
FROM tableName
WHERE date BETWEEN startDate AND endDate
GROUP BY person,
type
) tempTable ON tableName.person = tempTable.person AND
tableName.type = tempTable.type
WHERE countOfPair >= 2
The inner SELECT gathers each combination of person and type in between your start and end dates (please replace startDate and endDate with however you are referencing those) and performs a count of them.
The outer SELECT statement's JOIN then has the effect of appending the count of each combination to the end of each row containing that combination. The outer SELECT then retrieves the ID from each row that has a repeated combination.
If you have any questions or comments, then please feel free to post a Comment accordingly.
You can try this (I don't know if your version has window analytic function):
(X is the name of your table)
SELECT Y.ID, Y.DATE, Y.PERSON, Y.TYPE
FROM (
SELECT *, COUNT(*) OVER (PARTITION BY PERSON, TYPE) AS RC
FROM X
WHERE DATE >='01.01.2017' AND DATE <='04.04.2017'
) Y
WHERE RC>1
Or this if it doesn't support them:
SELECT X.ID, X.DATE, X.PERSON, X.TYPE
FROM X
INNER JOIN (
SELECT PERSON, TYPE, COUNT(*) AS RC
FROM X
WHERE DATE >='01.01.2017' AND DATE <='04.04.2017'
GROUP BY PERSON, TYPE
) Y ON X.PERSON = Y.PERSON AND X.TYPE = Y.TYPE
WHERE RC>1
I suggest to use always appropriate conversion for date datatypes.
Another method would be:
SELECT a.id
FROM tablename a NATURAL JOIN
(SELECT person,type FROM tablename
WHERE date>='01.01.2017' AND date<='06.06.2017'
GROUP BY person, type HAVING COUNT(*)>1) b ;
The NATURAL JOIN would automatically use columns person and type.
Add "DISTINCT" clause to avoid redundancy
SELECT DISTINCT ID FROM TABLE
WHERE DATE>='01.01.2017' AND DATE<='06.06.2017'

How to divide two values from the different row

I have used this formula.
Quote change = (current month data / previous month data) * 100
Then my data stored on SQL SERVER table look like below :
id DATE DATA
1 2015/01/01 10
2 2015/02/01 20
3 2015/03/01 30
4 2015/04/01 40
5 2015/05/01 50
6 2015/06/01 60
7 2015/07/01 70
8 2015/08/01 80
9 2015/09/01 90
How can i implement this formula on SQL Function ?
For Example
current month is 2015/02/1
Quote change = (Current Month Data / Previous Month Data ) * 100
Quote change =( 15/10)*100
Then if current date is 2015/01/01. Because no data before 2015/01/01, I need to show 0 or #
Sql server 2012 have a window function called LAG that is very useful in situations like this.
Lag returns the value of a specific column in the previous row (specified by the order by part of the over clause).
Try this:
;With cte as
(
SELECT Id, Date, Data, LAG(Data) OVER(ORDER BY Date) As LastMonthData
FROM YourTable
)
SELECT Id,
Date,
Data,
CASE WHEN ISNULL(LastMonthData, 0) = 0 THEN 0 ELSE (Data/LastMonthData) * 100 END As Quote
FROM cte
I've used a CTE just so I wouldn't have to repeat the LAG twice.
The CASE expression is to prevent an exception in case the LastMonthData is 0 or null.
You can use inner join like mentioned below -
select a.*,isnull(cast(a.data/b.data as decimal(4,2))*100,0)
from TableA as a
inner join TableA as b
on b.date = dateadd(mm,-1,a.date)
Let me know if this helps

retrieve several values previous to several given dates

I got a values table such as:
id | user_id | value | date
---------------------------------
1 | 12 | 38 | 2014-04-05
2 | 15 | 19 | 2014-04-05
3 | 12 | 47 | 2014-04-08
I want to retrieve all values for given dates. However, if I don't have a value for one specific date, I want to get the previous available value. For instance, with the above dataset, if I query values for user 12 for dates 2014-04-07 and 2014-04-08, I need to retrieve 38 and 47.
I succeeded using two queries like:
SELECT *
FROM values
WHERE date <= $date
ORDER BY date DESC
LIMIT 1
However, it would require dates.length requests each time. So, I'm wondering if there is any more performant solution to retrieve all my values in a single request?
In general, you would use a VALUES clause to specify multiple values in a single query.
If you have only occasional dates missing (and thus no big gaps in dates between rows for any particular user_id) then this would be an elegant solution:
SELECT dt, coalesce(value, lag(value) OVER (ORDER BY dt)) AS value
FROM (VALUES ('2014-04-07'::date), ('2014-04-08')) AS dates(dt)
LEFT JOIN "values" ON "date" = dt AND user_id = 12;
The lag() window function picks the previous value if the current row does not have a value.
If, on the other hand, there may be big gaps, you need to do some more work:
SELECT DISTINCT dt, first_value(value) OVER (ORDER BY diff) AS value
FROM (
SELECT dt, value, dt - "date" AS diff
FROM (VALUES ('2014-04-07'::date), ('2014-04-08')) AS dates(dt)
CROSS JOIN "values"
WHERE user_id = 12) sub;
In this case a CROSS JOIN is made for user_id = 12 and differences between the dates in the VALUES clause and the table rows computed, in a sub-query. So every row has a value for field value. In the main query the value with the smallest difference is selected using the first_value() window function. Note that ordering on diff and picking the first row would not work here because you want values for multiple dates returned.

Joining next Sequential Row

I am planing an SQL Statement right now and would need someone to look over my thougts.
This is my Table:
id stat period
--- ------- --------
1 10 1/1/2008
2 25 2/1/2008
3 5 3/1/2008
4 15 4/1/2008
5 30 5/1/2008
6 9 6/1/2008
7 22 7/1/2008
8 29 8/1/2008
Create Table
CREATE TABLE tbstats
(
id INT IDENTITY(1, 1) PRIMARY KEY,
stat INT NOT NULL,
period DATETIME NOT NULL
)
go
INSERT INTO tbstats
(stat,period)
SELECT 10,CONVERT(DATETIME, '20080101')
UNION ALL
SELECT 25,CONVERT(DATETIME, '20080102')
UNION ALL
SELECT 5,CONVERT(DATETIME, '20080103')
UNION ALL
SELECT 15,CONVERT(DATETIME, '20080104')
UNION ALL
SELECT 30,CONVERT(DATETIME, '20080105')
UNION ALL
SELECT 9,CONVERT(DATETIME, '20080106')
UNION ALL
SELECT 22,CONVERT(DATETIME, '20080107')
UNION ALL
SELECT 29,CONVERT(DATETIME, '20080108')
go
I want to calculate the difference between each statistic and the next, and then calculate the mean value of the 'gaps.'
Thougts:
I need to join each record with it's subsequent row. I can do that using the ever flexible joining syntax, thanks to the fact that I know the id field is an integer sequence with no gaps.
By aliasing the table I could incorporate it into the SQL query twice, then join them together in a staggered fashion by adding 1 to the id of the first aliased table. The first record in the table has an id of 1. 1 + 1 = 2 so it should join on the row with id of 2 in the second aliased table. And so on.
Now I would simply subtract one from the other.
Then I would use the ABS function to ensure that I always get positive integers as a result of the subtraction regardless of which side of the expression is the higher figure.
Is there an easier way to achieve what I want?
The lead analytic function should do the trick:
SELECT period, stat, stat - LEAD(stat) OVER (ORDER BY period) AS gap
FROM tbstats
The average value of the gaps can be done by calculating the difference between the first value and the last value and dividing by one less than the number of elements:
select sum(case when seqnum = num then stat else - stat end) / (max(num) - 1);
from (select period, row_number() over (order by period) as seqnum,
count(*) over () as num
from tbstats
) t
where seqnum = num or seqnum = 1;
Of course, you can also do the calculation using lead(), but this will also work in SQL Server 2005 and 2008.
By using Join also you achieve this
SELECT t1.period,
t1.stat,
t1.stat - t2.stat gap
FROM #tbstats t1
LEFT JOIN #tbstats t2
ON t1.id + 1 = t2.id
To calculate the difference between each statistic and the next, LEAD() and LAG() may be the simplest option. You provide an ORDER BY, and LEAD(something) returns the next something and LAG(something) returns the previous something in the given order.
select
x.id thisStatId,
LAG(x.id) OVER (ORDER BY x.id) lastStatId,
x.stat thisStatValue,
LAG(x.stat) OVER (ORDER BY x.id) lastStatValue,
x.stat - LAG(x.stat) OVER (ORDER BY x.id) diff
from tbStats x