How to get only those rows where the next row (by datestamp) has a different property in some field? - sql

I have a table of data like this:
+--------------+-------------------------+----------+
| o_objguid | o_acttime | o_action |
+--------------+-------------------------+----------+
| 478n8937g990 | 2013-10-02 10:45:33.423 | 1012 |
| 478n8937g990 | 2013-10-02 11:21:57.207 | 1012 |
| 478n8937g990 | 2013-10-02 11:21:57.887 | 1012 |
| 478n8937g990 | 2013-11-15 13:42:11.983 | 1013 |
+--------------+-------------------------+----------+
I want a query to return only those rows where, for a given o_objguid, the next row in time sequence does not have an o_action of 1012.
I'm using the following query:
select d1.* from dbo.dms_audt d1
inner join
(select d2.o_objguid,d2.o_acttime,d2.o_action,
min(datediff(second,d1.o_acttime,d2.o_acttime)) as intervalToNext
from dbo.dms_audt d1
inner join
dbo.dms_audt d2
on
d1.o_objguid=d2.o_objguid
where
d2.o_acttime>d1.o_acttime
group by
d2.o_objguid,d2.o_acttime,d2.o_action) d2
on
d1.o_objguid=d2.o_objguid
where
datediff(second,d1.o_acttime,d2.o_acttime)=intervalToNext
and
d1.o_action=1012
and
d2.o_action<>1012
This query does not return the row with an o_acttime of 2013-10-02 10:45:33.423, because the next row has the same o_action. But because I'm using an argument of second in the datediff() function, the rows with these o_acttime:
2013-10-02 11:21:57.207
2013-10-02 11:21:57.887
are both treated as the same date value for calculation purposes, so both rows are returned, when really the only one that should be returned is the 2013-10-02 11:21:57.887 row.
I tried changing the datediff() argument to millisecond, but this resulted in an overflow error, probably because a date difference of several days or more will have too many milliseconds to fit in the return type.
I guess I can join the query's results against another query that will use max(o_acttime), so that only the highest o_acttime in a group of results that have the same intervalToNext will be returned. But I'm concerned about the size and performance of this query; there are a lot of nested Cartesian products here and it's already pretty slow against a set of 1000000+ rows. Is there a better way to get the result I want?

This should work as well without computing a CTE twice. Depending on the data and indexing strategy it may or may not be faster than Joachim's response.
SELECT
*
FROM dbo.dms_audt d1
CROSS APPLY ( -- get next action
SELECT TOP 1
*
FROM dbo.tbl_audt X
WHERE d1.o_objguid = X.o_objguid
AND d1.o_acttime < X.o_acttime
ORDER BY X.o_acttime
) D2
WHERE D1.o_action = 1012
AND D2.o_action != 1012

Sadly, SQL Server 2008 does not have LEAD() (a 2012 feature) which would have made the query trivial, but you can simulate it using ROW_NUMBER();
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY o_acttime) rn
FROM table1
WHERE o_objguid = '478n8937g990'
)
SELECT a.*
FROM cte a
JOIN cte b
ON a.rn = b.rn - 1 AND b.o_action <> 1012;
An SQLfiddle to test with.

Related

Find difference between two consecutive rows from a result in SQL server 2008

I want to fetch the difference in "Data" column between two consecutive rows. For example, need Row2-Row1 ( 1902.4-1899.66) , Row 3-Row 2 and so on. The difference should be stored in a new column.
+----+-------+-----------+-------------------------+----+
| Name | Data |meter| Time |
+----+-------+-----------+-------------------------+----+
| Boiler-1 | 1899.66 | 1 | 5/16/2019 12:00:00 AM |
| Boiler-1 | 1902.4 | 1 | 5/16/2019 12:15:00 AM |
| Boiler-1 | 1908.1 | 1 | 5/16/2019 12:15:00 AM |
| Boiler-1 | 1911.7 | 6 | 5/16/2019 12:15:00 AM |
| Boiler-1 | 1926.4 | 6 | 5/16/2019 12:15:00 AM |
|
+----+-------+-----------+------------------------- +
Thing is the table structure that I have shown in the question, is actually obtained from two different tables. I mean, the above table is a result of a Select query to get data from two different tables. Goes like "select name, data, unitId, Timestamp from table t1 join table t2....." So is there anyway for me to calculate the difference in "data" column value between consecutive rows, without storing this above shown result into a table?
I use SQL 2008, so Lead/Lag functionality cannot be used.
The equivalent in SQL Server 2008 uses apply -- and it can be expensive:
with t as (
<your query here>
)
select t.*,
(t.data - tprev.data) as diff
from t outer apply
(select top (1) tprev.*
from t tprev
where tprev.name = t.name and
tprev.boiler = t.boiler and
tprev.time < t.time
order by tprev.time desc
) tprev;
This assumes that you want the previous row when the name and boiler are the same. You can adjust the correlation clause if you have different groupings in mind.
Not claiming that this is best, this is just another option in SQL SERVER < 2012. As from SQL Server 2012 its easy to do the same using LEAD and LAG default option added. Any way, for small and medium data set, you can consider this below script as well :)
Note: This is just an Idea for you.
WITH CTE(Name,Data)
AS
(
SELECT 'Boiler-1' ,1899.66 UNION ALL
SELECT 'Boiler-1',1902.4 UNION ALL
SELECT 'Boiler-1',1908.1 UNION ALL
SELECT 'Boiler-1',1911.7 UNION ALL
SELECT 'Boiler-1',1926.4
--Replace above select statement with your query
)
SELECT A.Name,A.Data,A.Data-ISNULL(B.Data,0) AS [Diff]
FROM
(
--Adding ROW_NUMBER Over (SELECT NULL) will keep the natural order
--of your data and will just add the row number.
SELECT *,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) RN FROM CTE
)A
LEFT JOIN
(
SELECT *,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) RN FROM CTE
) B
--Here the JOINING will take place on curent and next row for using ( = B.RN-1)
ON A.RN = B.RN-1

New column referencing second table - do I need a join?

I have two tables (first two shown) and need to make a third from the first two - do I need to do a join or can you reference a table without joining?
The third table shown is the desired output. Thanks for any help!
| ACC | CALL DATE | | |
+-----+-----------+--+--+
| 1 1 | 2/1/18 | | |
+-----+-----------+--
+-----+---------------+--+--+
| ACC | PURCHASE DATE | | |
+-----+---------------+--+--+
| 1 1 | 1/1/18 | | |
+-----+---------------+--+--+
+-----+-----------+----------------------+--+
| ACC | CALL DATE | PRIOR MONTH PURCHASE | |
+-----+-----------+----------------------+--+
| 1 1 | 2/1/18 | YES | |
+-----+-----------+----------------------+--+
Of course you can have a query that references multiple tables without joining. union all is an example of an operator that does that.
There is also the question of what you mean by "joining" in the question. If you mean explicit joins, there are ways around that -- such as correlated subqueries. However, these are implementing some form of "join" in the database engine.
As for your query, you would want to use exists with a correlated subquery:
select t1.*,
(case when exists (select 1
from table2 t2
where t2.acc = t1.acc and
datediff(month, t2.purchase_date, t1.call_date) = 1
)
then 'Yes' else 'No'
end) as prior_month_purchase
from table1 t1;
This is "better" than a join because it does not multiply or remove rows. The result set has exactly the rows in the first table, with the additional column.
The syntax assumes SQL Server (which was an original tag). Similar logic can be expressed in other databases, although date functions are notoriously database-dependent.
Lets check the options,
Say if you were to create a new third table on the basis of the data in first two, then every update/inserts/deletes to either of the tables should also propagate into the third table as well.
Say you instead have a view which does what you need, there isnt a need to maintain that third table and also gets you the data needed from the first two each time you query it.
create view third_table as
select a.acc,a.call_date,case when dateadd(mm,-1,a.call_date)=b.purchase_date then 'Yes' else 'No end as prior_month_purchase
from first_table a
left join second_table b
on a.acc=b.acc

How to fill in empty date rows multiple times?

I am trying to fill in dates with empty data, so that my query returned has every date and does not skip any.
My application needs to count bookings for activities by date in a report, and I cannot have skipped dates in what is returned by my SQL
I am trying to use a date table (I have a table with every date from 1/1/2000 to 12/31/2030) to accomplish this by doing a RIGHT OUTER JOIN on this date table, which works when dealing with one set of activities. But I have multiple sets of activities, each needing their own full range of dates regardless if there were bookings on that date.
I also have a function (DateRange) I found that allows for this:
SELECT IndividualDate FROM DateRange('d', '11/01/2017', '11/10/2018')
Let me give an example of what I am getting and what I want to get:
BAD: Without empty date rows:
date | activity_id | bookings
-----------------------------
1/2 | 1 | 5
1/4 | 1 | 4
1/3 | 2 | 6
1/4 | 2 | 2
GOOD: With empty date rows:
date | activity_id | bookings
-----------------------------
1/2 | 1 | 5
1/3 | 1 | NULL
1/4 | 1 | 4
1/2 | 2 | NULL
1/3 | 2 | 6
1/4 | 2 | 2
I hope this makes sense. I get the whole point of joining to a table of just a list of dates OR using the DateRange table function. But neither get me the "GOOD" result above.
Use a cross join to generate the rows and then left join to fill in the values:
select d.date, a.activity_id, t.bookings
from DateRange('d', ''2017-11-01',''2018-11-10') d cross join
(select distinct activity_id from t) a left join
t
on t.date = d.date and t.activity_id = a.activity_id;
It is a bit hard to follow what your data is and what comes from the function. But the idea is the same, wherever the data comes from.
I figured it out:
SELECT TOP 100 PERCENT masterlist.dt, masterlist.activity_id, count(r_activity_sales_bymonth.bookings) AS totalbookings
FROM (SELECT c.activity_id, dateadd(d, b.incr, '2016-12-31') AS dt
FROM (SELECT TOP 365 incr = row_number() OVER (ORDER BY object_id, column_id), *
FROM (SELECT a.object_id, a.column_id
FROM sys.all_columns a CROSS JOIN
sys.all_columns b) AS a) AS b CROSS JOIN
(SELECT DISTINCT activity_id
FROM r_activity_sales_bymonth) AS c) AS masterlist LEFT OUTER JOIN
r_activity_sales_bymonth ON masterlist.dt = r_activity_sales_bymonth.purchase_date AND masterlist.activity_id = r_activity_sales_bymonth.activity_id
GROUP BY masterlist.dt, masterlist.activity_id
ORDER BY masterlist.dt, masterlist.activity_id

Using a value from a previous row to calculate a value in the next row

I am trying to create a report that pulls the date from a previous row, does some calculation and then displays the answer on the row below that row. The column in question is "Time Spent".
E.g. I have 3 rows.
+=====+===============+============+====+
|name | DatCompleted | Time Spent | idx|
+=====+===============+============+====+
| A | 1/1/17 | NULL | 0 |
+-----+---------------+------------+----+
| B | 11/1/17 | 10 days | 1 |
+-----+---------------+------------+----+
| C | 20/1/17 | 9 days | 2 |
+=====+===============+============+====+
Time Spent C = DatCompleted of C - DateCompleted of B
Apart from using a crazy loop and using row x row instead of set I can't see how I would complete this. Has anyone ever used this logic before in SQL? If how did you go about this?
Thanks in advance!
Most databases support the ANSI standard LAG() function. Date functions differ depending on the database, but something like this:
select t.*,
(DateCompleted - lag(DateCompleted) over (order by DateCompleted)) as TimeSpent
from t;
In SQL Server, you would use datediff():
select t.*,
datediff(day,
lag(DateCompleted) over (order by DateCompleted),
DateCompleted
) as TimeSpent
from t;
You can do this by using ROW number syntax is
ROW_NUMBER ( ) OVER ( [ PARTITION BY value_expression , ... [ n ] ] order_by_clause)
For reference you can use ROW_NUMBER
You have an index already (similar to rownumber above). Join to itself.
Select table1.*
,TimeSpent=DateDiff("d",table1.DateCompleted,copy.DateCompleted)
from table1
join table1 copy on table.idx=copy.idx-1

SQL Server sum with a where or having condition

I'm hoping this makes sense as what I'm trying to do is SUM rows based on other columns of existing rows. I have tried a couple different ways and what I hope is now close is what I have here. This is not my full SQL but hopefully this small example will get me on track
SELECT Price,SUM(Item) from table where Price >= Price group by Price
Sample Data
| PRICE | ITEM |
|-------|-------|
| 1.00 | 5 |
| 2.00 | 9 |
| 3.00 | 2 |
Hopeful Result
| PRICE | ITEM |
|-------|-------|
| 1.00 | 5 |
| 2.00 | 14 |
| 3.00 | 16 |
The actual result is more or less the sample data which I would expect as I am grouping by Price so it makes sense that it returns the rows like this. I just can't seem to think of away to include Price in my select without having to group or use an aggregate on it. I'm thinking I could maybe do this type of calculation with an inner select but I'm hoping there is a different way as my actual query has a lot of joins which could get messy if I go this route.
Thanks for any help.
If you're using SQL server 2012...
Select price, item, sum(item) OVER(order by price rows unbounded preceding) as runningtotal
from sample
http://sqlfiddle.com/#!6/36e9f/1/0
You can accomplish this with a sub-query, but a more efficient way might be to use a CROSS/OUTER APPLY. It depends on your specific data. I provide both methods of doing that below... See which one runs faster based on your specific data.
Sub-query method
SELECT DISTINCT op.Price, (SELECT SUM(ip.Item) FROM table ip WHERE ip.Price <= op.Price) as ITEM FROM table op ORDER BY op.Price ASC
Outer-apply method
SELECT DISTINCT op.Price, a.Items
FROM table op
OUTER APPLY (SELECT SUM(ip.Item) as Items FROM TABLE ip WHERE ip.Price <= op.Price) a
ORDER BY op.Price ASC
Probably you are trying to do something like below. use a self join with the same table.
See a DEMO Here
SELECT t1.Price, SUM(t2.Item)
FROM table1 t1,
table1 t2
WHERE t2.Price <= t1.Price
GROUP BY t1.Price
ORDER BY t1.price;