Grouping by ID and time interval in ms sql - sql

I am trying to write a query that groups like ids within a timespan.
Real world scenario:
I want to see rows created by the same ID within 5 seconds of each other.
SELECT top 10
Id,
CreatedOn
FROM Logs
where ((DATEPART(SECOND, CreatedOn) + 5) - DATEPART(SECOND, CreatedOn)) < 10
GROUP BY
DATEPART(SECOND, CreatedOn),
Id,
CreatedOn
order by CreatedOn desc
This isnt quite right but I feel like I am on the right track.
thanks in advance

You may try doing a query on the condition that the ID matches, and the seconds since epoch is within 5 seconds of the matching record:
SELECT
t1.Id,
t1.CreatedOn
FROM logs t1
WHERE EXISTS (SELECT 1 FROM logs t2
WHERE t1.Id = t2.Id AND
t1.CreatedOn <> t2.CreatedOn AND
ABS(DATEDIFF(SECOND, t1.CreatedOn, t2.CreatedOn)) <= 5)
ORDER BY
t1.CreatedOn DESC;

Could be further optimized this way:
SELECT t1.Id,
,t1.CreatedOn
FROM logs t1
WHERE EXISTS (
SELECT 1
FROM logs t2
WHERE t2.Id = t1.Id
AND t2.CreatedOn <> t1.CreatedOn
AND ABS(DATEDIFF(SECOND, t1.CreatedOn, t2.CreatedOn)) <= 5
)
ORDER BY
t1.CreatedOn DESC;

Related

SQL: Select from another table (t2) without joining but referencing a column from t1

I have a table with columns date and net_sales. For each day, I want to get the sum of the net_sales for the last 30 days.
This is my query:
thirty_days_net_sales AS (
SELECT
t1.date,
t1.net_sales AS net_sales_on_date,
(SELECT SUM(t2.net_sales) FROM total_net_sales_per_day t2 WHERE t2.date >= DATE_SUB(t1.date, INTERVAL 30 DAY) AND t2.date <= t1.date)
FROM
total_net_sales_per_day t1)
When I run this query I get the error: LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
I am using Google BigQuery. Thanks in advance for your help!
Consider rather below approach
select *, sum(net_sales) over win last_30_days
from total_net_sales_per_day
window win as (order by unix_date(date) range between 29 preceding and current row )
You would use window functions. If you have data for every day (as the name of the table implies:
SELECT tnspd.*
sum(netsales) over (partition by date
order by date
rows between -30 and current row
)
FROM total_net_sales_per_day tnspd;
As the error said, you have to add and equal condition. In case of using join you have to use ON keyword for one equal condition.
In your query, because you do not have join explicitly, you must have an equal condition something like I added below:
thirty_days_net_sales AS (
SELECT
t1.date,
t1.net_sales AS net_sales_on_date,
(SELECT SUM(t2.net_sales) FROM total_net_sales_per_day t2 WHERE t2.date >= DATE_SUB(t1.date, INTERVAL 30 DAY) AND t2.date <= t1.date
AND t1.id==t2.id)
FROM
total_net_sales_per_day t1)
This link might help for more information:
https://sql.info/d/solved-bigquery-left-outer-join-cannot-be-used-without-a-condition-that-is-an-equality-of-fields-from-both-sides-of-the-join

Selecting only one duplicate record

I have a 3-column table.
ID (unique)
code
time (timestamp)
I want to show code records in the specified date range like this ;
SELECT * FROM table
WHERE time >= '2018-12-13' AND time <= '2018-12-16 23:59:59.999' ORDER BY ID
It's work. I do not have a problem with this query.
But I need the results of the only one row for each duplicate code records.
How can I do this with the above query?
use corelated subquery
select t1.* from table t1
where time = select min(time) from table t2
where t2.code=t1.code
and time >= '2018-12-13' AND time <= '2018-12-16 23:59:59.999'
Add DISTINCT to your select query if you want the whole row to be one row as unique and rest duplicate rows eliminated or else group by can help as group by groups the multiple entries taking one entry as a group and performing aggregatiom for the other to keep both the group section and other columns in sink
SELECT DISTINCT * from table
WHERE time >= '2018-12-13'
AND time <= '2018-12-16 23:59:59.999'
ORDER BY ID
or
SELECT * from table
WHERE time >= '2018-12-13'
AND time <= '2018-12-16 23:59:59.999'
group by code
ORDER BY ID
An appropriate method would be:
SELECT t.*
FROM table t
WHERE t.id = (SELECT t2.id
FROM table t t2
WHERE t2.code = t.code AND
t2.time >= '2018-12-13' AND
t2.time < '2018-12-17'
ORDER BY t2.id
LIMIT 1
)
ORDER BY ID;
In addition to being syntactically correct SQL that (apart from the LIMIT) should work in almost any database, it can also take advantage of an index on (code, time, id). It should be faster than an alternative using GROUP BY.
You can use group by or distinct clause.
SELECT ID, code, time
FROM table
WHERE time >= '2018-12-13'
AND time <= '2018-12-16 23:59:59.999'
GROUP BY code
ORDER BY ID

Date Difference between consecutive rows adding additional columns

Say I added a Cost Difference column to the second table from Rishal (see the below link for this previous post), how would I also calculate and display that?
Using just the 1001 Account Number and adding the following amounts of ID1=$10, ID4=$33 and ID6=$50 to the first table, how would I display in Rishal's second table a result of $23 and $17 in addition to the other 3 columns that are already there?
I've used this code (from GarethD) and would like to insert my Cost Difference column within this...Thanks in advance,
SELECT ID,
AccountNumber,
Date,
NextDate,
DATEDIFF("D", Date, NextDate)
FROM ( SELECT ID,
AccountNumber,
Date,
( SELECT MIN(Date)
FROM YourTable T2
WHERE T2.Accountnumber = T1.AccountNumber
AND T2.Date > T1.Date
) AS NextDate
FROM YourTable T1
) AS T
Date Difference between consecutive rows
I would recommend using JOIN to bring in the entire next record:
SELECT T.*, DATEDIFF("D", t.Date, t.NextDate) as datediff,
TNext.Amount, (Tnext.Amount - T.Amount) as amountdiff
FROM (SELECT T1.*,
(SELECT MIN(Date)
FROM YourTable T2
WHERE T2.Accountnumber = T1.AccountNumber AND
T2.Date > T1.Date
) AS NextDate
FROM YourTable as T1
) AS T LEFT JOIN
YourTable as Tnext
ON t.Accountnumber = tnext.Accountnumber AND t.Date = tnext.Accountnumber;

SQL query with 2 nested to queries on same table

This question is an extension capability related to my previous question here Update column with autonumber. Now with only one table this time:
Date Adds
6/1/18 0
6/5/18 1
6/7/18 0
...+60 records
10/1/18 0
I would like to create a table of Dates, 60 date records (for ex) beyond the Date with a number in the New in Field. Using the previous method, here is what I have:
Select t1.adds, t1.date from T1 where t1.adds > 0 AND
(select count(*)+1 from t1 as t2
where t2.Date <= t1.Date AND t2.date >=
(select date from t1 as t3 where t3.date > t2.date) = 60)
I think everything would work except for the 2nd conditional statement where I need the date to be greater than the corresponding date where Adds > 0. If executed I would expect my table to look like:
Date Adds
10/1/18 1
I think this works but unsure how efficient it is yet. I just made a tbltemp with Adds and Date where Adds > 0
SELECT q1.adds, t1.Date
FROM T1, tblTemp AS q1
WHERE (select count(*) from T1 as t2 where t2.date <= t1.date AND t2.date > q1.date)=60
I will do a little more testing with more records unless anyone has any better ideas?

PostgreSQL find entries where timestamp differences are within range?

I have a table that has a session_id, user_id, start_time, and value
Technically, a user should get a new session_id every 30 minutes, so there should never be a case where 2 entries have the same user_id but their start times are within 30 minutes of eachother.
How do I run a query to look for these error cases?
I did something like this to see some of the time differences for entries for a given user:
select t1.start_time - t2.start_time
from user_sessions as t1 inner join
user_sesssions as t2
on t1.user_id = 1 and t2.user_id = 1
I know that I'm looking for cases where:
((t1.start_time-t2.start_time) < 60*30*1000000 and (t1.start_time-t2.start_time) > 0) and t1.user_id = t2.user_id
I'm just not sure how to put the two pieces together into one query.
Does this do what you want?
select t1.start_time - t2.start_time
from user_sessions t1 inner join
user_sesssions t2
on t1.user_id = t2.user_id
where (t1.start_time - t2.start_time) < 60*30*1000000 and
(t1.start_time - t2.start_time) > 0;
Using LAG() OVER() allows a simple way to calculate the time difference between rows:
SELECT
user_id, previous_start, start_time, minutes_diff
FROM (
SELECT
user_id
, LAG(start_time) OVER(PARTITION BY user_id ORDER BY start_time) previous_start
, EXTRACT(MINUTES FROM
start_time - lag(start_time) over(partition by user_id order by start_time)
) minutes_diff
FROM user_sessions
) d
WHERE minutes_diff < 30
;