SQL- how to retrieve by similar dates - sql

Okay, so I have a table with a user_id column and a submitted_dtm column.
I want to find instances where users submitted multiple records within 1 day of each other, and count how many times that has happened.
I've tried something like
select * from table_t t where
(select count(*) from table_t t2 where
t.user_id = t2.user_id and
t.pk!=t2.pk and
t.submitted_dtm between t2.submitted_dtm-.5 and t2.submitted_dtm+.5)>0;
The problem is that this query returns a result for each record in a date group. Instead, I just want a result per date group. Ideally, I'd just get the count in that group.
That is, if I have 6 records:
user_id submitted_dtm
--------------------------
1 12/04/2017 1:15
1 12/04/2017 5:50
2 11/25/2017 2:00
2 11/25/2017 3:25
2 11/25/2017 6:05
2 10/06/2017 4:00
I want 2 results, a count of 2 and a count of 3.
Is it possible to do this in sql?

Following up on Dessma's answer.
select user_id, trunc(submitted_dtm), count(1)
from table_t
group by user_id, trunc(submitted_dtm)
having count(1) > 1;
Sqlfiddle

In Oracle 12.1 and higher, you can solve such problems easily with the match_recognize clause. Link to documentation (with examples) below; my only note about the solution below is that I left the date in DATE data type, especially important if the output is used in further computations. If it isn't, you can wrap within TO_CHAR() with whatever format model is appropriate for your users.
https://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8956
with
inputs ( user_id, submitted_dtm ) as (
select 1, to_date('12/04/2017 1:15', 'mm/dd/yyyy hh24:mi') from dual union all
select 1, to_date('12/04/2017 5:50', 'mm/dd/yyyy hh24:mi') from dual union all
select 2, to_date('11/25/2017 2:00', 'mm/dd/yyyy hh24:mi') from dual union all
select 2, to_date('11/25/2017 3:25', 'mm/dd/yyyy hh24:mi') from dual union all
select 2, to_date('11/25/2017 6:05', 'mm/dd/yyyy hh24:mi') from dual union all
select 2, to_date('10/06/2017 4:00', 'mm/dd/yyyy hh24:mi') from dual
)
-- End of simulated inputs (for testing only, not part of the solution).
-- SQL query begins below this line. Use your actual table and column names.
select user_id, submitted_dtm, cnt
from inputs
match_recognize(
partition by user_id
order by submitted_dtm
measures trunc(a.submitted_dtm) as submitted_dtm,
count(*) as cnt
pattern ( a b+ )
define b as trunc(submitted_dtm) = trunc(a.submitted_dtm)
);
USER_ID SUBMITTED_DTM CNT
---------- ------------------- ----------
1 2017-12-04 00:00:00 2
2 2017-11-25 00:00:00 3

I don't have data to test it but I suspect something like this would do the trick :
SELECT user_id,To_char(t.submitted_dtm, 'dd/mm/yyyy'), COUNT(*)
FROM table_t t
INNER JOIN table_t t2
ON t.user_id = t2.user_id
AND t.pk != t2.pk
AND t.submitted_dtm BETWEEN t2.submitted_dtm - .5 AND
t2.submitted_dtm + .5
GROUP BY user_id,To_char(t.submitted_dtm, 'dd/mm/yyyy')
HAVING COUNT(*) > 1

This is a general idea of how to get the instances.
select user_id, t1.submitted_dtm t1submitted, t2.submitted_dtm t2submtted
from table_t t1 join table_t t2 using (user_id)
where t2.submitted_dtm > t1.submitted_dtm
and t2.submitted_dtm - t1.submitted_dtm <= 1;
The last line could be modified somehow depending on what you mean by within a day.
To count the instances, create a derived table from the above and select count(*) from it.

Related

Why group by date is returning multiple rows for the same date?

I have a query like the following.
select some_date_col, count(*) as cnt
from <the table>
group by some_date_col
I get something like that at the output.
13-12-2021, 6
13-12-2021, 8
13-12-2021, 9
....
How is that possible? Here some_date_col is of type Date.
A DATE is a binary data-type that is composed of 7 bytes (century, year-of-century, month, day, hour, minute and second) and will always have those components.
The user interface you use to access the database can choose to display some or all of those components of the binary representation of the DATE; however, regardless of whether or not they are displayed by the UI, all the components are always stored in the database and used in comparisons in queries.
When you GROUP BY a date data-type you aggregate values that have identical values down to an accuracy of a second (regardless of the accuracy the user interface).
So, if you have the data:
CREATE TABLE the_table (some_date_col) AS
SELECT DATE '2021-12-13' FROM DUAL CONNECT BY LEVEL <= 6 UNION ALL
SELECT DATE '2021-12-13' + INTERVAL '1' SECOND FROM DUAL CONNECT BY LEVEL <= 8 UNION ALL
SELECT DATE '2021-12-13' + INTERVAL '1' MINUTE FROM DUAL CONNECT BY LEVEL <= 9;
Then the query:
SELECT TO_CHAR(some_date_col, 'YYYY-MM-DD HH24:MI:SS') AS some_date_col,
count(*) as cnt
FROM the_table
GROUP BY some_date_col;
Will output:
SOME_DATE_COL
CNT
2021-12-13 00:01:00
9
2021-12-13 00:00:01
8
2021-12-13 00:00:00
6
The values are grouped according to equal values (down to the maximum precision stored in the date).
If you want to GROUP BY dates with the same date component but any time component then use the TRUNCate function (which returns a value with the same date component but the time component set to midnight):
SELECT TRUNC(some_date_col) AS some_date_col,
count(*) as cnt
FROM <the table>
GROUP BY TRUNC(some_date_col)
Which, for the same data outputs:
SOME_DATE_COL
CNT
13-DEC-21
23
And:
SELECT TO_CHAR(TRUNC(some_date_col), 'YYYY-MM-DD HH24:MI:SS') AS some_date_col,
count(*) as cnt
FROM the_table
GROUP BY TRUNC(some_date_col)
Outputs:
SOME_DATE_COL
CNT
2021-12-13 00:00:00
23
db<>fiddle here
Oracle date type holds a date and time component. If the time components do not match, grouping by that value will place the same date (with different times) in different groups:
The fiddle
CREATE TABLE test ( xdate date );
INSERT INTO test VALUES (current_date);
INSERT INTO test VALUES (current_date + INTERVAL '1' MINUTE);
With the default display format:
SELECT xdate, COUNT(*) FROM test GROUP BY xdate;
Result:
XDATE
COUNT(*)
13-DEC-21
1
13-DEC-21
1
Now alter the format and rerun:
ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MON-DD HH24:MI:SS';
SELECT xdate, COUNT(*) FROM test GROUP BY xdate;
The result
XDATE
COUNT(*)
2021-DEC-13 23:29:36
1
2021-DEC-13 23:30:36
1
Also try this:
SELECT to_char(xdate, 'YYYY-MON-DD HH24:MI:SS') AS formatted FROM test;
Result:
FORMATTED
2021-DEC-13 23:29:36
2021-DEC-13 23:30:36
and this:
SELECT to_char(xdate, 'YYYY-MON-DD HH24:MI:SS') AS formatted, COUNT(*) FROM test GROUP BY xdate;
Result:
FORMATTED
COUNT(*)
2021-DEC-13 23:29:36
1
2021-DEC-13 23:30:36
1

Count distinct values from subquery using TO_CHAR(date,'dd.mm.yyyy hh24') hourly

I try to get the distinct values from one column for a day and group the sum using different column by hour using to_char.
For Example:
select to_char(date,'dd.mm.yyyy hh24'),count(*) from select distinct(name) from table_name
where date > to_date(some_date_format)
and date < to_date(some_date_format)
group by to_char(date,'dd.mm.yyyy hh24')
order by 1
Result Example:
01.01.01 01 233
01.01.01 02 233
01.01.01 03 233
01.01.01 04 233
01.01.01 05 233
I get the result hourly but without the distinct column 'name' values from the subquery.
Can somebody explain me where is my mistake.
Thank you!
Is this what you're looking for ?
CREATE TABLE table_name (name, my_date) AS
(
SELECT 'foo', TO_DATE('01-JAN-2021 14:30','DD-MON-YYYY HH24:MI') FROM DUAL UNION ALL
SELECT 'bar', TO_DATE('01-JAN-2021 14:30','DD-MON-YYYY HH24:MI') FROM DUAL UNION ALL
SELECT 'bar', TO_DATE('01-JAN-2021 14:30','DD-MON-YYYY HH24:MI') FROM DUAL UNION ALL
SELECT 'foo', TO_DATE('01-JAN-2021 15:35','DD-MON-YYYY HH24:MI') FROM DUAL
);
table created
SELECT TO_CHAR(my_date, 'DD-MON-YYYY HH24') as "date",
name,
count(*)
FROM table_name
GROUP BY TO_CHAR(my_date, 'DD-MON-YYYY HH24'),
name;
date NAM COUNT(*)
----------------------- --- ----------
01-JAN-2021 15 foo 1
01-JAN-2021 14 foo 1
01-JAN-2021 14 bar 2
Your query is returning the number of distinct names (it's using the COUNT command), not the list of names, as you may want it to do.
Because you're grouping the results by timestamp, you'll have at maximum 1 row for each timestamp, so you need to show the distinct names in a single row.
I suggest you to use Oracle's LISTAGG command to do that -- see the documentation here https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions089.htm#SQLRF30030
Grouping data from different rows into a single row is not part of the standard SQL specification, so each SQL manufacturer will have its own implementation of this function (or even it won't have, sometimes).
Also, keep in mind that, because this list can grow, it can end up with some performance issues (some manufacturers will need you to set some buffer parameters in order to accomodate this calculated data into a single column).
Thanks for help.
I found my mistake:
select
to_char(date, 'dd.mm.yyyy hh24'), count(distinct name)
from
table_name
where
date > to_date(some_date_format)
and date < to_date(some_date_format)
group by
to_char(date, 'dd.mm.yyyy hh24')
order by 1
When I use count(distinct name), it counts all distinct name values hourly.
I was doing it wrong in the beginning using subquery.

Oracle, SQL request to retrieve data by week

I have a database where there is logs of action performed by users, I want to identify the number of users by week which the ID changed From K beginning to A beginning, between the 01/01/2019 till today (20/06/2019) , in this example the user 1000 changed his ID from K to A because the last date action in K is older than the first action with A , the userID is unique of each user , here is my table, also the user 1002 changed at well for the same reason.
My table of logs looks like that
ID date action USERID
KF12 01/01/2019 Create 1000
KG45 11/06/2019 Create 1002
KI89 06/05/2019 Modify 1003
AO22 20/03/2019 Delete 1000
AI88 20/06/2019 Delete 1002
..
WHERE is what I tried, it's not fully complete, but I have no idea how to count changes by week
select distinct USERID, max(DATE_USER) over (partition by USERID)
FROM
HISTORY
WHERE
USERID in (Select distinct USERID
from HISTORY
where ID like 'K%'
and DATE_USER >= to_date('1.1.' || 2019, 'DD.MM.YYYY')
and DATE_USER < to_date('20.06.' || 2019 , 'DD.MM.YYYY')
INTERSECT
select distinct USERID
from HISTORY
where ID like 'A%'
and DATE_USER >= to_date('1.1.' || 2019, 'DD.MM.YYYY')
and DATE_USER < to_date('19.06.' || 2019 , 'DD.MM.YYYY'))
and ID like 'A%'
;
In this example the expected result is the users (1000 , 1002) who changed at (20/03/2019,20/06/2019), the result have to be like this
WEEKNUMBER COUNTOFCHANGE
25 1
12 1
Instead of using functions, You can try to use self join to achieve the same as the following:
-- DATA PREPARATION
WITH LOGS(ID, "DATE",ACTION, USERID) AS
(SELECT 'KF12',TO_DATE('01/01/2019','DD/MM/RRRR'),'Create',1000 FROM DUAL UNION ALL
SELECT 'KG45',TO_DATE('11/06/2019','DD/MM/RRRR'),'Create',1002 FROM DUAL UNION ALL
SELECT 'KI89',TO_DATE('06/05/2019','DD/MM/RRRR'),'Modify',1003 FROM DUAL UNION ALL
SELECT 'AO22',TO_DATE('20/03/2019','DD/MM/RRRR'),'Delete',1000 FROM DUAL UNION ALL
SELECT 'AI88',TO_DATE('20/06/2019','DD/MM/RRRR'),'Delete',1002 FROM DUAL)
-- ACTUAL QUERY
SELECT
WK,
COUNT(DISTINCT USERID)
FROM
(
SELECT
TO_CHAR(L2."DATE", 'WW') WK,
L2.USERID
FROM
LOGS L1
JOIN LOGS L2 ON ( L1.USERID = L2.USERID
AND L1."DATE" < L2."DATE"
AND L1.ID LIKE 'K%'
AND L2.ID LIKE 'A%' )
)
GROUP BY
WK
Output:
DB Fiddle demo
Cheers!!
Use lag to find previous ID , filter by 'K' -> 'A' and count as needed
select wk, count(distinct USERID) n
from
(select log.*, to_char(dat,'ww') wk, lag(ID) over(partition by USERID order by dat) prev_id
from log) t
where substr(t.ID,0,1) = 'A' and substr(t.prev_id,0,1) = 'K'
group by wk

SQL counting days with gap / overlapping

I am working on a "counting days" problem almost identical to this one. I have a list of date(s), and need to count how many days used excluding duplicate, and handling the gaps. Same input and output.
From: Markus Jarderot
Input
ID d1 d2
1 2011-08-01 2011-08-08
1 2011-08-02 2011-08-06
1 2011-08-03 2011-08-10
1 2011-08-12 2011-08-14
2 2011-08-01 2011-08-03
2 2011-08-02 2011-08-06
2 2011-08-05 2011-08-09
Output
ID hold_days
1 11
2 8
SQL to find time elapsed from multiple overlapping intervals
But for the life of me I couldn't understand Markus Jarderot's solution.
SELECT DISTINCT
t1.ID,
t1.d1 AS date,
-DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1) AS n
FROM Orders t1
LEFT JOIN Orders t2 -- Join for any events occurring while this
ON t2.ID = t1.ID -- is starting. If this is a start point,
AND t2.d1 <> t1.d1 -- it won't match anything, which is what
AND t1.d1 BETWEEN t2.d1 AND t2.d2 -- we want.
GROUP BY t1.ID, t1.d1, t1.d2
HAVING COUNT(t2.ID) = 0
Why is DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1) picking from the min(d1) from the entire list? Is that regardless of ID.
And what does t1.d1 BETWEEN t2.d1 AND t2.d2 do? Is that to ensure only overlapped interval are calculated?
Same thing with group by, I think because if in the event the same identical period will be discarded? I tried to trace the solution by hand but getting more confused.
This is mostly a duplicate of my answer here (including explanation) but with the inclusion of grouping on an id column. It should use a single table scan and does not require a recursive sub-query factoring clause (CTE) or self joins.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE your_table ( id, usr, start_date, end_date ) AS
SELECT 1, 'A', DATE '2017-06-01', DATE '2017-06-03' FROM DUAL UNION ALL
SELECT 1, 'B', DATE '2017-06-02', DATE '2017-06-04' FROM DUAL UNION ALL -- Overlaps previous
SELECT 1, 'C', DATE '2017-06-06', DATE '2017-06-06' FROM DUAL UNION ALL
SELECT 1, 'D', DATE '2017-06-07', DATE '2017-06-07' FROM DUAL UNION ALL -- Adjacent to previous
SELECT 1, 'E', DATE '2017-06-11', DATE '2017-06-20' FROM DUAL UNION ALL
SELECT 1, 'F', DATE '2017-06-14', DATE '2017-06-15' FROM DUAL UNION ALL -- Within previous
SELECT 1, 'G', DATE '2017-06-22', DATE '2017-06-25' FROM DUAL UNION ALL
SELECT 1, 'H', DATE '2017-06-24', DATE '2017-06-28' FROM DUAL UNION ALL -- Overlaps previous and next
SELECT 1, 'I', DATE '2017-06-27', DATE '2017-06-30' FROM DUAL UNION ALL
SELECT 1, 'J', DATE '2017-06-27', DATE '2017-06-28' FROM DUAL UNION ALL -- Within H and I
SELECT 2, 'K', DATE '2011-08-01', DATE '2011-08-08' FROM DUAL UNION ALL -- Your data below
SELECT 2, 'L', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
SELECT 2, 'M', DATE '2011-08-03', DATE '2011-08-10' FROM DUAL UNION ALL
SELECT 2, 'N', DATE '2011-08-12', DATE '2011-08-14' FROM DUAL UNION ALL
SELECT 3, 'O', DATE '2011-08-01', DATE '2011-08-03' FROM DUAL UNION ALL
SELECT 3, 'P', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
SELECT 3, 'Q', DATE '2011-08-05', DATE '2011-08-09' FROM DUAL;
Query 1:
SELECT id,
SUM( days ) AS total_days
FROM (
SELECT id,
dt - LAG( dt ) OVER ( PARTITION BY id
ORDER BY dt ) + 1 AS days,
start_end
FROM (
SELECT id,
dt,
CASE SUM( value ) OVER ( PARTITION BY id
ORDER BY dt ASC, value DESC, ROWNUM ) * value
WHEN 1 THEN 'start'
WHEN 0 THEN 'end'
END AS start_end
FROM your_table
UNPIVOT ( dt FOR value IN ( start_date AS 1, end_date AS -1 ) )
)
WHERE start_end IS NOT NULL
)
WHERE start_end = 'end'
GROUP BY id
Results:
| ID | TOTAL_DAYS |
|----|------------|
| 1 | 25 |
| 2 | 13 |
| 3 | 9 |
The brute force method is to create all days (in a recursive query) and then count:
with dates(id, day, d2) as
(
select id, d1 as day, d2 from mytable
union all
select id, day + 1, d2 from dates where day < d2
)
select id, count(distinct day)
from dates
group by id
order by id;
Unfortunately there is a bug in some Oracle versions and recursive queries with dates don't work there. So try this code and see whether it works in your system. (I have Oracle 11.2 and the bug still exists there; so I guess you need Oracle 12c.)
I guess Markus' idea is to find all starting points that are not within other ranges and all ending points that aren't. Then just take the first starting point till the first ending point, then the next starting point till the next ending point, etc. As Markus isn't using a window function to number starting and ending points, he must find a more complicated way to achieve this. Here is the query with ROW_NUMBER. Maybe this gives you a start what to look for in Markus' query.
select startpoint.id, sum(endpoint.day - startpoint.day)
from
(
select id, d1 as day, row_number() over (partition by id order by d1) as rn
from mytable m1
where not exists
(
select *
from mytable m2
where m1.id = m2.id
and m1.d1 > m2.d1 and m1.d1 <= m2.d2
)
) startpoint
join
(
select id, d2 as day, row_number() over (partition by id order by d1) as rn
from mytable m1
where not exists
(
select *
from mytable m2
where m1.id = m2.id
and m1.d2 >= m2.d1 and m1.d2 < m2.d2
)
) endpoint on endpoint.id = startpoint.id and endpoint.rn = startpoint.rn
group by startpoint.id
order by startpoint.id;
If all your intervals start at different dates, consider them in ascending order by d1 counting how many days are from d1 to the next interval.
You can discard an interval of it is contained in another one.
The last interval won't have a follower.
This query should give you how many days each interval give
select a.id, a.d1,nvl(min(b.d1), a.d2) - a.d1
from orders a
left join orders b
on a.id = b.id and a.d1 < b.d1 and a.d2 between b.d1 and b.d2
group by a.id, a.d1
Then group by id and sum days

Filter rows by those created within a close timeframe

I have a application where users create orders that are stored in a Oracle database. I'm trying to find a bug that only happens when a user creates orders within 30 seconds of the last order they created.
Here is the structure of the order table:
order_id | user_id | creation_date
I would like to write a query that can give me a list of orders where the creation_date is within 30 seconds of the last order for the same user. The results will hopefully help me find the bug.
I tried using the Oracle LAG() function but it doesn't seem to with the WHERE clause.
Any thoughts?
SELECT O.*
FROM YourTable O
WHERE EXISTS (
SELECT *
FROM YourTable O2
WHERE
O.creation_date > O2.creation_date
AND O.user_id = O2.user_id
AND O.creation_date - (30 / 86400) <= O2.creation_date
);
See this in action in a Sql Fiddle.
You can use the LAG function if you want, you would just have to wrap the query into a derived table and then put your WHERE condition in the outer query.
SELECT distinct
t1.order_id, t1.user_id, t1.creation_date
FROM
YourTable t1
join YourTable t2
on t2.user_id = t1.user_id
and t2.creation_date between t1.creation_date - 30/86400 and t1.creation_date
and t2.rowid <> t1.rowid
order by 3 desc
Example of using LAG():
SELECT id, (pss - css) time_diff_in_seconds
, creation_date, prev_date
FROM
(
SELECT id, creation_date, prev_date
, EXTRACT(SECOND From creation_date) css
, EXTRACT(SECOND From prev_date) pss
FROM
(
SELECT id, creation_date
, LAG(creation_date, 1, creation_date) OVER (ORDER BY creation_date) prev_date
FROM
( -- Table/data --
SELECT 1 id, timestamp '2013-03-20 13:56:58' creation_date FROM dual
UNION ALL
SELECT 2, timestamp '2013-03-20 13:57:27' FROM dual
UNION ALL
SELECT 3, timestamp '2013-03-20 13:59:16' FROM dual
)))
--WHERE (pss - css) <= 30
/
ID TIME_DIFF_IN_SECONDS
--------------------------
1 0 <<-- if uncomment where
2 31
3 11 <<-- if uncomment where