Find the people who are login 3 consecutive dates - sql

LoginHistory table
Date Name Login
----------------------------------------
03/20/2021 Amy 1
03/20/2021 Lily 1
03/20/2021 Nancy 1
03/21/2021 Amy 1
03/21/2021 Lily 1
03/21/2021 Leo 1
03/22/2021 Amy 1
03/22/2021 Lisa 1
03/22/2021 Leo 1
03/23/2021 Lily 1
03/23/2021 Lisa 1
03/23/2021 Leo 1
I want to find the people and their login date who was login instance 3 times in consecutive dates. For example, my output should has Amy, because she was login 3/20,3/21 and 3/22. For Lily, she shouldn't be in my output, because even she login 3 times, the date(3/20,3/21 and 3/23) is not in consecutive order.
output should be:
Date Name Login
----------------------------------------
03/20/2021 Amy 1
03/21/2021 Amy 1
03/21/2021 Leo 1
03/22/2021 Amy 1
03/22/2021 Leo 1
03/23/2021 Leo 1
Thanks.

Based on the specific sample data provided, you could use analytic min and max to get the first and last date for each name, count the difference in days and the number of logins which must be 3 with 2 days between first and last date.
You haven't specific a RDBMS so the date functions may need amending as appropriate, however all RDBMS support the same functionality.
select date, name
from (
select *,
DateDiff(day,Min(date) over(partition by name),
Max(date) over(partition by name))diff,
Count(*) over(partition by name) qty
from t
)t
where diff=2 and qty=3
order by date;

To produce a table of the consecutive logins, you can first anchor your search on the action that is the last in the sequence. Then, you can join all the preceding dates to that original result:
with vals(v) as (
select 1
union all
select 2
)
select c2.* from (
select c.* from loginhistory c where
(select count(*) from loginhistory c1 cross join vals v
where c1.name = c.name and c.dt = c1.dt + interval '1' day * v.v) = 2
) t1
join loginhistory c2 on t1.name = c2.name and c2.dt <= t1.dt and (c2.dt + interval '2' day) >= t1.dt
order by c2.dt

select * from LoginHistory where name in (
select name
from LoginHistory
where date between <start> and <end> -- must be exactly three dates in the range
group by name
having count(distinct date) = 3
)

Related

Inner Join - special time conditions

Given an hourly table A with full heart_rate records, e.g.:
User Hour Heart_rate
Joe 1 60
Joe 2 70
Joe 3 72
Joe 4 75
Joe 5 68
Joe 6 71
Joe 7 78
Joe 8 83
Joe 9 85
Joe 10 80
And a subset hours where a purchase happened, e.g.
User Hour Purchase
Joe 3 'Soda'
Joe 9 'Coke'
Joe 10 'Doughnut'
I want to keep only those records from A that are in B or at most 2hr behind the B subset, without duplication, i.e. and preserving both the heart_rate from A and the item purchased from b so the outcome is
User Hour Heart_rate Purchase
Joe 1 60 null
Joe 2 70 null
Joe 3 72 'Soda'
Joe 7 78 null
Joe 8 83 null
Joe 9 85 'Coke'
Joe 10 80 'Doughnut'
How can the result be achieved with an inner join, without duplication (in this case the hours 8&9) (This is an MWE, assume multiple users and timestamps instead of hours)
The obvious solution is to combine
Inner Join + deduplication
Left join
Can this be achieved in a more elegant way?
You could use an INNER join of the tables and conditional aggregation for the deduplication:
SELECT a.User, a.Hour, a.Heart_rate,
MAX(CASE WHEN a.Hour = b.Hour THEN b.Purchase END) Purchase
FROM a INNER JOIN b
ON b.User = a.User AND a.Hour BETWEEN b.Hour - 2 AND b.Hour
WHERE a.User = 'Joe' -- remove this line if you want results for all users
GROUP BY a.User, a.Hour, a.Heart_rate;
Or with MAX() window function:
SELECT DISTINCT a.*,
MAX(CASE WHEN a.Hour = b.Hour THEN b.Purchase END) OVER (PARTITION BY a.User, a.Hour) Purchase
FROM a INNER JOIN b
ON b.User = a.User AND a.Hour BETWEEN b.Hour - 2 AND b.Hour;
See the demo (for MySql but it is standard SQL).
Your solutiuons should work and sounds good.
There is another way, using 3 Select Statements.
The inner Select combines both tables by UNION ALL. Because only tables with the same columns can be combinded, fields which are only in one table have to be defined in the other one as well and set to null. The column hour_eat is added to see when the last purchase has occured. By sorting this table, we can archive that under each row from table B lies now the row of table A which occures next.
In the middle Select statement the lag(Purchase) gets the last Purchase. If we only think about the rows from the 1st table, the Purchase value from the 2nd table is now at the right place. This comes in handy if timestamps and not defined hours are used. The row the last_value calculates the time between the purchase and measurement of the heart_beat.
The outer Select filters the rows of interest. The last 2 hours before the purchase and only the rows of the 1st table.
With
heart_tbl as (SELECT "Joe" as USER, row_number() over() Hour, Heart_rate from unnest([60,72,72,75,68,71,78,83,85,80]) Heart_rate ),
eat_tbl as (Select "Joe" as User ,3 Hour , 'Soda' as Purchase UNION ALL SELECT "Joe", 9, 'Coke' UNION ALL SELECT "Joe", 10, 'Doughnut' )
SELECT user, hour,heart_rate,Purchase_,hours_till_Purchase
from
(
SELECT *,
lag(Purchase) over (order by hour, heart_rate is not null) as Purchase_,
hour-last_value(hour_eat ignore nulls) over (order by hour desc,heart_rate is not null) as hours_till_Purchase
From # combine both tables to one table (ordered by hours)
(
SELECT user, hour,heart_rate, null as Purchase, null as hour_eat from heart_tbl
UNION ALL
Select user, hour, null as heart_rate, Purchase, hour from eat_tbl
)
)
Where heart_rate is not null and hours_till_Purchase >= -2
order by hour

Select based on max date from another table

I'm trying to do a simple Select query by getting the country based on the MAX Last update from the other table.
Order#
1
2
3
4
The other table contains the country and the last update:
Order# Cntry Last Update
1 12/21/2019 9:19 PM
1 US 1/10/2020 1:07 AM
2 JP 7/29/2020 12:15 PM
3 CA 4/12/1992 2:04 PM
3 GB 11/6/2001 9:26 AM
3 DK 2/1/2005 3:04 AM
4 CN 8/20/2013 12:04 AM
4 10/1/2015 4:04 PM
My desired result:
Order# Country
1 US
2 JP
3 DK
4
Not sure the right solution for this. So far i'm stuck with this:
SELECT Main.[Order#], tempTable.Cntry
FROM Main
LEFT JOIN (
SELECT [Order#], Cntry, Max([Last Update]) as LatestDate FROM Country
GROUP BY [Order#], Cntry
) as tempTable ON Main.[Order#] = tempTable.[Order#];
Thanks in advance!
If needs only number of order and country,maybe don't need two tables:
SELECT distinct order, country
FROM
(
SELECT order, LAST_VALUE (country) OVER (PARTITION by [order] order by last_update) country FROM Country
) X
In SQL Server, you can use a correlated subquery:
update main
set country = (select top (1) s.country
from secondtable s
where s.order# = main.order#
order by s.lastupdate desc
);
EDIT:
A select would look quite simimilar:
select m.*,
(select top (1) country
from secondtable s
where s.order# = main.order#
order by s.lastupdate desc
)
from main m
I don't have time to try it with sample data, but is that what you are looking for?
select order orde, cntry
from table
where last_update =
(select max(last_update) from table where order = orde)

Vertica SQL for running count distinct and running conditional count

I'm trying to build a department level score table based on a deeper product url level score table.
Date is not consecutive
Not all urls got score updates at same day (independent to each other)
dist_url should be running count distinct (cumulative count distinct)
dist urls and urls score >=30 are both count distinct
What I have now is:
Date url Store Dept Page Score
10/1 a US A X 10
10/1 b US A X 30
10/1 c US A X 60
10/4 a US A X 20
10/4 d US A X 60
10/6 b US A X 22
10/9 a US A X 40
10/9 e US A X 10
Date Store Dept Page dist urls urls score >=30
10/1 US A X 3 2
10/4 US A X 4 3
10/6 US A X 4 2
10/9 US A X 5 2
I think the dist_url can be done by using window function, just not sure on query.
Current query is as below, but it's wrong since not cumulative count distinct:
SELECT
bm.AnalysisDate,
su.SoID AS Store,
su.DptCaID AS DTID,
su.PageTypeID AS PTID,
COUNT(DISTINCT bm.SeoURLID) AS NumURLsWithDupScore,
SUM(CASE WHEN bm.DuplicationScore > 30 THEN 1 ELSE 0 END) AS Over30Count
FROM csn_seo.tblBotifyMetrics bm
INNER JOIN csn_seo.tblSEOURLs su
ON bm.SeoURLID = su.ID
WHERE su.DptCaID IS NOT NULL
AND su.DptCaID <> 0
AND su.PageTypeID IS NOT NULL
AND su.PageTypeID <> -1
AND bm.iscompliant = 1
GROUP BY bm.AnalysisDate, su.SoID, su.DptCaID, su.PageTypeID;
Please let me know if anyone has any idea.
Based on your question, you seem to want two levels of logic:
select date, store, dept,
sum(sum(start)) over (partition by dept, page order by date) as distinct_urls,
sum(sum(start_30)) over (partition by dept, page order by date) as distinct_urls_30
from ((select store, dept, page, url, min(date) as date, 1 as start, 0 as start_30
from t
group by store, dept, page, url
) union all
(select store, dept, page, url, min(date) as date, 0, 1
from t
where score >= 30
group by store, dept, page, url
)
) t
group by date, store, dept, page;
I don't understand how your query is related to your question.
Try as I might, I don't get your output either:
But I think you can avoid UNION SELECTs - Does this do what you expect?
NULLS don't figure in COUNT DISTINCTs - and here you can combine an aggregate expression with an OLAP one ...
And Vertica has named windows to increase readability ....
WITH
input(Date,url,Store,Dept,Page,Score) AS (
SELECT DATE '2019-10-01','a','US','A','X',10
UNION ALL SELECT DATE '2019-10-01','b','US','A','X',30
UNION ALL SELECT DATE '2019-10-01','c','US','A','X',60
UNION ALL SELECT DATE '2019-10-04','a','US','A','X',20
UNION ALL SELECT DATE '2019-10-04','d','US','A','X',60
UNION ALL SELECT DATE '2019-10-06','b','US','A','X',22
UNION ALL SELECT DATE '2019-10-09','a','US','A','X',40
UNION ALL SELECT DATE '2019-10-09','e','US','A','X',10
)
SELECT
date
, store
, dept
, page
, SUM(COUNT(DISTINCT url) ) OVER(w) AS dist_urls
, SUM(COUNT(DISTINCT CASE WHEN score >=30 THEN url END)) OVER(w) AS dist_urls_gt_30
FROM input
GROUP BY
date
, store
, dept
, page
WINDOW w AS (PARTITION BY store,dept,page ORDER BY date)
;
-- out date | store | dept | page | dist_urls | dist_urls_gt_30
-- out ------------+-------+------+------+-----------+-----------------
-- out 2019-10-01 | US | A | X | 3 | 2
-- out 2019-10-04 | US | A | X | 5 | 3
-- out 2019-10-06 | US | A | X | 6 | 3
-- out 2019-10-09 | US | A | X | 8 | 4
-- out (4 rows)
-- out
-- out Time: First fetch (4 rows): 45.321 ms. All rows formatted: 45.364 ms

SQL find and group consecutive number in rows without duplicate

So I have a table like this:
Taxi Client Time
Tom A 1
Tom A 2
Tom B 3
Tom A 4
Tom A 5
Tom A 6
Tom B 7
Tom B 8
Bob A 1
Bob A 2
Bob A 3
and the expected result will be like this:
Tom 3
Bob 1
I have used the partition function to count the consecutive value but the result become this:
Tom A 2
Tom A 3
Tom B 2
Bob A 2
Please help, I am not good in English, thanks!
This is a variation of a gaps-and-islands problem. You can solve it using window functions:
select taxi, count(*)
from (select t.taxi, t.client, count(*) as num_times
from (select t.*,
row_number() over (partition by taxi order by time) as seqnum,
row_number() over (partition by taxi, client order by time) as seqnum_c
from t
) t
group by t.taxi, t.client, (seqnum - seqnum_c)
having count(*) >= 2
)
group by taxi;
use distinct count
select taxi ,count( distinct cient)
from table_name
group by taxi
It seems your expected output is wrong
I don't see where you get the number 3 from. If you're trying to do what your question says and group by client in consecutive order only and then get the number of different groups, I can help you out with the following query. Bob has 1 group and Tom has 4.
Partition by taxi, ORDER BY taxi, time and check if this client matches the previous client for this taxi. If yes, do not count this row. If no, count this row, this is a new group.
SELECT FEE.taxi,
SUM(FEE.clientNotSameAsPreviousInSequence)
FROM
(
SELECT taxi,
CASE
WHEN PreviousClient IS NULL THEN
1
WHEN PreviousClient <> client THEN
1
ELSE
0
END AS clientNotSameAsPreviousInSequence
FROM
(
SELECT *,
LAG(client) OVER (PARTITION BY taxi ORDER BY taxi, time) AS PreviousClient
FROM table
) taxisWithPreviousClient
) FEE
GROUP BY FEE.taxi;

Get values and like values in SQL

I have a SQL table that contains the following
ID AccountNumber Name
1 12345 Tony
2 123456 Mike
3 123458 Mike
4 45689 Tom
5 666999 Tim
6 6669997 Lisa
7 44455 Tim
8 78901 Matt
9 789011 Roger
What I need to do is show me all records where the Account Number begin with the same value (indeterminate number). For example. In this table, I'd want to select and display the following:
12345
123456
123458
666999
6669997
78901
789011
As you can see, it shows the each row where the AccountNumber matches or has the same beginning number. I haven't been able to find the proper query and would love any help.
Thanks!
The cases that you mention satisfy that the longer starts with the shorter. Here is a query that will get the shortest match for each account number:
select AccountNumber
from (select a.*, count(*) over (partition by ShortestAN) as numAN
from (select a.*,
(select top 1 a2.AccountNumber
from accounts a2
where a.AccountNumber like a2.AccountNumber + '%'
order by length(a2.AccountNumber) asc
) as ShortestAN
from accounts a
) a
) a
where numAN > 1
order by ShortestAN, AccountNumber;
The subquery finds the shortest account number that matches. The rest is just returning the ones where there is more than one match.
select a1.ID, a1.AccountNumber, a1.Name,
a2.ID, a2.AccountNumber, a2.Name
from Accounts a1
join Accounts a2 on LEN(a1.name) <= LEN(a2.name) and SUBSTRING(a2.name, 1, LEN(a1.name)) = a1.name
where /*are not same rows*/ a1.ID <> a2.ID
Would not an order by work if it were ordering as String?
SELECT AccountNumber, Id, Name
FROM Accounts
ORDER BY CAST(AccountNumber AS NVARCHAR(50))