Using MAX(Time) and MAX(Date) to get only the latest per group - sql

I want to get the latest highscore, per user, per game. My current query isn't working.
I have a SQL DB like the following:
player(string) game(string) score(int) Date(Date) time(Time)
jake soccer 20 2016/02/26 10:00:00
jake chess 50 2016/02/26 10:00:00
jake soccer 40 2016/02/26 13:00:00
jake chess 30 2016/02/26 13:00:00
jake soccer 20 2016/02/26 15:00:00
jake chess 60 2016/02/26 15:00:00
jake soccer 80 2016/02/26 18:00:00
jake chess 10 2016/02/26 18:00:00
mike chess 30 2016/02/26 13:00:00
mike soccer 20 2016/02/26 15:00:00
mike chess 60 2016/02/26 15:00:00
mike soccer 80 2016/02/26 18:00:00
mike chess 10 2016/02/26 18:00:00
What I want to get out of it is:
jake soccer 80 2016/02/26 18:00:00
jake chess 10 2016/02/26 18:00:00
mike soccer 80 2016/02/26 18:00:00
mike chess 10 2016/02/26 18:00:00
I found out the Time column also has the date, so this should work.
This is my current Query:
SELECT t1.*
FROM db t1
INNER JOIN (
SELECT player, MAX(time) TS
FROM db
GROUP BY player
) t2 ON t2.player = t1.player and t2.TS = t1.time
ORDER BY score DESC";
EDIT: I'm getting lots of wrong rows. Basically. I'm getting them sorted by time, but not the date
I now need to sort them not only by MAX(Time) but MAX(Date) as well. Or merge Date and Time in a new var

To get the latest highscore, per user, per game, try this:
;WITH cte as (
SELECT player, game, MAX(convert(datetime,cast([date] as nvarchar(10)) + ' '+ cast([time] as nvarchar(10)))) TS
FROM db
GROUP BY player, game)
SELECT db.*
FROM cte
LEFT JOIN db ON cte.player = db.player and cte.game = db.game and cte.TS = convert(datetime,cast(db.[date] as nvarchar(10)) + ' '+ cast(db.[time] as nvarchar(10)))
ORDER BY highscore DESC

Try using ROW_NUMBER()
SELECT
t1.*
FROM (
SELECT
*
, ROW_NUMBER() OVER (PARTITION BY player ORDER BY [time] DESC) AS rn
FROM db
) AS t1
WHERE rn = 1
;

Since you want not just the most recent game but also its score, this question can use the same patterns discussed here: Select first row in each GROUP BY group?
Personally I think picking the most recent time and then using that to match rows in the outer query is a little scary, since someone just might have two games at the same instant. Also it won't give you the best performance. Depending on your RDBMS, the linked question might have better approaches.

Related

Join sum to closest timestamp once up to interval cap

I am trying to join a site_interactions table with a store_transactions table. For this, I want that the store_transactions.sales_amount for a given username gets attached to the closest site_interactions.timestamp match, at most one time and up to 7 days of the site_interactions.timestamp variable.
site_interaction table:
username timestamp
John 01.01.2020 15:00:00
John 02.01.2020 11:30:00
Sarah 03.01.2020 12:00:00
store_transactions table:
username timestamp sales_amount
John 02.01.2020 16:00:00 45
John 03.01.2020 16:00:00 70
John 09.01.2020 16:00:00 15
Sarah 02.01.2020 09:00:00 35
Tim 02.01.2020 10:00:00 60
Desired output:
username timestamp sales_amount
John 01.01.2020 15:00:00 NULL
John 02.01.2020 11:30:00 115
Sarah 03.01.2020 12:00:00 NULL
Explanation:
John has 3 entries/transactions in the store_transactions table. The first and the second purchase were realized within the 7 days interval/limit, and the sum of these two transactions (45 + 70 = 115) were attached/joined to the closest and nearest match only once - i.e. to John's second interaction (timestamp = 02.01.2020 11:30:00). John's third transactions was not attached to any site interaction, because it exceeds the 7 days interval (including the time).
Sarah has one transaction realized before her interaction with the site. Thus her sales_amount of 35 was not attached to the site_interaction table.
Last, Tim's transaction was not attached anywhere - because this username does not show in the site_interaction table.
Here a link of the tables: https://rextester.com/RKSUK73038
Thanks in advance!
Below is for BigQuery Standard SQL
#standardSQL
select i.username, i.timestamp,
sum(sales_amount) as sales_amount
from (
select username, timestamp,
ifnull(lead(timestamp) over(partition by username order by timestamp), timestamp_add(timestamp, interval 7 day)) next_timestamp
from `project.dataset.site_interaction`
) i
left join `project.dataset.store_transactions` t
on i.username = t.username
and t.timestamp >= i.timestamp
and t.timestamp < least(next_timestamp, timestamp_add(i.timestamp, interval 7 day))
group by username, timestamp
if to apply to sample data from your question - output is

SQL - group on occurence in x or y

I'm having a hard time making the following to work:
I have a list of transactions consisting of Sender,Recipient, Amount and Date.
Table: Transactions
Sender Recipient Amount Date
--------------------------------------------------
Jack Bob 52 2019-04-21 11:06:32
Bob Jack 12 2019-03-29 12:08:11
Bob Jill 50 2019-04-19 24:50:26
Jill Bob 90 2019-03-20 16:34:35
Jill Jack 81 2019-03-25 12:26:54
Bob Jenny 53 2019-04-20 09:07:02
Jack Jenny 5 2019-03-29 06:15:35
Now I want to list the people who have participated in transactions, how many transactions they have participated in and the dates of the first and last transaction they participated in :
Result
Person NUM_TX First_active last_active
------------------------------------------------------------------
Jack 4 2019-03-25 12:26:54 2019-04-21 11:06:32
Bob 5 xxxx-xx-xx xx:xx:xx xxxx-xx-xx xx:xx:xx
Jill 3 xxxx-xx-xx xx:xx:xx xxxx-xx-xx xx:xx:xx
Jenny 2 xxxx-xx-xx xx:xx:xx xxxx-xx-xx xx:xx:xx
Using a group by statement seems not right - what is the right way to achieve my goal? I'm running on a postgres btw.
You need a UNION to get the 2 columns as 1 column person of a resultset and then group by person:
select
t.person Person,
count(*) NUM_TX,
min(t.date) First_active,
max(t.date) Last_active
from (
select sender person, date from transactions
union all
select recipient person, date from transactions
) t
group by t.person
This is a good place to use a lateral join:
select v.person, count(*) as num_transactions,
min(t.date) as first_date,
max(t.date) as last_date
from transactions t cross join lateral
(values (sender), (recipient)) v(person)
group by v.person;

How to have the rolling distinct count of each day for past three days in Oracle SQL?

I searched for this a lot, but I couldn't find the solution yet. let me explain my question by sample data and my desired output.
sample data:
datetime customer
---------- --------
2018-10-21 09:00 Ryan
2018-10-21 10:00 Sarah
2018-10-21 20:00 Sarah
2018-10-22 09:00 Peter
2018-10-22 10:00 Andy
2018-10-23 09:00 Sarah
2018-10-23 10:00 Peter
2018-10-24 10:00 Andy
2018-10-24 20:00 Andy
my desired output is to have the distinctive number of customers for past three days relative to each day:
trunc(datetime) progressive count distinct customer
--------------- -----------------------------------
2018-10-21 2
2018-10-22 4
2018-10-23 4
2018-10-24 3
explanation: for 21th, because we have only Ryan and Sarah the count is 2 (also because we have no other records before 21th); for 22th Andy and Peter are added to the distinct list, so it's 4. for 23th, no new customer is added so it would be 4. for 24th, however, as we only should consider past 3 days (as per business logic), we should only take 24th,23th and 22th; so the distinct customers would be Sarah, Andy and Peter. so the count is 3.
I believe it is called the progressive count, or moving count or rolling up count. but I couldn't implement it in Oracle 11g SQL. Obviously it's easy by using PL-SQL programming (Stored-Procedure/Function). but, preferably I wonder if we can have it by a single SQL query.
What you seem to want is:
select date,
count(distinct customer) over (order by date rows between 2 preceding and current row)
from (select distinct trunc(datetime) as date, customer
from t
) t
group by date;
However, Oracle does not support window frames with count(distinct).
One rather brute force approach is a correlated subquery:
select date,
(select count(distinct t2.customer)
from t t2
where t2.datetime >= t.date - 2
) as running_3
from (select distinct trunc(datetime) as date
from t
) t;
This should have reasonable performance for a small number of dates. As the number of dates increases, the performance will degrade linearly.

Advanced Sql query solution required

player team start_date end_date points
John Jacob SportsBallers 2015-01-01 2015-03-31 100
John Jacob SportsKings 2015-04-01 2015-12-01 115
Joe Smith PointScorers 2014-01-01 2016-12-31 125
Bill Johnson SportsKings 2015-01-01 2015-06-31 175
Bill Johnson AllStarTeam 2015-07-01 2016-12-31 200
The above table has many more rows. I was asked the below questions in an interview.
1.)For each player, which team were they play for on 2015-01-01?
I could not answer this one.
2.)For each player, how can we get the team for whom they scored the most points?
select team from Players
where points in (select max(points) from players group by player).
Please, solutions for both.
1
select *
from PlayerTeams
where startdate <='2015-01-01' and enddate >= '2015-01-01'
2
Select player, team, points
from(
Select *, row_number() over (partition by player order by points desc) as rank
From PlayerTeams) as player
where rank = 1
For #1:
Select Player
,Team
From table
Where '2015-01-01' between start_date and end_date
For #2:
select t.Player
,t.Team
from table t
inner join (select Player
,Max(points)
from table
group by Player) m
on t.Player = m.Player
and t.points = m.points

Get MAX count but keep the repeated calculated value if highest

I have the following table, I am using SQL Server 2008
BayNo FixDateTime FixType
1 04/05/2015 16:15:00 tyre change
1 12/05/2015 00:15:00 oil change
1 12/05/2015 08:15:00 engine tuning
1 04/05/2016 08:11:00 car tuning
2 13/05/2015 19:30:00 puncture
2 14/05/2015 08:00:00 light repair
2 15/05/2015 10:30:00 super op
2 20/05/2015 12:30:00 wiper change
2 12/05/2016 09:30:00 denting
2 12/05/2016 10:30:00 wiper repair
2 12/06/2016 10:30:00 exhaust repair
4 12/05/2016 05:30:00 stereo unlock
4 17/05/2016 15:05:00 door handle repair
on any given day need do find the highest number of fixes made on a given bay number, and if that calculated number is repeated then it should also appear in the resultset
so would like to see the result set as follows
BayNo FixDateTime noOfFixes
1 12/05/2015 00:15:00 2
2 12/05/2016 09:30:00 2
4 12/05/2016 05:30:00 1
4 17/05/2016 15:05:00 1
I manage to get the counts of each but struggling to get the max and keep the highest calculated repeated value. can someone help please
Use window functions.
Get the count for each day by bayno and also find the min fixdatetime for each day per bayno.
Then use dense_rank to compute the highest ranked row for each bayno based on the number of fixes.
Finally get the highest ranked rows.
select distinct bayno,minfixdatetime,no_of_fixes
from (
select bayno,minfixdatetime,no_of_fixes
,dense_rank() over(partition by bayno order by no_of_fixes desc) rnk
from (
select t.*,
count(*) over(partition by bayno,cast(fixdatetime as date)) no_of_fixes,
min(fixdatetime) over(partition by bayno,cast(fixdatetime as date)) minfixdatetime
from tablename t
) x
) y
where rnk = 1
Sample Demo
You are looking for rank() or dense_rank(). I would right the query like this:
select bayno, thedate, numFixes
from (select bayno, cast(fixdatetime) as date) as thedate,
count(*) as numFixes,
rank() over (partition by cast(fixdatetime as date) order by count(*) desc) as seqnum
from t
group by bayno, cast(fixdatetime as date)
) b
where seqnum = 1;
Note that this returns the date in question. The date does not have a time component.