Get latest date value based on month - sql

I have the following records. It is broken based on username, date and testscore.
Username date testscore
mike 2016-11-30 23:41:10.143 1
mike 2016-11-27 23:41:11.143 12
mike 2016-11-24 23:41:11.143 16
john 2016-11-28 23:41:11.143 7
john 2016-11-25 23:42:11.143 12
john 2016-11-25 23:42:11.143 7
mike 2016-10-30 23:41:10.143 1
mike 2016-10-27 23:41:11.143 5
mike 2016-10-24 23:41:11.143 16
john 2016-10-28 23:41:11.143 12
john 2016-10-25 23:42:11.143 8
john 2016-10-24 23:42:11.143 2
For each one of the users I like to get the latest test score (month wise) for the year broken down by month with their score. In other words, I like to get the last score per user per month for a given year.
so for the above it would be
username date testscore
mike 2016-11-30 23:41:10.143 1
john 2016-11-28 23:41:11.143 7
mike 2016-10-30 23:41:10.143 1
john 2016-10-28 23:41:11.143 12

Perhaps using the WITH TIES clause in concert with Row_Number()
Select top 1 with ties *
from YourTable
Order by Row_Number() over (partition by UserName,year(date),month(date) order by date desc)
Returns
Username date testscore
john 2016-10-28 23:41:11.143 12
john 2016-11-28 23:41:11.143 7
mike 2016-10-30 23:41:10.143 1
mike 2016-11-30 23:41:10.143 1

You can use ROW_NUMBER():
WITH CTE AS
(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY username, CONVERT(VARCHAR(6),[date],112)
ORDER BY [date] DESC)
FROM dbo.YourTable
)
SELECT *
FROM CTE
WHERE RN = 1;

What about this :
select top 1 from <mytable> group by date.year(),date.month(),username order by date;

It seems you need a primary key, but you can still get the job done.
Essentially, you want the row for each username that corresponds to the latest date for that user.
SELECT username, date, testscore
FROM MyTable m1
WHERE m1.date = (SELECT MAX(m2.DATE))
FROM MyTable m2
WHERE m2.username = m1.username)

Related

Join sum to closest timestamp once up to interval cap

I am trying to join a site_interactions table with a store_transactions table. For this, I want that the store_transactions.sales_amount for a given username gets attached to the closest site_interactions.timestamp match, at most one time and up to 7 days of the site_interactions.timestamp variable.
site_interaction table:
username timestamp
John 01.01.2020 15:00:00
John 02.01.2020 11:30:00
Sarah 03.01.2020 12:00:00
store_transactions table:
username timestamp sales_amount
John 02.01.2020 16:00:00 45
John 03.01.2020 16:00:00 70
John 09.01.2020 16:00:00 15
Sarah 02.01.2020 09:00:00 35
Tim 02.01.2020 10:00:00 60
Desired output:
username timestamp sales_amount
John 01.01.2020 15:00:00 NULL
John 02.01.2020 11:30:00 115
Sarah 03.01.2020 12:00:00 NULL
Explanation:
John has 3 entries/transactions in the store_transactions table. The first and the second purchase were realized within the 7 days interval/limit, and the sum of these two transactions (45 + 70 = 115) were attached/joined to the closest and nearest match only once - i.e. to John's second interaction (timestamp = 02.01.2020 11:30:00). John's third transactions was not attached to any site interaction, because it exceeds the 7 days interval (including the time).
Sarah has one transaction realized before her interaction with the site. Thus her sales_amount of 35 was not attached to the site_interaction table.
Last, Tim's transaction was not attached anywhere - because this username does not show in the site_interaction table.
Here a link of the tables: https://rextester.com/RKSUK73038
Thanks in advance!
Below is for BigQuery Standard SQL
#standardSQL
select i.username, i.timestamp,
sum(sales_amount) as sales_amount
from (
select username, timestamp,
ifnull(lead(timestamp) over(partition by username order by timestamp), timestamp_add(timestamp, interval 7 day)) next_timestamp
from `project.dataset.site_interaction`
) i
left join `project.dataset.store_transactions` t
on i.username = t.username
and t.timestamp >= i.timestamp
and t.timestamp < least(next_timestamp, timestamp_add(i.timestamp, interval 7 day))
group by username, timestamp
if to apply to sample data from your question - output is

SQL - group on occurence in x or y

I'm having a hard time making the following to work:
I have a list of transactions consisting of Sender,Recipient, Amount and Date.
Table: Transactions
Sender Recipient Amount Date
--------------------------------------------------
Jack Bob 52 2019-04-21 11:06:32
Bob Jack 12 2019-03-29 12:08:11
Bob Jill 50 2019-04-19 24:50:26
Jill Bob 90 2019-03-20 16:34:35
Jill Jack 81 2019-03-25 12:26:54
Bob Jenny 53 2019-04-20 09:07:02
Jack Jenny 5 2019-03-29 06:15:35
Now I want to list the people who have participated in transactions, how many transactions they have participated in and the dates of the first and last transaction they participated in :
Result
Person NUM_TX First_active last_active
------------------------------------------------------------------
Jack 4 2019-03-25 12:26:54 2019-04-21 11:06:32
Bob 5 xxxx-xx-xx xx:xx:xx xxxx-xx-xx xx:xx:xx
Jill 3 xxxx-xx-xx xx:xx:xx xxxx-xx-xx xx:xx:xx
Jenny 2 xxxx-xx-xx xx:xx:xx xxxx-xx-xx xx:xx:xx
Using a group by statement seems not right - what is the right way to achieve my goal? I'm running on a postgres btw.
You need a UNION to get the 2 columns as 1 column person of a resultset and then group by person:
select
t.person Person,
count(*) NUM_TX,
min(t.date) First_active,
max(t.date) Last_active
from (
select sender person, date from transactions
union all
select recipient person, date from transactions
) t
group by t.person
This is a good place to use a lateral join:
select v.person, count(*) as num_transactions,
min(t.date) as first_date,
max(t.date) as last_date
from transactions t cross join lateral
(values (sender), (recipient)) v(person)
group by v.person;

Change the result of RANK() based on conditions in other columns

Now I have a table in redshift like this:
Table Project_team
Employee_ID Employee_Name Start_date Ranking Is_leader Is_Parttime_Staff
Emp001 John 2014-04-01 1 No No
Emp002 Mary 2015-02-01 2 No Yes
Emp003 Terry 2015-02-15 3 Yes No
Emp004 Peter 2016-02-05 4 No No
Emp004 Morris 2016-05-01 5 No No
Initially there is no ranking for staff.
What I do is to use the rank() function like this:
RANK() over (partition by Employee_ID,Employee_Name order by Start_date) as page_seq
However, now I want to manipulate the ranking based on their status. If the employee is leader then he or she should be ranked at the first. If he or she is parttime staff then should be ranked at the last. The table should be sth like this:
Employee_ID Employee_Name Start_date Ranking Is_leader Is_Parttime_Staff
Emp003 Terry 2015-02-15 1 Yes No
Emp001 John 2014-04-01 2 No No
Emp004 Peter 2016-02-05 3 No No
Emp004 Morris 2016-05-01 4 No No
Emp002 Mary 2015-02-01 5 No Yes
I tried to use the case function to manipulate it like
Case when Is_leader = true then Ranking = 1 else RANK() over (partition by Employee_ID,Employee_Name order by Start_date) End as page_seq.
However it does not work.
What is the process that I need to change the ranking based on other conditions in other columns?
Many thanks!
use dense_rank()
demo
select *,dense_Rank() over(order by case when leader='yes' then 1 else 0 end desc, case when parmanent='yes' then 1 else 0 end)
from cte1
output:
id name leader parmanent employeerank
1 A yes no 1
3 C no no 2
2 B no yes 3

Display the latest modified record for each employee

emp table as like this
id Name Date Modified
1 Ram 2017-01-05
2 Kishore 2017-02-04
3 John 2017-04-22
1 Ram K 2017-04-25
1 Ram Kumar 2017-05-01
2 Kishore Babu 2017-05-05
3 John B 2017-06-01
Assuming you're using a reasonable rdbms that supports window functions, row_number should do the trick:
SELECT id, name, date_modified
FROM (SELECT id, name, date_modified,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date_modified DESC) rn
FROM emp) t
WHERE rn = 1

Aggregate Functions To Pull More Record Field Data

I would like to know what would be the best way to get the data from a specific row when I use a Group By query. The real query is more complex than the example I'm providing here so I'm looking for something other than a sub-select on the Sales table.
I'm using MSSQL 2008 and I would like something that allow me to get the date field from the Sales record that has the max(amount).
Query
select uid, firstName, lastName, AmountFromTagetedRow, DateFromTargetedRow,
from users u inner join
sales s on u.uid = s.custID
group by uid, firstName, lastName
order by uid
USERS
uid firstName lastName
1 Sam Smith
2 Joe Brown
3 Kim Young
SALES
sid Custid date amount ...
1 1 2016-01-02 100
2 3 2016-01-12 485
3 1 2016-01-22 152
4 2 2016-02-01 156
5 1 2016-02-02 12
6 1 2016-03-05 84
7 2 2016-03-10 68
RESULTS
uid firstName LastName amount date
1 Sam Smith 152 2016-01-22
2 Joe Brown 156 2016-02-01
3 Kim Young 485 2016-01-12
Your posted query doesn't match your amount but something like this should get you pointed in the right direction.
with SortedResults as
(
select uid
, firstName
, lastName
, AmountFromTagetedRow
, DateFromTargetedRow
, ROW_NUMBER() over (partition by u.uid order by AmountFromTagetedRow desc) as RowNum
from users u inner join
sales s on u.uid = s.custID
group by uid
, firstName
, lastName
)
select *
from SortedResults
where RowNum = 1
order by uid