How to display the oldest date for a unique user who has multiple dates in a database? - sql

Let's say that my output looks like this (simplified example):
UserName
ProfileCreation
PurchasePrice
PurchaseDate
Alice
Dec 21 2019 6:00AM
120.00
Dec 21 2019 8:00AM
Alice
Dec 21 2019 6:00AM
90.00
Dec 25 2019 9:00AM
Alice
Dec 21 2019 6:00AM
150.00
Jan 02 2020 10:00AM
Bob
Jan 01 2020 9:00PM
50.00
Jan 03 2020 11:00PM
Bob
Jan 01 2020 9:00PM
70.00
Jan 07 2020 11:00PM
The code for this output would look like this, I guess (not that important):
SELECT
UserName, ProfileCreation, PurchasePrice, PurchaseDate
FROM Some_Random_Database
But my desired output should look like this:
UserName
ProfileCreation
PurchasePrice
FirstPurchaseDate
NumberOfPurchases
AvgOfPurchasePrice
Alice
Dec 21 2019
120.00
Dec 21 2019
3
120.00
Bob
Jan 01 2020
50.00
Jan 03 2020
2
60.00
Hopefully, it's understandable what my goal is - to have unique user with date of his/her oldest purchase and with some calculated metrics for all purchases. Price of the first purchase can stay, but it is not necessary.
I'm writing in SOQL dialect - Salesforce Marketing Cloud.
Obviously, I've got some ideas how to do some of the intended tweaks in my code, but I'd like to see a solution from any expert who is willing to show me the best way possible. I'm really just a noob :-)
I appreciate any help, guys!

Note: i know nothing about Salesforce Marketing Cloud, but...
There's few ways to achieve that:
#1 - standard sql
SELECT UserName, ProfileCreation
, MIN(PurchaseDate) FirstPurchaseDate
, COUNT(PurchasePrice) NoOfPurchases
, AVG(PurchasePrice) AvgPurchasePrice
FROM Foo
GROUP BY UserName, ProfileCreation;
#2 - window functions
SELECT DISTINCT UserName, ProfileCreation
, MIN(PurchaseDate) OVER(PARTITION BY UserName ORDER BY UserName) FirstPurchaseDate
, COUNT(PurchasePrice) OVER(PARTITION BY UserName ORDER BY UserName) NoOfPurchases
, AVG(PurchasePrice) OVER(PARTITION BY UserName ORDER BY UserName) AvgPurchasePrice
FROM Foo;

SELECT
UserName, ProfileCreation, PurchasePrice, PurchaseDate
FROM
Some_Random_Database
WHERE
(UserName, PurchaseDate) IN
(SELECT UserName, max(PurchaseDate) FROM Some_Random_Database GROUP BY UserName);

Related

How to get most recent balance for every user and its corresponding dates

I have a table called balances. I want to get the most recent balance for each user, forever every financial year and its corresponding date it was updated.
name
balance
financial_year
date_updated
Bob
20
2021
2021-04-03
Bob
58
2019
2019-11-13
Bob
43
2019
2022-01-24
Bob
-4
2019
2019-12-04
James
92
2021
2021-09-11
James
86
2021
2021-08-18
James
33
2019
2019-03-24
James
46
2019
2019-02-12
James
59
2019
2019-08-12
So my desired output would be:
name
balance
financial_year
date_updated
Bob
20
2021
2021-04-03
Bob
43
2019
2022-01-24
James
92
2021
2021-09-11
James
59
2019
2019-08-12
I've attempted this but found that using max() sometimes does not work since I use it across multiple columns
SELECT name, max(balance), financial_year, max(date_updated)
FROM balances
group by name, financial_year
select NAME
,BALANCE
,FINANCIAL_YEAR
,DATE_UPDATED
from (
select t.*
,row_number() over(partition by name, financial_year order by date_updated desc) as rn
from t
) t
where rn = 1
NAME
BALANCE
FINANCIAL_YEAR
DATE_UPDATED
Bob
43
2019
24-JAN-22
Bob
20
2021
03-APR-21
James
59
2019
12-AUG-19
James
92
2021
11-SEP-21
Fiddle
The problem is not that you use max() across multiple columns but the fact, that max() returns the maximum value. In your example, the highest balance of Bob in financial year 2019 was 58. The 'highest' (last) date_updated was 2022-01-24, but at this time the balance was 43.
What you're looking for is the balance at the time the balance was updated last within a financial year per user, that is something like
SELECT b.name, b.financial_year, b.balance, b.date_updated
FROM balances b
INNER JOIN (SELECT name, financial_year, max(date_updated) last_updated
FROM balances GROUP BY name, financial_year) u
ON b.name = u.name AND b.financial_year = u.financial_year AND b.date_updated = u.last_updated;

SQL: The second oldest date

Imagine you've got a table similar to this:
|email | purchase_date |
|:--------------|:---------------------|
|stan#gmail.com | Jun 30 2020 12:00AM |
|stan#gmail.com | Aug 05 2020 5:00PM |
|stan#gmail.com | Mar 22 2018 3:00AM |
|eric#yahoo.com | Aug 05 2020 5:00PM |
|eric#yahoo.com | Mar 22 2018 3:00PM |
|kyle#gmail.com | Mar 22 2018 3:00PM |
|kyle#gmail.com | Jun 30 2020 12:00AM |
|kyle#gmail.com | Aug 05 2020 5:00PM |
|kenny#gmail.com| Aug 05 2020 5:00PM |
Totally random. The actual database I work with is actually more complex with much more columns.
Both the columns are STRING type. Which is not convenient. The purchase date should be DATE type. Kenny made only one purchase, so there shouldn't be any row for him in the result table.
Also notice that a there's a lot of identical dates.
I'd like to select the email and the 2nd oldest purchase date (named as 'second_purchase') for each email address, so that the result looks like this:
|email | second_purchase |
|:--------------|:-------------------- |
|stan#gmail.com | Jun 30 2020 12:00AM |
|eric#yahoo.com | Aug 05 2021 5:00PM |
|kyle#gmail.com | Jun 30 2020 12:00AM |
I can't seem to get the logic or syntax right. I don't want to put all my codes in here, because I've tried many variations of my idea...
It didn't seem to work somehow. But I'd love to see an example code from someone skilled in SQL. My idea is maybe not that great..:-)
This version is actually SOQL (Salesforce Object Query Language). That could be important.
Sorry for not styling the table properly, I didn't seem to work either, even when I used the recommended styling. I wasn't able to post. That was actually quite frustrating.
Anyway, thank you for any help!
You could try the following sql which uses a dense_rank over each user's email and orders by a casted purchase_date
Query #1
WITH date_converted_table AS (
SELECT
email,
purchase_date,
DENSE_RANK() OVER (
PARTITION BY email
ORDER BY CAST(purchase_date as timestamp) ASC
) dr
FROM
mytable
)
SELECT
email,
purchase_date as second_purchase
FROM
date_converted_table
WHERE dr=2;
email
second_purchase
eric#yahoo.com
Aug 05 2020 5:00PM
kyle#gmail.com
Jun 30 2020 12:00AM
stan#gmail.com
Jun 30 2020 12:00AM
Query #2
SELECT
email,
purchase_date as second_purchase
FROM (
SELECT
email,
purchase_date,
DENSE_RANK() OVER (
PARTITION BY email
ORDER BY CAST(purchase_date as timestamp) ASC
) dr
FROM
mytable
) tb
WHERE dr=2;
email
second_purchase
eric#yahoo.com
Aug 05 2020 5:00PM
kyle#gmail.com
Jun 30 2020 12:00AM
stan#gmail.com
Jun 30 2020 12:00AM
View on DB Fiddle
Update 1
As it pertains to follow up question in comment:
Is it possible to upgrade the result so that there are first_purchase
dates (where dr=1) adn second_purchase dates (where dr=2) in separate
columns?
A case expression and aggregation may assist you as shown below. The having clause ensures that there is both a first and second purchase date.
SELECT
email,
MAX(CASE
WHEN dr=1 THEN purchase_date
END) as first_purchase,
MAX(CASE
WHEN dr=2 THEN purchase_date
END) as second_purchase
FROM (
SELECT
email,
purchase_date,
DENSE_RANK() OVER (
PARTITION BY email
ORDER BY CAST(purchase_date as timestamp) ASC
) dr
FROM
mytable
) tb
GROUP BY email
HAVING
SUM(
CASE WHEN dr=1 THEN 1 ELSE 0 END
) > 0 AND
SUM(
CASE WHEN dr=2 THEN 1 ELSE 0 END
) > 0;
email
first_purchase
second_purchase
eric#yahoo.com
Mar 22 2018 3:00PM
Aug 05 2020 5:00PM
kyle#gmail.com
Mar 22 2018 3:00PM
Jun 30 2020 12:00AM
stan#gmail.com
Mar 22 2018 3:00AM
Jun 30 2020 12:00AM
View on DB Fiddle
Let me know if this works for you.

SQL find rows in groups where a column has a null and a non-null value

The Data
row ID YEAR PROD STA DATE
01 01 2011 APPLE NEW 2011-11-18 00:00:00.000
02 01 2011 APPLE NEW 2011-11-18 00:00:00.000
03 01 2013 APPLE OLD NULL
04 01 2013 APPLE OLD NULL
05 02 2013 APPLE OLD 2014-04-08 00:00:00.000
06 02 2013 APPLE OLD 2014-04-08 00:00:00.000
07 02 2013 APPLE OLD 2014-11-17 10:50:14.113
08 02 2013 APPLE OLD 2014-11-17 10:46:04.947
09 02 2013 MELON OLD 2014-11-17 11:01:19.657
10 02 2013 MELON OLD 2014-11-17 11:19:35.547
11 02 2013 MELON OLD NULL
12 02 2013 MELON OLD 2014-11-21 10:32:36.017
13 03 2006 APPLE NEW 2007-04-11 00:00:00.000
14 03 2006 APPLE NEW 2007-04-11 00:00:00.000
15 04 2004 APPLE OTH 2004-09-27 00:00:00.000
16 04 2004 APPLE OTH NULL
ROW is not a column in the table. Is just to show which records i want.
The question
I need to find rows where a group consisting of (ID, YEAR, PROD, STA) has at least one NULL DATE and a non-NULL DATE.
Expected result
From the above dataset this would be rows 9 to 12 and 15 to 16
Im sitting in front od SSMS and have no idea how to get this. Thinking about group by and exists but really no idea.
You can use COUNT ... OVER:
SELECT ID, YEAR, PROD, STA, [DATE]
FROM (
SELECT ID, YEAR, PROD, STA, [DATE],
COUNT(IIF([DATE] IS NULL, 1, NULL)) OVER
(PARTITION BY ID, YEAR, PROD, STA) AS cnt_nulls,
COUNT(IIF([DATE] IS NOT NULL, 1, NULL)) OVER
(PARTITION BY ID, YEAR, PROD, STA) AS cnt_not_nulls
FROM mytable) AS t
WHERE t.cnt_nulls > 0 AND t.cnt_not_nulls > 0
The window version of COUNT is applied twice over ID, YEAR, PROD, STA partitions of data: it returns for every row the population of the current partition. The count is conditionally performed:
the first COUNT counts the number of NULL [Date] values within the partition
the second COUNT counts the number of NOT NULL [Date] values within the partition.
The outer query checks for partitions having a count of at least one for both of the two COUNT functions of the inner query.

Select rows where count of grouped data =2

I have to get the users, which appear at least two weeks before and after the given date. So lets say I have data:
userName date week
user1 27 10 2011 44
user1 27 10 2011 44
user1 27 10 2011 44
user2 21 04 2011 17
user2 29 04 2011 17
user2 02 05 2011 19
user2 03 05 2011 19
user2 16 05 2011 21
user2 23 05 2011 22
user3 06 01 2011 24
user3 14 05 2011 25
user3 20 05 2011 26
user3 27 05 2011 27
and I need to get the results first grouped by user and week, then I need to count how many weeks the user appears before (lets say week 20) and after, and then select only ones who appears at least 2 weeks before and after, so in my case I would get the result
user2
Unfortunately I cannot create viewTable because of the database restrictions. this query is giving me only the first part of the results, data grouped by user and week, but I have no idea how to count grouped data:
SELECT username,
min(a.actionDate) as date,
datepart(wk,a.actionDate) as week
FROM Table1 a
GROUP BY username ,
datepart(wk,amd.actionDate)
thanks for any help.
To return users which have date records at least two weeks before and two weeks after a specified date, try:
select username
from Table1
group by username
having datediff(wk, min(actiondate), #date) >= 2 and
datediff(wk, #date, max(actiondate)) >= 2

How to calculate Rank SQL query

HI, I have the following table which save agent ranking on daily basis on basis of tickets status.
No. **Agent Name** **Incidents** **workorder** **Rank** **TimeStamp**
1 cedric 200 29 1 21 Jan 2011
2 poul 100 10 2 21 Jan 2011
3 dan 200 20 1 21 Jan 2011
4 cedric 100 19 2 22 Jan 2011
5 poul 200 26 1 22 Jan 2011
6 dan 150 20 2 22 Jan 2011
Now i need query which fetch ranking between two dates means if i select date between 21 jan 2011 to 22 jan 2011 then query return me agents average ranking between these two dates of agent not return the agent ranking details on date wise. I need single name of agent with his ranking.
Regards,
Iftikhar hashmi
Try
SELECT [Agent Name], AVG(RANK) FROM MY_TABLE WHERE [TimeStamp] BETWEEN DATE1 AND DATE2
GROUP BY [Agent Name]
(Update)
Thanks to Martin which reminded me I need to cast RANK.
SELECT [Agent Name], AVG(CAST(RANK AS FLOAT)) FROM MY_TABLE WHERE [TimeStamp] BETWEEN DATE1 AND DATE2
GROUP BY [Agent Name]