ORDER BY date but also GROUP BY userid - sql

I have a table of records I want to sort by earliest date first then by userid.
If the user associated to the date also has other records in that table I want to group those under the earliest date.
Desired output
Id UserId Date
1 2 1/1/2020
2 2 2/1/2020
3 2 3/1/2020
4 1 1/2/2020
5 1 2/2/2020
6 3 1/4/2020
7 4 1/5/2020
In this example UserId 2 has the earliest record in that table, so that record should be first followed by his additional records in date asc order

You seems want :
select t.*
from table t
order by min(date) over (partition by userid), date;
Some database product doesn't support window function with order by, so you can do instead :
select t.*, min(date) over (partition by userid) as mndate
from table t
order by mndate, date;

If I understand what you want...
You could do this (sample with DB2 syntax):
SELECT tab.UserId, tab.Date, tab.*
FROM DB2SIS.TABLE_NAME tab
ORDER BY tab.Date ASC, tab.UserId ASC
This way UserId and Date will appear repeatedly. Instead of 'tab.*' use each field you want to show, then UserId and Date will not repeat.

Related

redshift cumulative count records via SQL

I've been struggling to find an answer for this question. I think this question is similar to what i'm looking for but when i tried this it didn't work.
Because there's no new unique user_id added between 02-20 and 02-27, the cumulative count will be the same. Then for 02-27, there is a unique user_id which hasn't appeared on any previous dates (6)
Here's my input
date user_id
2020-02-20 1
2020-02-20 2
2020-02-20 3
2020-02-20 4
2020-02-20 4
2020-02-20 5
2020-02-21 1
2020-02-22 2
2020-02-23 3
2020-02-24 4
2020-02-25 4
2020-02-27 6
Output table:
date daily_cumulative_count
2020-02-20 5
2020-02-21 5
2020-02-22 5
2020-02-23 5
2020-02-24 5
2020-02-25 5
2020-02-27 6
This is what i tried and the result is not quite what i want
select
stat_date,count(DISTINCT user_id),
sum(count(DISTINCT user_id)) over (order by stat_date rows unbounded preceding) as cumulative_signups
from data_engineer_interview
group by stat_date
order by stat_date
it returns this instead;
date,count,cumulative_sum
2022-02-20,5,5
2022-02-21,1,6
2022-02-22,1,7
2022-02-23,1,8
2022-02-24,1,9
2022-02-25,1,10
2022-02-27,1,11
The problem with this task is that it could be done by comparing each row uniquely with all previous rows to see if there is a match in user_id. Since you are using Redshift I'll assume that your data table could be very large so attacking the problem this way will bog down in some form of a loop join.
You want to think about the problem differently to avoid this looping issue. If you derive a dataset with id and first_date_of_id you can then just do a cumulative sum sorted by date. Like this
select user_id, min("date") as first_date,
count(user_id) over (order by first_date rows unbounded preceding) as date_out
from data_engineer_interview
group by user_id
order by date_out;
This is untested and won't produce the full list of dates that you have in your example output but rather only the dates where new ids show up. If this is an issue it is simple to add in the additional dates with no count change.
We can do this via a correlated subquery followed by aggregation:
WITH cte AS (
SELECT
date,
CASE WHEN EXISTS (
SELECT 1
FROM data_engineer_interview d2
WHERE d2.date < d1.date AND
d2.user_id = d1.user_id
) THEN 0 ELSE 1 END AS flag
FROM (SELECT DISTINCT date, user_id FROM data_engineer_interview) d1
)
SELECT date, SUM(flag) AS daily_cumulative_count
FROM cte
ORDER BY date;

rank function only returns 1 with date in redshift

I'm running the code below in redshift. I want to get a ranking of the order when a customer purchased a product based on the date. Each purchase has a unique ticketid, each customer has a unique customer_uuid, and each product has a unique product_id. The code below is returning 1 for all rankings and I'm not sure why. Is there an error in my code or is there a problem with ranking by a date field in redshift? Does anyone see how to modify this code to correct the issue.
code:
select customer_uuid,
product_id,
date,
ticketid
rank()
over(partition by customer_uuid,
product_id,
ticketid order by date asc) as rank
from table
order by customer_uuid, product_id
data:
customer_uuid product_id ticketid date
1 2 1 1/1/18
1 2 2 1/2/18
1 2 3 1/3/18
output:
customer_uuid product_id ticketid date rank
1 2 1 1/1/18 1
1 2 2 1/2/18 1
1 2 3 1/3/18 1
desired output:
customer_uuid product_id ticketid date rank
1 2 1 1/1/18 1
1 2 2 1/2/18 2
1 2 3 1/3/18 3
First, you have ticket_id in the partition by, which makes each row unique.
Second, you are using rank(). If you want an enumeration, do you want row_number()?
row_number() over(partition by customer_uuid, product_id order by date asc) as rank
I want to get a ranking of the order when a customer purchased a product based on the date. Each purchase has a unique ticketid, each customer has a unique customer_uuid, and each product has a unique product_id.
Basically you have unique (customer_uuid, product_id, ticket_id) tuples. If you use those as a partition, the rank will always be 1, since there is only one record per partition.
You just need to remove the ticket_id from the partition:
rank() over(
partition by customer_uuid, product_id
order by date
) as rank
Note: rank() will give an equal position to records that share the same (customer_uuid, product_id, date).

finding the number of days in between first 2 date point

So the question seems to be quite difficult I wonder if I could get some advice from here. I am trying to solve this with SQLite 3. So I have a data format of this.
customer | purchase date
1 | date 1
1 | date 2
1 | date 3
2 | date 4
2 | date 5
2 | date 6
2 | date 7
number of times the customer repeats is random.
so I just want to find whether customer 1's 1st and 2nd purchase date are fallen in between a specific time period. repeat for other customers. only need to consider 1st and 2nd dates.
Any help would be appreciated!
We can try using ROW_NUMBER here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY customer ORDER BY "purchase date") rn
FROM yourTable
)
SELECT
customer,
CAST(MAX(CASE WHEN rn = 2 THEN julianday("purchase date") END) -
MAX(CASE WHEN rn = 1 THEN julianday("purchase date") END) AS INTEGER) AS diff_in_days
FROM cte
GROUP BY
customer;
The idea here is to aggregate by customer and then take the date difference between the second and first purchase. ROW_NUMBER is used to find these first and second purchases, for each customer.

Group by in T-SQL for selecting different columns

I have following table ContactDetails. This table contains both cell phone as well as emails. These rows can be updated based on Users latest contact details. So userid (here 1) can have multiple rows grouped by Email and cell as below. There can be multiple users 2.3.4...so on
Rows are as below
SrNo Userid ContactType ContactDetail LoadDate
1 1 Email x1.y#gmail.com 2013-01-01
2 1 Cell 12345678 2013-01-01
3 1 Email x2.y#gmail.com 2012-01-01
4 1 Cell 98765432 2012-01-01
5 1 Email x2.y#gmail.com 2011-01-01
6 1 Cell 987654321 2011-01-01
I am looking for recent Email and Cell details of users. I tried running the query as below
Select
Userid,
Max(ContactDetail),
MAX(LoadDate)
from
ContactDetails
group by
Userid, ContactType;
But I understand that this won't work.
Can anyone give some suggestion to pull the latest email and cell in single or sub-queries?
Cheers!
Junni
You can use ROW_NUMBER() to select the most recent row of interest:
;With Ordered as (
select UserId,ContactType,ContactDetail,LoadDate,
ROW_NUMBER() OVER (
PARTITION BY UserID,ContactType
ORDER BY LoadDate DESC) as rn
from ContactDetails
)
select * from Ordered where rn = 1

Find out the Old Date from a date column in sql

How the find the oldest values from the datetime column?
I have table with datetime column (UpdateDate), and i need to find out the oldest data based on the UpdateDate .
Id UpdateDate Desc
-----------------------------------------
1 2010-06-15 00:00:00.000 aaaaa
2 2009-03-22 00:00:00.000 bbbbb
3 2008-01-12 00:00:00.000 ccccc
4 2008-02-12 00:00:00.000 ddddd
5 2009-04-03 00:00:00.000 eeeee
6 2010-06-12 00:00:00.000 fffff
I have Find out the old year dates from the current date using
Select UpdateDate from Table1 where DATEDIFF(YEAR,UpdateDate,getdate()) > 0 Query. But I need to find out the 2008th data only (Since the 2008 is the oldest one here)
I dont know what is there in the Table I need find out the Oldest date values.. How is it Possible?
Select UpdateDate from Table1 where DATEDIFF(YEAR,PartDateCol,getdate()) IN
(Select MAX(DATEDIFF(YEAR,PartDateCol,GETDATE())) DiffYear from Table1)
This will return two record of 2008. If your records has four 2006 date than it return all 2006 data if difference is large.
One way of doing this is
Select UpdateDate from Table1 where YEAR(UpdateDate )=2008
But, you can find out the oldest dates by ordering the data as such
Select * from Table1 order by UpdateDate ASC
You can use top and order by.
select top(1) UpdateDate
from Table1
order by UpdateDate
Update:
If you want all rows for the first year present you can use this instead.
select *
from (
select *,
rank() over(order by year(UpdateDate)) as rn
from Table1
) as T
where T.rn = 1
If you want the data that within 2008 year try this:
Select UpdateDate From Table1
Where Year(UpdateDate) =
(
Select Year(UpdateDate)
from Table1 Order By UpdateDate ASC Limit 1
) ;