Postgres DB query to get the count, and first and last ids by date in a single query - sql

I have the following db structure.
table
-----
id (uuids)
date (TIMESTAMP)
I want to write a query in postgres (actually cockroachdb which uses the postgres engine, so postgres query should be fine).
The query should return a count of records between 2 dates , id of the record with latest date and id of the record with latest earliest date within that range.
So the query should return the following:
count, id(of the earliest record in the range), id (of the latest record in the range)
thanks.

You can use row_number() twice, then conditional aggregation:
select
no_records,
min(id) filter(where rn_asc = 1) first_id
max(id) filter(where rn_desc = 1) last_id
from (
select
id,
count(*) over() no_records
row_number() over(order by date asc) rn_asc,
row_number() over(order by date desc) rn_desc
from mytable
where date >= ? and date < ?
) t
where 1 in (rn_asc, rn_desc)
The question marks represents the (inclusive) start and (exclusive) end of the date interval.
Of course, if ids are always increasing, simple aggregation is sufficient:
select count(*), min(id) first_id, max(id) last_id
from mytable
where date >= ? and date < ?

Unfortunately, Postgres doesn't support first_value() as an aggregation function. One method is to use arrays:
select count(*),
(array_agg(id order by date asc))[1] as first_id,
(array_agg(id order by date desc))[1] as last_id
from t
where date >= ? and date <= ?

Related

How to get a count of new values per day in Postgres

I have the following schema -
Date
UserID
"2021-07-29"
1
"2021-07-29"
2
"2021-07-30"
1
"2021-07-30"
4
"2021-08-01"
2
"2021-08-01"
2
It contains the dates of some event, along with the user who triggered that event.
I need to get a count of all the NEW users who triggered the event on every given day until today, ignoring users who have triggered the event in the past.
So after running the query, results would look like this
Date
Count
"2021-07-29"
2
"2021-07-30"
1
"2021-08-01"
0
Because on the 29th, user 1, and 2 - who I've never seen before triggered it.
On the 30th, user 4 - who I've never seen before triggered it.
On the first, I've seen user 2 before, so ignore him.
You can use a window function to get the first date for each user. Then use conditional aggregation:
select date, count(*) filter (where seqnum = 1) as num_new_users
from (select t.*,
row_number() over (partition by userid order by date) as seqnum
from t
) t
group by date;
Use the window function FIRST_VALUE in a subquery (or CTE) to get the first trigger for each user and in the outer query count if it's equal to the current date:
SELECT dt,count(*) FILTER (WHERE first_trigger = dt)
FROM (
SELECT *,FIRST_VALUE(dt) OVER w first_trigger FROM t
WINDOW w AS (PARTITION BY userid ORDER BY dt
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
ORDER BY dt)j
GROUP BY dt;
Demo: db<>fiddle
Use MIN() window function to get the min date for each user and then aggregate and count for each date only the min dates:
SELECT Date, SUM((Date = min_date)::int) Count
FROM (
SELECT *, MIN(date) OVER (PARTITION BY UserID) min_date
FROM tablename
) t
GROUP BY Date;
Or:
SELECT Date, COUNT(*) FILTER (WHERE Date = min_date) Count
FROM (
SELECT *, MIN(date) OVER (PARTITION BY UserID) min_date
FROM tablename
) t
GROUP BY Date;
See the demo.

How to return max date per month for user

I have following table:
And I would like to have returned maximum threshold date per each month for every user, so my final result should look like that:
I wanted to use analytic function ROW_NUMBER and return maximum number of row but how to do it per month for each user? Is there any simpler way to do it in BigQuery?
You can partition the row_number by the user and the month, and then take the first one for each:
SELECT user_id, threshold_date, net_deposists_usd
FROM (SELECT user_id, threshold_date, net_deposists_usd,
ROW_NUMBER () OVER (PARTITION BY user_id, EXTRACT (MONTH FROM threshold_date)
ORDER BY net_deposists_usd DESC) AS rk
FROM mytable)
WHERE rk = 1
BigQuery now supports qualify, which does everything you want. For the month, just use date_trunc():
select t.*
from t
qualify row_number() over (partition by user_id, date_trunc(threshold_date, month)
order by threshold_date desc, net_deposits_usd desc
);
A simple alternative uses arrays and group by:
select array_agg(t order by threshold_date desc, net_deposits_usd desc limit 1)[ordinal(1)].*
from t
group by user_id, date_trunc(threshold_date, month) ;

How to use max(Date) and Distinct in Oracle DB. I would like to get data inserted very last within a day

I am looking for a way to fetch Data.
by
latest date In the same day
by UserId
UserId,Value1,Date
1, 2030,2020–09-07 10:58:58
1, 2020,2020–09-07 05:58:28
1, 2050,2020–09-08 19:58:28
2, 3000,2020–09-07 10:58:18
2, 2001,2020–09-06 10:58:55
3, 2400,2020–09-08 10:28:53
4, 2400,2020–09-07 13:28:53
e.g
where Date >= trunc(TO_DATE(’20200907’,’YYYYMMDD’)) and Date < trunc(TO_DATE(’20200908’,’YYYYMMDD’))
Ideal Result
UserId,Value
1,2050
2,3000
4,2400
select UserId, value
What should I use ?
max(Date) ? Distinct userId ? Group by userId?
If value is the only column you want, then you could use keep:
select userid, max(value1) keep(dense_rank last order by dt) value1
from mytable
where dt >= date '2020-09-07' and dt < date '2020-09-08'
group by userid
order by userid
Notes:
this uses the standard date syntax rather than to_date() to build literal dates
dateis a reserved word in Oracle, hence not a good choice for a column name; I renamed it to dt in the query.
If you want more columns in the resultset, then filtering with window functions is more appropriate:
select t.*
from (
select t.*, row_number() over(partition by userid order by dt desc) rn
from mytable t
where dt >= date '2020-09-07' and dt < date '2020-09-08'
) t
where rn = 1

Select latest 30 dates for each unique ID

This is a sample data file
Data Contains unique IDs with different latitudes and longitudes on multiple timestamps.I would like to select the rows of latest 30 days of coordinates for each unique ID.Please help me on how to run the query .This date is in Hive table
Regards,
Akshay
According to your example above (where no current year dates for id=2,3), you can numbering date for each id (order by date descending) using window function ROW_NUMBER(). Then just get latest 30 values:
--get all values for each id where num<=30 (get last 30 days for each day)
select * from
(
--numbering each date for each id order by descending
select *, row_number()over(partition by ID order by DATE desc)num from Table
)X
where num<=30
If you need to get only unique dates (without consider time) for each id, then can try this query:
select * from
(
--numbering date for each id
select *, row_number()over(partition by ID order by new_date desc)num
from
(
-- move duplicate using distinct
select distinct ID,cast(DATE as date)new_date from Table
)X
)Y
where num<=30
In Oracle this will be:
SELECT * FROM TEST_DATE1
WHERE DATEUPDT > SYSDATE - 30;
select * from MyTable
where
[Date]>=dateadd(d, -30, getdate());
To group by ID and perform aggregation, something like this
select ID,
count(*) row_count,
max(Latitude) max_lat,
max(Longitude) max_long
from MyTable
where
[Date]>=dateadd(d, -30, getdate())
group by ID;

How do I get all rows from the second to latest date?

I have gotten all rows for the latest date like this:
SELECT date, quarter, sales_region, revenue
FROM regions
WHERE date = (SELECT MAX(date) FROM regions)
ORDER BY 1
So how would I get the rows for the second latest date?
I have tried but no luck:
SELECT MAX(date), quarter, sales_region, revenue
FROM regions
WHERE date < (SELECT MAX(date) FROM regions)
ORDER BY 1
Here is one method:
SELECT date, quarter, sales_region, revenue
FROM regions
WHERE date = (SELECT DISTINCT date
FROM regions r2
ORDER BY date DESC
OFFSET 1 FETCH FIRST 1 ROW ONLY
)
ORDER BY 1;
Another method uses dense_rank():
select r.*
from (select r.*, dense_rank() over (order by date desc) as seqnum
from regions r
) r
where seqnum = 2;
Gordon answered your question precisely, but if you want to get the records for the last two dates in one query, you could use IN instead of =, and get the top two records with LIMIT 2:
SELECT date, quarter, sales_region, revenue
FROM regions
WHERE date IN (SELECT DISTINCT date
FROM regions r2
ORDER BY date DESC
LIMIT 2)
ORDER BY 1;
Starting with version 8.4, you can also use FETCH FIRST 2 ROW ONLY instead of LIMIT 2.