I’m using Oracle and trying to find the maximum transaction count (and associated date) for each station.
This is the code I have but it returns each transaction count and date for each station rather than just the maximum. If I take the date part out of the outer query it returns just the maximum transaction count for each station, but I need to know the date of when it happened. Does anyone know how to get it to work?
Thanks!
SELECT STATION_ID, STATION_NAME, MAX(COUNTTRAN), TESTDATE
FROM
(
SELECT COUNT(TRANSACTION_ID) AS COUNTTRAN, STATION_ID,
STATION_NAME, TO_CHAR(TRANSACTION_DATE, 'HH24') AS TESTDATE
FROM STATION_TRANSACTIONS
WHERE COUNTRY = 'GB'
GROUP BY STATION_ID, STATION_NAME, TO_CHAR(TRANSACTION_DATE, 'HH24')
)
GROUP BY STATION_ID, STATION_NAME, TESTDATE
ORDER BY MAX(COUNTTRAN) DESC
This image shows the results I currently get vs the ones I want:
What your query does is this:
Subquery: Get one record per station_id, station_name and date. Count the transactions for each such combination.
Main query: Get one record per station_id, station_name and date. (We already did that, so it doesn't change anything.)
Order the records by transaction count.
This is not what you want. What you want is one result row per station_id, station_name, so in your main query you should have grouped by these only, excluding the date:
select
station_id,
station_name,
max(counttran) as maxcount,
max(testdate) keep (dense_rank last over order by counttran) as maxcountdate
from
(
select
count(transaction_id) as counttran,
station_id,
station_name,
to_char(transaction_date, 'hh24') as testdate
from station_transactions
where country = 'GB'
group by station_id, station_name, to_char(transaction_date, 'hh24')
)
group by station_id, station_name;
An alternative would be not to group by in the main query again, for actually you already have the desired records already and only want to remove the others. You can do this by ranking the records in the subquery, i.e. give them row numbers, with #1 for the best record per station (this is the one with the highest count). Then dismiss all others and you are done:
select station_id, station_name, counttran, testdate
from
(
select
count(transaction_id) as counttran,
row_number() over(partition by station_id order by count(transaction_id) desc) as rn
station_id,
station_name,
to_char(transaction_date, 'hh24') as testdate
from station_transactions
where country = 'GB'
group by station_id, station_name, to_char(transaction_date, 'hh24')
)
where rn = 1;
Related
I need to write SQL query to pull the single, highest-earning day for a certain brand of each quarter of 2018. I have the following but it does not pull a singular day - it pulls the highest earnings for each day.
select distinct quarter, order_event_date, max(gc) as highest_day_gc
from (
select sum(commission) as cm, order_date,
extract(quarter from order__date) as quarter
from order_table
where advertiser_id ='123'
and event_year='2018'
group by 3,2
)
group by 1,2
order by 2 DESC
You can use window functions to find the highest earning day per quarter by using rank().
select rank() over (partition by quarter order by gc desc) as rank, quarter, order_event_date, gc
from (select sum(gross_commission) gc,
order_event_date,
extract(quarter from order_event_date) quarter
from order_aggregation
where advertiser_id = '123'
and event_year = '2018'
group by order_event_date, quarter) a
You could create the query above as view and filter it by using where rank = 1.
You could add the LIMIT clause at the end of the sentence. Also, change the las ORDER BY clause to ORDER BY highest_day_gc. Something like:
SELECT DISTINCT quarter
,order_event_date
,max(gc) as highest_day_gc
FROM (SELECT sum(gross_commission) as gc
,order_event_date
,extract(quarter from order_event_date) as quarter
FROM order_aggregation
WHERE advertiser_id ='123'
AND event_year='2018'
GROUP BY 3,2) as subquery
GROUP BY 1,2
ORDER BY 3 DESC
LIMIT 1
I have the following db structure.
table
-----
id (uuids)
date (TIMESTAMP)
I want to write a query in postgres (actually cockroachdb which uses the postgres engine, so postgres query should be fine).
The query should return a count of records between 2 dates , id of the record with latest date and id of the record with latest earliest date within that range.
So the query should return the following:
count, id(of the earliest record in the range), id (of the latest record in the range)
thanks.
You can use row_number() twice, then conditional aggregation:
select
no_records,
min(id) filter(where rn_asc = 1) first_id
max(id) filter(where rn_desc = 1) last_id
from (
select
id,
count(*) over() no_records
row_number() over(order by date asc) rn_asc,
row_number() over(order by date desc) rn_desc
from mytable
where date >= ? and date < ?
) t
where 1 in (rn_asc, rn_desc)
The question marks represents the (inclusive) start and (exclusive) end of the date interval.
Of course, if ids are always increasing, simple aggregation is sufficient:
select count(*), min(id) first_id, max(id) last_id
from mytable
where date >= ? and date < ?
Unfortunately, Postgres doesn't support first_value() as an aggregation function. One method is to use arrays:
select count(*),
(array_agg(id order by date asc))[1] as first_id,
(array_agg(id order by date desc))[1] as last_id
from t
where date >= ? and date <= ?
I want to find customers where for example, system by error registered duplicates of an order.
It's pretty easy, if reg_date is EXACTLY the same but I have no idea how to implement it in query to count as duplicate if for example there was up to 1 second difference between transactions.
select * from
(select customer_id, reg_date, count(*) as cnt
from orders
group by 1,2
) x where cnt > 1
Here is example dataset:
https://www.db-fiddle.com/f/m6PhgReSQbVWVZhqe8n4mi/0
CUrrently only customer's 104 orders are counted as duplicates because its reg_date is identical, I want to count also orders 1,2 and 4,5 as there's just 1 second difference
demo:db<>fiddle
SELECT
customer_id,
reg_date
FROM (
SELECT
*,
reg_date - lag(reg_date) OVER (PARTITION BY customer_id ORDER BY reg_date) <= interval '1 second' as is_duplicate
FROM
orders
) s
WHERE is_duplicate
Use the lag() window function. It allows to have a look hat the previous record. With this value you can do a diff and filter the records where the diff time is more than one second.
Try this following script. This will return you day/customer wise duplicates.
SELECT
TO_CHAR(reg_date :: DATE, 'dd/mm/yyyy') reg_date,
customer_id,
count(*) as cnt
FROM orders
GROUP BY
TO_CHAR(reg_date :: DATE, 'dd/mm/yyyy'),
customer_id
HAVING count(*) >1
I have a table with a created timestamp and id identifier.
I can get number of unique id's per week with:
SELECT date_trunc('week', created)::date AS week, count(distinct id)
FROM my_table
GROUP BY week ORDER BY week;
Now I want to have the accumulated number of created by unique id's per week, something like this:
SELECT date_trunc('week', created)::date AS week, count(distinct id),
(SELECT count(distinct id)
FROM my_table
WHERE date_trunc('week', created)::date <= week) as acc
FROM my_table
GROUP BY week ORDER BY week;
But that doesn't work, as week is not accessible in the sub select (ERROR: column "week" does not exist).
How do I solve this?
I'm using PostgreSQL
Use a cumulative aggregation. But, I don't think you need the distinct, so:
SELECT date_trunc('week', created)::date AS week, count(*) as cnt,
SUM(COUNT(*)) OVER (ORDER BY MIN(created)) as running_cnt
FROM my_table
GROUP BY week
ORDER BY week;
In any case, as you've phrased the problem, you can change cnt to use count(distinct). Your subquery is not using distinct at all.
CTEs or a temp table should fix your problem. Here is an example using CTEs.
WITH abc AS (
SELECT date_trunc('week', created)::date AS week, count(distinct id) as IDCount
FROM my_table
GROUP BY week ORDER BY week;
)
SELECT abc.week, abc.IDcount,
(SELECT count(*)
FROM my_table
WHERE date_trunc('week', created)::date <= adc.week) as acc
FROM abc
GROUP BY week ORDER BY abc.week;
Hope this helps
In my table I have bus rides taken in different networks - each record represents one ride.
My goal is to find the max number of rides taken in a day in each network and the day that the max number of rides occurred - which requires first counting the number of rides per day in each network and then taking the max count per network - in the end I will have three columns -
YMD - max_count- network_id
I have tried to use the query below but I am not sure where or how to include the max() function. Any suggestions?
SELECT DISTINCT ON (network_id)
network_id, count(*), to_char(start_time, 'YYYY-MM-DD') as YMD
FROM routes
ORDER BY network_id, count DESC, YMD;
I'd use an aggregate query to count the number of rides a day, and then a windowing rank call to find the date with the most rides:
SELECT network_id, cnt, ymd
FROM (SELECT network_id,
ymd,
cnt,
RANK() OVER (PARTITION BY network_id ORDER BY cnt DESC) AS rk
FROM (SELECT network_id,
TO_CHAR(start_time, 'YYYY-MM-DD') AS ymd,
COUNT(*) AS cnt
FROM routes
GROUP BY network_id, TO_CHAR(start_time, 'YYYY-MM-DD')
) t
) s
WHERE rk = 1