Combining COUNT and RANK - PostgreSQL - sql

What I need to select is total number of trips made by every 'id_customer' from table user and their id, dispatch_seconds, and distance for first order. id_customer, customer_id, and order_id are strings.
It should looks like this
+------+--------+------------+--------------------------+------------------+
| id | count | #1order id | #1order dispatch seconds | #1order distance |
+------+--------+------------+--------------------------+------------------+
| 1ar5 | 3 | 4r56 | 1 | 500 |
| 2et7 | 2 | dc1f | 5 | 100 |
+------+--------+------------+--------------------------+------------------+
Cheers!
Original post was edited as during discussion S-man helped me to find exact problem solution. Solution by S-man https://dbfiddle.uk/?rdbms=postgres_10&fiddle=e16aa6008990107e55a26d05b10b02b5

db<>fiddle
SELECT
customer_id,
order_id,
order_timestamp,
dispatch_seconds,
distance
FROM (
SELECT
*,
count(*) over (partition by customer_id), -- A
first_value(order_id) over (partition by customer_id order by order_timestamp) -- B
FROM orders
)s
WHERE order_id = first_value -- C
https://www.postgresql.org/docs/current/static/tutorial-window.html
A window function which gets the total record count per user
B window function which orders all records per user by timestamp and gives the first order_id of the corresponding user. Using first_value instead of min has one benefit: Maybe it could be possible that your order IDs are not really increasing by timestamp (maybe two orders come in simultaneously or your order IDs are not sequential increasing but some sort of hash)
--> both are new columns
C now get all columns where the "first_value" (aka the first order_id by timestamp) equals the order_id of the current row. This gives all rows with the first order by user.
Result:
customer_id count order_id order_timestamp dispatch_seconds distance
----------- ----- -------- ------------------- ---------------- --------
1ar5 3 4r56 2018-08-16 17:24:00 1 500
2et7 2 dc1f 2018-08-15 01:24:00 5 100
Note that in these test data the order "dc1f" of user "2et7" has a smaller timestamp but comes later in the rows. It is not the first occurrence of the user in the table but nevertheless the one with the earliest order. This should demonstrate the case first_value vs. min as described above.

You are on the right track. Just use conditional aggregation:
SELECT o.customer_id, COUNT(*)
MAX(CASE WHEN seqnum = 1 THEN o.order_id END) as first_order_id
FROM (SELECT o.*,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_timestamp ASC) as seqnum
FROM orders o
) o
GROUP BY o.customer_id;
Your JOIN is not necessary for this query.

You can use window function :
select distinct customer_id,
count(*) over (partition by customer_id) as no_of_order
min(order_id) over (partition by customer_id order by order_timestamp) as first_order_id
from orders o;

I think there are many mistakes in your original query, your rank isn't partitioned, the order by clause seems incorrect, you filter out all but one "random" order, then apply the count, the list goes on.
Something like this seems closer to what you seem to want?
SELECT
customer_id,
order_count,
order_id
FROM (
SELECT
a.customer_id,
a.order_count,
a.order_id,
RANK() OVER (PARTITION BY a.order_id, a.customer_id ORDER BY a.order_count DESC) AS rank_id
FROM (
SELECT
customer_id,
order_id,
COUNT(*) AS order_count
FROM
orders
GROUP BY
customer_id,
order_id) a) b
WHERE
b.rank_id = 1;

Related

How to get the 2nd record for a customer purchase?

I'm working on a customers database and I want to get all data for their second purchase (for all of our customer weather they have 2 or more purchases).
For example:
Customer_ID Order_ID Order_Date
1 259 09/05/2020
1 644 03/11/2020
1 617 18/04/2022
4 834 22/09/2021
4 995 07/02/2022
I want to display the second order which is:
Customer_ID Order_ID Order_Date
1 644 03/11/2020
4 995 07/02/2022
I'm facing some difficulties in finding the right logic, any idea how I can achieve my end goal? :)
*Note: I'm using snowflake
You can use a ROW_NUMBER and filter using QUALIFY clause:
select * from table qualify row_number() over(partition by customer_id order by order_date) = 2;
You can use common table expression
with CTE_RS
AS (
SELECT Customer_ID,ORDER_ID,Order_Date,ROW_NUMBER() OVER(PARTITION BY Customer_ID ORDER BY Order_Date ) ORDRNUM FROM *TABLE NAME*
)
SELECT Customer_ID,ORDER_ID,Order_Date
FROM CTE_RS
WHERE ORDRNUM = 2 ;

rank function only returns 1 with date in redshift

I'm running the code below in redshift. I want to get a ranking of the order when a customer purchased a product based on the date. Each purchase has a unique ticketid, each customer has a unique customer_uuid, and each product has a unique product_id. The code below is returning 1 for all rankings and I'm not sure why. Is there an error in my code or is there a problem with ranking by a date field in redshift? Does anyone see how to modify this code to correct the issue.
code:
select customer_uuid,
product_id,
date,
ticketid
rank()
over(partition by customer_uuid,
product_id,
ticketid order by date asc) as rank
from table
order by customer_uuid, product_id
data:
customer_uuid product_id ticketid date
1 2 1 1/1/18
1 2 2 1/2/18
1 2 3 1/3/18
output:
customer_uuid product_id ticketid date rank
1 2 1 1/1/18 1
1 2 2 1/2/18 1
1 2 3 1/3/18 1
desired output:
customer_uuid product_id ticketid date rank
1 2 1 1/1/18 1
1 2 2 1/2/18 2
1 2 3 1/3/18 3
First, you have ticket_id in the partition by, which makes each row unique.
Second, you are using rank(). If you want an enumeration, do you want row_number()?
row_number() over(partition by customer_uuid, product_id order by date asc) as rank
I want to get a ranking of the order when a customer purchased a product based on the date. Each purchase has a unique ticketid, each customer has a unique customer_uuid, and each product has a unique product_id.
Basically you have unique (customer_uuid, product_id, ticket_id) tuples. If you use those as a partition, the rank will always be 1, since there is only one record per partition.
You just need to remove the ticket_id from the partition:
rank() over(
partition by customer_uuid, product_id
order by date
) as rank
Note: rank() will give an equal position to records that share the same (customer_uuid, product_id, date).

Top 10 of total amount paid aggregated by provider, partitioned by state - PostgreSQL

I have a database of medicare data with three tables: provider metadata (doctor's unique number, name, city, state, credentials, etc); hcpcs metadata (code, description, if it's for drugs or not); provider_services (doctor's unique number, hcpcs code, number of services completed by that doctor, average cost)
I'm trying to get the top 10 payments by state, aggregated by provider. However I'm running into an issue where 1) I can't figure out how to rank by the total payment and 2) I can't figure out how to aggregate the providers. Here's the best query I've gotten so far:
SELECT *
FROM (
SELECT p.npi,
p.nppes_provider_last_org_name AS last_name,
p.nppes_provider_first_name AS first_name,
p.nppes_provider_city AS city,
p.nppes_provider_state AS state,
(ps.average_medicare_payment_amt * ps.line_srvc_cnt) AS total_amount,
RANK() OVER (PARTITION BY p.nppes_provider_state ORDER BY ps.average_medicare_payment_amt desc) AS rank
FROM provider_services ps
JOIN provider p ON ps.npi = p.npi
) t
WHERE rank <= 10
GROUP BY t.last_name, t.npi, t.first_name, t.city, t.state, t.total_amount, t.rank
ORDER BY state ASC;
This results in something like:
| LAST | FIRST| STATE | TOTAL | RANK |
|-------|------|----|---------|---|
| DOE | JANE | AK | 3000.41 | 10|
| SMITH | JOHN | AK | 6000.98 | 7 |
| COLE | ANN | AK | 1000 | 4 |
| SMITH | JOHN | AK | 1560.32 | 1 |
So my issues are 1. the providers aren't aggregating (John Smith with the same unique number showing up multiple times) and 2. I can only get it to compile with that average_payment_amt and not total_amt so the rankings are really screwed up.
Consider following adjustments:
Avoid ever using SELECT * in aggregate queries with GROUP BY. It is a wonder this query was allowed in PostgreSQL without error but such use of SELECT * may be shorthand for all columns specified in GROUP BY.
Use calculated expression for total_amount in the window function's ORDER BY clause.
Apply an aggregation function like SUM on your total_amount and do not include it as grouping column. In fact, you do not mention how you want to aggregate by provider.
Rank based on state throws off aggregation based on different column: provider. Right now it appears you want to use rank only for filtering records and not display.
Below achieves the following:
Sums total payment amounts by provider for the top 10 payment amounts per state.
SELECT t.npi, t.last_name, t.first_name, t.city, t.state,
SUM(t.total_amount) AS total_amount
FROM (
SELECT p.npi,
p.nppes_provider_last_org_name AS last_name,
p.nppes_provider_first_name AS first_name,
p.nppes_provider_city AS city,
p.nppes_provider_state AS state,
(ps.average_medicare_payment_amt * ps.line_srvc_cnt) AS total_amount,
RANK() OVER (PARTITION BY p.nppes_provider_state
ORDER BY ps.average_medicare_payment_amt * ps.line_srvc_cnt DESC) AS rank
FROM provider_services ps
JOIN provider p ON ps.npi = p.npi
) t
WHERE rank <= 10
GROUP BY t.npi, t.last_name, t.first_name, t.city, t.state
ORDER BY t.state ASC;
Now, below achieves the following if this is your intention:
Displays records of top 10 payments per state in state and rank order (where providers can repeat if they ranked multiple times within or between states).
SELECT t.*
FROM (
SELECT p.npi,
p.nppes_provider_last_org_name AS last_name,
p.nppes_provider_first_name AS first_name,
p.nppes_provider_city AS city,
p.nppes_provider_state AS state,
(ps.average_medicare_payment_amt * ps.line_srvc_cnt) AS total_amount,
RANK() OVER (PARTITION BY p.nppes_provider_state
ORDER BY ps.average_medicare_payment_amt * ps.line_srvc_cnt DESC) AS rank
FROM provider_services ps
JOIN provider p ON ps.npi = p.npi
) t
WHERE rank <= 10
ORDER BY t.state, t.rank;
I am guessing that you actually want to aggregate in the subquery and rank by the total amount:
SELECT t.*
FROM (SELECT p.npi,
p.nppes_provider_last_org_name AS last_name,
p.nppes_provider_first_name AS first_name,
p.nppes_provider_state AS state,
SUM(ps.average_medicare_payment_amt * ps.line_srvc_cnt) AS total_amount,
RANK() OVER (PARTITION BY p.nppes_provider_state ORDER BY SUM(ps.average_medicare_payment_amt * ps.line_srvc_cnt) DESC) as rnk
FROM provider_services ps JOIN
provider p
ON ps.npi = p.npi
) t
WHERE rnk <= 10
ORDER BY state ASC, total_amount DESC;

Top 2 Months of Sales by Customer - Oracle

I am trying to develop a query to pull out the top 2 months of sales by customer id. Here is a sample table:
Customer_ID Sales Amount Period
144567 40 2
234567 50 5
234567 40 7
144567 80 10
144567 48 2
234567 23 7
desired output would be
Customer_ID Sales Sum Period
144567 80 10
144567 48 2
234567 50 5
234567 40 7
I've tried
select sum(net_sales_usd_spot), valid_period, customer_id
from sales_trans_price_output
where valid_period in (select valid_period, sum(net_sales_usd_spot)
from sales_trans_price_output
where rank<=2)
group by valid_period, customer_id
error is
too many values ORA-00913.
I see why, but not sure how to rework it.
Try:
SELECT *
FROM (
SELECT t.*,
row_number() over (partition by customer_id order by sales_amount desc ) rn
FROM sales_trans_price t
)
WHERE rn <= 2
ORDER BY 1,2 desc
Demo: http://sqlfiddle.com/#!4/882888/3
what if you change your where clause to:
where valid_period in
(
select p.valid_period from sales_trans_price_output p
join (select valid_period, sum(net_sales_usd_spot)
from sales_trans_price_output
where rank<=2) s on s.valid_period = p.valid_period
)
It might be ugly and need refactoring, but I think this is the logic you're after.
The error is because of this.
where valid_period in (select valid_period, sum(net_sales_usd_spot)
from sales_trans_price_output
where rank<=2)
The subquery can only contain one field.
You are on the right track using rank, but you might not be using it correctly. Google oracle rank to find the correct syntax.
Back to what you are looking to achieve, a derived table is the approach I would use. That's simply a subquery with an alias. Or, if you use the keyword with, it might be called a CTE - Computed Table Expression.
Try it
SELECT * FROM (
SELECT T.*,
RANK () OVER (PARTITION BY CUSTOMER_ID
ORDER BY VALID_PERIOD DESC) FN_RANK
FROM SALES_TRANS_PRICE_OUTPUT T
) A
WHERE A.FN_RANK <= 2
ORDER BY CUSTOMER_ID ASC, VALID_PERIOD DESC, FN_RANK DESC

Firebird Query- Return first row each group

In a firebird database with a table "Sales", I need to select the first sale of all customers. See below a sample that show the table and desired result of query.
---------------------------------------
SALES
---------------------------------------
ID CUSTOMERID DTHRSALE
1 25 01/04/16 09:32
2 30 02/04/16 11:22
3 25 05/04/16 08:10
4 31 07/03/16 10:22
5 22 01/02/16 12:30
6 22 10/01/16 08:45
Result: only first sale, based on sale date.
ID CUSTOMERID DTHRSALE
1 25 01/04/16 09:32
2 30 02/04/16 11:22
4 31 07/03/16 10:22
6 22 10/01/16 08:45
I've already tested following code "Select first row in each GROUP BY group?", but it did not work.
In Firebird 2.5 you can do this with the following query; this is a minor modification of the second part of the accepted answer of the question you linked to tailored to your schema and requirements:
select x.id,
x.customerid,
x.dthrsale
from sales x
join (select customerid,
min(dthrsale) as first_sale
from sales
group by customerid) p on p.customerid = x.customerid
and p.first_sale = x.dthrsale
order by x.id
The order by is not necessary, I just added it to make it give the order as shown in your question.
With Firebird 3 you can use the window function ROW_NUMBER which is also described in the linked answer. The linked answer incorrectly said the first solution would work on Firebird 2.1 and higher. I have now edited it.
Search for the sales with no earlier sales:
SELECT S1.*
FROM SALES S1
LEFT JOIN SALES S2 ON S2.CUSTOMERID = S1.CUSTOMERID AND S2.DTHRSALE < S1.DTHRSALE
WHERE S2.ID IS NULL
Define an index over (customerid, dthrsale) to make it fast.
in Firebird 3 , get first row foreach customer by min sales_date :
SELECT id, customer_id, total, sales_date
FROM (
SELECT id, customer_id, total, sales_date
, row_number() OVER(PARTITION BY customer_id ORDER BY sales_date ASC ) AS rn
FROM SALES
) sub
WHERE rn = 1;
İf you want to get other related columns, This is where your self-answer fails.
select customer_id , min(sales_date)
, id, total --what about other colums
from SALES
group by customer_id
So simple as:
select CUSTOMERID min(DTHRSALE) from SALES group by CUSTOMERID