rank function only returns 1 with date in redshift - sql

I'm running the code below in redshift. I want to get a ranking of the order when a customer purchased a product based on the date. Each purchase has a unique ticketid, each customer has a unique customer_uuid, and each product has a unique product_id. The code below is returning 1 for all rankings and I'm not sure why. Is there an error in my code or is there a problem with ranking by a date field in redshift? Does anyone see how to modify this code to correct the issue.
code:
select customer_uuid,
product_id,
date,
ticketid
rank()
over(partition by customer_uuid,
product_id,
ticketid order by date asc) as rank
from table
order by customer_uuid, product_id
data:
customer_uuid product_id ticketid date
1 2 1 1/1/18
1 2 2 1/2/18
1 2 3 1/3/18
output:
customer_uuid product_id ticketid date rank
1 2 1 1/1/18 1
1 2 2 1/2/18 1
1 2 3 1/3/18 1
desired output:
customer_uuid product_id ticketid date rank
1 2 1 1/1/18 1
1 2 2 1/2/18 2
1 2 3 1/3/18 3

First, you have ticket_id in the partition by, which makes each row unique.
Second, you are using rank(). If you want an enumeration, do you want row_number()?
row_number() over(partition by customer_uuid, product_id order by date asc) as rank

I want to get a ranking of the order when a customer purchased a product based on the date. Each purchase has a unique ticketid, each customer has a unique customer_uuid, and each product has a unique product_id.
Basically you have unique (customer_uuid, product_id, ticket_id) tuples. If you use those as a partition, the rank will always be 1, since there is only one record per partition.
You just need to remove the ticket_id from the partition:
rank() over(
partition by customer_uuid, product_id
order by date
) as rank
Note: rank() will give an equal position to records that share the same (customer_uuid, product_id, date).

Related

How to get the 2nd record for a customer purchase?

I'm working on a customers database and I want to get all data for their second purchase (for all of our customer weather they have 2 or more purchases).
For example:
Customer_ID Order_ID Order_Date
1 259 09/05/2020
1 644 03/11/2020
1 617 18/04/2022
4 834 22/09/2021
4 995 07/02/2022
I want to display the second order which is:
Customer_ID Order_ID Order_Date
1 644 03/11/2020
4 995 07/02/2022
I'm facing some difficulties in finding the right logic, any idea how I can achieve my end goal? :)
*Note: I'm using snowflake
You can use a ROW_NUMBER and filter using QUALIFY clause:
select * from table qualify row_number() over(partition by customer_id order by order_date) = 2;
You can use common table expression
with CTE_RS
AS (
SELECT Customer_ID,ORDER_ID,Order_Date,ROW_NUMBER() OVER(PARTITION BY Customer_ID ORDER BY Order_Date ) ORDRNUM FROM *TABLE NAME*
)
SELECT Customer_ID,ORDER_ID,Order_Date
FROM CTE_RS
WHERE ORDRNUM = 2 ;

Select the best selling product ID

What if I have table like this and I want to select the best selling product_id.
id
transaction_id
product_id
qty_sold
1
21
2
5
2
22
3
2
3
23
4
2
3
24
2
1
3
25
2
4
I want the best selling product_id with the highest qty_sold
Using SQLS, you can group by the productID, add up the number of sold, and order by the total descending. If we also take the minimum transaction ID per product, if two products come out to have the same total qty, we can take the minimum tran ID to split the tie
SELECT TOP 1 product_id, SUM(qty_sold) as sellcount, MIN(transaction_id) as firsttran
FROM t
GROUP BY product_id
ORDER BY SUM(qty_sold) DESC, MIN(transaction_id)
Once you're happy the sums are right etc, you can remove the , SUM(qty_sold) as sellcount, MIN(transaction_id) from the SELECT if you want/if you only need the prod ID

ORDER BY date but also GROUP BY userid

I have a table of records I want to sort by earliest date first then by userid.
If the user associated to the date also has other records in that table I want to group those under the earliest date.
Desired output
Id UserId Date
1 2 1/1/2020
2 2 2/1/2020
3 2 3/1/2020
4 1 1/2/2020
5 1 2/2/2020
6 3 1/4/2020
7 4 1/5/2020
In this example UserId 2 has the earliest record in that table, so that record should be first followed by his additional records in date asc order
You seems want :
select t.*
from table t
order by min(date) over (partition by userid), date;
Some database product doesn't support window function with order by, so you can do instead :
select t.*, min(date) over (partition by userid) as mndate
from table t
order by mndate, date;
If I understand what you want...
You could do this (sample with DB2 syntax):
SELECT tab.UserId, tab.Date, tab.*
FROM DB2SIS.TABLE_NAME tab
ORDER BY tab.Date ASC, tab.UserId ASC
This way UserId and Date will appear repeatedly. Instead of 'tab.*' use each field you want to show, then UserId and Date will not repeat.

Add temporary column with number in sequence in BigQuery

I have two columns: customers and orders. orders has customer_id column. So customer can have many orders. I need to find order number in sequence (by date). So result should be something like this:
customer_id order_date number_in_sequence
----------- ---------- ------------------
1 2020-01-01 1
1 2020-01-02 2
1 2020-01-03 3
2 2019-01-01 1
2 2019-01-02 2
I am going to use it in WITH clause. So I don't need to add it to the table.
You need row_number() :
select t.*,
row_number() over (partition by customer_id order by order_date) as number_in_sequence
from table t;

Combining COUNT and RANK - PostgreSQL

What I need to select is total number of trips made by every 'id_customer' from table user and their id, dispatch_seconds, and distance for first order. id_customer, customer_id, and order_id are strings.
It should looks like this
+------+--------+------------+--------------------------+------------------+
| id | count | #1order id | #1order dispatch seconds | #1order distance |
+------+--------+------------+--------------------------+------------------+
| 1ar5 | 3 | 4r56 | 1 | 500 |
| 2et7 | 2 | dc1f | 5 | 100 |
+------+--------+------------+--------------------------+------------------+
Cheers!
Original post was edited as during discussion S-man helped me to find exact problem solution. Solution by S-man https://dbfiddle.uk/?rdbms=postgres_10&fiddle=e16aa6008990107e55a26d05b10b02b5
db<>fiddle
SELECT
customer_id,
order_id,
order_timestamp,
dispatch_seconds,
distance
FROM (
SELECT
*,
count(*) over (partition by customer_id), -- A
first_value(order_id) over (partition by customer_id order by order_timestamp) -- B
FROM orders
)s
WHERE order_id = first_value -- C
https://www.postgresql.org/docs/current/static/tutorial-window.html
A window function which gets the total record count per user
B window function which orders all records per user by timestamp and gives the first order_id of the corresponding user. Using first_value instead of min has one benefit: Maybe it could be possible that your order IDs are not really increasing by timestamp (maybe two orders come in simultaneously or your order IDs are not sequential increasing but some sort of hash)
--> both are new columns
C now get all columns where the "first_value" (aka the first order_id by timestamp) equals the order_id of the current row. This gives all rows with the first order by user.
Result:
customer_id count order_id order_timestamp dispatch_seconds distance
----------- ----- -------- ------------------- ---------------- --------
1ar5 3 4r56 2018-08-16 17:24:00 1 500
2et7 2 dc1f 2018-08-15 01:24:00 5 100
Note that in these test data the order "dc1f" of user "2et7" has a smaller timestamp but comes later in the rows. It is not the first occurrence of the user in the table but nevertheless the one with the earliest order. This should demonstrate the case first_value vs. min as described above.
You are on the right track. Just use conditional aggregation:
SELECT o.customer_id, COUNT(*)
MAX(CASE WHEN seqnum = 1 THEN o.order_id END) as first_order_id
FROM (SELECT o.*,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_timestamp ASC) as seqnum
FROM orders o
) o
GROUP BY o.customer_id;
Your JOIN is not necessary for this query.
You can use window function :
select distinct customer_id,
count(*) over (partition by customer_id) as no_of_order
min(order_id) over (partition by customer_id order by order_timestamp) as first_order_id
from orders o;
I think there are many mistakes in your original query, your rank isn't partitioned, the order by clause seems incorrect, you filter out all but one "random" order, then apply the count, the list goes on.
Something like this seems closer to what you seem to want?
SELECT
customer_id,
order_count,
order_id
FROM (
SELECT
a.customer_id,
a.order_count,
a.order_id,
RANK() OVER (PARTITION BY a.order_id, a.customer_id ORDER BY a.order_count DESC) AS rank_id
FROM (
SELECT
customer_id,
order_id,
COUNT(*) AS order_count
FROM
orders
GROUP BY
customer_id,
order_id) a) b
WHERE
b.rank_id = 1;