Using Qualify with Rank in Teradata - sql

I am trying to get latest two month start dates for a particular product when it was sold in Teradata.Since a product was sold in multiple months , I should only get the latest two selling months for each product.
Trying to use Qualify with Dense Rank :
SELECT DISTINCT PRODUCT, MONTH_START_DATE,
DENSE_RANK() OVER (PARTITION BY PRODUCT ORDER BY MONTH_START_DATE DESC ) AS RNK
FROM EMP_TABLE
HERE PRODUCT = 'SOAP'
This will give me Different months with Rank and Product. Something like this :
+---------+------------------+------+
| Product | Month_start_date | RNK |
+---------+------------------+------+
| SOAP | 2016-12-01 | 1 |
| SOAP | 2016-11-01 | 2 |
| SOAP | 2016-10-01 | 3 |
+---------+------------------+------+
But if I rewrite code to get only top 2 :
SELECT DISTINCT PRODUCT, MONTH_START_DATE
DENSE_RANK() OVER (PARTITION BY PRODUCT ORDER BY MONTH_START_DATE DESC ) AS RNK
FROM EMP_TABLE
WHERE PRODUCT = 'SOAP'
QUALIFY RNK < 3
I always get only the top rank result. What is the reason for this ? The solution is writing a subquery but wanted to understand the reason behind 'Qaulify' giving only top row.
Thanks for the help.

Related

Select row with smallest number on multiple groups of same ids

I have the following table as an output from a sql statement
user | product | price
…
123 | 12 | 451.29
373 | 12 | 637.28
623 | 12 | 650.84
672 | 16 | 356.87
123 | 16 | 263.90
…
Now I want to get only the row with the smallest price for each product_id
THE SQL is fairly easy
SELECT user, product, price
FROM t
WHERE product IN (
SELECT product_id
FROM p
WHERE typ LIKE 'producttyp1'
)
)
but adding MIN(price) does not work how it usually do. I think its because there are several groups of the same product_ids in the same table. Is there an easy to use solution or do I have to rewrite the whole query?
Edit: when I delete user from the query I can get the product and the smallest price:
12 | 451.29
16 | 263.90
But now I would have to join the user, which I am trying to avoid.
You can use row_number():
select p.*
from (select p.*,
row_number() over (partition by product order by price asc) as seqnum
from p
) p
where seqnum = 1;

Find Customers With 4 Consecutive Years of Giving (Including Gaps)

I have a table similar to below:
+------------+-----------+
| CustomerID | OrderYear |
+------------+-----------+
| 1 | 2012 |
| 1 | 2013 |
| 1 | 2014 |
| 1 | 2017 |
| 1 | 2018 |
| 2 | 2012 |
| 2 | 2013 |
| 2 | 2014 |
| 2 | 2015 |
| 2 | 2017 |
+------------+-----------+
How would I identify which CustomerIDs have 4 consecutive years of giving? (In the above, only customer 2.) As you can see, some records will have gaps in order years.
I started down the row of trying to utilize some combination of ROW_NUMBER/LAG/LEAD with no luck to this point.
Very paired down/modified attempt...
WITH CTE
AS
(
SELECT T.ConstituentLookupID,
T.FISCALYEAR,
COUNT(T.FISCALYEAR) OVER (PARTITION BY T.ConstituentLookupID) AS
YearCount,
FIRST_VALUE(T.FISCALYEAR) OVER(PARTITION BY T.ConstituentLookupID ORDER
BY T.FISCALYEAR DESC) - T.FISCALYEAR + 1 as X,
ROW_NUMBER() OVER(PARTITION BY T.ConstituentLookupID ORDER BY
T.FISCALYEAR DESC) AS RN
FROM #Temp AS T)
SELECT CTE.ConstituentLookupID,
CTE.FISCALYEAR,
CTE.YearCount,
CTE.X,
CTE.RN,
FROM CTE
WHERE CTE.YearCount >= 4 --Have at least 4 years of giving
AND CTE.X - CTE.RN = 1 --Some kind of way to calculate consecutive years. Doesnt account current year and gaps...;
Assuming no duplicates, you can use lag():
select distinct customerid
from (
select t.*,
lag(orderyear, 3) over(partition by customerid order by orderyear) oderyear3
from mytable t
) t
where orderyear = orderyear3 + 3
A more conventional approach is to use some gaps-and-islands technique. This is convenient if you want the start and end of each series. Here, an island is a series of rows with "adjacent" order years, and you want islands that are at least 4 years long. We can identify the islands by comparing the order year against an incrementing sequence, then use aggregation:
select customerid, min(orderyear) firstorderyear, max(orderyear) lastorderyear
from (
select t.*,
row_number() over(partition by customerid order by orderyear) rn
from mytable t
) t
group by customerid, orderyear - rn
having count(*) >= 4
Assuming you have no more than one row per customer and year, the simplest method is lag():
select customerid, year
from (select t.*,
lag(orderyear, 3) over (partition by customerid order by orderyear) as prev3_year
from t
) t
where prev3_year = year - 3;
The idea is to look 3 years back. If that year is year - 3, then there are four years in a row. If your data can have duplicates, there are tweaks to the logic (they make the query more slightly more complicated).
This could return duplicates, so you might just want:
select distinct customerid
from (select t.*,
lag(orderyear, 3) over (partition by customerid order by orderyear) as prev3_year
from t
) t
where prev3_year = year - 3;
I have a simple solution using row number and group by
SELECT Max(z.customerid),
Count(z.grp)
FROM (SELECT customerid,
orderyear,
orderyear - Row_number()
OVER (
ORDER BY customerid) AS Grp
FROM mytable)z
GROUP BY z.grp
HAVING Count(z.grp) = 4

row_number() over(partition by ____ order by____) but only when dates have elapsed a certain amount of time

I have a "package rank" based on a certain number of times a unique customer id has been sent a package.
row_number() over (partition by package.customer_id
order by ship_date.shipped_date) as package_rank
The output returned is something like this:
+------------+-----------+-----+
|customer_id | ship_date | rank|
+------------+-----------+-----+
| sam | 8/20/2019 | 1 |
| sam | 9/20/2019 | 2 |
| sam | 9/23/2019 | 3 |
| tim | 9/20/2019 | 1 |
| tim | 10/18/2019| 2 |
+------------+-----------+-----+
Since, it is unlikely that we would have shipped another complete box within 3 days, like in the case of sam, I would not want to include that shipment. I would only want to have the rank include shipment dates that are at least 28 days later than the previous ship date. Please let me know what the best way to go about this is! Thank you in advance.
Use a window function to exclude such rows from the list:
SELECT customer_id,
ship_date,
row_number()
OVER (PARTITION BY customer_id
ORDER BY ship_date) AS rank
FROM (SELECT customer_id,
ship_date,
lag(ship_date)
OVER (PARTITION BY customer_id
ORDER BY ship_date) AS prev_ship_date
FROM package) AS p1
WHERE (prev_ship_date + 28 > ship_date) IS NOT FALSE
ORDER BY rank;
Use lag() to get the previous ship date. Then filter based on that. I would phrase this as:
SELECT customer_id, ship_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY ship_date) AS package_rank
FROM (SELECT p.*,
LAG(ship_date) OVER (PARTITION BY customer_id ORDER BY ship_date) AS prev_ship_date
FROM package p
) p
WHERE prev_ship_date IS NULL OR
prev_ship_date < ship_date - INTERVAL '28 day'
ORDER BY rank;

Rank function for date in Oracle SQL

I have the following code for example:
SELECT id, order_day, purchase_id FROM d
customer_id and purchase_id are unique. Each customer_id could have multiple purchase_id. Assume every one has made at least 5 orders.
Now, I just want to pull the first 5 purchase IDs of each customers ID (this depends on the earliest dates of purchases). I want the result to look like this:
id | purchase_id | rank
-------------------------
A | WERFEW43 | 1
A | ERTGDSFV | 3
A | FDGRT45 | 2
A | BRTE4TEW | 4
A | DFGDV | 5
B | DSFSF | 1
B | CF345 | 2
B | SDFSDFSDFS | 4
I thought of Ranking order_day, but my knowledge is not good enough to pull this off.
select id,purchase_id, rank() over (order by order_day)
from d
you also can try dense_rank() over (order by order_day) and row_number() over (order by order_day) and choose which one will be more suitable for you
select *
from
( SELECT
id
,order_day
,purchase_id
,row_number() -- ranking
over (partition by id -- each customer
order by order_day) as rn -- based on oldest dates
FROM d
) as dt
where rn <= 5

Aggregate highest prices per client of salesmen

I have a table like this:
SELECT * FROM orders;
client_id | order_id | salesman_id | price
-----------+----------+-------------+-------
1 | 167 | 1 | 65
1 | 367 | 1 | 27
2 | 401 | 1 | 29
2 | 490 | 2 | 48
3 | 199 | 1 | 68
3 | 336 | 2 | 22
3 | 443 | 1 | 84
3 | 460 | 2 | 92
I want to find the an array of order_ids for each of the highest priced sales for each unique salesman and client pair. In this case I want the resulting table:
salesman_id | order_id
-------------+----------------
1 | {167, 401, 443}
2 | {490, 460}
So far I have an outline for a query:
SELECT salesman_id, max_client_salesman(order_id)
FROM orders
GROUP BY salesman_id;
However I'm having trouble writing the aggregate function max_client_salesman.
The documentation online for aggregate functions and arrays in postgres is very minimal. Any help is appreciated.
Standard SQL
I would combine the window function last_value() or firstvalue() with DISTINCT to the get the orders with the highest price per (salesman_id, client_id) efficiently and then aggregate this into the array you are looking for with the simple aggregate function array_agg().
SELECT salesman_id
,array_agg(max_order_id) AS most_expensive_orders_per_client
FROM (
SELECT DISTINCT
salesman_id, client_id
,last_value(order_id) OVER (PARTITION BY salesman_id, client_id
ORDER BY price
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) AS max_order_id
FROM orders
) x
GROUP BY salesman_id
ORDER BY salesman_id;
Returns:
salesman_id | most_expensive_orders_per_client
-------------+------------------------------------
1 | {167, 401, 443}
2 | {490, 460}
SQL Fiddle.
If there are multiple highest prices per (salesman_id, client_id), this query pick only one order_id arbitrarily - for lack of definition.
For this solution it is essential to understand that window functions are applied before DISTINCT. How you to combine DISTINCT with a window function:
PostgreSQL: running count of rows for a query 'by minute'
For an explanation on ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING refer to this closely related answer on dba.SE.
Simper with non-standard DISTINCT ON
PostgreSQL implements, as extension to the SQL standard, DISTINCT ON. With it you can very effectively select rows unique according to a defined set of columns.
It won't get simpler or faster than this:
SELECT salesman_id
,array_agg(order_id) AS most_expensive_orders_per_client
FROM (
SELECT DISTINCT ON (1, client_id)
salesman_id, order_id
FROM orders
ORDER BY salesman_id, client_id, price DESC
) x
GROUP BY 1
ORDER BY 1;
SQL Fiddle.
I also use positional parameters for shorter syntax. Details:
Select first row in each GROUP BY group?
I think you want the Postgres function array_agg in combination with row_number() However, your description of the query does not make sense to me.
The following gets clients and salesmen and the list of orders for the highest priced order by salesman:
select client_id, salesman_id, array_agg(order_id)
from (select o.*,
row_number() over (partition by salesman_id order by price desc) as sseqnum,
row_number() over (partition by client_id order by price desc) as cseqnum
from orders o
) o
where sseqnum = 1
group by salesman_id, client_id
I don't know what you mean by "highest priced sales for each salesman and client". Perhaps you want:
where sseqnum = 1 or cseqnum = 1