SQL Group based on individual row chaining to another individual row - sql

I don't quite know how to ask this question better than this. Effectively I have a transaction table. This table per customer has 1 to many rows of transactions for that customers. Per row, it marks the customer ID of the previous customer that occurred before it. For example:
Cust_ID Tran_Type Prev_ID
10 A 9
10 B 9
9 T 7
9 A 7
8 B ~
8 A ~
7 T ~
In this example, cust 7 is the starting customer for the day for an individual using this program. They then started working on customer 9 and then finally customer 10. In addition, for another individual they started with customer 8 and didn't do another transaction the entire day. The two groups I'd expect is group A which is comprised of customer 7, 9, 10 and group B comprised of customer 8 only.
I'm honestly stumped on this one. Does anyone have any advice? I'm fairly certain I want to start by grouping on the unique customer ID's and previous ID's which will give me:
Cust_ID Prev_ID
10 9
9 7
8 ~
7 ~
At this point though I'm not sure how else to do it using vanilla sql. Thanks.

should just be a group by
select
custid, prev_id
from transactiontable
group by custid,previd

You are right, you'd start with the distinct rows. Then recursively go up from the records without previous transactions.
with pairs as
(
select distinct cust_id, prev_id from transactions
)
, groups (cust_id, prev_id, grp, pos) as
(
select cust_id, prev_id, row_number() over (order by cust_id), 1
from pairs
where prev_id is null
union all
select p.cust_id, p.prev_id, g.grp, g.pos + 1
from pairs p
join groups g on g.cust_id = p.prev_id
)
select cust_id, prev_id, grp
from groups
order by grp, pos;
REXTESTER demo: http://rextester.com/NZGLU84962

Related

Retrieve records from versioned table

this sql case has been troubling me for a while and I wanted to ask here what other folks think.
I have a table user who owns vehicles, but the same vehicle maybe owned by multiple user over time, there is another column called effective_date which tells from what day this is owning is effective. Two driver doesn't own the same vehicle, but records are versioned, meaning we can check who owned this vehicle 2 years ago, or 5 years ago using effective date.
Table has following columns,
id, version, name, vehicle_id, effective_date. Every change to this table is versioned
Now there is another table called accidents which tells what accident with vehicle and when, not versioned
it has id, description, vehicle_id, acc_date
Now I am trying to select all accidents and who caused the accident. Inner join doesn't work here, What I do is select all rows from accident table and run sub query for each row and find the user's id and version that was responsible for the cause. This will be super slow and I am looking for more performant way of organizing the date or constructing a query. Right now it runs a subquery for every row it selects from accident table, because each row has different accident date. I am ok doing few queries if there is easy way of doing within a single query.
Example
user table
id
version
name
vehicle_id
effective_date
1
1
A
1
01/10/2021
1
2
A
2
02/10/2021
2
1
B
1
03/10/2021
2
2
B
2
04/10/2021
accident:
id
description
vehicle_id
acc_date
1
hit1
1
03/5/2021
2
hit2
1
03/15/2021
Result:
user_id
user_version
acc_id
vehicle_id
acc_date
1
1
1
1
03/5/2021
2
1
2
1
03/15/2021
thanks for your help
To get the latest user at the time of the accident you can use ROW_NUMBER() sorting by descending effective_date. With this ordering the first user listed for each accident is the responsible one.
For example:
select *
from (
select *,
row_number() over(partition by u.vehicle_id
order by effective_date desc) as rn
from user u
join accident a on a.vehicle_id = u.vehicle_id
where u.effective_date <= a.acc_date
) x
where rn = 1
Select user_id, user_version,
acc_id, vehicle_id, acc_date from(
Select rownumber() over
(Partition by a.id, a vehicle_id,
b.id) sn ,a.id
as user_id, a.version
as user_version,
b.id as acc_id, a.vehicle_id,
acc_date from user a
Inner Join
Accident b on
a.vehicle_id = b.vehicle_id) a
where sn = 1

How do I retrieve the row with the max value from this table?

Hopefully this is the correct place to ask this question. In this SQL cross-join exercise from codeacademy with the following code:
SELECT month, COUNT(*) FROM newspaper
CROSS JOIN months
WHERE (start_month<=month) & (end_month>=month)
GROUP BY month;
Returns a table
month
COUNT(*)
1
2
2
9
3
13
4
17
5
27
6
30
7
20
8
22
9
21
10
19
11
15
12
10
How can I then retrieve the row with the max COUNT(*) from this table?
{month:6, COUNT( * ):30}?
I tried the following which doesn't work (returns blank on the website):
SELECT * FROM
(SELECT month, COUNT(*) FROM newspaper
CROSS JOIN months
WHERE (start_month<=month) & (end_month>=month)
GROUP BY month)
WHERE COUNT(*)=(
SELECT MAX(COUNT(*)) FROM
(SELECT month, COUNT(*) FROM newspaper
CROSS JOIN months
WHERE (start_month<=month) & (end_month>=month)
GROUP BY month)
);
Preferably, I would to this to work without renaming COUNT(*).
P.S: No idea which SQL dialect codeacademy uses.
I realized we don't need a CTE to do this, you can simply do:
SELECT TOP(1) month, COUNT(*) FROM newspaper
CROSS JOIN months
WHERE (start_month<=month) & (end_month>=month)
GROUP BY month
ORDER BY 2 DESC
;
This will grab the top row, and it will be ordered by the highest count. I am unsure of language used by CodeAcademy, but every language I know of can grab the top row in some fashion.
Edit:
I see someone posted that CodeAcademy uses SQLite, which uses Limit to get X amount of rows. So you can use:
SELECT month, COUNT(*) FROM newspaper
CROSS JOIN months
WHERE (start_month<=month) & (end_month>=month)
GROUP BY month
ORDER BY 2 DESC
LIMIT 1
;

Retrieve last record in a group based on string - DB2

I have a table with transactional data in a DB2 database that I want to retrieve the last record, per location and product. The date is unfortunately stored as a YYYYMMDD string. There is not a transaction id or similar field I can key in on. There is no primary key.
DATE
LOCATION
PRODUCT
QTY
20210105
A
P1
4
20210106
A
P1
3
20210112
A
P1
7
20210104
B
P1
3
20210105
B
P1
1
20210103
A
P2
6
20210105
A
P2
5
I want to retrieve results showing the last transaction per location, per product, so the results should be:
DATE
LOCATION
PRODUCT
QTY
20210112
A
P1
7
20210105
B
P1
1
20210105
A
P2
5
I've looked at answers to similar questions but for some reason can't make the jump from an answer that addresses a similar question to code that works in my environment.
Edit: I've tried the code below, taken from an answer to this question. It returns multiple rows for a single location/part combination. I've tried the other answers in that question to, but have not had luck getting them to execute.
SELECT *
FROM t
WHERE DATE > '20210401' AND DATE in (SELECT max(DATE)
FROM t GROUP BY LOCATION) order by PRODUCT desc
Thank you!
You can use ROW_NUMBER(). For example, if your table is called t you can do:
select *
from (
select *,
row_number() over(partition by location, product
order by date desc) as rn
from t
) x
where rn = 1
You can use lead() to get the last row before a change:
select t.*
from (select t.*,
lead(date) over (partition by location, product order by date) as next_lp_date,
lead(date) over (order by date) as next_date
from t
) t
where next_lp_date is null or next_lp_date <> next_date
It looks like you just needed to match your keys within the subselect.
SELECT *
FROM t T1
WHERE DATE > '20210401'
AND DATE in (SELECT max(DATE) FROM t T2 WHERE T2.Location = T1.Location and T2.Product=T1.Product)

RANK() function with over is creating ranks dynamically for every run

I am creating ranks for partitions of my table. Partitions are performed by name column with ordered by its transaction value. While I am generating these partitions and checking count for each of the ranks, I get different number in each rank for every query run I do.
select count(*) FROM (
--
-- Sort and ranks the element of RFM
--
SELECT
*,
RANK() OVER (PARTITION BY name ORDER BY date_since_last_trans desc) AS rfmrank_r,
FROM (
SELECT
name,
id_customer,
cust_age,
gender,
DATE_DIFF(entity_max_date, customer_max_date, DAY ) AS date_since_last_trans,
txncnt,
txnval,
txnval / txncnt AS avg_txnval
FROM
(
SELECT
name,
id_customer,
MAX(cust_age) AS cust_age,
COALESCE(APPROX_TOP_COUNT(cust_gender,1)[OFFSET(0)].VALUE, MAX(cust_gender)) AS gender,
MAX(date_date) AS customer_max_date,
(SELECT MAX(date_date) FROM xxxxx) AS entity_max_date,
COUNT(purchase_amount) AS txncnt,
SUM(purchase_amount) AS txnval
FROM
xxxxx
WHERE
date_date > (
SELECT
DATE_SUB(MAX(date_date), INTERVAL 24 MONTH) AS max_date
FROM
xxxxx)
AND cust_age >= 15
AND cust_gender IN ('M','F')
GROUP BY
name,
id_customer
)
)
)
group by rfmrank_r
For 1st run I am getting
Row f0
1 3970
2 3017
3 2116
4 2118
For 2nd run I am getting
Row f0
1 4060
2 3233
3 2260
4 2145
What can be done, If I need to get same number of partitions getting ranked same for each run
Edit:
Sorry for the blurring of fields
This is the output of field ```query to get this column````
The RANK window function determines the rank of a value in a group of values.
Each value is ranked within its partition. Rows with equal values for the ranking criteria receive the same rank. Drill adds the number of tied rows to the tied rank to calculate the next rank and thus the ranks might not be consecutive numbers.
For example, if two rows are ranked 1, the next rank is 3.

How to do a complex calculation as this sample

In the stored procedure (I'm using SQL server2008), I'm having a business like this sample:
ID City Price Sold
1 A 10 3
1 B 10 5
1 A 10 1
1 B 10 3
1 C 10 5
1 C 10 2
2 A 10 1
2 B 10 6
2 A 10 3
2 B 10 4
2 C 10 3
2 C 10 4
What I want to do is:
with each ID, sort by City first.
After sort, for each row of this ID, re-calculate Sold from top to bottom with condition: total of Sold for each ID does not exceed Price (as the result below).
And the result like this:
ID City Price Sold_Calculated
1 A 10 3
1 A 10 1
1 B 10 5
1 B 10 1 (the last one equal '1': Total of Sold = Price)
1 C 10 0 (begin from this row, Sold = 0)
1 C 10 0
2 A 10 1
2 A 10 3
2 B 10 6
2 B 10 0 (begin from this row, Sold = 0)
2 C 10 0
2 C 10 0
And now, I'm using the Cursor to do this task: Get each ID, sort City, calculate Sold then, and save to temp table. After finish calculating, union all temp tables. But it take a long time.
What I know people advise is, DO NOT use Cursor.
So, with this task, can you give me the example (with using select form where group) to finish? or do we have other ways to solve it quickly?
I understand this task is not easy for you, but I still post here, hope that there is someone helps me to go through.
I'm very appriciated for your help.
Thanks.
In order to accomplish your task you'll need to calculate a running sum and use a case statement
Previously I used a JOIN to do the running sum and Lag with the case statement
However using a recursive Cte to calculate the running total as described here by Aaron Bertand, and the case statement by Andriy M we can construct the following, which should offer the best performance and doesn't need to "peek at the previous row"
WITH cte
AS (SELECT Row_number()
OVER ( partition BY id ORDER BY id, city, sold DESC) RN,
id,
city,
price,
sold
FROM table1),
rcte
AS (
--Anchor
SELECT rn,
id,
city,
price,
sold,
runningTotal = sold
FROM cte
WHERE rn = 1
--Recursion
UNION ALL
SELECT cte.rn,
cte.id,
cte.city,
cte.price,
cte.sold,
rcte.runningtotal + cte.sold
FROM cte
INNER JOIN rcte
ON cte.id = rcte.id
AND cte.rn = rcte.rn + 1)
SELECT id,
city,
price,
sold,
runningtotal,
rn,
CASE
WHEN runningtotal <= price THEN sold
WHEN runningtotal > price
AND runningtotal < price + sold THEN price + sold - runningtotal
ELSE 0
END Sold_Calculated
FROM rcte
ORDER BY id,
rn;
DEMO
As #Gordon Linoff commented, the order of sort is not clear from the question. For the purpose of this answer, I have assumed the sort order as city, sold.
select id, city, price, sold, running_sum,
lag_running_sum,
case when running_sum <= price then Sold
when running_sum > price and price > coalesce(lag_running_sum,0) then price - coalesce(lag_running_sum,0)
else 0
end calculated_sold
from
(
select id, city, price, sold,
sum(sold) over (partition by id order by city, sold
rows between unbounded preceding and current row) running_sum,
sum(sold) over (partition by id order by city, sold
rows between unbounded preceding and 1 preceding) lag_running_sum
from n_test
) n_test_running
order by id, city, sold;
Here is the demo for Oracle.
Let me break down the query.
I have used SUM as analytical function to calculate the running sum.
The first SUM, groups the rows based on id, and in each group orders the row by city and sold.
The rows between clause tell which rows to be considered for adding up. Here i have specified it to add
current row and all other rows above it. This gives the running sum.
The second one does the same thing except for, the current row is excluded from adding up. This
essentially creates a running sum but lagging the previous sum by one row.
Using this result as inline view, the outer select makes use of CASE statement to determine the
value of new column.
As long as the running sum is less than or equal to price it gives sold.
If it crosses the price, the value is adjusted so that sum becomes equal to price.
For the rest of the rows below it, value is set as 0.
Hope my explanation is quite clear.
To me, it sounds like you could use window functions in a case like this. Is this applicable?
Although in my case your end result would possibly look like:
ID City Price Sold_Calculated
2 A 10 4
2 B 10 6
2 C 10 0
Which could have an aggregation like
SUM(Sold_Calculated) OVER (PARTITION BY ID, City, Price, Sold_Calculated)
depending on how far down you want to go.. You could even use a case statement if need be
Are you looking to do this entirely in SQL? A simple approach would be this:
SELECT C.ID,
C.City,
C.Price,
calculate_Sold_Function(C.ID, C.Price) AS C.Sold_Calculated
FROM CITY_TABLE C
GROUP BY C.City
Where calculate_Sold_Function is a T-SQL/MySQL/etc function taking the ID and Price as parameters. No idea how you plan on calculating price.