DB2 SQL - Difference (minus) of CASE WHEN - sql

I am trying to do the difference between 2 case when in a DB2 environment as follow:
select DISTINCT
TLORDER.BILL_NUMBER,
TLORDER.XCHARGES,
TLORDER.CHARGES,
TLORDER.DISTANCE,
TLORDER.CREATED_TIME,
TLORDER.DESTCITY,
(CASE WHEN ODRSTAT.STATUS = '5ARRCONS' THEN MAX(ORDSTAT.CHANGED)
END -
CASE WHEN ODRSTAT.STATUS = 'PICKD' THEN MIN(ODRSTAT.CHANGED)
END) AS DETENTION
FROM ODRSTAT
LEFT JOIN TLORDER ON ODRSTAT.ORDER_ID = TLORDER.DETAIL_LINE_ID
I tried a few variation of this concept using the various online ressources and found many answers for sums but not for difference using the same columns.
The goal is to substract the oldest date (in col. CHANGED) if status in Col. Status is 'pickd' from the newest date (in the same col CHANGED) if the status in Col. STATUS is '5arrcons'
Consider the following dataset:
Key ORDER_ID STATUS CHANGED
1 10 5ARRCONS 12/10/2017
2 10 OTHER 12/10/2017
3 10 PICKD 12/5/2017
4 10 OTHER 12/3/2017
5 10 PICKD 12/1/2017
In this case the wanted result from the CASE statement would be
MAX = 12/10/2017
Min = 12/1/2017
so (12/10/2017 - 12/1/2017) equals 9
9 is what I would want in what is returned
Any and all help will be appreciated.
thank you for your time

It is hard to tell exactly you want want. But, this may be what you want:
SELECT o.BILL_NUMBER, o.XCHARGES, o.CHARGES, o.DISTANCE, o.CREATED_TIME, o.DESTCITY,
(MAX(CASE WHEN os.STATUS = '5ARRCONS' THEN os.CHANGED END) -
MIN(CASE WHEN os.STATUS = 'PICKD' THEN os.CHANGED END)
) AS DETENTION
FROM ODRSTAT os JOIN
TLORDER o
ON os.ORDER_ID = o.DETAIL_LINE_ID
GROUP BY o.BILL_NUMBER, o.XCHARGES, o.CHARGES, o.DISTANCE, o.CREATED_TIME, o.DESTCITY;

Related

How can I replace the LAST() function in MS Access with proper ordering on a rather large table?

I have an MS Access database with the two tables, Asset and Transaction. The schema looks like this:
Table ASSET
Key Date1 AType FieldB FieldC ...
A 2023.01.01 T1
B 2022.01.01 T1
C 2023.01.01 T2
.
.
TABLE TRANSACTION
Date2 Key TType1 TType2 TType3 FieldOfInterest ...
2022.05.31 A 1 1 1 10
2022.08.31 A 1 1 1 40
2022.08.31 A 1 2 1 41
2022.09.31 A 1 1 1 30
2022.07.31 A 1 1 1 30
2022.06.31 A 1 1 1 20
2022.10.31 A 1 1 1 45
2022.12.31 A 2 1 1 50
2022.11.31 A 1 2 1 47
2022.05.23 B 2 1 1 30
2022.05.01 B 1 1 1 10
2022.05.12 B 1 2 1 20
.
.
.
The ASSET table has a PK (Key).
The TRANSACTION table has a composite key that is (Key, Date2, Type1, Type2, Type3).
Given the above tables let's see an example:
Input1 = 2022.04.01
Input2 = 2022.08.31
Desired result:
Key FieldOfInterest
A 41
because if the Transactions in scope was to be ordered by Date2, TType1, TType2, TType3 all ascending then the record having FieldOfInterest = 41 would be the last one.
Note that Asset B is not in scope due to Asset.Date1 < Input1, neither is Asset C because AType != T1. Ultimately I am curious about the SUM(FieldOfInterest) of all the last transactions belonging to an Asset that is in scope determined by the input variables.
The following query has so far provided the right results but after upgrading to a newer MS Access version, the LAST() operation is no longer reliably returning the row which is the latest addition to the Transaction table.
I have several input values but the most important ones are two dates, lets call them InputDate1 and
InputDate2.
This is how it worked so far:
SELECT Asset.AType, Last(FieldOfInterest) AS CurrentValue ,Asset.Key
FROM Transaction
INNER JOIN Asset ON Transaction.Key = Asset.Key
WHERE Transaction.Date2 <= InputDate2 And Asset.Date1 >= InputDate1
GROUP BY Asset.Key, Asset.AType
HAVING Asset.AType='T1'
It is known that the grouped records are not guaranteed to be in any order. Obviously it is a mistake to rely on the order of the records of the group by operation will always keep the original table order but lets just ignore this for now.
I have been struggling to come up with the right way to do the following:
join the Asset and Transaction tables on Asset.Key = Transaction.Key
filter by Asset.Date1 >= InputDate1 AND Transaction.Date2 <= InputDate2
then I need to select one record for all Transaction.Key where Date2 and TType1 and TType2 and TType3 has the highest value. (this represents the actual last record for given Key)
As far as I know there is no way to order records within a group by clause which is unfortunate.
I have tried Ranking, but the Transactions table is large (800k rows) and the performance was very slow, I need something faster than this. The following are an example of three saved queries that I wrote and chained together but the performance is very disappointing probably due to the ranking step.
-- Saved query step_1
SELECT Asset.*, Transaction.*
FROM Transaction
INNER JOIN Asset ON Transaction.Key = Asset.Key
WHERE Transaction.Date2 <= 44926
AND Asset.Date1 >= 44562
AND Asset.aType = 'T1'
-- Saved query step_2
SELECT tr.FieldOfInterest, (SELECT Count(*) FROM
(SELECT tr2.Transaction.Key, tr2.Date2, tr2.Transaction.tType1, tr2.tType2, tr2.tType3 FROM step_1 AS tr2) AS tr1
WHERE (tr1.Date2 > tr.Date2 OR
(tr1.Date2 = tr.Date2 AND tr1.tType1 > tr.Transaction.tType1) OR
(tr1.Date2 = tr.Date2 AND tr1.tType1 = tr.Transaction.tType1 AND tr1.tType2 > tr.tType2) OR
(tr1.Date2 = tr.Date2 AND tr1.tType1 = tr.Transaction.tType1 AND tr1.tType2 = tr.tType2 AND tr1.tType3 > tr.tType3))
AND tr1.Key = tr.Transaction.Key)+1 AS Rank
FROM step_1 AS tr
-- Saved query step_3
SELECT SUM(FieldOfInterest) FROM step_2
WHERE Rank = 1
I hope I am being clear enough so that I can get some useful recommendations. I've been stuck with this for weeks now and really don't know what to do about it. I am open for any suggestions.
Reading the following specification
then I need to select one record for all Transaction.Key where Date2 and TType1 and TType2 and TType3 has the highest value. (this represents the actual last record for given Key)
Consider a simple aggregation for step 2 to retrieve the max values then in step 3 join all fields to first query.
Step 1 (rewritten to avoid name collision and too many columns)
SELECT a.[Key] AS Asset_Key, a.Date1, a.AType,
t.[Key] AS Transaction_Key, t.Date2,
t.TType1, t.TType2, t.TType3, t.FieldOfInterest
FROM Transaction t
INNER JOIN Asset a ON a.[Key] = a.[Key]
WHERE t.Date2 <= 44926
AND a.Date1 >= 44562
AND a.AType = 'T1'
Step 2
SELECT Transaction_Key,
MAX(Date2) AS Max_Date2,
MAX(TType1) AS TType1,
MAX(TType2) AS TType2,
MAX(TType3) AS TType3
FROM step_1
GROUP Transaction_Key
Step 3
SELECT s1.*
FROM step_1 s1
INNER JOIN step_2 s2
ON s1.Transaction_Key = s2.Transaction_Key
AND s1.Date2 = s2.Max_Date2
AND s1.TType1 = s2.Max_TType1
AND s1.TType2 = s2.Max_TType2
AND s1.TType3 = s2.Max_TType3

Create function that returns table in SQL

I wanted to create view with some logic like using (for loop , if .. else) but since that's not supported in SQL
I thought of creating table function that takes no parameter and returns a table.
I have a table for orders as below
OrderId Destination Category Customer
----------------------------------------
6001 UK 5 Adam
6002 GER 3 Jack
And table for tracking orders as below
ID OrderID TrackingID
-----------------------
1 6001 1
2 6001 2
3 6002 2
And here are the types of tracking
ID Name
--------------
1 Processing
2 Shipped
3 Delivered
As you can see in tracking order, The order number may have more than one record depending on how many tracking events occurred.
We have more than 25 tracking types that I didn't include here. which means one order can exist 25 times in tracking order table.
Now with that being said , My requirements is to create view as below with condition that an order must belong to 5 or 3 category ( we have more than 15 categories).
And whenever I run the function it must return the updated information.
So for example, when new tracking occurs and it's inserted in tracking order , I want to run my function and see the update in the corresponding flag column (e.g isDelivered).
I'm really confused on what is the best way to achieve this. I don't need the exact script i just need to understand the way to achieve it as i'm not very familiar with SQL
It could be done with a crosstab query using conditional aggregation. Something like this
select o.OrderID,
max(case when tt.[Name]='Processing' then 1 else 0 end) isPrepared,
max(case when tt.[Name]='Shipped' then 1 else 0 end) isShipped,
max(case when tt.[Name]='Delivered' then 1 else 0 end) isDelivered
from orders o
join tracking_orders tro on o.OrderID=tro.OrderID
join tracking_types tt on tro.TrackingID=tt.TrackingID
where o.category in(3, 5)
group by o.OrderID;
[EDIT] To break out Category 3 orders, 3 additional columns were added to the cross tab.
select o.OrderID,
max(case when tt.[Name]='Processing' then 1 else 0 end) isPrepared,
max(case when tt.[Name]='Shipped' then 1 else 0 end) isShipped,
max(case when tt.[Name]='Delivered' then 1 else 0 end) isDelivered,
max(case when tt.[Name]='Processing' and o.category=3 then 1 else 0 end) isC3Prepared,
max(case when tt.[Name]='Shipped' and o.category=3 then 1 else 0 end) isC3Shipped,
max(case when tt.[Name]='Delivered' and o.category=3 then 1 else 0 end) isC3Delivered
from orders o
join tracking_orders tro on o.OrderID=tro.OrderID
join tracking_types tt on tro.TrackingID=tt.TrackingID
where o.category in(3, 5)
group by o.OrderID;

SQL COUNT with condition and without - using JOIN

My goal is something like following table:
Key | Count since date X | Count total
1 | 4 | 28
With two simple selects I could gain this values: (the key of the table consists of 3 columns [t$ncmp, t$trav, t$seqn])
1. SELECT COUNT(*) FROM db.table WHERE t$date >= sysdate-2 GROUP BY t$ncmp, t$trav, t$seqn
2. SELECT COUNT(*) FROM db.table GROUP BY t$ncmp, t$trav, t$seqn
How can I join these statements?
What I tried:
SELECT n.t$trav, COUNT(n.t$trav), m.total FROM db.table n
LEFT JOIN (SELECT t$ncmp, t$trav, t$seqn, COUNT(*) as total FROM db.table
GROUP BY t$ncmp, t$trav, t$seqn) m
ON (n.t$ncmp = m.t$ncmp AND n.t$trav = m.t$trav AND n.t$seqn = m.t$seqn)
WHERE n.t$date >= sysdate-2
GROUP BY n.t$ncmp, n.t$trav, n.t$seqn
I tried different variantes, but always got errors like 'group by is missing' or 'unknown qualifier'.
Now this at least executes, but total is always 2.
T$TRAV COUNT(N.T$TRAV) TOTAL
4 2 2
29 3 2
51 1 2
62 2 2
16 1 2
....
If it matter, I will run this as an OPENQUERY from MSSQLSERVER to Oracle-DB.
I'd try
GROUP BY n.t$trav, m.total
You typically GROUP BY the same columns as you SELECT - except those who are arguments to set functions.
My goal is something like following table:
If so, you seem to want conditional aggregation:
select key, count(*) as total,
sum(case when datecol >= date 'xxxx-xx-xx' then 1 else 0 end) as total_since_x
from t
group by key;
I'm not sure how this relates to your sample queries. I simply don't see the relationship between that code and your question.

How to identify subsequent user actions based on prior visits

I want to identify the users who visited section a and then subsequently visited b. Given the following data structure. The table contains 300,000 rows and updates daily with approx. 8,000 rows:
**USERID** **VISITID** **SECTION** Desired Solution--> **Conversion**
1 1 a 0
1 2 a 0
2 1 b 0
2 1 b 0
2 1 b 0
1 3 b 1
Ideally I want a new column that flags the visit to section b. For example on the third visit User 1 visited section b for the first time. I was attempting to do this using a CASE WHEN statement but after many failed attempts I am not sure it is even possible with CASE WHEN and feel that I should take a different approach, I am just not sure what that approach should be. I do also have a date column at my disposal.
Any suggestions on a new way to approach the problem would be appreciated. Thanks!
Correlated sub-queries should be avoided at all cost when working with Redshift. Keep in mind there are no indexes for Redshift so you'd have to rescan and restitch the column data back together for each value in the parent resulting in an O(n^2) operation (in this particular case going from 300 thousand values scanned to 90 billion).
The best approach when you are looking to span a series of rows is to use an analytic function. There are a couple of options depending on how your data is structured but in the simplest case, you could use something like
select case
when section != lag(section) over (partition by userid order by visitid)
then 1
else 0
end
from ...
This assumes that your data for userid 2 increments the visitid as below. If not, you could also order by your timestamp column
**USERID** **VISITID** **SECTION** Desired Solution--> **Conversion**
1 1 a 0
1 2 a 0
2 1 b 0
2 *2* b 0
2 *3* b 0
1 3 b 1
select t.*, case when v.ts is null then 0 else 1 end as conversion
from tbl t
left join (select *
from tbl x
where section = 'b'
and exists (select 1
from tbl y
where y.userid = x.userid
and y.section = 'a'
and y.ts < x.ts)) v
on t.userid = v.userid
and t.visitid = v.visitid
and t.section = v.section
Fiddle:
http://sqlfiddle.com/#!15/5b954/5/0
I added sample timestamp data as that field is necessary to determine whether a comes before b or after b.
To incorporate analytic functions you could use:
(I've also made it so that only the first occurrence of B (after an A) will get flagged with the 1)
select t.*,
case
when v.first_b_after_a is not null
then 1
else 0
end as conversion
from tbl t
left join (select userid, min(ts) as first_b_after_a
from (select t.*,
sum( case when t.section = 'a' then 1 end)
over( partition by userid
order by ts ) as a_sum
from tbl t) x
where section = 'b'
and a_sum is not null
group by userid) v
on t.userid = v.userid
and t.ts = v.first_b_after_a
Fiddle: http://sqlfiddle.com/#!1/fa88f/2/0

Select percentage of rows that have a certain value - SQL

Say I have a table like this in Sequel Pro:
SaleID VendorID
1 A
2 C
3 E
4 C
5 D
And I want to find the percentage of sales in which Vendor C was the vendor. (Obviously in this case it is 40%, but I'm working with much bigger tables). How would I do that? I was thinking something using the Count function, but I'm not sure how I would do it exactly. Thanks!
select sum(case when vendorID = 'C' then 1 else 0 end) * 100 / count(*)
from your_table