Replicate the effect of the first. function in SAS within SQL

Replicate the effect of the first. function in SAS within SQL - sql

What is the simplest way (in terms of easily understandable code, not code length or efficiency) to replicate the effect of the first. function in SAS?
Select sum(amount), order_id,
From tablename
Group By order_id
/* Pseudo code below */
Having first.amount = $100
Order by date
What I'm trying to do above is to get the total amount for each order but exclude all order_id for which the first transaction date has an amount > 100. If there are multiple amounts for the same date, then only 1 needs to fit the criteria for the whole order to be removed.
For example, the following orders should be removed
Order_ID Date Amount
1 1/1 45
1 1/1 100
1 1/2 32
2 1/1 100
The following orders should NOT be removed
Order_ID Date Amount
3 1/1 99.99
3 1/2 100
4 1/1 9
4 1/2 100

I see. You are referring to the first variables in a by-processing loop. Your question is pretty unclear.
This is a bit complicated, because the "first" appears to be needed after the aggregation, but it only applies to the first date. I'm not sure what this code would look like in SAS, but the following should have the same effect SQL:
select order_id, sum(amount)
from t
where (select top 1 sum(amount) as first_amount
from t
where t2.order_id = t.order_id
group by date
order by date
) < 100
group by order_id;

Related

How to get correct min, max date for each customer's changing label in wide format in BigQuery?

I have a table that records customer purchases, for example:
customer_id
label
date
purchase_id
price
2
A
2022-01-01
asd
10
3
A
2022-01-01
asdf
5
4
B
2022-02-04
asdfg
200
2
A
2022-01-03
asdjg
4
3
B
2022-02-01
dfs
20
2
G
2022-04-05
fdg
40
2
G
2022-04-10
fdg
40
2
A
2022-06-06
fgd
20
I want to see how many days/money each customer has spent in each label, so far what I'm doing is:
SELECT
customer_id,
label,
COUNT(DISTINCT(purchase_id) as orders_count,
SUM(price) as total_spent,
min(date) as first_date,
max(date) as last_date,
DATE_DIFF(max(date), min(date), DAY) as days
FROM
TABLE
WHERE
date > '2022-01-01'
GROUP BY
customer_id,
label
which gives me a long table, like this:
customer_id
label
orders_count
total_spent
first_date
last_date
days
2
A
3
34
2022-01-01
2022-06-06
180
2
G
1
40
2022-04-05
2022-04-10
5
etc
Just for simplicity I show a few columns, but customers have orders all the time. The issue with the above is that, for example for customer 2, that he starts with label A, then changes to G, then he is back to A so this is not visible in the results table (min(date) is correct, but max(date) takes their 2nd A max(date)) and that I'd prefer to have it in wide format. For instance, ideally, columns called next_label_{i} that you get values for each changing label would be the best for me.
Could you advise me of a way of a) dealing with accomodating with this label change(future label change is the same as an earlier label) and b) a way to produce it into a wide format?
Thanks
edit:
example output (correct date, wide format) [columns would go as wide as the max number of unique labels for any customer]
customer_id
first_label
first_first_date
first_last_date
first_total_spent
first_days
next_label
next_first_date
next_last_date
next_days
next_label_2
next_first_date_2
next_last_date_2
next_days_2
2
A
2022-01-01
2022-01-03
2
14
G
2022-04-05
2022-04-05
0
A
2022-06-06
2022-06-06
0
etc
Sorry this is not exactly accurate (missing the orders_count, total_spent) but it's a pain in the ass for format it here, but hopefully you get the idea. In principle, it's something as if you used python's pivot_table on the previous dataset.
Alternatively, I'd be glad for just a solution in the long format that distinguishes between a customer's label and the same customer's repeated label ( as in customer 2 who starts with A and after changing to G, returns to A)

Could you advise me of ... b) a way to produce it into a wide format?
First, I want to say that I hope you have really good reason to get that output as usually it is not what is considered a best practices and rather is being left for presentation layer to handle.
With that in mind - consider below approach
select * from (
select customer_id, offset, purchase.*
from (
select customer_id,
array_agg((struct(label, date, purchase_id, price)) order by date) purchases
from your_table
group by customer_id
), unnest(purchases) purchase with offset
order by customer_id, offset
)
pivot (
any_value(label) label,
any_value(date) date,
any_value(purchase_id) purchase_id,
any_value(price) price
for offset in (0,1,2,3,4,5)
)
if applied to sample data in your question - output is
Note: Above has silly assumption that you know the max number of steps (in this case I used 6 - from 0 till 5). There are plenty of posts here on SO that shows how to use same technique to make it dynamic. I do not want to duplicate them as it is against SO policies. So, just do your extra homework on this :o)

sum of amount group by condition optimise query

I have a view consists of data from different tables. major fields are BillNo,ITEM_FEE,GroupNo. Actually I need to calculate the total discount by passing the groupNo. The discount calculation is based on the fraction of amount group by BillNo(single Bill no can have multiple entries). If there are multiple transactions for a single BillNo then discount is calculated if decimal part of sum of ITEM_FEE is greater than 0 and if there is only single transaction and the decimal part of ITEM_FEE is greater than 0 then the decimal part will be treated as discount.
I have prepared script and I am getting total discount for a particular groupNo.
declare #GroupNo as nvarchar(100)
set #GroupNo='3051'
SELECT Sum(disc) Discount
FROM --sum(ITEM_FEE) TotalAmoiunt,
(SELECT (SELECT CASE
WHEN ( Sum(item_fee) )%1 > 0 THEN Sum(( item_fee )%1)
END
FROM view_bi_sales VBS
WHERE VBS.billno = VB.billno
GROUP BY billno) Disc
FROM view_bi_sales VB
WHERE groupno = #GroupNo)temp
The problem is that it takes almost 2 minutes to get the result.
Please help me to find the result faster if possible.

Thank you for all your help and support , as I was already calculating sum of decimal part of ITEM_FEE group by BillNo , there was no need of checking greater than 0 or not. below query gives me the desired ouput in less than 10 sec
select sum(discount) from
(select sum((ITEM_FEE)%1) discount from view_BI_Sales
where groupno=3051
group by BillNo )temp

If I understand correctly, you don't need a JOIN. This might help performance:
SELECT SUM(disc) as Discount
FROM (SELECT (CASE WHEN SUM(item_fee % 1) > 0
THEN SUM(item_fee % 1)
END) as disc
FROM view_bi_sales VBS
WHERE groupno = #GroupNo
GROUP BY billno
) vbs;

Retain values till there is a change in value in Teradata

There is a transaction history table in teradata where balance gets changed only when there is a transaction
Data as below:
Cust_id Balance Txn_dt
123 1000 27MAY2018
123 350 31MAY2018
For eg,For a customer(123) on May 27 we have a balance of 1000 and on May 31 there is a transaction made by the customer so balance becomes 350. There is no record maintained for May 28 to May 30 with same balance as on May 27 . I want these days data also to be there (With same balance retained and the date is incremented ) Its like same record has to be retained for rest of the days till there is a change in a balance done by the transaction . How to do this in teradata?
Expected output:
Cust_id Balance Txn_dt
123 1000 27MAY2018
123 1000 28MAY2018
123 1000 29MAY2018
123 1000 30MAY2018
123 350 31MAY2018
Thanks
Sandy
Hi Dnoeth. It seems to work, but can you let me know how to expand till a certain day for eg : till 30JUN2018 ?

There are several ways to get this result, the simplest in Teradata utilizes Time Series Expansion for Periods:
WITH cte AS
(
SELECT Cust_id, Balance, Txn_dt,
-- return the next row's date
Coalesce(Min(Txn_dt)
Over (PARTITION BY Cust_id
ORDER BY Txn_dt
ROWS BETWEEN 1 Following AND 1 Following)
,Txn_dt+1) AS next_Txn_dt
FROM tab
)
SELECT Cust_id, Balance
,Last(pd) -- last day of the period
FROM cte
-- make a period of the current and next row's date
-- and return one row per day
EXPAND ON PERIOD(Txn_dt, next_Txn_dt) AS pd
If you run TD16.10+ you can replace the MIN OVER with a simplified LEAD:
Lead(Txn_dt)
Over (PARTITION BY Cust_id
ORDER BY Txn_dt)

sharing cash with priority to creditors in sql

I have a table in sql 2014 with name "tblPaymentPlan" like this:
Creditors PlanToPay َAmount
----------------------------------
A 2017-01-20 2000
A 2017-02-20 1500
A 2017-03-20 3000
B 2017-01-25 3000
B 2017-02-25 1000
and also another table with name "tblPaid" like following:
Creditors Paid َ
-----------------
A 4500
B 3500
and the result that I expect:
Creditors PlanToPay َRemain
----------------------------------
A 2017-01-20 0
A 2017-02-20 0
A 2017-03-20 2000
B 2017-01-25 0
B 2017-02-25 500
I have no idea for doing this job at all! Would you please to help me to perform this job. Please informed that I have a lot of records in my tables.
I need this query for budget planing. (We can use numbers for defining priority instead of dates)

What you want is a running total of what is owing, from that you can subtract what has been paid.
SELECT Creditors, PlanToPay, IIF(ABS(Remain)!=Remain,0,IIF(Remain<Amount,Remain,Amount)) as Remain
FROM (SELECT pp.Creditors, pp.PlanToPay, pp.Amount,
SUM(pp.Amount) OVER(PARTITION BY pp.Creditors ORDER BY pp.PlanToPay ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)-tp.paid AS Remain
FROM tblPaymentPlan pp
JOIN (SELECT creditors, sum(paid) as paid from tblpaid group by creditors) tp
ON pp.creditors = tp.creditors) ss
ORDER By Creditors, PlanToPay
SQLFiddle
In the windowing function (SUM OVER) the PARTITION separates the creditors, the ORDER determines how the rows are arranged (by date), and the ROWS clause tells it to use all the rows in the partition before this row and include this row in the running total. We then subtract the sum of everything paid to that creditor from this running total.
This of course gives us alot of negative numbers, so we do it in a subquery. The main query checks if the absolute value of that remaining is equal to the value, true if it's positive, false if it is not, and returns the value remaining if true, or 0 if not.
UPDATE - added handling for multiple rows with value still owing

You can subtract the running total from amount in paid table and if it is less than 0, set remain to 0 else the difference of amount from the running total.
select pp.creditors,pp.plantopay,
case when sum(pp.amount) over(partition by pp.creditors order by pp.plantopay)-coalesce(pd.paid,0) <= 0 then 0
else sum(pp.amount) over(partition by pp.creditors order by pp.plantopay)-coalesce(pd.paid,0) end as remain
from tblpaymentplan pp
left join tblPaid pd on pp.creditors=pd.creditors

sql DB calculation moving summary‏‏‏‏‏

I would like to calculate moving summary‏‏‏‏‏:
Total amount:100
first receipt: 20
second receipt: 10
the first row in calculation column is a difference between total amount and the first receipt: 100-20=80
the second row in calculation column is a difference between the first calculated_row and the first receip: 80-10=70
The presentation is supposed to present receipt_amount, balance:
receipt_amount | balance
20 | 80
10 | 70
I'll be glad to use your help
Thanks :-)

You didn't really give us much information about your tables and how they are structured.
I'm assuming that there is an orders table that contains the total_amount and a receipt_table that contains each receipt (as a positive value):
As you also didn't specify your DBMS, this is ANSI SQL:
select sum(amount) over (order by receipt_nr) as running_sum
from (
select total_amount as amount
from orders
where order_no = 1
union all
select -1 * receipt_amount
from the_receipt_table
where order_no =
) t

First of all- thanks for your response.
I work with Cache DB which can be used both SQL and ORACLE syntax.
Basically, the data is locaed in two different tables, but I have them in one join query.
Couple of rows with different receipt amounts and each row (receipt) has the same total amount.
Foe example:
Receipt_no Receipt_amount Total_amount Balance
1 20 100 80
1 10 100 70
1 30 100 40
2 20 50 30
2 10 50 20
So, the calculation is supposed to be in a way that in the first receipt the difference calculation is made from the total_amount and all other receipts (in the same receipt_no) are being reduced from the balance
Thanks!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Replicate the effect of the first. function in SAS within SQL - sql

Related

How to get correct min, max date for each customer's changing label in wide format in BigQuery?

sum of amount group by condition optimise query

Retain values till there is a change in value in Teradata

sharing cash with priority to creditors in sql

sql DB calculation moving summary‏‏‏‏‏

Categories

Resources