Running SQL function in HUE IMPALA - sql

I have started working on HUE IMPALA and I am stuck at a complex problem which I am not able to get through. So my table looks like this.
Id
Month
Base Rate
Payment
New Payment
a
Jan
1
100
NULL
a
Feb
1
100
NULL
a
Mar
1
100
NULL
a
Apr
2
NULL
NULL
a
May
3
NULL
NULL
a
Jun
4
NULL
NULL
a
Jul
5
NULL
NULL
So my aim is to fill the values in new payment column with this logic.
if Payment IS NULL THEN New Payment = (New Base Rate (current base rate - previous base rate)* Previous Payment) + Previous Payment... ELSE Payment
Eg: For mar new payment = 100
But for Apr, New Payment = 100 + (100* (1-1)) = 100
For this I have written the following code:
Select id, month,
CASE WHEN payment is NULL then
LAG(payment)
over(Partition BY id order by month) +
((LAG(payment)
over(Partition BY id order by month))*
(base_rate-lag(base_rate)
OVER (Partition by id order by month)))
Else payment end as New Payment
With this I get following answer
Id
Month
Base Rate
Payment
New Payment
a
Jan
1
100
100
a
Feb
1
100
100
a
Mar
1
100
100
a
Apr
2
NULL
100
a
May
3
NULL
NULL
a
Jun
4
NULL
NULL
a
Jul
5
NULL
NULL
Now the problem is the New Payment variable stops at May Month because there is NULL value in the Previous month (Apr) in the payment column. What I want is once the NULL value comes in the payment column, the code then starts using the updated value in new payment column in the above mentioned logic. So the answer I want is this:
Id
Month
Base Rate
Payment
New Payment
a
Jan
1
100
100
a
Feb
1
100
100
a
Mar
1
100
100
a
Apr
2
NULL
100
a
May
3
NULL
200
a
Jun
4
NULL
400
a
Jul
5
NULL
800
May -- New Payment = 100 + (100*(2-1)) = 200
June -- New Payment = 200 + (200 * (3-2))= 400
It's okay if a new variable needs to be created or if I have to split this code into multiple parts like create a table first then apply the rest of the logic. Entirely new logic which doesn't use the lag function is also welcome.

Related

Transforming a dataset containing bank transactions into SQL Server

I would like to transform a dataset containing some bank transactions.
The ultimate goal is to make a report in Power BI to track daily expenses.
For this, I have the following situation that gives me a headache. :)
This is an example:
Date
Transaction_Details
Debit
Credit
21 Jan 2023
Transfer HomeBank
500
NULL
NULL
Reference: 4944
NULL
NULL
NULL
Beneficiary: David John
NULL
NULL
NULL
In Bank Account: RO97INGB1111333380218
500
NULL
20 Jan 2023
POS Payment
36
NULL
NULL
Card number: xxxx xxxx xxxx 1020
NULL
NULL
NULL
Terminal: OZKARDES A/S
NULL
NULL
NULL
Date: 19-01-2023
NULL
NULL
The desired output would be to transpose all rows in Transaction_Details that have NULL values in Date column, into a new column (e.g Other_Details) and for each transaction to add another column with "Transaction_Key".
Below, I have attached an example:
Transaction_Key
Date
Transaction_Details
Other_Details
Debit
Credit
1
21 Jan 2023
Transfer HomeBank
Reference: 4944, Beneficiary: David John, In Bank Account: RO97INGB1111333380218
500
NULL
2
20 Jan 2023
POS Payment
Card number: xxxx xxxx xxxx 1020, Terminal: OZKARDES A/S, Date: 19-01-2023
36
NULL
I used some COALESCE functions but it didn't work.
If we can assume you are able to create an Id/Sequence either in the data source or when importing the data, such that you end up with an incrementing number per row, then by using a windowed aggregation as follows you can convert your data as required:
select Transaction_Key,
Max(Date) Date,
Max(case when date is not null then Transaction_Details end) Transaction_Details,
String_Agg(case when date is null then Transaction_Details end, ',') Other_details,
Max(case when date is not null then Debit end) Debit,
Max(case when date is not null then Credit end) Credit
from (
select *,
Sum(case when date is null then 0 else 1 end) over(order by id) as Transaction_Key
from t
)t
group by Transaction_Key;
See this example Fiddle

Linear Interpolation in SQL

I work with crashes and mileage for the same year which is Year in table. Crashes are are there for every record, but annual mileage is not. NULLs for mileage could be at the beginning or at the end of the time period for certain customer. Also, couple of annual mileage records can be missing as well. I do not know how to overcome this. I try to do it in CASE statement but then I do not know how to code it properly. Issue needs to be resolved in SQL and use SQL Server.
This is how the output looks like and I need to have mileage for every single year for each customer.
The info I am pulling from is proprietary database and the records themselves should be untouched as is. I just need code in query which will modify my current output to output where I have mileage for every year. I appreciate any input!
Year
Customer
Crashes
Annual_Mileage
2009
123
5
3453453
2010
123
1
NULL
2011
123
0
54545
2012
123
14
376457435
2013
123
3
63453453
2014
123
4
NULL
2015
123
15
6346747
2016
123
0
NULL
2017
123
2
534534
2018
123
7
NULL
2019
123
11
NULL
2020
123
15
565435
2021
123
12
474567546
2022
123
7
NULL
Desired Results
Year
Customer
Crashes
Annual_Mileage
2009
123
5
3453453
2010
123
1
175399 (prior value is taken)
2011
123
0
54545
2012
123
14
376457435
2013
123
3
63453453
2014
123
4
34900100 (avg of 2 adjacent values)
2015
123
15
6346747
2016
123
0
3440641 (avg of 2 adjacent values)
2017
123
2
534534
2018
123
7
534534 ( prior value is taken)
2019
123
11
549985 (avg of 2 adjacent values)
2020
123
15
565435
2021
123
12
474567546
2022
123
7
474567546 (prior value is taken)
SELECT Year,
Customer,
Crashes,
CASE
WHEN Annual_Mlg IS NOT NULL THEN Annual_Mlg
WHEN Annual_Mlg IS NULL THEN
CASE
WHEN PREV.Annual_Mlg IS NOT NULL
AND NEXT.Annual_Mlg IS NOT NULL
THEN ( PREV.Annual_Mlg + NEXT.Annual_Mlg ) / 2
ELSE 0
END
END AS Annual_Mlg
FROM #table
The above code doesn't work, but I just need to start somehow and that what I have currently.
I understand what I need to do I just do not know how to code it in SQL.
After i applied row_number () function i got this output for first 2 clients and for the rest of the 4 clients row_number() function gave correct output. i have no idea why is that. I thought may be because i used "full join" before to combine mileage and crashes table?
enter image description here
Your use of #table tells me that you're using MS SQL Server (a temporary table, probably in a stored procedure).
You want to:
select all the rows in #table
joined with the matching row (if any) for the previous year, and
joined with the matching row (if any) for the next year
Then it's easy. Assuming the primary key on your #table is composed of the year and customer columns, something like this ought to do you:
select t.year ,
t.customer ,
t.crashes ,
annual_milage = coalesce(
t.annual_milage ,
( coalesce( p.annual_mileage, 0 ) +
coalesce( n.annual_mileage, 0 )
) / 2
)
from #table t -- take all the rows
left join #table p on p.year = t.year - 1 -- with the matching row for
and p.customer = t.customer -- the previous year (if any)
left join #table n on n.year = t.year + 1 -- and the matching row for
and n.customer = t.customer -- the next year (if any)
Notes:
What value you default to if the previous or next year doesn't exist is up to you (zero? some arbitrary value?)
Is the previous/next year guaranteed to be the current year +/- 1?
If not, you may have to use derived tables as the source for the
prev/next data, selecting the closest previous/next year (that sort
of thing rather complicates the query significantly).
Edited To Note:
If you have discontiguous years for each customer such that the "previous" and "next" years for a given customer are not necessarily the current year +/- 1, then something like this is probably the most straightforward way to find the previous/next year.
We use a derived table in our from clause, and assign a sequential number in lieu of year for each customer, using the ranking function row_number() function. This query, then
select row_nbr = row_number() over (
partition by x.customer
order by x.year
) ,
x.*
from #table x
would produce results along these lines:
row_nbr
customer
year
...
1
123
1992
...
2
123
1993
...
3
123
1995
...
4
123
2020
...
1
456
2001
...
2
456
2005
...
3
456
2020
...
And that leads us to this:
select year = t.year ,
customer = t.customer ,
crashes = t.crashes ,
annual_mileage = coalesce(
t.mileage,
coalesce(
t.annual_mileage,
(
coalesce(p.annual_mileage,0) +
coalesce(n.annual_mileage,0)
) / 2
),
)
from (
select row_nbr = row_number() over (
partition by x.customer
order by x.year
) ,
x.*
from #table x
) t
left join #table p on p.customer = t.customer and p.row_nbr = t.row_nbr-1
left join #table n on n.customer = t.customer and n.row_nbr = t.row_nbr+1

How to bring corresponding data in the column

How to bring the corresponding data in new columns by comparing there other attributes. here in the below table we have 2 weeks of data along with Store ID and Price type, if the price type is "Regular" then we have to add "Reduced" price with same criteria (Year, Week, StoreID) in the new column and if the price type is "Reduced" then we have to add "Regular" price with same criteria (Year, Week, StoreID) in the new column.
Year
Week
StoreID
PriceType
Price
2021
10
S
Regular
200
2021
10
S
Reduced
150
2021
10
D
Regular
180
2021
10
D
Reduced
120
2021
9
S
Regular
35
2021
9
D
Reduced
40
Has to be change like the below table, in the below output table, "Reduced/Regular" value is 150 in row number 1 because 150 is the corresponding value for 200 with criteria (2021, 10, S) and in 2nd row the Reduced/Regular value is 200 because 200 is the corresponding vale for 150 with criteria (2021, 10, S).
But last 2 rows for week 9 will gives 0 because we don't have corresponding criteria.
Year
Week
StoreID
PriceType
Price
Reduced/Regular
2021
10
S
Regular
200
150
2021
10
S
Reduced
150
200
2021
10
D
Regular
180
120
2021
10
D
Reduced
120
180
2021
9
S
Regular
35
0
2021
9
D
Reduced
40
0
Kindly help with this logic Thanks in advance
You can use window functions and conditional logic:
select t.*,
(case when priceType = 'Regular'
then max(case when priceType = 'Reduced' then price end) over (partition by year, week, storeId)
else max(case when priceType = 'Regular' then price end) over (partition by year, week, storeId)
end) as other_price
from t;
Happily, this is standard SQL and will work in any database.

Aggregate payments per year per customer per type

Please consider the following payment data:
customerID paymentID pamentType paymentDate paymentAmount
---------------------------------------------------------------------
1 1 A 2015-11-28 500
1 2 A 2015-11-29 -150
1 3 B 2016-03-07 300
2 4 A 2015-03-03 200
2 5 B 2016-05-25 -100
2 6 C 2016-06-24 700
1 7 B 2015-09-22 110
2 8 B 2016-01-03 400
I need to tally per year, per customer, the sum of the diverse payment types (A = invoice, B = credit note, etc), as follows:
year customerID paymentType paymentSum
-----------------------------------------------
2015 1 A 350 : paymentID 1 + 2
2015 1 B 110 : paymentID 7
2015 1 C 0
2015 2 A 200 : paymentID 4
2015 2 B 0
2015 2 C 0
2016 1 A 0
2016 1 B 300 : paymentID 3
2016 1 C 0
2016 2 A 0
2016 2 B 300 : paymentID 5 + 8
2016 2 C 700 : paymentId 6
It is important that there are values for every category (so for 2015, customer 1 has 0 payment value for type C, but still it is good to see this).
In reality, there are over 10 payment types and about 30 customers. The total date range is 10 years.
Is this possible to do in only SQL, and if so could somebody show me how? If possible by using relatively easy queries so that I can learn from it, for instance by storing intermediary result into a #temptable.
Any help is greatly appreciated!
a simple GROUP BY with SUM() on the paymentAmount will gives you what you wanted
select year = datepart(year, paymentDate),
customerID,
paymentType,
paymentSum = sum(paymentAmount)
from payment_data
group by datepart(year, paymentDate), customerID, paymentType
This is a simple query that generates the required 0s. Note that it may not be the most efficient way to generate this result set. If you already have lookup tables for customers or payment types, it would be preferable to use those rather than the CTEs1 I use here:
declare #t table (customerID int,paymentID int,paymentType char(1),paymentDate date,
paymentAmount int)
insert into #t(customerID,paymentID,paymentType,paymentDate,paymentAmount) values
(1,1,'A','20151128', 500),
(1,2,'A','20151129',-150),
(1,3,'B','20160307', 300),
(2,4,'A','20150303', 200),
(2,5,'B','20160525',-100),
(2,6,'C','20160624', 700),
(1,7,'B','20150922', 110),
(2,8,'B','20160103', 400)
;With Customers as (
select DISTINCT customerID from #t
), PaymentTypes as (
select DISTINCT paymentType from #t
), Years as (
select DISTINCT DATEPART(year,paymentDate) as Yr from #t
), Matrix as (
select
customerID,
paymentType,
Yr
from
Customers
cross join
PaymentTypes
cross join
Years
)
select
m.customerID,
m.paymentType,
m.Yr,
COALESCE(SUM(paymentAmount),0) as Total
from
Matrix m
left join
#t t
on
m.customerID = t.customerID and
m.paymentType = t.paymentType and
m.Yr = DATEPART(year,t.paymentDate)
group by
m.customerID,
m.paymentType,
m.Yr
Result:
customerID paymentType Yr Total
----------- ----------- ----------- -----------
1 A 2015 350
1 A 2016 0
1 B 2015 110
1 B 2016 300
1 C 2015 0
1 C 2016 0
2 A 2015 200
2 A 2016 0
2 B 2015 0
2 B 2016 300
2 C 2015 0
2 C 2016 700
(We may also want to play games with a numbers table and/or generate actual start and end dates for years if the date processing above needs to be able to use an index)
Note also how similar the top of my script is to the sample data in your question - except it's actual code that generates the sample data. You may wish to consider presenting sample code in such a way in the future since it simplifies the process of actually being able to test scripts in answers.
1CTEs - Common Table Expressions. They may be thought of as conceptually similar to temp tables - except we don't actually (necessarily) materialize the results. They also are incorporated into the single query that follows them and the whole query is optimized as a whole.
Your suggestion to use temp tables means that you'd be breaking this into multiple separate queries that then necessarily force SQL to perform the task in an order that we have selected rather than letting the optimizer choose the best approach for the above single query.

I am trying a proper logic for stock managment

I have a table StockManagement it looks like this
PId Qty Dateof Booking DateOfReturning
1 5 1 Jan 2013 3 Jan 2013
1 5 1 Jan 2013 4 Jan 2012
Now let's suppose I have 10 quantity of Product A. As my table shows I had Issue 5 Product A from 1 JAn to 3 Jan And another 5 From 1 Jan to 4 Jan .Now My customer want to book Product A from 4 Jan to 7 Jan.As you can see in table the 5 Product A will be return on 3 Jan so I can issue 5 Product from 4 Jan. This is what I want to do through query.
So please help me to get the available quantity between two dates.
select count(s1.qty) from StockManagement s1 inner join
StockManagement s2 on s1.PId=s2.PId
where to_date(s1.DateofBooking,'dd/mm/yyyy') not
between to_date(s2.DateofBooking,'dd/mm/yyyy') and
to_date(s2.DateOfReturning,'dd/mm/yyyy')
Try this. meanwhile I will try to post you a live demo
SQL_LIVE_DEMO
declare #BkDate datetime, #qty int, #Pid int
select #BkDate='04-Jan-2013', #qty=5, #Pid=1
select sum(Qty) AvailableQty
from StockManagement
where
Id=#Pid AND
#BkDate not between DateOfBooking and DateOfReturning
having sum(Qty)>=#qty