Snowflake SQL - Forward / Back fill a column with multiple values - sql

I am trying to forward and back fill a column that has multiple values. My end result should be a filled column with duplicated values in the index.
My current dataset has a report month and a category. The category column could have a dynamic amount of category numbers with null values in between. There is 1 column for report month that is a distinct value.
reportmonth
category
2020-01
null
2021-02
5
2021-03
null
2021-04
null
2021-05
10
2021-05
5
2021-06
null
2021-07
null
Here is the dataset that I am expecting:
reportmonth
category
2020-01
5
2021-01
10
2021-02
5
2021-02
10
2021-03
10
2021-03
5
2021-04
5
2021-04
10
2021-05
5
2021-05
10
2021-06
5
2021-06
10
2021-07
5
2021-07
10
I've tried using
first_value(category) ignore nulls OVER (ORDER BY reportmonth ROWS BETWEEN CURRENT ROW AND UNBOUNDED following) AS forward_fill
but this seems to stop once it hits the next category number. It also does not duplicate the report months.
There are other posts / questions similar to this, however none of them need to have the category repeat by the reportmonth.
Any help would be greatly appreciated.

You need to cross join the distinct reportmonth values with the distinct category values, try the following:
Select R.reportmonth, C.category
FROM
(
Select Distinct reportmonth
From yourTbl
) R
Cross Join
(
Select Distinct category
From yourTbl Where category is not null
) C
Order By R.reportmonth
The output according to your sample data:

Related

How to get the last day of the month without LAST_DAY() or EOMONTH()?

I have a table t with:
DATE
LOCATION
PRODUCT_ID
AMOUNT
2021-10-29
1
123
10
2021-10-30
1
123
9
2021-10-31
1
123
8
2021-10-29
1
456
100
2021-10-30
1
456
90
2021-10-31
1
456
80
2021-10-29
2
123
18
2021-10-30
2
123
17
2021-11-29
2
456
18
I need to find the AMOUNT of each PRODUCT_ID for each combination of LOCATION + PRODUCT_ID.
If a PRODUCT_ID has no entry for that day the AMOUNT is NULL.
So the result should look like:
DATE
LOCATION
PRODUCT_ID
AMOUNT
2021-10-31
1
123
8
2021-10-31
1
456
80
2021-10-31
2
123
NULL
2021-11-30
2
456
NULL
Sadly EXASOL has no LAST_DAY() or EOMONTH() function. How can I solve this?
You can get to the last day of the month using a date_trunc function in combination with date_add:
case
when t.date = date_add('day', -1, date_add('month', 1, date_trunc('month', t.date)))
then 'Y' else 'N' end as end_of_month
That being said, if you group your table for all combinations of locations and products, you will not get NULLs for products without sales on the last day of the month as shown in your output table.
When you group your data, any value that does not exist will simply not show up in your output table. If you want to force nulls to show up, you can create a new table that contains all combinations of products, locations, and hard-coded end of month dates.
Then, you can left join your old table with this new hard-coded table by date, location, and product. This method will give you the NULL values you expect.

Aggregate in plsql

ORGANIZATION_ID
BAY_ID
CASCADE_GROUP_ID
DOWNSTEAM_VALUE
1001
100012
1
2
1001
100014
1
4
1001
100016
1
6
1001
100018
1
8
I need to create a view by aggregating the values of the DOWNSTEAM_VALUE column mentioned in the above table. In the below example, the aggregation at the DOWNSTEAM_VALUE column should happen by looking at the BAY_ID. If in case, the first row containing BAY_ID is 100012, the downstream value should be calculated by adding up the DOWNSTEAM_VALUE of the current BAY_ID row + remaining DOWNSTEAM_VALUE values in ascending order such as 2+4+6+8 and show like 20 and same goes to next BAY_ID , the downstream value would be 4+6+8=18. Since the last BAY_ID doesn't have any more DOWNSTEAM_VALUE values to add, it should show 8.
ORGANIZATION_ID
BAY_ID
CASCADE_GROUP_ID
DOWNSTEAM_VALUE
1001
100012
1
20
1001
100014
1
18
1001
100016
1
14
1001
100018
1
8
Any help would be really appreciated. Thanks
You can use SUM analytic function with windowing clause for that like below.
select ORGANIZATION_ID
, BAY_ID
, CASCADE_GROUP_ID
, sum(DOWNSTEAM_VALUE)over(
partition by ORGANIZATION_ID, CASCADE_GROUP_ID
order by BAY_ID asc
ROWS BETWEEN CURRENT ROW AND UNBOUNDED
FOLLOWING) as DOWNSTEAM_VALUE
from your_table
;

Calculate Balance from Transactions

I have a table that looks like this
ID Type Amount Created
10 4 30,00 2019-11-29 11:34:54.417
1 1 10,50 2019-11-19 11:34:54.417
3 2 16,50 2019-11-17 11:34:54.417
2 4 11,50 2019-11-15 11:34:54.417
4 6 10,00 2019-11-11 11:34:54.417
5 3 8,60 2019-10-19 11:34:54.417
7 1 21,50 2019-05-19 11:34:54.417
8 4 9,00 2019-04-19 11:34:54.417
9 1 8,00 2019-02-19 11:34:54.417
6 1 1,50 2019-01-19 11:34:54.417
Imagine this table keeps an e-wallet and these are Transactions with ID , Type(withdrawals , reversals , deposits etc..) ,Amount and datetime Created.
Lets say that all these 10 transactions refer to a specific Customer. Thus if i run
SELECT SUM(Amount) AS Balance
FROM transactions
WHERE Created <= '20191120'
this query will return the Balance of this customer until 2019/11/20.
What i want is to run a select query to this table and keep only the Transactions with Type=4.
E.g.
SELECT ID
, Type
, Amount
, Created
FROM transactions
WHERE type=4
This query returns the following
ID Type Amount Created
2 4 11,50 2019-11-15 11:34:54.417
8 4 9,00 2019-04-19 11:34:54.417
10 4 30,00 2019-11-29 11:34:54.417
What i really want though is an extra column in this result set that shows the balance of the customer at the point of each transaction(with Type=4). For example when he did the Transaction with ID = 2 His balance before this(not counting the current(id=2) was (1,50+8,00+9,00+21,50+8,60+10,00) , when he did the Transaction with ID = 8 his balance was (1,50+8,00) and so on..
A desired Result set would be
ID Type Amount Created Balance
2 4 11,50 2019-11-15 11:34:54.417 58,60
8 4 9,00 2019-04-19 11:34:54.417 9,50
10 4 30,00 2019-11-29 11:34:54.417 97,1
I want to do this in one Select Query. I have some thoughts of doing it in two steps but that's not my intention, i Just need to run it once and have all five desired columns.
Looking carefully at your desired output, assuming your DBMS supports window functions, you can do this using a pseudo-cumulative sum:
SELECT ID, Type, Amount, Created, Balance
FROM (
SELECT ID, Type, Amount, Created,
SUM(Amount) OVER(-- Sum "amount" of all rows before current row (exclude current row)
ORDER BY Created ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS Balance
FROM transactions
) src
WHERE type = 4
ORDER BY id
;
SQL Fiddle
Also, shouldn't the balance for id=2 be 58,6 and the balance for id=10 be 97.1?
You want a cumulative sum and filtering:
SELECT t.*
FROM (SELECT t.*, SUM(Amount) OVER (ORDER BY Created) - Amount AS Balance
FROM transactions t
WHERE Created <= '20191120'
) t
WHERE type = 4;

Get the latest price SQLITE

I have a table which contain _id, underSubheadId, wefDate, price.
Whenever a product is created or price is edited an entry is made in this table also.
What I want is if I enter a date, I get the latest price of all distinct UnderSubheadIds before the date (or on that date if no entry found)
_id underHeadId wefDate price
1 1 2016-11-01 5
2 2 2016-11-01 50
3 1 2016-11-25 500
4 3 2016-11-01 20
5 4 2016-11-11 30
6 5 2016-11-01 40
7 3 2016-11-20 25
8 5 2016-11-15 52
If I enter 2016-11-20 as date I should get
1 5
2 50
3 25
4 30
5 52
I have achieved the result using ROW NUMBER function in SQL SERVER, but I want this result in Sqlite which don't have such function.
Also if a date like 2016-10-25(which have no entries) is entered I want the price of the date which is first.
Like for 1 we will get price as 5 as the nearest and the 1st entry is 2016-11-01.
This is the query for SQL SERVER which is working fine. But I want it for Sqlite which don't have ROW_NUMBER function.
select underSubHeadId,price from(
select underSubHeadId,price, ROW_NUMBER() OVER (Partition By underSubHeadId order by wefDate desc) rn from rates
where wefDate<='2016-11-19') newTable
where newTable.rn=1
Thank You
This is a little tricky, but here is one way:
select t.*
from t
where t.wefDate = (select max(t2.wefDate)
from t t2
where t2.underSubHeadId = t.underSubHeadId and
t2.wefdate <= '2016-11-20'
);
select underHeadId, max(price)
from t
where wefDate <= "2016-11-20"
group by underHead;

add column based on a column value in one row

I've this table with the following data
user Date Dist Start
1 2014-09-03 150 12500
1 2014-09-04 220 null
1 2014-09-05 100 null
2 2014-09-03 290 18000
2 2014-09-04 90 null
2 2014-09-05 170 null
Based on the value in Start Column i need to add another column and repeat the value if not null for the same user
The resultant table should be as below
user Date Dist Start StartR
1 2014-09-03 150 12500 12500
1 2014-09-04 220 null 12500
1 2014-09-05 100 null 12500
2 2014-09-03 290 18000 18000
2 2014-09-04 90 null 18000
2 2014-09-05 170 null 18000
Can someone please help me out with this query? because i don't have any idea how can i do it
For the data you have, you can use a window function:
select t.*, min(t.start) over (partition by user) as StartR
from table t
You can readily update using the same idea:
with toupdate as (
select t.*, min(t.start) over (partition by user) as new_StartR
from table t
)
update toupdate
set StartR = new_StartR;
Note: this works for the data in the question and how you have phrased the question. It would not work if there were multiple Start values for a given user, or if there were NULL values that you wanted to keep before the first non-NULL Start value.
You can use COALESCE/ISNULL and a correlated sub-query:
SELECT [user], [Date], [Dist], [Start],
StartR = ISNULL([Start], (SELECT MIN([Start])
FROM dbo.TableName t2
WHERE t.[User] = t2.[User]
AND t2.[Start] IS NOT NULL))
FROM dbo.TableName t
I have used MIN([Start]) since you haven't said what should happen if there are multiple Start values for one user that are not NULL.