How to SUM() OVER() by pentaho? - pentaho

MY data like
| ID | Values |
|:---:|:------:|
| 1 | 200 |
| 2 | 300 |
| 3 | 650 |
| 4 | 120 |
| 5 | 830 |
I want : T-SQL : SUM(Values) OVER(ORDER BY ID) AS Sum
ID
Values
Sum
1
200
200
2
300
500
3
650
1150
4
120
1270
5
830
2100
How should I do by pentaho??

You use the "Group by" step with the Cumulative sum option, and without filling the Group field section so it performs the sum for all the rows:
You'll have to feed the data ordered by ID with a Sort step, in my screenshot I haven't put the Sort step because I have fill up the Data grid with the data ordered, but in your case you might need to make sure the data is ordered first.

Related

Clean Data Using SQL - Take Column Difference

I have data in SQL as follows:
Actual Table
+-------------+--------+------+
| Id | Weight | Type |
+-------------+--------+------+
| 00011223344 | 35 | A |
| 00011223344 | 10 | A |
| 12311223344 | 100 | B |
| 00034343434 | 25 | A |
| 00034343434 | 25 | A |
| 99934343434 | 200 | C |
| 88855667788 | 100 | D |
+-------------+--------+------+
Column ID will always have length of 11 and has data type varchar. I need to create a column Actual Weight and Actual ID from the table above.
Actual Id is dependent on column ID. If the ID starts with 000 than we need to find ID from column ID that does not starts with 000 but characters after that (i.e. 8 characters from right) are similar. Matched ID would be the Actual Id. For example if we look at first 3 ids first 2 starts with 000 and another ID that does not starts with 000 and contains similar 8 characters from right can be found in 3rd row i.e. 12311223344 therefore in derived column Actual ID the first 2 rows would have Actual Id as 12311223344.
Actual Weight is dependent on values in 2 columns ID and Weight. We need to group column Id based on the criteria mentioned above if for any Id that does not starts with 000 but contains another entry that does starts with 000. Then we need to recalculate Weight for Id that does not starts with 000 by adding all Weights of ones starting with 000 and taking difference with one that does not starts with 000.
Example if we look at first 3 rows, in 3rd row we have Id starting with 123 and having entries that have 8 digits from right similar to this one except they start with 000 instead of 123 (i.e. row 1 and 2). For cases starting with 000 Actual Weight would be similar to Weight but for the one starting with 123 Actual Weight would be 100-(35+10)
I am looking for a query that can create these 2 derived column without need of creating any other table/view.
Desired Output
+-------------+-------------+--------+---------------+------+
| Id | Actual ID | Weight | Actual Weight | Type |
+-------------+-------------+--------+---------------+------+
| 00011223344 | 12311223344 | 35 | 35 | A |
| 00011223344 | 12311223344 | 10 | 10 | A |
| 12311223344 | 12311223344 | 100 | 55 | B |
| 00034343434 | 99934343434 | 25 | 25 | A |
| 00034343434 | 99934343434 | 25 | 25 | A |
| 99934343434 | 99934343434 | 200 | 150 | C |
| 88855667788 | 88855667788 | 100 | 100 | D |
+-------------+-------------+--------+---------------+------+
Hmmmm . . . If I'm following this:
select t.*,
(case when id like '000%' then weight
else weight - sum(case when id like '000%' then weight else 0 end) over (partition by actual_id)
end) as actual_weight
from (select t.*,
max(id) over (partition by stuff(id, 1, 3, '')) as actual_id
from t
) t;
Here is a db<>fiddle.

Assign new value to every unique number in SQL Server

I am new to SQL Server and trying to do some operations
Sample data:
Amount | BillID
-------+-------
500 | 10009
500 | 1492
350 | 15892
222 | 15596
899 | 20566
350 | 9566
How can I create a new column that holds a serial number according to the Amount column so the output looks like:
Amount | BillID | unique
-------+--------+-------
500 | 10009 | 1
500 | 1492 | 1
350 | 15892 | 2
222 | 15596 | 3
899 | 20566 | 4
350 | 9566 | 2
I would recommend dense_rank():
select t.*, dense_rank() over(order by amount) rn
from mytable t
This assigns a unique, incremental number to each amount. The smallest amount gets ranks 1, and the number are assigned incrementally by increasing amount. This is not exactly the output you showed (where there is no apparent logic to order the ranks), but I think that's the logic you want in essence.

Select statement that displays a count of every unique variable

I have a table that looks like this:
ID | Value | Date
1 | 3000 | 25/06
1 | 3000 | 26/06
1 | 2000 | 12/07
2 | 4000 | 23/12
2 | 4000 | 12/12
3 | 2000 | 01/11
3 | 2000 | 23/04
3 | 4000 | 23/05
3 | 4000 | 04/11
Now I want to display unique values for a specific ID and how many times each specific value appears in the table for a specific ID.
The desired output for
select ### where ID = 1 from tablename; would be:
distinct Value | count
3000 | 2
2000 | 1
and for:
select ### where ID = 3 from tablename;
distinct Value | count
2000 | 2
4000 | 2
Can this be done with a single select statement (for each ID)?
Maybe something like this:
select ID
, Value
, Count(*) AS CountOfValues
from tablename
group by ID, Value
Just grouping by both ID and Value and counting each amount of times the value appears per those grouping sets.

SQL - Want to provide unique rank for same set of values?

I need help with determining the last changed price by dates for which I am trying to generate a Unique-Identifier column, so I can apply partition to this new column and derive additional logic in my programming.
Can you please help me to derive the Unique-Identifier column?
Date | OrderID | Price | Seq_no |Unique-Indentifier
1/24/2015 | 568956 | 300 | 1 | 1
1/20/2015 | 568956 | 350 | 1 | 2
1/20/2015 | 568956 | 375 | 2 | 3
1/20/2015 | 568956 | 400 | 3 | 4
1/17/2015 | 568956 | 400 | 1 | 4
1/14/2015 | 568956 | 500 | 1 | 5
1/11/2015 | 568956 | 500 | 1 | 5
1/9/2015 | 568956 | 400 | 1 | 6
1/7/2015 | 568956 | 400 | 1 | 6
1/24/2015 | 568957 | 600 | 1 | 7
1/20/2015 | 568957 | 600 | 1 | 7
1/17/2015 | 568957 | 700 | 1 | 8
1/14/2015 | 568957 | 800 | 1 | 9
1/11/2015 | 568957 | 800 | 1 | 9
1/9/2015 | 568957 | 700 | 1 | 10
1/7/2015 | 568957 | 700 | 1 | 10
I can’t apply partition on Price column. Reason: For OrderID '568956' the same price 400 was set in two different dates. I wanted to isolate these two sets. If I simply use partition on Price Column then I will get all four rows as one set. So I need to put some identifier to differentiate these rows and apply partition on my new column 'UniqueIdentifier'.
Set 1:
1/20/2015 568956 400 4
1/17/2015 568956 400 4
Set 2:
1/9/2015 568956 400 6
1/7/2015 568956 400 6
If I apply partition I get the result as one set - Which I am not expecting.
Set 1:
1/20/2015 568956 400 4
1/17/2015 568956 400 4
1/9/2015 568956 400 4
1/7/2015 568956 400 4
In your select statement do something like this:
SELECT
DISTINCT
ROW_NUMBER() OVER(PARTITION BY Date,OrderID,Price ORDER BY Date DESC) AS RowNum
,Date
,OrderID
,Price
You may have to mess around with the PARTITION BY section depending on how your select statement is working but when I used this it returns a unique row number for each value.
I'm not sure if you will be able to ORDER BY that Date value accurately so you may have to convert it into a DATETIME
You need to identify the groups, and then assign the sequential number. One method is a difference of row numbers. I think this is the logic:
select t.*,
dense_rank() over (partition by orderid order by grp, price) as newcol
from (select t.*,
(row_number() over partition by orderid order by date, seq_no) -
row_number() over partition by orderid, price order by date, seq_no)
) as grp
from t
) t

SQL Group By Having Where Statements

I have a MS Access table tracking quantities of products at end month as below.
I need to generate the latest quantity for a specified ProductId at a specified date e.g.
The Quantity for ProductId 1 on 15-Feb-12 is 100, The Quantity for ProductId 1 on 15-Mar-12 is 150.
ProductId | ReportingDate | Quantity|
1 | 31-Jan-12 | 100 |
2 | 31-Jan-12 | 200 |
1 | 28-Feb-12 | 150 |
2 | 28-Feb-12 | 250 |
1 | 31-Mar-12 | 180 |
2 | 31-Mar-12 | 280 |
My SQL statement below bring all previous values instead the latest one only. Could anyone assist me troubleshoot the query.
SELECT Sheet1.ProductId, Max(Sheet1.ReportingDate) AS MaxOfReportingDate, Sheet1.Quantity
FROM Sheet1
GROUP BY Sheet1.ProductId, Sheet1.Quantity, Sheet1.ReportingDate, Sheet1.ProductId
HAVING (((Sheet1.ReportingDate)<#3/15/2012#) AND ((Sheet1.ProductId)=1))
Here's #naveen's idea:
SELECT TOP 1 Sheet1.ProductId, Sheet1.ReportingDate AS MaxOfReportingDate, Sheet1.Quantity
FROM Sheet1
WHERE (Sheet1.ProductId = 1)
AND (Sheet1.ReportingDate < #2012/03/15#)
ORDER BY Sheet1.ReportingDate DESC
Although note that MsAccess selects top with ties, so this won't work if you have more than one row per ReportingDate, ProductId combo. (But at the same time, this means that the data isn't deterministic anyway)
Edit - I meant that if you have a contradiction in your data like below, you'll get 2 rows back.
ProductId | ReportingDate | Quantity|
1 | 31-Jan-12 | 100
1 | 31-Jan-12 | 200