Get specifics values from column to create new lines - pentaho

I have a problem use Kettle/PDI. I need did columns to separate specifics values. But I need to create new lines with just this values, one row for each column.
I tried to use row normaliser but it's didn't work.
Please can someone help me?
Thanks in advance.
e.g.
I have data like this:
I have data like this:
n1 | row 1 | value 1.1 | ... | null | null
n1 | row 2 | value 2.1 | ... | null | null
n1 | row 3 | value 3.1 | ... | null | null
n1 | row 4 | value 4.1 | ... | 1200,00 | 1500,00
n2 | row 1 | value 1.1 | ... | null | null
n2 | row 2 | value 2.1 | ... | null | null
n2 | row 3 | value 3.1 | ... | null | null
n2 | row 4 | value 4.1 | ... | 1120,00 | 1350,00
I would like this:
n1 | row 1 | value 1.1 | ... | null | null
n1 | row 2 | value 2.1 | ... | null | null
n1 | row 3 | value 3.1 | ... | null | null
n1 | row 4 | value 4.1 | ... | 1200,00 | 1500,00
n1 | row 5 | 1200,00 | ... | null | null
n1 | row 6 | 1500,00 | ... | null | null
n2 | row 1 | value 1.1 | ... | null | null
n2 | row 2 | value 2.1 | ... | null | null
n2 | row 3 | value 3.1 | ... | null | null
n2 | row 4 | value 4.1 | ... | 1120,00 | 1350,00
n2 | row 5 | 1120,00 | ... | null | null
n2 | row 6 | 1350,00 | ... | null | null

If you want to clone the rows which have a value in a field, this is a way to do that part of the task.
Check if the value exists, then clone it. If it is the clone move the value to the main value, and set the fields that we want to clear to null. Remove the cloned field, and merge the streams again.

Related

Using LAG function with higher offset

Suppose we have the following input table
cat | value | position
------------------------
1 | A | 1
1 | B | 2
1 | C | 3
1 | D | 4
2 | C | 1
2 | B | 2
2 | A | 3
2 | D | 4
As you can see, the values A,B,C,D change position in each category, I want to track this change by adding a column change in front of each value, the output should look like this:
cat | value | position | change
---------------------------------
1 | A | 1 | NULL
1 | B | 2 | NULL
1 | C | 3 | NULL
1 | D | 4 | NULL
2 | C | 1 | 2
2 | B | 2 | 0
2 | A | 3 | -2
2 | D | 4 | 0
For example C was in position 3 in category 1 and moved to position 1 in category 2 and therefore has a change of 2. I tried inmplementing this using the LAG() function with an offset of 4 but I failed, how can I write this query.
Use lag() - with the proper partition by clause:
select
t.*,
lag(position) over(partition by value order by cat) - position change
from mytable t
You can use lag and then order by to maintain original order. Here is the demo.
select
*,
lag(position) over (partition by value order by cat) - position as change
from yourTable
order by
cat, position
output:
| cat | value | position | change |
| --- | ----- | -------- | ------ |
| 1 | A | 1 | null |
| 1 | B | 2 | null |
| 1 | C | 3 | null |
| 1 | D | 4 | null |
| 2 | C | 1 | 2 |
| 2 | B | 2 | 0 |
| 2 | A | 3 | -2 |
| 2 | D | 4 | 0 |
I think you just want lag() with the right partition by:
select t.*,
(lag(position) over (partition by value order by cat) - position) as change
from t;
Here is a db<>fiddle.

Any way to achieve coalesce row wise?

I have a table
| ID | V1 | V2 |
| 100 | 1 | 1 |
| 100 | null | 1 |
| 101 | null | null |
| 101 | 1 | 1 |
| 102 | 1 | null |
| 102 | 1 | null |
Needed Sample output:
ID 100 has V1 value in at least one of the rows so need 1
same for ID 101 has V1 value in at least one of the rows so need 1
ID 102 has no V2 value in both rows so need null
Required output
| ID | V1 | V2 |
| 100 | 1 | 1 |
| 101 | 1 | 1 |
| 102 | 1 | null |
tried to combine the values into a list and get the max value
Is there any easier function which can achieve this?
select ID, max(V1) as V1, max(V2) as V2 from table group by ID;
You can do aggregation :
select id, max(v1) as v1, max(v2) as v2
from table t
group by id;

Oracle SQL Left join same table unknown amount of times

I have this table
| old | new |
|------|-------|
| a | b |
| b | c |
| d | e |
| ... | ... |
| aa | bb |
| bb | ff |
| ... | ... |
| 11 | 33 |
| 33 | 523 |
| 523 | 4444 |
| 4444 | 21444 |
The result I want to achieve is
| old | newest |
|------|--------|
| a | e |
| b | e |
| d | e |
| ... | |
| aa | ff |
| bb | ff |
| ... | |
| 11 | 21444 |
| 33 | 21444 |
| 523 | 21444 |
| 4444 | 21444 |
I can hard code the query to get the result that I want.
SELECT
older.old,
older.new,
newer.new firstcol,
newer1.new secondcol,
…
newerX-1.new secondlastcol,
newerX.new lastcol
from Table older
Left join Table newer
on older.old = newer.new
Left join Table newer1
on newer.new = newer1.old
…
Left join Table newerX-1
on newerX-2.new = newerX-1.old
Left join Table newerX
on newerX-1.new = newerX.old;
and then just take the first value from the right that is not null.
Illustrated here:
| old | new | firstcol | secondcol | thirdcol | fourthcol | | lastcol |
|------|-------|----------|-----------|----------|-----------|-----|---------|
| a | b | c | e | null | null | ... | null |
| b | c | e | null | null | null | ... | null |
| d | e | null | null | null | null | ... | null |
| ... | ... | ... | ... | ... | ... | ... | null |
| aa | bb | ff | null | null | null | ... | null |
| bb | ff | null | null | null | null | ... | null |
| ... | ... | ... | ... | ... | ... | ... | null |
| 11 | 33 | 523 | 4444 | 21444 | null | ... | null |
| 33 | 523 | 4444 | 21444 | null | null | ... | null |
| 523 | 4444 | 21444 | null | null | null | ... | null |
| 4444 | 21444 | null | null | null | null | ... | null |
The problem is that the length of "the replacement chain" is always changing (Can vary from 10 to 100).
There must be a better way to do this?
What you are looking for is a recursive query. Something like this:
with cte (old, new, lev) as
(
select old, new, 1 as lev from mytable
union all
select m.old, cte.new, cte.lev + 1
from mytable m
join cte on cte.old = m.new
)
select old, max(new) keep (dense_rank last order by lev) as new
from cte
group by old
order by old;
The recursive CTE creates all iterations (you can see this by replacing the query by select * from cte). And in the final query we get the last new per old with Oracle's KEEP LAST.
Rextester demo: http://rextester.com/CHTG34988
I'm trying to understand how you group your rows to determine different "newest" values. Are these the groupings you want based on the old field?
Group 1 - one letter (a, b, d)
Group 2 - two letters (aa, bb)
Group 3 - any number (11, 33, 523, 4444)
Is this correct? If so, you just need to group them by an expression and then use a window function MAX(). Something like this:
SELECT
"old",
MAX() OVER(PARTITION BY MyGrouping) AS newest
FROM (
SELECT
"old",
CASE
WHEN NOT IS_NUMERIC("old") THEN 'string' || CHAR_LENGTH("old") -- If string, group by string length
ELSE 'number' -- Otherwise, group as a number
END AS MyGrouping
FROM MyTable
) src
I don't know if Oracle has equivalents of the IS_NUMERIC and CHAR_LENGTH functions, so you need to check on that. If not, replace that expression with something similar, like this:
https://www.techonthenet.com/oracle/questions/isnumeric.php

Liquibase script to delete table element based on other table element

I am pretty unfamiliar with liquibase script.
I have two tables, tableA and tableB.
tableB contains elements that have a tableA_fk value. It means that they point to an element of tableA.
tableA contains elements that come always by group of two. One of the element point to the pk of the other element (relatedpk).
I want to delete all the elements of the tableA that have the field "someValue" equal to NULL and no element of tableB pointing to it.
The elements can be removed only by group of two
Example:
tableA:
+----+---------------------+-----------+-----------+
| pk | name | someValue | relatedpk |
+----+---------------------+-----------+-----------+
| 1 | ElementA | 1 | NULL |
| 2 | ElementA | 1 | 1 |
| 3 | ElementB | NULL | NULL |
| 4 | ElementB | NULL | 3 |
| 5 | ElementC | 3 | NULL |
| 6 | ElementC | 3 | 5 |
| 7 | ElementD | NULL | NULL |
| 8 | ElementD | NULL | 7 |
| 9 | ElementE | NULL | NULL |
| 10 | ElementE | NULL | 9 |
+----+---------------------+-----------+-----------+
tableB:
+----+------------------------------+-----------+
| pk | name | tableA_fk |
+----+------------------------------+-----------+
| 1 | Value1 | 2 |
| 2 | Value2 | 3 |
| 3 | Value3 | 9 |
+----+------------------------------+-----------+
In this example I want to remove ElementD with pk=7,8 from tableA.
Reason:
ElementA cannot be removed because
someValue != null
ElementB cannot be removed because
tableA_fk = 3 for element Value2 in tableB
ElementC cannot be removed because
someValue != null
ElementD can be removed because
someValue=NULL
No Element from tableB point to one of this two elements from tableA.
ElementE cannot be removed because
tableA_fk = 9 for element Value3 in tableB
Is it possible to implement somthing like that in a liquibase script?
In something like that
<changeSet id="remove-elements">
<delete tableName="tableA">
<where>ConditionToRemoveTheCorrectELements</where>
</delete>
</changeSet>
Rather than trying to use the <delete tableName> tag, I would suggest that you just write the SQL required and use a <sql> or <sqlFile> change tag.

Marking records with 1 on first occurence of unique value

I have a table that I'd like to add a column to that shows a 1 on the first occurrence of a given value for the record within the dataset.
So, for example, if I was using the ID field as where to look for unique occurrences, I'd want a "FirstOccur" column (like the one below) putting a 1 on the first occurrence of a unique ID value in the dataset and just ignoring (leaving as null) any other occurrence:
| ID | FirstOccur |
|------|--------------|
| 1 | 1 |
| 1 | |
| 1 | |
| 2 | 1 |
| 2 | |
| 3 | 1 |
| 4 | 1 |
| 4 | |
I have a working 2-step approach that first applies some ranking sql that will give me something like this:
| ID | FirstOccur |
|------|--------------|
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 3 | 1 |
| 4 | 1 |
| 4 | 2 |
..and I just apply some update SQL to null any value above 1 to get the desired result.
I was just wondering if there was a (simpler) one-hit approach.
Assuming you have a creation date or auto incremented id or something that specifies the ordering, you can do:
update t
set firstoccur = 1
where creationdate = (select min(creationdate)
from t as t2
where t2.id = t.id
);