Pentaho Data Integration New Row for Each Deliminated Value in a Column

Pentaho Data Integration New Row for Each Deliminated Value in a Column - pentaho

I have a stream with a column with a variable number of semicolon deliminated values with a layout like
activities ticket_id
1;2;3 1
4;5 2
6; 3
7;8;9;10 4
. How can I get a new row for ever activity, like this
activity_id ticket_id
1 1
2 1
3 1
4 2
and so on?
In case I formatted this question wrong, here's an image

Simply use the "Split field to rows" step, use the Activities column as the field to split and ";" as the Delimeter.
Remenber that when you have a single value AND the delimeter, the step will output an additional row with an empty string. In this scenario the step will auto adjust the ticket_id column to repeat the values.

This solution in the screenshot below worked too

Related

BigQuery: UNNESTING string representation of list of JSONs

I have a STRING column with a LIST [,,] of JSONS that I would like to UNNEST into separate lines.
For example:
ROW TICKET_ID Subject UPDATES(STRING)
1 1 Need help... [{"Actor":"Tom","Type":"Request"}, {"Actor":"John","Type":"Update"}]
2 2 Something... [{"Actor":"Kate","Type":"Request"}, {"Actor":"Tim","Type":"Update"}]
I would like it to look like:
ROW TICKET_ID SUBJECT UPDATE
1 1 Need help... {"Actor":"Tom","Type":"Request"}
2 1 Need help... {"Actor":"Tom","Type":"Request"}
3 2 Something... {"Actor":"Kate","Type":"Request"}
4 2 Something... {"Actor":"Kate","Type":"Request"}
I have tried using JSON_EXTRACT_ARRAY() and CROSS JOIN UNNEST() so far but unable to split the updates into separate lines as the updates appear as separate rows within the same row (array)

Use below
select * except(updates)
from your_table,
unnest(json_extract_array(updates)) update
if applied to sample data in your_question - output is

Choosing which rows to sum and average in either SSRS or SQL

ROW column 1 column 2
1 A 1
2 A 1
3 A 3
4 A 1
5 A 2
6 B 1
7 B 3
8 B 1
Pic of table
Lets say I have this table as shown above. I want to be able to average SELECTED values from column 2. Am I able to use any function in SSRS that allows me to select which value to use to average? The end goal is to allow the user to interactively choose which value to average.
For example if I would want to use ("Row 1 + Row 2 + Row 4")/3, or (Row 6 + Row 8)/2, how can I go about letting the end user to choose those values to average?
Is there something that I need to do in SQL first to make it easier in SSRS?

The idea is by using report parameter and dataset filter
Add parameter in SSRS to allow user input of multiple values, set the available values for row-1, row-2, and so on
here for your reference how to add the parameter in SSRS
https://learn.microsoft.com/en-us/sql/reporting-services/report-design/add-change-or-delete-a-report-parameter-report-builder-and-ssrs?view=sql-server-ver15#:~:text=To%20add%20or%20edit%20a,or%20accept%20the%20default%20name.
after you add the parameter, let's say you already have a dataset which is SQL query such as:
SELECT *
FROM the_table
Right click on your dataset, on properties, in the filter tab, add a filter for the column ROW IN parameter that you have made earlier
after you add filter on your dataset, on your report, simply use that dataset and put expression AVG(Column 2)

sql: select rows with multiple lines of data

In table A -> Column X there is some data which has numbers, alphabets and special characters. Most of the records has single line of data but some of them has 2 or 3 lines of data.
1 this is a sample description of data 01/11/2017 # 123'~
Records with two lines of data
1 this is a sample description
2 of data 22/11/2017 #~ 12##'
I need to do a select query to get the records which has 2 lines of data in Column X of table A.
I use TOAD and the above mentioned sample data is from the Grid popup editor
thanks

You could select those rows that contain a new line (do not know your sample data, either chr(10) or chr(13)):
select *
from tableA
where instr(columnX, chr(10)) > 0;
The solution is taken from this SO answer, please do not forget to upvote the linked solution if it helped you.

subtract every next column value from previous?

I have a dataset, where somehow the next singular data is added on top of the previous data for one row, and that for every column, which means,
row with ID 1 is the original pure data, but row with e.g ID 10 has added the data from the previous 9 datasets on itself...
what I now want is to get the original pure data for every distinct item, which means for every ID, how can I substract all data from lets say ID, 10? I would have to substract those of the previous one, for ID 9 and so on...
I want to do this either in SQL Server or in Rapidminer, I am working with those tools, any idea?
here is a sample:
ID col1 col2 col3
1 12 2 3
2 15 5 5
3 20 8 8
so the real correct data for Item with ID 3 is not 20, 8, 8 it is (20-15),(8-5),(8-5) so its 5,3,3...
subtract the later from its previous for every item except the first..
1 12 2 3

Try it out with lag series operator, it will work for sure! To get this operator you should install the series extension from the RM marketplace.
What this operator does - he copies the selected attributes and pushes every row of the example set for one point, so row with ID 1 gets a copy with ID 2 etc (you can also specify the value for a lag). Afterwards you can substract one value from another with Generate Attributes.

I think lag() is the answer to your question:
select (case when id = 1 then col
else col - lag(col) over (order by id)
end)
However, sample data would clarify the question.

Within RapidMiner there is the Differentiate operator contained in the Series extension (which is not installed by default and needs to be downloaded from the RapidMiner Marketplace). This can be used to calculate differences between attributes in adjacent examples.

SQL Select where id is in `column`

I have a column that has multiple numbers separated by a comma. Example for a row:
`numbers`:
1,2,6,66,4,9
I want to make a query that will select the row only if the number 6 (for example) is in the column numbers.
I cant use LIKE because if there is 66 it'll work too.

You can use like. Concatenate the field separators at the beginning and end of the list and then use like. Here is the SQL Server sytnax:
where ','+numbers+',' like '%,'+'6'+',%'
SQL Server uses + for string concatenation. Other databases use || or the concat() function.

You should change your database to rather have a new table that joins numbers with the row of your current table. So if your row looks like this:
id numbers
1 1,2,6,66,4,9
You would have a new table that joins those values like so
row_id number
1 1
1 2
1 6
1 66
1 4
1 9
Then you can search for the number 6 in the number column and get the row_id

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pentaho Data Integration New Row for Each Deliminated Value in a Column - pentaho

This solution in the screenshot below worked too

Related

BigQuery: UNNESTING string representation of list of JSONs

Choosing which rows to sum and average in either SSRS or SQL

sql: select rows with multiple lines of data

subtract every next column value from previous?

SQL Select where id is in `column`

Categories

Resources