Pentaho Data Integration Generating Dynamic Row - pentaho

I have a case where I am reading data in through a table input step. The values that are read in would be:
agent, sub agent, merchant, total
1, 2, 2222, 10
2, 2, 2343, 4
1, 3, 1212, 1
What I am trying to accomplish is checking to see if there is an instance where agent does not have the same value as a sub agent. So if agent 1 does not have sub agent 1 then I need to create a row to have:
agent, sub agent, merchant, total
1, 1, null, 0
I am not really sure how I could possibly generate this single row as its own row. I have attempted several methods using filter row and creating constants but all attempts have either overwrote all existing entries that did not match and changed them to be sub agent 1 or created agent_1, sub agent_1, merchant_1, and total_1 fields.

If you copy the streams in three, do a stream lookup and check if the row with same agent and sub agent exists, and if it doesn't add the row that doesn't exists.
Here is my output:
agent sub merchant total
1 2 2222 10
2 2 2343 4
1 3 1212 1
1 1 <null> 0

I think this solution easier. Might need to correct sql query (I use postgres)

Related

Creating a column by using co-occurrence of each instance

TRANSACTION_NUMBER
UPC_CODE
PURCHASED_UNIT
COPURCHASED_FREQUENCY
T123456
1040-1204-8612
2
2
T123456
4020-4104-2120
1
0
T123456
1040-1204-8612
3
2
T123456
2994-8182-9311
5
0
T191201
9879-8712-3456
2
2
T191201
2387-1928-1247
1
0
T191201
7417-2741-4245
4
0
T191201
9879-8712-3456
2
2
[Also please refer to a screenshot example of the hypothetical table]
Please refer to the image above:
Let's hypothesize that the data contains information such as 'TRANSACTION_NUMBER' (indicating the unique instances of each transaction), 'UPC_CODE' (indicating unique product identifiers information), and 'PURCHASED_UNIT' (indicating how many times that particular product is purchased).
My goal is to generate a column called "COPURCHASED_FREQUENCY". This columnn should contain information about number of times that pairs of UPCs are co-purchased within the same transaction instance indicated by a column 'Transaction Number".
The tricky thing is that we cannot perform this operation on neither R nor Python; instead, this column should be created by SQL operation.
I think what I am trying to create is something similar to co-occurence instance. I cannot think of a particular way to do this as of now, and I would appreciate your help! If you can help with this, that would be great!
that would be achievable by using window functions & if you want to show 0 for the ones that have not bought more that once , you can use a case statement :
select *
, count(*) over (partition by TRANSACTION_NUMBER,UPC_CODE) as COPURCHASED_FREQUENCY
from youratble

Choosing which rows to sum and average in either SSRS or SQL

ROW column 1 column 2
1 A 1
2 A 1
3 A 3
4 A 1
5 A 2
6 B 1
7 B 3
8 B 1
Pic of table
Lets say I have this table as shown above. I want to be able to average SELECTED values from column 2. Am I able to use any function in SSRS that allows me to select which value to use to average? The end goal is to allow the user to interactively choose which value to average.
For example if I would want to use ("Row 1 + Row 2 + Row 4")/3, or (Row 6 + Row 8)/2, how can I go about letting the end user to choose those values to average?
Is there something that I need to do in SQL first to make it easier in SSRS?
The idea is by using report parameter and dataset filter
Add parameter in SSRS to allow user input of multiple values, set the available values for row-1, row-2, and so on
here for your reference how to add the parameter in SSRS
https://learn.microsoft.com/en-us/sql/reporting-services/report-design/add-change-or-delete-a-report-parameter-report-builder-and-ssrs?view=sql-server-ver15#:~:text=To%20add%20or%20edit%20a,or%20accept%20the%20default%20name.
after you add the parameter, let's say you already have a dataset which is SQL query such as:
SELECT *
FROM the_table
Right click on your dataset, on properties, in the filter tab, add a filter for the column ROW IN parameter that you have made earlier
after you add filter on your dataset, on your report, simply use that dataset and put expression AVG(Column 2)

How to count distinct values in a list

I am fairly new to writing queries in Snowflake and have run into a hiccup. I am trying to count how many times an item appears in a list all in the same column.
I was able to use the flatten function and then tried to add in the count function with no luck.
Here is a dummy version of my data:
Ticket# Tasks
1 ["cut apple","peel orange","slice cheese"]
2 ["slice cheese","peel orange"]
3 ["cut apple"]
4 ["cut apple","slice cheese"]
5 ["cut apple", "chop kiwi"]
Here is what I want the output to look like:
(hopefully auto populating the distinct list of tasks in desc order)
Tasks Quantity
Cut Apple 4
Slice Cheese 3
Peel Orange 2
Chop Kiwi 1
Step 1: Define a normalized data schema and put the schema into a database.
Your normalized data schema might look something like this:
Step 2: Add your data
Step 3: Then you will be able to use SQL COUNT with DISTINCT to find the unique rows in your data table(s)
SQL COUNT with DISTINCT
Too long for a comment, but guidance for you to look into, then try to write a sample query. While you have the opportunity to do so while learning, I would look into Data Normalization and adjust your "Tasks" column.
You should have a secondary lookup table that has a primary key ID and a description of each unique task (you'll see in the data normalization). So you can follow along from your data context to the document, I will provide the layout examples and see how that helps you.
Starting with your lookup task table...
Tasks Table
TaskID TaskDescription
1 cut apple
2 peel orange
3 slice cheese
4 chop kiwi
Then, you would have another table that has TicketID, and a third table shows multiple records for each TicketID.
Ticket Table
TicketID ExPurchaseDate
1 someDate
2 sameDate
3 etc...
Now, a detail table per ticket.
TicketTasks Table
TicketTaskID TicketID TaskID
1 1 1
2 1 2
3 1 3
4 2 3
5 2 2
6 3 1
7 4 1
7 4 3
8 5 1
9 5 4
Try to digest this some with the normalization and then look into writing a sql query with COUNT(*) and GROUP BY. More than happy to help you more after, but hope this HELPs guide you some.

How can I select new created rows without old, unupdated row?

I have a database that I'm searching through that is sometimes updated by another person. The way it is updated is terrible, but I can't change it. What happens is the updated numbers contain a "-1" or "-2". For example,
ID
1
2
3
4
Whenever one ID is updated, a new row is created like so:
ID
1
1-1
2
3
4
In this case, 1 was updated. Both 1 and 1-1 show up in the table. If it's updated again, it looks like this:
ID
1
1-1
1-2
2
3
4
It makes me furious but I can't do anything about it. I would like to select the rows in a query such that I get
ID
1-2
2
3
4
Does anybody have any suggestions?
I am assuming your IDs are strings since you can use - in them. You can create a saved query with your entire table and two additional columns:
OriginalID: IIf(InStr([ID],'-')=0,[ID],CInt(Left([ID],InStr([ID],'-')-1)))
and
Version: IIf(InStr([ID],'-')=0,0,CInt(Right([ID],Len([ID])-InStr([ID],'-'))))
This converts the number after the dash to an actual number (and zero for the original version).
Then use
SELECT [OriginalID] & IIF(Max([Version])=0,'','-' & Max([Version])) AS MaxID
FROM [MySavedQuery]
GROUP BY [OriginalID]
I have not had a chance to test this so there may be a parenthesis missing here or there or you may have to add a +1 or -1 to some lengths, but it should get you most of the way there.
First, split off the part of the ID without the dash, and set it to 0 if there is no dash:
SELECT ID,
CLng(IIF(ID Like "*-*", Right(ID, Len(ID) - InStr(1, ID, "-")), 0)) As LastPartID,
CLng(IIF(ID LIKE "*-*", Left(ID, InStr(1, ID, "-") - 1), ID)) As FirstPartID
From MyTable
If you save this as a separate query, the next query is simple:
SELECT FirstPartID & IIF(Max(LastPartID) = 0, "", "-" & Max(LastPartID))
FROM MyQuery
GROUP By FirstPartID

Access SQL how to make an increment in SELECT query

I Have an SQL query giving me X results, I want the query output to have a coulmn called
count making the query somthing like this:
count id section
1 15 7
2 3 2
3 54 1
4 7 4
How can I make this happen?
So in your example, "count" is the derived sequence number? I don't see what pattern is used to determine the count must be 1 for id=15 and 2 for id=3.
count id section
1 15 7
2 3 2
3 54 1
4 7 4
If id contained unique values, and you order by id you could have this:
count id section
1 3 2
2 7 4
3 15 7
4 54 1
Looks to me like mikeY's DSum approach could work. Or you could use a different approach to a ranking query as Allen Browne described at this page
Edit: You could use DCount instead of DSum. I don't know how the speed would compare between the two, but DCount avoids creating a field in the table simply to store a 1 for each row.
DCount("*","YourTableName","id<=" & [id]) AS counter
Whether you go with DCount or DSum, the counter values can include duplicates if the id values are not unique. If id is a primary key, no worries.
I frankly don't understand what it is you want, but if all you want is a sequence number displayed on your form, you can use a control bound to the form's CurrentRecord property. A control with the ControlSource =CurrentRecord will have an always-accurate "record number" that is in sequence, and that will update when the form's Recordsource changes (which may or may not be desirable).
You can then use that number to navigate around the form, if you like.
But this may not be anything like what you're looking for -- I simply can't tell from the question you've posted and the "clarifications" in comments.
The only trick I have seen is if you have a sequential id field, you can create a new field in which the value for each record is 1. Then you do a running sum of that field.
Add to your query
DSum("[New field with 1 in it]","[Table Name]","[ID field]<=" & [ID Field])
as counterthing
That should produce a sequential count in Access which is what I think you want.
HTH.
(Stolen from Rob Mills here:
http://www.access-programmers.co.uk/forums/showthread.php?p=160386)
Alright, I guess this comes close enough to constitute an answer: the following link specifies two approaches: http://www.techrepublic.com/blog/microsoft-office/an-access-query-that-returns-every-nth-record/
The first approach assumes that you have an ID value and uses DCount (similar to #mikeY's solution).
The second approach assumes you're OK creating a VBA function that will run once for EACH record in the recordset, and will need to be manually reset (with some VBA) every time you want to run the count - because it uses a "static" value to run its counter.
As long as you have reasonable numbers (hundreds, not thousands) or records, the second approach looks like the easiest/most powerful to me.
This function can be called from each record if available from a module.
Example: incrementingCounterTimeFlaged(10,[anyField]) should provide your query rows an int incrementing from 0.
'provides incrementing int values 0 to n
'resets to 0 some seconds after first call
Function incrementingCounterTimeFlaged(resetAfterSeconds As Integer,anyfield as variant) As Integer
Static resetAt As Date
Static i As Integer
'if reset date < now() set the flag and return 0
If DateDiff("s", resetAt, Now()) > 0 Then
resetAt = DateAdd("s", resetAfterSeconds, Now())
i = 0
incrementingCounterTimeFlaged = i
'if reset date > now increments and returns
Else
i = i + 1
incrementingCounterTimeFlaged = i
End If
End Function
autoincrement in SQL
SELECT (Select COUNT(*) FROM table A where A.id<=b.id),B.id,B.Section FROM table AS B ORDER BY B.ID Asc
You can use ROW_NUMBER() which is in SQL Server 2008
SELECT ROW_NUMBER() OVER (ORDER By ID DESC) RowNum,
ID,
Section
FROM myTable
Then RowNum displays sequence of row numbers.