Flatten Invoice Header with Invoice Lines in Kettle - pentaho

If you have an invoice header with several values (invoice #, date, location) and an unknown amount of invoice lines with several values (product, price, tax), is there a way to flatten this data to one row that extends in cases where the quantity of invoice lines varies by invoice?
Input Example-
{"InvoiceRecords": [{
"InvoiceDate": "8/9/2017 12:00:00 AM",
"InvoiceLocation": "002",
"InvoiceNumber": "2004085",
"InvoiceRecordHeaderDetails": [{
"InvNum": "2004085",
"Location": "002",
"InvDate": "8/9/2017 12:00:00 AM"
}],
"InvoiceRecordLineItemDetails": [{
"UniqueID": "3939934",
"InvNum": "2004085",
"LINEITEM": "1",
"CUSTID": "PREAA",
"DEPTID": "320306",
"PRODID": "088856",
"ProdDesc": "STATE UST",
"Unitprice": "0.003",
"QuantShare": "237.5",
"TaxRate": "7.25",
"taxamount": "0.05"
}],
"InvoiceTaxCodeDetails": [{
"InvNum": "2004085",
"LineItem": "1",
"UniqueID": "34",
"taxCode": "SALES TAX",
"taxrate": "7.25",
"maxtax": "0"
}]
}]}
I need all the items on the same row (allowing for there to be more than one line item and/or more than one Tax Code items on a given Invoice Record.
Output Example (note: "_n" below in reference to undetermined amount of invoice lines and tax rows possible):
{"InvoiceRecords": [{
"InvoiceDate": "8/9/2017 12:00:00 AM",
"InvoiceLocation": "002",
"InvoiceNumber": "2004085",
"InvoiceRecordHeaderDetailsInvNum": "2004085",
"InvoiceRecordHeaderDetailsInvNumLocation": "002",
"InvoiceRecordHeaderDetailsInvNumInvDate": "8/9/2017 12:00:00 AM",
"InvoiceRecordLineItemDetailsUniqueID_1": "3939934",
"InvoiceRecordLineItemDetailsInvNum_1": "2004085",
"InvoiceRecordLineItemDetailsLINEITEM_1": "1",
"InvoiceRecordLineItemDetailsCUSTID_1": "PREAA",
"InvoiceRecordLineItemDetailsDEPTID_1": "320306",
"InvoiceRecordLineItemDetailsPRODID_1": "088856",
"InvoiceRecordLineItemDetailsProdDesc_1": "STATE UST",
"InvoiceRecordLineItemDetailsUnitprice_1": "0.003",
"InvoiceRecordLineItemDetailsQuantShare_1": "237.5",
"InvoiceRecordLineItemDetailsTaxRate_1": "7.25",
"InvoiceRecordLineItemDetailstaxamount_1": "0.05",
"InvoiceTaxCodeDetailsInvNum_1": "2004085",
"InvoiceTaxCodeDetailsLineItem_1": "1",
"InvoiceTaxCodeDetailsUniqueID_1": "34",
"InvoiceTaxCodeDetailstaxCode_1": "SALES TAX",
"InvoiceTaxCodeDetailstaxrate_1": "7.25",
"InvoiceTaxCodeDetailsmaxtax_1": "0",
"InvoiceRecordLineItemDetailsUniqueID_n": "3939934",
"InvoiceRecordLineItemDetailsInvNum_n": "2004085",
"InvoiceRecordLineItemDetailsLINEITEM_n": "1",
"InvoiceRecordLineItemDetailsCUSTID_n": "PREAA",
"InvoiceRecordLineItemDetailsDEPTID_n": "320306",
"InvoiceRecordLineItemDetailsPRODID_n": "088856",
"InvoiceRecordLineItemDetailsProdDesc_n": "STATE UST",
"InvoiceRecordLineItemDetailsUnitprice_n": "0.003",
"InvoiceRecordLineItemDetailsQuantShare_n": "237.5",
"InvoiceRecordLineItemDetailsTaxRate_n": "7.25",
"InvoiceRecordLineItemDetailstaxamount_n": "0.05",
"InvoiceTaxCodeDetailsInvNum_n": "2004085",
"InvoiceTaxCodeDetailsLineItem_n": "1",
"InvoiceTaxCodeDetailsUniqueID_n": "34",
"InvoiceTaxCodeDetailstaxCode_n": "SALES TAX",
"InvoiceTaxCodeDetailstaxrate_n": "7.25",
"InvoiceTaxCodeDetailsmaxtax_n": "0"
}]}
Thanks!

You have an example of a similar question in the samples directory which sits nearby you spoon.bat. Have a look at the samples/transformation/XML Add and survive the first choc: they do something much more complex, just to show all what is possible.
In your case, split with a Switch/Case, the input stream in header, items and manage to keep the InvoiceNumber on each (more on this later). Convert the three stream into JSON (with JSON Output or, maybe easier, with a Javascript). Then you Group by the items by InvoiceNumber. Join the three flows by InvoiceNumber, for which I suggest a lookup stream in the header stream then an other lookup stream in the footer stream. With an other javascript and treating the data as string, you can build the JSON row in the format { header, [item], footer}, which you can Group by with a concatenation to have only one row.
Some work, but rather standard, except for the tricky part of the get the InvoiceNumber on the items and footer as they have disappeared from the flow. For that you can use fact that the javascript preserve the values unless redefined. Add a new start script [right click on the Script1 on the tab top, add a copy, right click on the Script1_0 just created, and define it as Start script].
On this start script:
var PrevInvoiceNumber = -1;
On the main script:
if(InvoiceNumber && PrevInvoiceNumber!=InvoiceNumber)
PrevInvoiceNumber = InvoiceNumber
Then you should see the data with on each line the PrevInvoiceNumber which is equal to the expected InvoiceNumber of the invoice.

Related

Add computed field to Query in Grafana using JSON API als data source

What am I trying to achieve:
I would like to have a time series chart showing the total number of members in my club at any time. This member count should be calculated by using the field "Eintrittsdatum" (joining-date) and "Austrittsdatum" (leaving-date). I’m thinking of it as a running sum - every filled field with a joining-date means +1 on the member count, every leaving-date entry is a -1.
Data structure
I’m calling the API of webling.ch with a secret key. This is my data structure with sample data per member:
[
{
"type": "member",
"meta": {
"created": "2020-03-02 11:33:00",
"createuser": {
"label": "Joana Doe",
"type": "user"
},
"lastmodified": "2022-12-06 16:32:56",
"lastmodifieduser": {
"label": "Joana Doe",
"type": "user"
}
},
"readonly": true,
"properties": {
"Mitglieder ID": 99,
"Anrede": "Dear",
"Vorname": "Jon",
"Name": "Doe",
"Strasse": "Doeington Street",
"Adresszusatz": null,
"PLZ": "9999",
"Ort": "Doetown",
"E-Mail": "jon.doe#doenet.net",
"Telefon Privat": null,
"Telefon Geschäft": null,
"Mobile": "099 877 54 54",
"Geschlecht": "m",
"Geburtstag": "1966-03-10",
"Mitgliedschaftstyp": "Aktivmitgliedschaft",
"Eintrittsdatum": "2020-03-01",
"Austrittsdatum": null,
"Passfoto": null,
"Wordpress Benutzername": null,
"Wohnhaft im Glarnerland": false,
"Lat": "43.1563379",
"Long": "6.0474622"
},
"parents": [
240
],
"children": {
},
"links": {
"debitor": [
2124,
3056,
3897
],
"attendee": [
2576
]
},
"id": 1815
}
]
Grafana data source
I am using the “JSON API” by Marcus Olsson: GitHub - grafana/grafana-json-datasource: A data source plugin for loading JSON APIs into Grafana.
Grafana v9.3.1 (89b365f8b1) on Linux
My current approach
Queries:
Query C - uses a filter on the source-API to only show entries with "Eintrittsdatum" IS NOT EMPTY
Field 1 (alias "datum") has a JSONata-Query of:
properties.Eintrittsdatum
Field 2 (alias "names") should return the full name and has a query of:
$map($.properties, function($v) {(
($v.Vorname&" "&$v.Name);
)})
Field 3 (alias "value") should return "1" for every entry and has a query of:
$map($.properties, function($v) {(
(1);
)})
Query D - uses a filter on the source-API to only show entries with "Austrittsdatum" IS NOT EMPTY
Field 1 (alias "datum") has a JSONata-Query of:
properties.Austrittsdatum
Field 2 (alias "names") should return the full name and has a query of:
$map($.properties, function($v) {(
($v.Vorname&" "&$v.Name);
)})
Field 3 (alias "value") should return "1" for every entry and has a query of:
$map($.properties, function($v) {(
(1);
)})
Here's a screenshot to clarify things
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-1.png)
Transformations:
My applied transformations
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-2.png)
What's working
I can correctly gather the number of members added/subtracted per day.
What's not working
I can't get the graph to display the way i want: I'd like to have a running sum of these numbers instead of the following two graphs.
Time series graph with merged queries
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-3.png)
Time series graph with unmerged queries
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-4.png)
I can't get the names to display within the tooltip of the data points (really not THAT necessary).

Change JSON Keys in Nested JSON in SQL Table

I have a table with column called tableJson which contains information of the following type:
[
{
"type": "TABLE",
"content": {
"rows":
[{"Date": "2021-09-28","Monthly return": "1.44%"},
{"Date": "2021-11-24", "Yearly return": "0.62%"},
{"Date": "2021-12-03", "Monthly return": "8.57%"},
{},
]
}
}
]
I want to change "Monthly Return" to "Weekly Return" everywhere in the table column where it exists.
Thank you in advance!
I tried different approaches to Parse, read, OPENJSON, CROSS APPLY but could not make it.

Great Expectations Row Based Dimensions

I have data like this:
[ {
"name": "Apple",
"price": 1,
"type": "Food"
},
{
"name": "Apple",
"price": 0.90,
"type": "Food"
},
{
"name": "Apple",
"price": 1000,
"type": "Computer"
},
{
"name": "Apple",
"price": 900,
"type": "Computer"
}
]
Using the Great Expectations automatic profile, a valid range for price would be 0.90 to 1,000. Is it possible to have it slice on the type dimension, so food would be 0.90 to 1 and computer would be 900 to 1000? Or would I need to transform the data first using dbt? I know the column that will create the dimension, but I don't know the particular values.
Also, same question on differences between rows. Like if they had a timestamp, instead of 900 to 1000, it validates -100 for the change in value.
I used this approach to first load the data in a pandas data frame:
https://discuss.greatexpectations.io/t/how-can-i-use-the-return-format-unexpected-index-list-to-select-row-from-a-pandasdataset/70/2

Capturing Choice set values from the adaptive card

I was wondering how can I capture the selected choice sets values in the adaptive card. Is there a way to capture the response after hitting the Action.submit button?
Below you can see, the selection is an array that shows the titles based on the initial logic and creates a Choices array
"body": [{
"type": "TextBlock",
"text": "select the recordings you want to delete"
},
{
"type": "Input.ChoiceSet",
"isMultiSelect": true,
"id": "myColor",
"style": "compact",
"value": "1",
"choices": vars.selection
}
]

Append to Specific Array Index from another table dynamically SQL Server

I have a json string that I need to modify
{"RecordCount":3,"Top":10,"Skip":0,"SelectedSort":"Seed asc","value":[{"AccountProductListId":22091612871138,"Name":"April 4th 2018","AccountId":256813438078643,"IsPublic":false,"Comment":"Test order sheet","Quantity":3},{"AccountProductListId":166305848801939,"Name":"test","AccountId":256813438078643,"IsPublic":false,"Comment":"","Quantity":1},{"AccountProductListId":21177711287586,"Name":"Test Order sheet","AccountId":256813438078643,"IsPublic":true,"Comment":"the very first sheet","Quantity":2}]}
Inside value the array looks like this:
"value": [{
"AccountProductListId": 22091612871138,
"Name": "April 4th 2018",
"IsPublic": false,
"Comment": "Test order sheet",
"Quantity": 3
}, {
"AccountProductListId": 166305848801939,
"Name": "test",
"IsPublic": false,
"Comment": "",
"Quantity": 1
}, {
"AccountProductListId": 21177711287586,
"Name": "Test Order sheet",
"IsPublic": true,
"Comment": "the very first sheet",
"Quantity": 2
}],
What I need to do is append some data from another table:
AccountProductListId ProductID
21177711287586 97096131867163|32721319938943
22091612871138 97096131867163|145461009584740|130005306921282
166305848801939 8744071222157
As you can see the AccountProductListId is already in the JSON result so I should know which array it should go to. The only problem is I don't know the syntax to merge the ProductID data into its specific array index. The JSON array could have more than 3 items.
Essentially ending up with something like this:
"value": [{
"AccountProductListId": 22091612871138,
"Name": "April 4th 2018",
"IsPublic": false,
"Comment": "Test order sheet",
"Quantity": 3,
"ProductID": "97096131867163|145461009584740|130005306921282"
}, {
"AccountProductListId": 166305848801939,
"Name": "test",
"IsPublic": false,
"Comment": "",
"Quantity": 1,
"ProductID": "8744071222157"
}, {
"AccountProductListId": 21177711287586,
"Name": "Test Order sheet",
"IsPublic": true,
"Comment": "the very first sheet",
"Quantity": 2,
"ProductID": "97096131867163|32721319938943"
}],
Any information would be greatly appreciated. Thanks.
Process with SQL Server
Prior to SQL Server 2016, there is not in-built support to read or write JSON.
Starting SQL Server 2016, you can use OPENJSON rowset function to read JSON and FOR JSON clause to write JSON.
See https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server
The approach would be to use OPENJSON to read the JSON string as a rowset, join it with the table to pickup ProductID and use FOR JSON to convert back to JSON.
Process outside SQL Server
Depending on your situation, it might be simpler to parse JSON outside SQL Server. If going that route, then you could
Collect all the AccountProductListIDs from the parsed JSON
Send the collected id to SQL Server via a stored procedure that takes a TVP input and outputs the AccountProductListID -> ProductID mapping
Inject the ProductIDs into JSON object and serialize back to string