talend - output of tMap to another query - sql

I have a one view query (which is quite heavy) so I want to avoid re-querying again.
The output of this query is transformed and put into the file. There is a unique reference number on this file (field reference in the query).
The "references" I need as an input as a where clause in my second query.
I'm thinking of this flow:
1st subjob:
tOutputFile
/
tOracleInput -> tMap -> tReplicate
\
tMap (will only map the reference field)
\
tSetGlobalVar
(set to a list, and add to globalMap)
And upon complete of that subjob, the next subjob will run;
tOracleInput (build the where clause from the list from globalMap) -> tMap -> tOutputFile
Does this design looks okay? Or am I better off using a subquery on the references number in my 2nd tOracleInput?
SELECT ... FROM table1 WHERE references IN (SELECT references from BIGVIEW WHERE ...)

Depending on how many different values are retrieved for the reference field, the query should exceed the maximum length authorized by Oracle.
You should consider to join these values with the 2nd tOracleInput using facilities offered by "Reload at each row" lookup model.
Lear how it works here.
Hope this helps.

Related

Retreiving ship-to-party country SAP QuickViewer

I've created a QuickView, where I enter Sales Document data as search criterias.
Selection fields are
VBAP-VBELN, VBAP-POSNR, VBAP-MATNR and VBAP-KWMENG
As the Sales Document Item data.
Furthermore I retrieve the Schedule line date from
VBEP-EDATU
From General Data in Customer Header KNA1, I use
KNA1-KUNNR and KNA1-LAND1
Now, all connections and keys works out. My issue is, I wish to list the Country Key for ship-to-part rather than sold-to-party (which is represented by KUNNR). How could this be solved?
Tables are joined as follows:
VBAK-VBELN -> (VBAP-VBELN,-POSNR) -> (VBEP-VBELN, -POSNR)
VBAK-KUNNR -> KNA1-KUNNR
I do know I would probably need a new table in here to retreive what Iæm looking for, but I'm completely blank. Any help would be greatly appriciated. I
Ship-To-Party is stored in table VPBA (Sales Document Partners) in field KUNNR1 for partner function SH, so to select it you should join KNA1 not with VBAK but with VBPA-KUNNR restricting function to SH.
However, SQVI is a very primitive tool which doesn't allow setting complex (neither trivial) conditions so you should switch to ABAP queries tool (SQ01).
But just in case you wanna do this in SQVI here is the workaround:
Join necessary tables in SQVI builder
Add necessary KNA1 fields to layout: KUNNR and LANDX
Go to layout mode and enable selection by partner function PARVW
Run your query by SH partner function
and voilá! You will be shown only the orders with ship-to-party specified with their countries

Merge two CSV and collate data

I have two CSV files, the first like so:
Book1:
ID,TITLE,SUBJECT
0001,BLAH,OIL
0002,BLAH,HAMSTER
0003,BLAH,HAMSTER
0004,BLAH,PLANETS
0005,BLAH,JELLO
0006,BLAH,OIL
0007,BLAH,HAMSTER
0008,BLAH,JELLO
0009,BLAH,JELLO
0010,BLAH,HAMSTER
0011,BLAH,OIL
0012,BLAH,OIL
0013,BLAH,OIL
0014,BLAH,JELLO
0015,BLAH,JELLO
0016,BLAH,HAMSTER
0017,BLAH,PLANETS
0018,BLAH,PLANETS
0019,BLAH,HAMSTER
0020,BLAH,HAMSTER
And then a second CSV with items associated with the first list, with ID being the common attribute between the two.
Book2:
ID,ITEM
0001,PURSE
0001,STEAM
0001,SEASHELL
0002,TRUMPET
0002,TRAMPOLINE
0003,PURSE
0003,DOLPHIN
0003,ENVELOPE
0004,SEASHELL
0004,SERPENT
0004,TRUMPET
0005,CAR
0005,NOODLE
0006,CANNONBALL
0006,NOODLE
0006,ORANGE
0006,SEASHELL
0007,CREAM
0007,CANNONBALL
0007,GUM
0008,SERPENT
0008,NOODLE
0008,CAR
0009,CANNONBALL
0009,SERPENT
0009,GRAPE
0010,SERPENT
0010,CAR
0010,TAPE
0011,CANNONBALL
0011,GRAPE
0012,ORANGE
0012,GUM
0012,SEASHELL
0013,NOODLE
0013,CAR
0014,STICK
0014,ORANGE
0015,GUN
0015,GRAPE
0015,STICK
0016,BASEBALL
0016,SEASHELL
0017,CANNONBALL
0017,ORANGE
0017,TRUMPET
0018,GUM
0018,STICK
0018,GRAPE
0018,CAR
0019,CANNONBALL
0019,TRUMPET
0019,ORANGE
0020,TRUMPET
0020,CHERRY
0020,ORANGE
0020,GUM
The real datasets are millions of records, so I'm sorry in advance for my simple example.
The problem I need to solve is getting the data merged and collated in a way where I can see which item groupings most commonly appear together on the same ID. (e.g. GRAPE,GUM,SEASHELL appear together 340 times, ORANGE and STICK 89 times, etc...)
Then I need to see if there is any change/deviation to the general results in common appearance when grouped by SUBJECT.
Tools I'm familiar with are Excel and SQL, but I also have PowerBI and Alteryx at my disposal.
Full disclosure: Not homework, or work, but a volunteer project, thus my unfamiliarity with this kind of data manipulation.
Thanks in advance.
An Alteryx solution:
Drag the two .csv files onto your canvas (seen as book1.csv and book2.csv in my picture; Alteryx will create "Input" tools for you.
Drag a "Join" tool on and connect the two .csv files to its inputs; select "ID" as the join field; unselect the "Right_ID" as output since it's merely a duplicate of "ID"
Drag a "Summary" tool on and connect the Join tool's output to the Summary tool's input; select all three of the outputs and add as a "group by"... then add the ID column with a "count"
Drag a browse tool on and connect the summary's output to the browse tool's input.
run the workflow
After all that, click on the browse tool and you should see what is seen in my screenshot: (which is showing just the first ten rows of output):
+1 for taking on a volunteer project - I think anyone who knows data can have a big impact in support of their favourite group or cause.
I would just pull the 2 files into Power BI as 2 separate tables (Get Data / From File). Create a relationship between the 2 tables based on ID (it might get auto-generated). It should be one to many.
Then I would add a Calculated Column to the Book1 table to Concatenate the related ITEM values, eg.
Items =
CALCULATE (
CONCATENATEX (
DISTINCT ( 'Book2'[ITEM] ),
'Book2'[ITEM],
", ",
'Book2'[ITEM], ASC
)
)
Now you can use that Items field in visuals (e.g. a Table), along with Count of ID to get the frequency.
Adding Subject to a copy of the table (e.g. to the Columns well of a Matrix) will produce your grouped scenario, or you could add a Subject Slicer.
As you will be comparing subsets of varying size, I would change Count of ID to Show value as - % of grand total.
Little different solution using Alteryx.
With this dataset, there are very few repeating 3 or 4 item groups. You can do the two item affinity analysis and get a probability of 3 or 4 item groups, or you can count the 3 and 4 item groups individually. I believe what you want is the latter as your probability of getting grapes with oranges may be altered by whether you have bananas in the cart or not.
Anyway, I did not join in the subject until after finding all of my combinations. I found all the combinations by taking the Cartesian join of two, then three, then four of the original set. I then removed all duplicates by ensuring items were always in alphabetical order in each row. I then counted occurrences of each combination. More joins can be added in the same pattern to count groups of 5,6,7...
Once you have the counts of occurrences, then I would join back with the subjects and perform this analysis on each group and compare to the overall results.
I'm supposed to disclose that I work for Alteryx.
first of all if you are using windows
just navigate to the directory which contains the CSV and write the following command:
copy pattern newfileName.csv
#example
copy *.csv merged.csv
now you created one csv file, the file is too large now you can't process it once, depending on your programming language you can use appropriate way, for python you can use generators to process line by line, or pandas you can read chunk by chunk it will be easy.
I hope this help you.

Bigquery return nested results without flattening it without using a table

It is possible to return nested results(RECORD type) if noflatten_results flag is specified but it is possible to just view them on screen without writing it to table first.
for example, here is an simple user table(my actual table is big large(400+col with multi-level of nesting)
ID,
name: {first, last}
I want to view record particular user & display in my applicable, so my query is
SELECT * FROM dataset.user WHERE id=423421 limit 1
is it possible to return the result directly?
You should write your output to "temp" table with noflatten_results option (also respective expiration to be set to purge table after it is used) and serve your client out of this temp table. All "on-fly"
Have in mind that no matter how small "temp" table is - if you will be querying it (in above second step) you will be billed for at least 10MB, so you better use Tabledata.list API in this step (https://cloud.google.com/bigquery/docs/reference/v2/tabledata/list) which is free!
So if you try to get repeated records it will fail on the interface/BQ console with the error:
Error: Cannot output multiple independently repeated fields at the same time.
and in order to get past this error is to FLATTEN your output.

How to implement a key lookup for generated keys table in pentaho Kettle

I just started to use Pentaho Kettle for integration. Seems great so far, quite intuitive compared to Talend, which I was also investigating.
I am trying to migrate some customers without their keys. So I have their email addresses.
The customer may already exist in the database, so what I need to do is:
If the customer exists, add it's id to the imported field and continue.
But if the customer doesn't exist I need to get the next Hibernate key from the table Hibernate_Sequences and set it as the id.
But I don't want to always allocate a key, so I want to conditionally execute a step to allocate the next key.
So what I want to do, is in the flow execute the db procedure, which allocates the next key and returns it, only if there's no value in id from the "lookup id" step.
Is this possible?
Just posting my updated flow - so the answer was to use a filter rows component which splits the data on true/false. I really had trouble getting the id out of the database stored proc because of a bug, so I had to use decimal and then convert back to integer (which I also couldn't figure out how to do, so used a javascript component).
Yes it is. As per official documentation (i left only valuable information) "Lookup values are added as new fields onto the stream". So u need just to put step "Filter row" in Flow section and check for "id" which suppose to be added in "Existing Id Lookup" step.

How to look up technical key of dimension using its natural key?

According to the Wiki:
"The Dimension Lookup/Update step allows you to implement Ralph Kimball's slowly changing dimension for both types: Type I (update) and Type II (insert) ..."
"To do the lookup it uses not only the specified natural keys (with an "equals" condition) but also the specified "Stream datefield" (see below)."
"As a result of the lookup or update operation of this step type, a field is added to the stream containing the technical key of the dimension."
So if I understand that correctly, it should be possible to have the "Dimension Lookup/Update" step lookup a dimensions technical/surrogate key using a natural key. In case no entry yet exists the step could also be configured to add the requested natural key to the dimension table using a unique technical key. But for now I would like to only use the lookup functionality - no update and no insert.
Here's my setup:
This is my dimension table (SCD Type 1) named "dims":
The transformation looks as follows:
But if I run this in Preview mode I get:
What I would like to see is actually the values of id (1,2,3) next to the natural keys (a,b,c)
What am I doing wrong here?
Effectively I could achieve this using a join step - but I would like to use the advanced dimension handling functionality after I got this working.
Kind regards
Raffael
http://www.joyofdata.de/blog/a-stackoverflow-but-for-business-intelligence/
This step expects a table with 3 more attributes:
start_date (date)
end_date (date)
version (int)
Check that your date settings in the „Lookup / Update“ step matches your data. Check the version field too.
Below an example:
Table:
Setting for the „Dimension Lookup / Update“ step:
Preview table (the id's that match the date are returned)