Database lookup in Talend

Database lookup in Talend - sql

In my talned job I am copying data from Excel to SQL table.
For this job to maintain the foreign key constraint I had to do a look up before copying the data.
The task goes like this.
I have to copy data in Table2 (id keys value).
My excel sheet has data for id and keys column. Table 1 has two columns id and value.
For value column's data I want to look at Table1's corresponding entry with the id of the current record in Table2. I have to copy the data from Table1's value column to Table2's value column.
Excel (id 1 2 3, keys a b c)
Table_1 (id 1 2 3, value 123 456 789)
desired output: Table_2 (id 1 2 3, keys a b c, value 123 456 789)
current output: Table_2 (id 1 2 3, keys a b c, value null null null)
How do I properly map this?

You've set the job up exactly as needs to be done really so it's not your job layout that's the problem here.
If you're expecting every record in your Excel document to have a matching record in your database table then you should use an inner join condition in your tMap join like so:
And this then allows you to have an output that grabs everything that isn't joining (which is your issue here):
This should show you everything in your source (not lookup) file that isn't matching. I suspect that everything is failing to match on your join condition (essentially WHERE ExcelDoc.id = Table.Id) even if it looks like it should. This could be down to a mismatch of datatypes as they are read into Talend (so the Java/Talend datatypes such as int/Integer or String rather than the DB types) or because one of your id columns has extraneous whitespace padding.
If you find that your id column in one of your sources does in fact have any padding then you should be able to convert it in a previous step before your join or even just make the transformation in the join:
The only other thing I'd recommend is that you name your flows between components (just click on them to change the name), especially when you are joining anything in a tMap.

Related

How to get the differences between two - kind of - duplicated tables (sql)

Prolog:
I have two tables in two different databases, one is an updated version of the other. For example we could imagine that one year ago I duplicated table 1 in the new db (say, table 2), and from then I started working on table 2 never updating table 1.
I would like to compare the two tables, to get the differences that have grown in this period of time (the tables has preserved the structure, so that comparison has meaning)
My way of proceeding was to create a third table, in which I would like to copy both table 1 and table 2, and then count the number of repetitions of every entry.
In my opinion, this, added to a new attribute that specifies for every entry the table where he cames from would do the job.
Problem:
Copying the two tables into the third table I get the (obvious) error to have two duplicate key values in a unique or primary key costraint.
How could I bypass the error or how could do the same job better? Any idea is appreciated

Something like this should do what you want if A and B have the same structure, otherwise just select and rename the columns you want to confront....
SELECT
*
FROM
B
WHERE NOT EXISTS (SELECT * FROM A)
if NOT EXISTS doesn't work in your DBMS you could also use a left outer join comparing the rows columns values.
SELECT
A.*
from
A left outer join B
on A.col = B.col and ....

How to update numerical column of one table based on matching string column from another table in SQL

I want to update numerical columns of one table based on matching string columns from another table.i.e.,
I have a table (let's say table1) with 100 records containing 5 string (or text) columns and 10 numerical columns. Now I have another table that has the same structure (columns) and 20 records. In this, few records contain updated data of table1 i.e., numerical columns values are updated for these records and rest are new (both text and numerical columns).
I want to update numerical columns for records with the same text columns (in table1) and insert new data from table2 into table1 where text columns are also new.
I thought of taking an intersect of these two tables and then update but couldn't figure out the logic as how can I update the numerical columns.
Note: I don't have any primary or unique key columns.
Please help here.
Thanks in advance.

The simplest solution would be to use two separate queries, such as:
UPDATE b
SET b.[NumericColumn] = a.[NumericColumn],
etc...
FROM [dbo].[SourceTable] a
JOIN [dbo].[DestinationTable] b
ON a.[StringColumn1] = b.[StringColumn1]
AND a.[StringColumn2] = b.[StringColumn2] etc...
INSERT INTO [dbo].[DestinationTable] (
[NumericColumn],
[StringColumn1],
[StringColumn2],
etc...
)
SELECT a.[NumericColumn],
a.[StringColumn1],
a.[StringColumn2],
etc...
FROM [dbo].[SourceTable] a
LEFT JOIN [dbo].[DestinationTable] b
ON a.[StringColumn1] = b.[StringColumn1]
AND a.[StringColumn2] = b.[StringColumn2] etc...
WHERE b.[NumericColumn] IS NULL
--assumes that [NumericColumn] is non-nullable.
--If there are no non-nullable columns then you
--will have to structure your query differently
This will be effective if you are working with a small dataset that does not change very frequently and you are not worried about high contention.
There are still a number of issues with this approach - most notably what happens if either the source or destination table is accessed and/or modified while the update statement is running. Some of these issues can be worked around other ways but so much depends on the context of how the tables are used that it is difficult to provide a more effective generically-applicable solution.

Avoid replicating whole tables in Matlab `join`'s

I have a table T of records and fields. I want to create a new field and populate it with the result of a lookup of another table L. This means that I will use one or more fields in T as a foreign key. In SQL, I can UPDATE the newly created field in table T using a JOIN with table L. Conversely, Matlab has no updating of an existing table when doing a join; that creates a whole new table, which is then used to replace the original table T. It seems like a lot of data replication to populate one field. Is this avoided under-the-hood? Is there a code design pattern or idiom that avoids this, but is still reasonably readable and doesn't compromise on code compactness too much?
While I asked this question in the context of join, I'd be interested in general in strategies for avoiding table replication in all variations of Matlab joins.
I'll describe an example of how, for each record in Table1, the ForeignKey is used to look up Data in Table2.
Table1
-----------------------------
SomeField NewField ForeignKey
--------- -------- ----------
someData1 dummy a
someData2 dummy b
someData3 dummy a
someData4 dummy b
someData5 dummy a
Table2
--------
Key Data
--- ----
a apple
b banana
The following SQL code performs the lookup. The entry in the Data field is then concatenated with the content of field SomeField in Table1 and stored into field NewField.
UPDATE Table1 INNER JOIN Table2
ON Table1.ForeignKey = Table2.Key
SET Table1.NewField = Table1.SomeField & Table2.Data
The updated Table1 is:
Table1
------------------------------------
SomeField NewField ForeignKey
--------- --------------- ----------
someData1 someData1apple a
someData2 someData2banana b
someData3 someData3apple a
someData4 someData4banana b
someData5 someData5apple a
Interesting to note that Table INNER JOIN Table2 isn't actually created. It is only "virtually" created to enable the calculation with which to update Table1. In contrast, Matlab's JOIN creates the actual joined table, and a separate operation is needed to do the calculation.

The operation that you're doing must look something like this:
table1 = table({'someData1';'someData2';'someData3';'someData4';'someData5'},...
{'a';'b';'a';'b';'a'},'VariableNames',{'SomeField','ForeignKey'});
table2 = table({'a';'b'},{'apple';'banana'},'VariableNames',{'Key','Data'});
table3 = join(table1,table2,'LeftKeys','ForeignKey','RightKeys','Key')
This produces the following table:
SomeField ForeignKey Data
___________ __________ ________
'someData1' 'a' 'apple'
'someData2' 'b' 'banana'
'someData3' 'a' 'apple'
'someData4' 'b' 'banana'
'someData5' 'a' 'apple'
And then you are applying some sort of operation on the columns SomeField and Data.
I think this join function has been added to make use easy for those familiar with SQL but less familiar with MATLAB syntax.
If you are still worried about copying large amounts of data (as I mentioned, this is not the case because of lazy copying), you can obtain the column Data above using the following set-based operations:
[~,index] = ismember(table1.ForeignKey,table2.Key);
data = table2.Data(index);
Here, data is a cell array, identical to table3.Data. In any case, the index values created here are what SQL would create internally for this JOIN operation. If a table1.ForeignKey is not in table2.Key, the corresponding index value is 0 (MATLAB indexes starting at 1). In that case, you cannot use index directly to index, you would need to use an additional level of indexing to get only the valid rows:
[valid,index] = ismember(table1.ForeignKey,table2.Key);
data1 = table1.SomeField(valid)
data2 = table2.Data(index(valid));
Note that table/join uses ismember in this exact same way, then copies the left table (which causes the copy to reference the data in the input table due to lazy copying), and adds columns to it for the right table.

I’m no expert on tables, but I can give insight into how MATLAB operates with this type of data.
A MATLAB table object contains a matrix for each column.
Copying a matrix in MATLAB does not copy the data. MATLAB uses lazy copying. This means that the copy references the same data as the original (until the copy or original is changed, at which point a copy is made). This behavior is well documented (1), (2).
Thus, creating a new table using whole columns from other tables would cause matrices to be copied, but these copies don’t incur any actual copying of matrix contents, the new table references data in the original tables.
But if any of the values in a column is changed, the whole column will need to be copied to avoid the other table to see the same change. The reference is internal and temporary, and invisible to the user. For all intends and purposes, it looks like the new table contains a copy of the original data.
However, if the join operation causes rows to be swapped or removed, all of this is a moot point. The data will be copied.

How to capture rejected records in SAP BODS?

I am doing a lookup operation in BODS. The records that match need to be sent to a table and the ones that don't match need to be captured in a file.
Can you please help me out on how to capture the records that failed the lookup? It was easy in IBM Datastage but in SAP BODS its quite complex.

Look up is like a left Outer Join, it returns null for the non matching column from the right side table.
If you are matching records from Table A by doing look up to Table B, you must be fetching fields from Table B as well,
If the records are matching they will have values from table B, and those that doe not match will end having NULL values for Table B columns,
after separate the data with 2 Query transforms
**1) where Table B col is not null and load to the desired table
2) where Table B col is null and load Flat File.**

Creating a join list between two tables VBA

Good afternoon,
I am still quite a novice with VBA but am trying to create a loop that will be able to sift through a long list of data within a given column (in my case, both tables have one common identifier, the system ID) and if a system ID is matched in one column with a column from the other table, then a new sheet is created that combines all of the rows associated with both sets of data into one row.
For example, if my data looked like this:
Table 1
Column A, Column B, Column C |
ID, Name, Birthday
Table 2
Column A, Column B, Column C|
Purchase, Amount, ID
And I had the same ID in both Tables 1 and 2, for each match, I would like to have all rows associated with the match joined together.
This would really enable me to speed things up with organizing information, so I was not sure if it would be possible... Any Ideas are welcome!

since excel is not a database program like access, you can not use sql-like joins natively. you would have to program your own join function:
(Since i do not have MS Office installed, i can only give you pseudo-code)
for each-loop going through IDs of Table1
for each-loop going through IDs of Table2
if(Table1.ID = Table2.ID) then
copy data of Table1 into a new sheet
copy data of Table2 into the same sheet, next to Table1 data
PS: i assume you use excel because of the vocabulary (column, worksheet,..)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Database lookup in Talend - sql

Related

How to get the differences between two - kind of - duplicated tables (sql)

How to update numerical column of one table based on matching string column from another table in SQL

Avoid replicating whole tables in Matlab `join`'s

How to capture rejected records in SAP BODS?

Creating a join list between two tables VBA

Categories

Resources