Unable to load distinct records via lookup in informatica - sql

This is a very strange thing happening i dont know why. I have created a mapping that transforms the data via expression and loads the data into the target(file) based on lookup on the same target.
Source table
#CompanyName
Acne Lmtd
Acne Ltd
N/A
None
Abc Ltd
Abc Ltd
X
Mapping
Source
->Exp(trim..)
->Lookup(source.company_name
= tgt.company_name)
ReturnPort is CompId
-> filter(ISNULL(CompId))
-> Target
Compid (via sequence
gen)
CompName
The above mapping logic inserts duplicate companynames as well like in source 2 Abc Ltd records same is repeated in target as well. I dont know why. I have tried to debug as well the condition evaluates to true in filter that companyid is null even if the record is already inserted in target.
Also, i thought it might be the case of lookup cache i do enabled dynamic as well but same result. It should have worked like an sql query
select company_id
From lkptarget where
company_name
In (select company_name
from
Source)
Therefore, for Abc Ltd the filter condition should have result in false
Isnull(company_id) false
But, this is getting true. How do I get unique records via lookup and without using distinct?
Note: lookup used is dynamic lookup already

That was in fact a dynamic cache issue the newLookupRow gets assigned a value of 0 on duplicates so I have added the condition in filter as ISNULL(COMPANYID) AND NEWLOOKUPROW=1
and finally that did work.

The Lookup transformation has not way to know what happens in further transformations in the mapping. It can't see results in the target itself, because the Lookup cache is loaded once at the beginning of the mapping using a separate connection to the database. Even if you disable caching (that would mean one query for each Lookup input row), data is not immediately committed (so not visible to other connections) when writing to the target.
That's the reason to use dynamic Lookup cache, which works by adding new lines to the Lookup cache. However in your case there is a catch : the company_id is created after the Lookup (it's the right place to do so), so it can't be added to the Lookup cache.
I think you could configure the Lookup so that :
You activate the options Dynamic Lookup Cache, Update Else Insert and Insert Else Update
You use the company_name to make comparison between source data and Lookup data
You create a fake field company_id with value 0 before the Lookup and associate it to the corresponding Lookup field
You check the checkbox Disable in comparison for the company_id field
You can then use the predefined field NewLookupRow (it appears when you check the Dynamic Lookup Cache option) which should have a value of 1 for new rows or 2 for existing rows with updates (0 for identical rows)
The Lookup should now output NewLookupRow = 1 for the first Abc Ltd and then NewLookupRow = 0 for the second. The filter just after the Lookup should have a condition like NewLookupRow = 1.
For more details you can have a look at the Informatica documentation :
https://docs.informatica.com/data-integration/data-services/10-2/developer-transformation-guide/dynamic-lookup-cache.html

Related

Unable to implement SCD2

I was just working on SCD Type 2 and was unable to fully implement it in a way that some scenarios were not getting full filled. I had done it in IICS. Really finding it very very difficult to cover all possible scenarios. So, below is the flow:
Src
---> Lkp ( on src.id = tgt.id)
---> expression ( flag= iif (isnull (tgt.surrogatekey) then Insert, iif(isnotnull(tgt.surrogatekey) and md5(other_non_key_cole)<>tgt.md5)then Update)
----> insert on flag insert(works fine)
but on update i pass updates
to 2 target instances
of same target table
in one i am updating it
as new update as insert
and in other i
am updating tgt_end_date=lkp_start_date for previously stored ids and active_ind becomes 'N'.
But what happens is this works but not in when i receive new updates with again same records meaning duplicates or simply rerunning the mapping inserts duplicates in the target table and changing of end_date also becomes unstable when i insert multiple changes of the same record it sets all active_flags to 'Y' what expected is all should be 'N' except the last latest in evry run. Could anyone please help with this even in SQL if you can interpret.
If I’ve understood your question correctly, in each run you have multiple records in your source that are matching a single record in your target.
If this is the case then you need to process your data so that you have a single source record per target record before you put the data through the SCD2 process

Access Issue and best function to accomplish task

New user here and I've read many threads, but can't seem to figure out the best way to accomplish my task.
Current Issue:I'm using a switch function in Access to accomplish my goal. Here is what I have, but i'm getting a syntax error?
UPDATE all_rugs_prod
SET construction_facet =
Switch(
construction = Machine Woven, Machine Made,
construction = Machine Made, Machine Made,
construction = Printed, Printed,
construction = Hand Hooked, Hand Hooked
)
all_rugs_prod is Database,
construction_facet is the field I want to value to be returned in,
construction is the field it is going to search in.
I'm very new to all this so, i need as much help as I can get.....
Backdrop:I'm taking say database 1, then mapping/matching the fields to database 2. database 2 has many other fields that require data to be popluated in that were added in database 2 that were added.
I created an Append database from database 1 into databas 2 and matched those fields that were appended from database 1 that match database 2.
My biggest issue is the fact that I need to normalize/map data in database 2. Example: in database 2 there is a field from database 1 that has many different text values. I need to search that field and bring back a predetermined text value based on a predetermined list it would fit into. So say in database 2.field7 the text is "aqua blue", I need to normalize/map it to return it to database 2.field8 "blue" and so on and so forth. what is the best way to accomplish this. The list in some cases of say various colors is very long. Thanks!
The syntax error arises because you need to enclose literal strings in double quotes, e.g.
"Machine Woven"`
Otherwise each word separated by whitespace will be interpreted as a field (as opposed to a literal string), which, if not found in the source dataset, will result the fields being interpreted as parameters requiring a value to be supplied by the user; but more critically, this will result in too many arguments supplied to the switch function.
However, since you are only updating the value of records which contain the value "Machine Woven" in the construction field, your query could be simplified to:
update all_rugs_prod
set construction_facet = "Machine Made"
where construction = "Machine Woven"
For a situation in which many possible values in place of "Machine Woven" are being mapped to "Machine Made", I would suggest creating a separate mapping table, e.g.:
Mapping_Table
+---------------------+--------------+
| map_from | map_to |
+---------------------+--------------+
| Machine Woven | Machine Made |
| Machine Built | Machine Made |
| Machine Constructed | Machine Made |
+---------------------+--------------+
And then use a simple update query with inner joins to the above mapping table to perform an implicit selection and update the new value, e.g.:
update
all_rugs_prod inner join mapping_table on
all_rugs_prod.construction = mapping_table.map_from
set
all_rugs_prod.construction_facet = mapping_table.map_to

SSIS Inserting incrementing ID with starting range into multiple tables at a time

Is there are one or some reliable variants to solve easy task?
I've got a number of XML files which will be converting into 6 SQL tables (via SSIS).
Before the end of this process i need to add a new (in fact - common for all tables) column (or field) into each of them.
This column represents ID with assigning range and +1 incrementing step. Like (350000, 1)
Yes, i know how to solve it on SSMS SQL stage. But i need a solution at SSIS's pre-SQL converting lvl.
I'm sure there should be well-known pattern-solutions to deal with it.
I am going to take a stab at this. Just to be clear, I don't have a lot of information in your question to go on.
Most XML files that I have dealt with have a common element (let's call it a customer) with one to many attributes (this can be invoices, addresses, email, contacts, etc).
So your table structure will be somewhat star shaped around the customer.
So your XML will have a core customer information on a 1 to 1 basis that can be loaded into a single main table, and will have array information of invoices and an array of addresses etc. Those arrays would be their own tables referencing the customer as a key.
I think you are asking how to create that key.
Load the customer data first and return the identity column to be used as a foreign key when loading the other tables.
I find it easiest to do so in script component. I'm only going to explain how to get the key back. I personally would handle the whole process in C# (deserializing and all).
Add this to Using Block:
Using System.Data.OleDB;
Add this into your main or row processing depending on where the script task / component is:
string SQL = #"INSERT INTO Customer(CustName,field1, field2,...)
values(?,?,?,...); Select cast(scope_identity() as int);";
OleDBCommanad cmd = new OleDBCommand();
cmd.CommandType = System.Data.CommandType.Text;
cmd.CommandText = SQL;
cmd.Parameters.AddWithValue("#p1",[CustName]);
...
cmd.Connection.Open();
int CustomerKey = (int)cmd.ExecuteScalar(); //ExecuteScalar returns the value in first row / first column which in our case is scope_identity
cmd.Connection.Close();
Now you can use CustomerKey for all of the other tables.

Bigquery return nested results without flattening it without using a table

It is possible to return nested results(RECORD type) if noflatten_results flag is specified but it is possible to just view them on screen without writing it to table first.
for example, here is an simple user table(my actual table is big large(400+col with multi-level of nesting)
ID,
name: {first, last}
I want to view record particular user & display in my applicable, so my query is
SELECT * FROM dataset.user WHERE id=423421 limit 1
is it possible to return the result directly?
You should write your output to "temp" table with noflatten_results option (also respective expiration to be set to purge table after it is used) and serve your client out of this temp table. All "on-fly"
Have in mind that no matter how small "temp" table is - if you will be querying it (in above second step) you will be billed for at least 10MB, so you better use Tabledata.list API in this step (https://cloud.google.com/bigquery/docs/reference/v2/tabledata/list) which is free!
So if you try to get repeated records it will fail on the interface/BQ console with the error:
Error: Cannot output multiple independently repeated fields at the same time.
and in order to get past this error is to FLATTEN your output.

SQL Server: Remove substrings from field data by iterating through a table of city names

I have two databases, Database A and Database B.
Database A contains some data which needs to be placed in a table in Database B. However, before that can happen, some of that data must be “cleaned up” in the following way:
The table in Database A which contains the data to be placed in Database B has a field called “Desc.” Every now and then the users of the system put city names in with the data they enter into the “Desc” field. For example: a user may type in “Move furniture to new cubicle. New York. Add electric.”
Before that data can be imported into Database B the word “New York” needs to be removed from that data so that it only reads “Move furniture to new cubicle. Add electric.” However—and this is important—the original data in Database A must remain untouched. In other words, Database A’s data will still read “Move furniture to new cubicle. New York. Add electric,” while the data in Database B will read “Move furniture to new cubicle. Add electric.”
Database B contains a table which has a list of the city names which need to be removed from the “Desc” field data from Database A before being placed in Database B.
How do I construct a stored procedure or function which will grab the data from Database A, then iterate through the Cities table in Database B and if it finds a city name in the “Desc” field will remove it while keeping the rest of the information in that field thus creating a recordset which I can then use to populate the appropriate table in Database B?
I have tried several things but still haven’t cracked it. Yet I’m sure this is probably fairly easy. Any help is greatly appreciated!
Thanks.
EDIT:
The latest thing I have tried to solve this problem is this:
DECLARE #cityName VarChar(50)
While (Select COUNT(*) From ABCScanSQL.dbo.tblDiscardCitiesList) > 0
Begin
Select #cityName = ABCScanSQL.dbo.tblDiscardCitiesList.CityName FROM ABCScanSQL.dbo.tblDiscardCitiesList
SELECT JOB_NO, LTRIM(RTRIM(SUBSTRING(JOB_NO, (LEN(job_no) -2), 5))) AS LOCATION
,JOB_DESC, [Date_End] , REPLACE(Job_Desc,#cityName,' ') AS NoCity
FROM fmcs_tables.dbo.Jobt WHERE Job_No like '%loc%'
End
"Job_Desc" is the field which needs to have the city names removed.
This is a data quality issue. You can always make a copy of the [description] in Database A and call it [cleaned_desc].
One simple solution is to write a function that does the following.
1 - Read data from [tbl_remove_these_words]. These are the phrases you want removed.
2 - Compare the input - #var_description, to the rows in the table.
3 - Upon a match, replace with a empty string.
This solution depends upon a cleansing table that you maintain and update.
Run a update query that uses the input from [description] with a call to [fn_remove_these_words] and sets [cleaned_desc] to the output.
Another solution is to look at products like Melisa Data (DQ) product for SSIS or data quality services in the SQL server stack to give you a application frame work to solve the problem.