Convert columns to rows in pairs using Informatica powercenter or SQL - sql

I have a requirement where I have to track changes on a column and display changed and new value
I have a source file with employee attributes. I did a lookup on employee table and returned employee attributes on which I am tracking changes. I created a flag where I am comparing columns from source and look. I also have a router to filter out update_flag='Y'
employee_id,name,old_department_id,new_department_id,old_salary,new_salary
1,SAM,10,20,100,200
to
employee_id,employee_attribute,old_value,new_value
1,SAM,department_id,10,20
1,SAM,salary,100,200

You could use a Router with an output for each record you want to create. The Router can output multiple records for a single input record.
Group1 : old_department_id != new_department_id
Group2 : old_salary != new_salary
After the Router you use Expressions to build the record values you want like adding the column employee_attribute. Then you combine the outputs of each Expression with a Union.

Related

Handling Json data in snowflake

enter image description here
I have a table which contains Json file data in each row which gets updated into my snowflake table every weak. I am extracting values from the Json files into another table. When the data is loaded in Json format there are multiple entries of the same ID. So, when I extract values from Json to a table there are duplicate rows. How do I tackle them in order to get the distinct rows only. My select query look something like this:
select
json_data:data[0].attributes."Additional Invoice?":: string as "Additional Invoice?",
json_data:data[0].attributes."Additional PO?":: string as "Additional PO?",
json_data:data[0].attributes."Aggregate Contract Value":: number as "Aggreagate Contract Value" ,
json_data:data[0].attributes."Annualized Baseline Spend" :: number as "Annualized Baseline Spend",
json_data:data[0].id ::number as ID,
json_data:data[0].type::string as TYPE
from scout_projects order by ID
the scout project file screenshot is attached.
The attached Screenshot is the output form the given query and as you could see the ID column is the same but there are only 2 unique rows. I want my query to return only those 2 unique rows.
select distinct json_data:data[0].id :: number as ID from scout_projects
what is the approach should I take?
I tried using subquery, but it gave me error stating "single-row subquery returns more than one row. snowflake error" which is obvious. so, need a way out .

Azure Data Factory Exist Transformation

Is there a way that after comparing two tables and then use the Case function?
I am trying to have a new column base on Exists transformation. In sql I do it like this:
(isnull (select 'YES' from sales where salesperson = t1.salesperson group by salesperson), 'NO')) AS registeredSales
T1 is personal.
Or should I include the table into the stream of the joins and then use the case() function to compare the two columns?
If there's another way to work around to compare these two streams, I would be pleased to hear.
Thanks.
Flat files in a datalake can also be compared. We can use the derived column in dataflow to gernerate a new column.
I create a dataflow demo cotains two sources: CustomerSource(customer.csv stored in datalake2) and SalesSource(sales.csv stored in datalake2 and it contains only one column) as follows
Then I join the two sources with the column CustomerId
Then I use Select activity to give an alias to the CustomerId from SalesSource
In the DerivedColumn, I select the Add column and enter the expression iifNull(SalesCustomerID, 'NO', 'YES') to generate a new column named 'registeredSales' as follows:
The last column of the result shows:

Custom column in Custom SQL on BigQuery

I'm stacking on some issue on Tableau when I'm trying to run Custom query with string parameter. I'd like to query one column dynamically from certain table on BigQuery.
My SQL looks like.:
select <Parameters.column for research> as column,
count(*) as N
from table_name
where date=<Parameters.date>
group by 1
Here I'm trying to use parameter as column name.
But unfortunatlly I'm receive string column with one value of the parameter.
Is it possible to execute my request? If it's doable, so how to write the Custom SQL?

Regex Table name pattern to exclude certain tables in CaptureChangeMySQL

How to exclude certain table(names) in Nifi's CaptureChangeMySQL processor by passing a table name pattern?
For e.g. I have 500 tables and their corresponding history tables.
Capture change should work for Employee, Order etc. but not for their corresponding tables EmployeeHistory, OrderHistory and so on.
In short, tables with postfix 'History' should be filtered by the processor.
I tried
1) $.table_name:equals('DeviceHistory'):not() - not worked
2) ${table_name:equals('*History'):not()} - not worked either
From the NiFi CaptureChangeMySQL processor documentation, the Table Name Pattern field is set as:
A regular expression (regex) for matching CDC events affecting matching tables. The regex must match the table name as it is stored in the database. If the property is not set, no events will be filtered based on table name.
This should be a Java regex string. Looking at the NiFi CaptureChangeMySQL processor source code, here is a code snippet of how this value is used:
// Should we skip this table? Yes if we've specified a DB or table name pattern and they don't match
skipTable = (databaseNamePattern != null && !databaseNamePattern.matcher(data.getDatabase()).matches())
|| (tableNamePattern != null && !tableNamePattern.matcher(data.getTable()).matches());
where tableNamePattern holds Pattern.compile(YOUR_TABLE_NAME_PATTERN).
I wrote a sample program based on this and got the desired behavior using this regex string:
^(?:(?!History).)*$
Here is a demo: https://regex101.com/r/VWuSTy/1/tests

Dynamic SQL queries as parameter

I need a Report where a user has to choose 2 parameters. The first parameter contains the years (2017, 2016...), and the second one contains the ID process. Depending on the process that the user chooses, the SQL statement will be one or another. The parameter year is part of the WHERE clause of the SQL contained in the second parameter.
So I have this report with 2 parameters (param_year, Indicador). Query parameter is done using a table datasource, where the IDs column contains the SQL sentences and the Values column contains the text the user must select.
So what I'm doing next is to set ${Indicador} as the SQL statement in the JDBC connection that I have done to the Database. This is reporting me an SQL error
"Failed at query: ${Indicador}.
Any suggestions will be appreciated. Thanks in advance.
Another option is to create multiple datasources in your Master/sub report, then select appropriate datasource using PRD expression on Master/sub Report -> Attributes -> query -> name attribute.
More detailed explanation:
Create a query (I mean a query as a PRD object, which uses the PRD datasource) for every SQL string you need and move the SQL strings from the parameter table into Report Designer queries definitions.
Replace the SQL strings in your parameter table with names of corresponding queries, e.g:
Use the value of your parameter (which should be equal to the PRD query name) as value for Master/sub Report -> Attributes -> query -> name attribute:
You need Pentaho Data Integration to do this kind of dynamic query
If the table structure (output columns) for both queries is the same, you could put them together into one big SQL statement with UNION ALL and put in each query "WHERE ${Indicador} = ValueToRunThisQuery".
The optimizer should be smart enough to know the not-selected subquery is going to return zero rows and not even run it. You can supply a few null columns if one query has fewer columns, but the data types have to be the same for filled columns.
If the output table structure is different between the two queries they should be in different data sources, or even reports.
SELECT ID, BLA, BLA, BLA, ONLY_IN_A
FROM TABLE A
WHERE ${Indicador} = "S010"
UNION ALL
SELECT ID, BLA, BLA, BLA, NULL
FROM TABLE B
WHERE ${Indicador} = "S020"