How to view the problematic records in IIDR CDC - ibm-data-replication

we are using IBM Data Replication CDC. Our target system is IBM Integrated Analytics System (IIAS) and we are using external table mirror bulk apply method to apply changes to IIAS.
When we get errors we want to see the problematic records but couldn't find the file where rejected records are written. Is there a parameter which we can set on CDC to set the path for rejected records or is there a default path for it?

this is Yukun from IBM Data Replication team. The following answer may help you with the problem.
Parameter:
you can enable global_trace_hours parameter which will write the conflicting rows to the target trace "on" directory.
The path for reject records:
Rejected records are written to .bad file that can be configured through external table option "LOGDIR" in /conf/db2loadoptions.xml.
The rejected records file would have format "database.schema.external-table-name.file-name.application-handle.id.bad" and available in the refresh loader path where external table files are written.

Related

Azure Data Factory Copy Activity - Does not fail, despite being unsuccessful

Overview
I have a Data Factory copy activity in a pipeline that moves tabular data (.csv) from a file in Azure Data Lake into a SQL database-landing schema.
The table definitions will be refreshed each load, in principle, and so there should not be errors. However, I want to log errors in the case that the table exists and the file cannot be mapped to the destination.
I anticipate the activity failing when it cannot match the column to the destination table definition. Strangely my process succeeds despite not loading the file into the database.
Testing
I added a column in my source file.
I did not update the destination database table definition.
I truncated the destination table.
I unset fault tolerance.
Deleted all copy activity logs (nascent development environment, so this is OK).
Run the pipeline.
Check Data Factory pipeline Output, 100% success rating!?
Query destination table (empty).
Open each log file and look for relevant information (nothing just headers per copy activity call).
Copy activity settings

BigQuery Scheduled Data Transfer throws "Incompatible table partitioning specification." Error - but error message is truncated

I'm using the new BQ Data Transfer UI and upon scheduling a Data Transfer, the transfer fails.
The error message in Run History isn't terribly helpful as the error message seems truncated.
Incompatible table partitioning specification. Expects partitioning specification interval(type:hour), but input partitioning specification is ; JobID: xxxxxxxxxxxx
Note the part of the error that says..."but input partition specification is..." with nothing before the semicolon. Seems this error is truncated.
Some details about the run:
The run imports data from a CSV file located in a GCS Bucket on a nightly basis. Once successfully ingested the process will delete the file. The target table in BQ is a partitioned table using the default partition pseudo column (_PARTITIONTIME)
What I have done so far:
Reran the scheduled Data Transfer -- which failed and threw the same error
Deleted the target table in BQ and recreated it with different partition specifications (day, hour, month) -- then Reran Scheduled Transfer -- failed and threw same error.
Imported the data manually (I downloaded the file from GCS and uploaded it locally from my machine) using the BQ UI (Create Table, append the specific table) - Worked perfectly.
Checked to see if this was a known issue here on Stack Overflow and only found this (which is now closed) -- close, but not exactly the issue. (BigQuery Data Transfer Service with BigQuery partitioned table)
What I'm holding off doing since it would take a bit more work.
Change schema of the target BQ table to include a specified column specific for partitioning
Include a system-generated timestamp in the original file inside of GCS and ensure the process recognizes this as the partitioning field.
Am I doing something wrong? Or is this a known issue?
Alright, I believe I have solved this. It looks like you need to include runtime parameters into your target table if the destination table is being partitioned.
https://cloud.google.com/bigquery-transfer/docs/gcs-transfer-parameters
Specifically this section called "Runtime Parameter Examples" here: https://cloud.google.com/bigquery-transfer/docs/gcs-transfer-parameters#loading_a_snapshot_of_all_data_into_an_ingestion-time_partitioned_table
They also advise that minutes cannot be specified in these parameters.
You will need to append the parameters to your destination table details as shown below:

SQL Polybase External Table - Dealing with file metadata

I cannot find any reference for dealing with file metadata when creating an External Table starting from a partitioned source of files. More precisely: I have a set of partitioned parquet files. The partition strategy is in the form:
{YEAR}/{MONTH}/{filename}.parquet
Now I can create an external table referencing the whole set using the LOCATION pointing at the root of the partition and using s recursive strategy.
LOCATION = 'folder_or_filepath' Specifies the folder or the file path
and file name for the actual data in Hadoop or Azure blob storage. The
location starts from the root folder. The root folder is the data
location specified in the external data source.
In this context, it would be crucial to be able to access partitioning metadata like {YEAR}, {MONTH} or {filename} and store them as columns into the newly created external table for further usages.
By my researches, access file metadata seems to be a missing feature right now. But I'm not sure.
For sure, it is not possible leverages on PARTITION BY functionality as evidenced here:
https://feedback.azure.com/forums/307516-azure-synapse-analytics/suggestions/19520860-polybase-partitioned-by-functionality-when-creati
Is there some mitigation strategy? I'm about to set up a Data Factory Mapping Dataflow which will do the dirty job. But I'm still unsure about these two options:
Reducing the partitioned set into a single file adding metadata columns on each row;
Just adding metadata columns on each file and leave the partitioned hierarchy;
Bonus: any suggestion?

Azure SQL Database sys.fn_get_audit_file or sys.fn_xe_file_target_read_file troubles

I'm having troubles on a Azure SQL Database where i'm trying to read DB Audit logs.
Both procedures sys.fn_get_audit_file or sys.fn_xe_file_target_read_file sould be able to read a file.
But whatever I do i'm getting blank tables.But, even if I specify a non existing file I receive a table with zero records instead of a error.
So I'm afraid its something else.
My login is in the db_owner group.
Any suggestions ?
I found that I could only read XEL files by using the same server and same database context that they were created for. So for example, consider the following scenario:
ServerA is the Azure Synapse instance I was creating the audit XEL files from, all related to DatabaseA
ServerB is a normal SQL instance that I want to read the XEL files on
Test 1:
Using ServerB, try to read file directly from blob storage
Result: 0 rows returned, no error message
Test 2:
Using ServerB, download the XEL files locally, and try to read from the local copy
Result: 0 rows returned, no error message
Test 3:
Using ServerA, with the current DB = 'master', try to read file directly from blob storage
Result: 0 rows returned, no error message
Test 4:
Using ServerA, with the current DB = 'DatabaseA', try to read file directly from blob storage
Result: works perfectly
Because I really wanted to read the files from ServerB, I also tried doing a CREATE CREDENTIAL there that was able to read & write to my blob storage account. That didn't make any difference unfortunately - a repeat of Test 1 got the same result as before.

Iterating through folder - handling files that don't fit schema

I have a directory containing multiple xlsx files and what I want to do is to insert the data from the files in to a database.
So far I have solved this by using tFileList -> tFileInputExcel -> tPostgresOutput
My problem begins when one of this files doesn't match the defined schema and returns an error resulting on a interruption of a workflow.
What I need to figure out is if it's possible skip that file (moving it to another folder for instance) and continuing iterating the rest of existing files.
If I check the option "Die on error" the process ends and doesn't process the rest of the files.
I would approach this by making your initial input schema on the tFileInputExcel be all strings.
After reading the file I would then validate the schema using a tSchemaComplianceCheck set to "Use another schema for compliance check".
You should be able to then connect a reject link from the tSchemaComplianceCheck to a tFileCopy configured to move the file to a new directory (if you want it to move it then just tick "Remove source file").
Here's a quick example:
With the following set as the other schema for the compliance check (notice how it now checks that id and age are Integers):
And then to move the file:
Your main flow from the tSchemaComplianceCheck can carry on using just strings if you are inserting into a database. You might want to use a tConvertType to change things back to the correct data types after this if you are doing any processing that requires proper data types or you are using your tPostgresOutput component to create the table as well.