How to validate Column Names and Column Order in Azure Data Factory - azure-data-factory-2

I want read the column names from a file stored in Azure Files And then validate the column names and sequence e.g. "First_Column"="First_Column", "Second_Column"= "Second_Column", ... etc and also the order should match. Please suggest a way to do this in Azure Data Factory.

Update:
Alternatively, we can use Lookup activity to view the headers, but the judgment condition will be a little complex.
At the If Condition1 we can use the expression: #and(and(equals(activity('Lookup1').output.firstRow.Prop_0,'First_Column'),equals(activity('Lookup1').output.firstRow.Prop_1,'Second_Column')),equals(activity('Lookup1').output.firstRow.Prop_2,'Third_Column'))
We can validate the column names and sequence in dataflow via column patterns in derived column.
For example:
The source data csv file is like this:
The dataflow is like this:
I don't select First row as header , so we can read the headers of the csv file into the dataflow.
Then I use SurrogateKey1 to add a row_no to the data.
The data preview is like this:
At ConditionalSplit1 activity, I use row_no == 1 to filter the headers.
At DerivedColumn1 activity, I use several column pattern to validate the column names and sequence.
The result is as follows:

Related

How to split column value using azure data factory

I have Source csv file, In which there is one column which have multiple values (data sep.by comma (,)) so I want extract that particular one column using data factory and store that multiple records into table (in database) with different column name
Could you please suggest how should I design that azure data factory pipeline ?
You can use the split function in the Data flow Derived Column transformation to split the column into multiple columns and load it to sink database as below.
Source transformation:
Derived Column transformation:
Using the split() function, splitting the column based on delimiter which returns an array.
Derived Column data preview:
Here 2 new columns are added in the derived column which stores the split data from the source column (name).
Select transformation (optional):
In Select transformation, we can remove columns which are not used in sink and only select required columns.
Sink:
Connect sink to the database and map the column to load the data.

Adding a dynamic column in copy activity of azure data factory

I am using Data flow in my Azure Data factory pipeline in order to copy data from one cosmos db collection to another cosmos db collection. I am using cosmos SQL Api as the source and sink datasets.
Problem is when copying the documents from one collection to other,I would like to add an additional column whose value will be same as one of the existing key in json. I am trying with Additional column thing in Source settings but i am not able to figure out how can I add an existing column value in there. anyone with any help on this_
In case of copy activity, you can assign the existing column value to new column under Additional column by specifying value as $$COLUMN and add the column name to be assigned.
If you are adding new column in data flow, you can achieve this using derived column

Create table schema and load data in bigquery table using source google drive

I am creating table using google drive as a source and google sheet as a format.
I have selected "Drive" as a value for create table from. For file Format, I selected Google Sheet.
Also I selected the Auto Detect Schema and input parameters.
Its creating the table but the first row of the sheet is also loaded as a data instead of table fields.
Kindly tell me what I need to do to get the first row of the sheet as a table column name not as a data.
It would have been helpful if you could include a screenshot of the top few rows of the file you're trying to upload at least to see the data types you have in there. BigQuery, at least as of when this response was composed, cannot differentiate between column names and data rows if both have similar datatypes while schema auto detection is used. For instance, if your data looks like this:
headerA, headerB
row1a, row1b
row2a, row2b
row3a, row3b
BigQuery would not be able to detect the column names (at least automatically using the UI options alone) since all the headers and row data are Strings. The "Header rows to skip" option would not help with this.
Schema auto detection should be able to detect and differentiate column names from data rows when you have different data types for different columns though.
You have an option to skip header row in Advanced options. Simply put 1 as the number of rows to skip (your first row is where your header is). It will skip the first row and use it as the values for your header.

How to use/do where in column of a lookup in Splunk Search Query

I want the search with a field which match with any of the values in
look up table.
For now, I have used below where in query. But, I still want to query with Look up table instead of manually putting all those values in double quotes using the in clause.
|where in(search,"abcd","bcda","efsg","zyca");
First, you need to create a lookup field in the Splunk Lookup manager. Here you can specify a CSV file or KMZ file as the lookup. You will name the lookup definition here too. Be sure to share this lookup definition with the applications that will use it.
Once you have a lookup definition created, you can use it in a query with the Lookup Command. Say you named your lookup definition "my_lookup_csv", and your lookup column in your search is "event_column", and your csv column names are "column1", "column2", etc. Your search query will now end in:
| lookup my_lookup_csv column1 as event_column

Talend - Read xml schema of Ldap from an xml file

What I'm looking ?
I want to read the schema of an LDAPinput from an xml file.
Info:
The user will define the attributes that he wants in the xml file.
The job will retrive only those attributes that are defined in the xml from the LDAP folder. How can I do that?
I am new to talend and I cant find any question on this in SO.
Honestly, this is very painful to do properly and I'd seriously reconsider why you need to limit the columns back from the LDAP service and not just ignore the extraneous columns.
First of all you need to parse your XML input to get the requested columns and drop that into a list and then lob that into the globalMap.
What you're going to have to do is read in the entire output with all the columns from a correctly configured tLDAPInput component but with the schema for the component set to have a single dynamic column.
From here you'll need to use a tJavaRow/tJavaFlex component to loop through the list of expected columns from your XML input and then retrieve each column's name from the dynamic column's metadata and if the column name matches the provided values from your XML input then to output the value into an output column.
The output schema for your tJavaRow/tJavaFlex will need to contain as many columns as you could possibly have return (so every LDAP column for your service) but then populate them as they are needed. Alternatively you could output another dynamic schema column which means you don't need fixed schema columns but you'd have to add a meta column (so a column inside the dynamic column) with each match of column names.