Loading files with dynamically generated columns

Loading files with dynamically generated columns - sql

I need to create a SSIS project that loads daily batches of 150 files into a SQL Server database. Each batch always contains the same 150 files and each file in the batch has a unique name. Also each file can either be a full or incremental type. Incremental files have one more column than the full files. Each batch contains a control file that states if a file is full or incremental. See example of a file below:
Full File
| SID | Name | DateOfBirth |
|:---: | :----: | :-----------: |
| 1 | Samuel | 20/05/1964 |
| 2 | Dave | 06/03/1986 |
| 3 | John | 15/09/2001 |
Incremental File
| SID | Name | DateOfBirth | DeleteRow |
|:---: | :----: | :-----------: | :----------: |
| 2 | | | 1 |
| 4 | Abil | 19/11/1993 | 0 |
| 5 | Zainab | 26/02/2006 | 0 |
I want to avoid creating 2 packages (full and incremental) for each file.
Is there a way to dynamically generate the column list in each source/destination component based on the file type in the control file? For example, when the file type is incremental, the column list should include the extra column (DeleteRow).

Let's assume my ControlFile.xlsx is :
Col1 Col2
File1.xlsx Full
file2.xlsx Incremental
Flow:
1.Create a DFT where ControlFile.xlsx is captured in an object variable. Source : Control connection, Destination : RecordSet Destination
Pass this object variable in ForEach loop. ResultSet variable should be capturing Col2 of ControlFile.xlsx.
Create a Sequence container just for a start point. Add 2 DFD for full load and incremental load. Use the constraints (as shown in below
image) to decide which DFD will run.
Inside DFD, use excel source to OLEDB destination.
Use FilePath variable for connection property in Full load and incremental excel connections to make it dynamic.
Step1: overall image
Step2:
In DFT - read control file, you read the FlowControl.xlsx to save it RecordSet destination, into RecordOutput variable
Step3:
Your precedence constraints should look like below image("Full" for full load, "Incremental" for incremental load ) :
Use the source and destination connections as shown in first image. It's a bit hard to explain all the steps, but flow is simple.
one thing to notice is, you have additional column in Incremental, hence you'll need to use 'Derived Column' in your full load for correct mapping.
Also, make sure DelayValidation property is set to true.
For each loop container uses For each ADO Enumerator. Following images describe the properties :
AND

I can think of two solutions.
1) Have a script task at the beginning of the package that looks to see if this is an incremental load or a full load. If it is a full load, have it loop through all the files and add a "DeleteRow" column with all zeros to every file. Then you can use the same column list.
2) Use BiML to dynamically generate your package at run time based on the available metadata.

Related

Karate - How to construct two tables, using lines from each to validate against the other [duplicate]

I want to use single row under examples in cucumber like below:
Examples:
| data1 | data2|paymentOp|
| MySql | uk1 |??????????|
Where paymentOp is a number which I am getting from java method which has List as an argument. The method returns each of the numbers which I want to pass it under paymentOp.
There is an absolute way to iterate it by copy the row and paste it again in the table but I don't want that because the method has a dynamic result which may return 2 or 5 set of numbers.
Is it possible to achieve it using Karate?
How to proceed further. Any lead here would be much appreciated!

You can combine Examples: with dynamic behavior. Please read this example (especially the second one): https://github.com/intuit/karate/blob/master/karate-demo/src/test/java/demo/outline/examples.feature
Since you have difficulties reading the docs and examples (:P) here is a simple example. Take some time to understand it carefully.
Background:
* def data = { one: 1, two: 2, three: 3 }
Scenario Outline:
* match data.<key> == <value>
Examples:
| key | value |
| one | 1 |
| two | 2 |
| three | 3 |

How can I Import SAP txt file into SQL Server Express 2017?

For now the closest thing to an ETL process I can get is using the Import Wizard to upload data from a periodic SAP job. The problem is that our SAP implementation outputs a txt file that is failing to import. Please see below a sample of the txt files I am attempting to import.
Sample file
27.07.2018 Dynamic List Display 1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Material |Plnt|SLoc|Batch |Created |Created by|Last Chg |Changed by|Year|Pe|PIB|Unrestricted|Stock in tfr|In Qual. Insp.|Restricted|Blocked|Returns|GrV |Stock Category|Reserved qty|
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 00R000002|US03|Z101|0021662831|19.12.2017|RFCDWP |27.07.2018|PFPREMOTE1|2018| 7| | 920,000 | 0,000 | 0,000 | 0,000 | 0,000 | 0,000 |1800 57|TWA999999 | 0,000 |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

The import wizard was creating a "Column0" and although I was ignoring it, when selecting a data source the DataType had to be changed to "text stream" (string(x) might have also worked but I am ignoring the column later on anyways so I didn't care to find out)

Returning a value based on multiple conditions in excel

Consider the following data:
Item | Overall | Individual | newColumn
A | Fail | Pass | blank
A | Fail | Fail | blank
B | Fail | Pass | issue
B | Fail | Pass | issue
C | Pass | Pass | blank
I have the logic built out for the first 3 columns already. There are two levels of fails in this data:
overall, and
individual.
If any of the individual fail, the overall fails. Sometimes the overall can fail even though all the individuals are fine. This logic is already built out.
I am trying to find a formula for the newColumn. If all the individuals are a pass for a given item (example item B), but the overall is still a fail, the cell should return the text "issue". It is ok if it returns issue twice, not sure if you can non-dupe that part. I've tried various forms of countifs/and/ors and creating columns that count distinct values but I always find a scenario where it will break the logic.

Try this:
=IF(COUNTIFS($A$2:$A$6,A2,$C$2:$C$6,"Fail"),"blank",IF(B2="Fail","Issue","blank"))
As required

If you add a new column with the formula:
=IF(B2="Fail",IF(COUNTIFS(A:A,A2,C:C,"fail")=0,"issue",""),"")
Then this should work on the assumptions:
For each item if one of the overalls are false they are all false
The only two possible values are "Pass" and "Fail" for columns B & C
If you require the word blank instead of a blank cell then use:
=IF(B2="Fail",IF(COUNTIFS(A:A,A2,C:C,"fail")=0,"issue","blank"),"blank")

Value to table header in Pentaho

Hi I'm quite new in Pentaho Spoon and I have a problem:
I have a table like this:
model | type | color| q
--1---| --1-- | blue | 1
--1---| --2-- | blue | 2
--1---| --1-- | red | 1
--1---| --2-- | red | 3
--2---| --1-- | blue | 4
--2---| --2-- | blue | 5
And I would like to create a single table (to export in csv or excel) for each model grouped by type with the value of the group as header and as value the q value:
table-1.csv
type | blue | red
--1--| -1-- | -1-
--2--| -2-- | -3-
table-2.csv
type | blue
--1--| -4-
--2--| -5-
I tried with row denormalizer but nothing.
Any suggestion?

Typically it's helpful to see what you have done in order to offer help, but I know how counterintuitive the "help" on this step is.
Make sure you sort the rows on Model and Type before sending them to the denormalizer step. Then give this a try:
As for splitting the output into files, there are a few ways to handle that. Take a look at the Switch/Case step using the Model field.
Also, if you haven't found them already, take a look at the sample files that come with the PDI download. They should be in ...pdi-ce-6.1.0.1-196\data-integration\samples. They can be more helpful than the online documentation sometimes.

Row denormalizer can't be used here if number of colors is unknown, also, you can't define text output fields dynamically.
There are few ways that I can see without using java and js steps. One of them is based on the following idea: we can prepare rows with two columns:
Row Model
type|blue|red 1
1|1|1 1
2|2|3 1
type|blue 2
1|4 2
2|5 2
Then we can prepare filename for each row using Model field and then easily output all rows using text output where file name is taken from filename field. In this case all records will be exported into two files without additional efforts.
Here you can find sample transformation: copy-paste me into new transformation
Please note that it's a sample solution that works only with csv. Also it works only if you have the same number of colors for each type inside model. It's just a hint how to use spoon, it's not a complete solution.

Generate automatically all the variables and value labels in SPSS

I have the variable labels and value labels in a table in my database, like this
id_variable_label | variable_label | id_value_label | value_label | id_father_label
---------------------------------------------------------------------------------------------------------
1 | father_label | null | null | null
null | father_label | 1 | child01 | 1
null | father_label | 2 | child02 | 1
Is there a way to generate automatically all the variables and value labels when I import the data from my database through a ODBC connection?

There isn't a direct way to do this, but if you read that table as an SPSS dataset, it would be pretty simple to generate the labels with a little Python code.
Note also that if your labeling is static, you can use APPLY DICTIONARY to copy labels from one dataset to another, so saving one fully labeled file would allow you to propagate that to others that are similarly structured.

You can use SPSS syntax to create variable and value labels.
See the SPSS commands VARIABLE LABELS and VALUE LABELS.
Here's a tutorial here that explains how you can use them.
You could generate the syntax from your database.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Loading files with dynamically generated columns - sql

Related

Karate - How to construct two tables, using lines from each to validate against the other [duplicate]

How can I Import SAP txt file into SQL Server Express 2017?

Returning a value based on multiple conditions in excel

Value to table header in Pentaho

Generate automatically all the variables and value labels in SPSS

Categories

Resources