SSIS Transposing Header Information With Excel Load - sql

I've tried searching but most of the excel loads dealing with header information ask about ignoring them instead of loading them.
I'm working on a project, and have 7 lines of header information I would like loaded into my database table, it looks like this:
Using OpenRowset I load only A3:D13, when I check the Columns section of the Excel Source I only see these columns:
I needed to add a column for load date so I've added a derived column into my data flow, after which it looks like this:
Finally I add the DB destination and map out the columns, but as you can see I only have 4 input columns to my 7 columns in my table:
and after trying to load the file into my table I get this mess:
I'm thinking it's got something to do with how the header is formatted, instead of the normal heading then data directly below it, it's set up as heading then data adjacent to the heading.
Any ideas on what I can do to load this information correctly? Thanks in advance.

Related

Create table schema and load data in bigquery table using source google drive

I am creating table using google drive as a source and google sheet as a format.
I have selected "Drive" as a value for create table from. For file Format, I selected Google Sheet.
Also I selected the Auto Detect Schema and input parameters.
Its creating the table but the first row of the sheet is also loaded as a data instead of table fields.
Kindly tell me what I need to do to get the first row of the sheet as a table column name not as a data.
It would have been helpful if you could include a screenshot of the top few rows of the file you're trying to upload at least to see the data types you have in there. BigQuery, at least as of when this response was composed, cannot differentiate between column names and data rows if both have similar datatypes while schema auto detection is used. For instance, if your data looks like this:
headerA, headerB
row1a, row1b
row2a, row2b
row3a, row3b
BigQuery would not be able to detect the column names (at least automatically using the UI options alone) since all the headers and row data are Strings. The "Header rows to skip" option would not help with this.
Schema auto detection should be able to detect and differentiate column names from data rows when you have different data types for different columns though.
You have an option to skip header row in Advanced options. Simply put 1 as the number of rows to skip (your first row is where your header is). It will skip the first row and use it as the values for your header.

How can i merge header cells in Excel writer in pentaho?

I am trying to merge header cells columns into one cell but when i do that my data also comes in one column. I want my resulting output as per this screenshot attached. Kindly help me for this.
Are your columns variable? Or you always have the same output schema?
If it's fixed then, I would use a template where the headers are fixed and I start populating from row 5.
Google Spreadsheet input
If you are using the Spreadsheet input that is not possible on the step.
What I usually do in that kind of situation is to create a row with my headers and hide it so the user don't get confused with two headers. Them the Step will get the result perfectly using the column names provided on first row. (you can use a formula like =b3 there so it changes with the real header. No problem.)
Excel input
If you are using the Excel input step you can set the sheet to be read from row 2, column 0 and should work fine. =)

Pentaho PDI how to validate source Excel metadata for the order and number of columns?

In my case, I need to process input data in Excel (xls and xlsx) format. I need to do a file level validation of the Excel file for the order and number of columns, before processing the row level data. If this file level validation is failed, then exclude this file and inform the concerned through mail.
Please guide me, with some sample or example, how to validate the excel files for metadata? I thought of placing a variable in kettle.properties with semicolon separated header fields and compare this with the source excel file. But not getting a way to extract only the header row from file as I want.
Please guide me.
Are column names on Row 1 of your file (or any other row reasonably close to row 1) and you know how many fields are in each, at most? If so, maybe you can get away with that.
Step 1: You need to understand how many rows may there be, what they may be called, what data types, etc.
Step 2: Read the first N rows of the file(s) ensuring the header row will be read; Filter everything that is not the header (how to? depends on the specific structure). Because you don't know what are the field names, just name them field0, ... field999 or whatever.
Step3: Work some magic on the headers; filtering based on position of certain fields, mapping field names to data types, etc.
Step4: Metadata injection. Using the information you already have from before, you create a template transformation that is generic in the sense that field names are not set up in the excel input step. The metadata injection allows you to set up that step in run time, depending on the entire logic you just applied on the headers.
This page has a couple example videos: http://wiki.pentaho.com/display/EAI/ETL+Metadata+Injection
I had to build something like that (only it was CSV files and not XLS) a while back and metadata injection allowed me to load every single file in one go with 100% mapping accuracy. Of course, the magic happens before, when you parse the header row.
Thanks nsousa for your answer.
I got to the required solution with the help of my colleague. Here what I did
(1) Read only the 1st row of the source Excel file as normal data (no header, limit 1) where the field names will be called as F1, F2 etc
(2) concat the fields (data) to get a pattern
(3) Match this pattern with acual metadata pattern, if they are matching, then excel file is passed
Good trick. Thanks.

Validate in excel before uploading to access

I am attempting to upload an Excel document into Access. I have used VBA to unhide all columns and rows and then delete columns and rows that are not being used. All of the worksheets upload into Access properly except one. This particular worksheet attempts to upload a field and label it Field 12. I am unable to find a way to delete this field. Any help?
It is probably the first column after your data...
Try either in VBA or in Excel deleting the columns to the right of your data (not just contents but an actual delete). I've found this typically happens when the columns to the right of your data contained data at one point and Access / Excel sees those as still containing data. Then try your import again.
Alternatively, you could upload into a new Access staging table before pulling your desired known columns into the final table through an INSERT query. Then you can delete the staging table if you like or delete it before the next import. In this way, each import can have its own "added columns".

Script Component Won't show all the input columns in the script editor

I have a script component that is behaving quite weird, the component's using a Flat File Source has it input column source and I need to output multiple columns to multiple Database tables, the problem is that after connecting the Flat file source and and configuring all the input columns and the multiple output's and output columns and I go into the script editor to configure which output columns gets which input, it won't give me all the Input columns to work, there is a total of 26 columns but it will only let me work with about 20 of the columns so there are 6 columns that are missing and won't let me use them, do you have any idea of what could be causing this???
This images are just to show a little bit about what I'm doing and the problems:
I think the Script Component doesn't like columnnames (that start) with a number. They wont show up in Row.[column] as property. Are those columns missing?