How to handle SSIS solution with an excel source having multiple headers - sql

I am working on a package that requires I import an excel sheet with multiple headers. I am suppose to unpivot the top column (top header) and make that a row and also unpivot some columns in the actual bottom columns.
I have created the package using just the actual bottom columns. I then hard coded the top column into the table using derived column.
Here's an example of what I am looking for
Sample Raw Data and Target Result

You are trying to un-pivot or reverse-pivot the input. In SSIS you can use the unpivot transformation,
Shared a good walkthrough below, hope this helps.
https://www.sqlshack.com/an-overview-of-ssis-pivot-and-ssis-unpivot-transformation/
KR,
Alex

Related

Order of the columns in Apache Zeppelin when selecting the data from the temprorary table is wrong, how to put specific column first?

Currently we have the scala DataFrame output with id value shown first (but it is chronologically added to the DataFrame last). Other columns appears dynamically based on .pivot() function and the data.
When I call for the data in %sql interpreter, the order is changing, thus making CSV file that I download also have id column as the last one, that doesn't work for me. I can't just write the selection script at once with putting the id column at the first point manually, as I can't control other columns because of pivot. Is there any other way to make specific column go first?
The Scala paragraph is:
resultMean.registerTempTable("mean")
The sql paragraph is:
%sql
select *
from mean
For someone who will read this in future, the reason of such a behavior is in misusing the DataFrame. In Scala .show() was applied to one DataFrame, while the export to the temp table to another one. If you face the same, please double check you apply your methods to the same objects.

(Excel-VBA) Specific data import (on the background) in the active sheet

Would you please help me (total beginner) to prepare a VBA macro that would open a sheet on the background and import specific selection as shown below:
Let's say we have downloaded wordcount analysis (xlsx) like this downloaded from a CAT tool for testing.
Now I would need to add a macro to my main sheet that would read lines starting (Column A) with "All". If "All" then I'd need to record columns of that line (specficilly Columns A - O) in array / hashtable?.
Please take a look at this image that summs it all (better than explaining it for me :-)
Let me know in case you need to know more details.
All tips / suggestions are greatly appreciated.
Many thanks!
My suggestion (I'm a beginner too) would be to use the Macro Recorder. Great tool to learn (example).
start recording
filter for 'ALL'
copy/past the Cells
stop Recording
Then have a look at the recorded code and adjust it :)
Looking at your data and the final layout you are looking for, using a Pivot Table would provide you with all of the flexibility you need.
You can:
filter which data to display
generate calculated values based on data in other columns
choose what order your columns are displayed
dynamically change the layout if you decide you want a different view
From your data, I was able to generate the following Pivot Table in about 15 minutes.
There are several good, simple tutorials on building Pivot Tables. A Google search will turn up plenty.
Things you will need to learn about for your particular problem:
Classic display (I used the classic display to get this particular layout)
Calculated Fields (many of the columns in the pivot table are calculated based on your spec). There is a maximum string length of 255 characters for a field calculation, so you may need to rename some of the columns in the original data set.
Of course, basics of Pivot Tables
Loading new data and updating your pivot table
Good Luck!

Query several ranges and add automatically a column to know the source of each row

I am trying to achieve the following in Google Spreadsheets.
First, I want to query several ranges (in different sheets from the same spreadsheet). I tried a formula like this =query(arrayformula({indirect(E2:E10)}),"select * where Col1 <>''") with no success
In E2:E10 I have a list of ranges. Column F contains a name that describes the source of the value in Column E.
My second problem is that I need to add a column to the output of that query that tells me the origin of each row.
If the sources are ranges of 3 columns by country I need to merge those tables and add that country to each row.
All credits to +Ben Liebrand who helped me out here: https://support.google.com/docs/profile/3464
"I just want to start of by saying that the indirect() function does not work in an arrayformula() function as expected. So you will need to take another approach. I can understand what you are trying to do so I added another TAB in your spreadsheet to demonstrate another approach. I know it was initially a specific design you were trying so I made some changes to what you had. Maybe you can take a look at what I have offered and maybe you can tweak your design.
I know what I am offering is just very rough but you will also notice that I removed the end row specifier from your ranges in the range table.
Don't assume my example to be the final result but I was just trying to show that the range you were trying to use with the indirect() function will not work.
So hopefully this will give you a new idea of how you can maybe handle this.
My formula also adds the country to each of the tables in the output. My formula looks like this
=query(ArrayFormula({
if(len(indirect(regexextract(F2,"\w+\!\w+")&":A")),G2,),indirect(F2);
if(len(indirect(regexextract(F3,"\w+\!\w+")&":A")),G3,),indirect(F3);
if(len(indirect(regexextract(F4,"\w+\!\w+")&":A")),G4,),indirect(F4);
if(len(indirect(regexextract(F5,"\w+\!\w+")&":A")),G5,),indirect(F5);
if(len(indirect(regexextract(F6,"\w+\!\w+")&":A")),G6,),indirect(F6);
if(len(indirect(regexextract(F7,"\w+\!\w+")&":A")),G7,),indirect(F7)
})," select * where Col1 <> '' ")
Hope this is of some help to you"
And I hope is useful to the community
GerĂ³nimo

SSIS Check Excel source rows redirect rows to another table on 'x' number of field matches

I work in a sales based environment and our data consists of 'leads'.
Let's say we record CompanyName, PhoneNumber, Address1 & PostCode(ZIP). These rows a seeded with a unique ID in the schema.
The leads come in from various sources and are compiled onto a spread sheet and then imported into SQL 2012 using SSIS.
After a validation check to see if a file exists we then use a simple data flow which consists of an Excel source, Derived Column, Data Conversion and finally an OLE DB Destination.
My requirement I'm sure has a relatively simple solution. I understand what I need to achieve is the first step. I need to take a sample of data from the last rolling two months, if 2 or more fields in the source excel file match the corresponding field in the destination sql table then I want to redirect to another table.
I am unsure of which combination of components I could use to achieve this. I believe that Fuzzy lookup may not be what I am looking for as I am looking to find exact field matches, I have looked at the lookup component but I am unsure if this is the way to go.
Could anyone please provide some advice on how I can best achieve this as simply as possible.
You can use the Lookup to check for matches in your existing table. However, it will be fairly complicated to implement the requirement of checking for any two or more fields matching. Your expression would be long and complex basically consisting of:
(using pseudo code for readability)
IIF((a=a AND b=b) OR (a=a AND c=c) OR (b=b AND c=c) OR ...and so on
for every combination of two columns you want to test
I would do this by importing the entire spreadsheet to a staging table, and doing the existing rows check in a SQL stored proc that moves the data to the desired destination table.

How to rearrange data from excel?

I have data in a spreadsheet which I have to upload in sql. The problem is that this data is quite crude. I need to rearrange the sheets in the excel file in terms of their relation with eachother. The first sheet has master data a colum of this sheet is to be linked to data in the other sheet. All I have is a sheet in which data is embedded. The relation between data is displayed using an expander button. Please tell me how I can rearrange this data fast? I think this can be done by running sql queries or ssis package but I'm not sure.
You can use SSIS package with multiple Source components (inside Data Flow) to access each sheet.
Once you can see both sources, you can sort all key columns using a sort component and then use a Merge Join to combine the two sets of data together