Merging two different spreadsheets database data

Merging two different spreadsheets database data - sql

newbie here working on something bit complicated..not sure how to start and whats the best way..looking for some advice and tips
So, we have 2 system running using MS Dynamics POS 2009 and have extract of all data (inventory/stock) in spreadsheets. Both dbo have pretty much the same items but because they have been run separately all naming and Part Numbers are in different format.
I need to create one database (one excel file) from both. Where partial match on Part Number will be identified and "merged" (keeping Part Number and Description from sheet1 and updating Stock (sheet1 stock + stock from sheet2)
Problem is that Part Numbers are written in completely different styles (by different people) and can by match only by some partial match (i guess last 3-6 characters in Part numbers)
I am not excel expert so any advice and tips would be appreciated.
Also have thoughts of loading those excel sheets into 2 separate SQL databases and doing it from SSMS as not sure if excel can cope with this
Thanks

I'm not 100% sure of the source data, but based on the available information here are some possible steps:
-Create a new Database in SSMS
-Load the data from your excel extracts in with the import data tool (Right on your newly-created database, tasks, Import data). This
will pull up a wizard that will transform your Excel spreadsheet to a table in SQL server. Do this for all spreadsheets
http://searchsqlserver.techtarget.com/feature/The-SQL-Server-Import-and-Export-Wizard-how-to-guide
-You may be able to do some matching based on the start/end of characters and use a MERGE statement to get unique data. The merge statement
allows you to set a match criteria, and then take certain actions depending on a positive or negative match. For example, if your different POS
systems have two different spreadsheets of products where there is some overlap, but also some products that are unique to each system, you could start with a source table from the first system and only insert products into it that are unique to the other system, if there is a match do nothing. Something like
MERGE ProductA A
USING ProductB B
ON RIGHT(A.ProductID, 5) = RIGHT(B.ProductID,5)
WHEN NOT MATCHED BY TARGET THEN
INSERT (ProductID, Descrption)
VALUES (b.ProductID, b.Description)
https://www.simple-talk.com/sql/learn-sql-server/the-merge-statement-in-sql-server-2008/

Related

SSIS Check Excel source rows redirect rows to another table on 'x' number of field matches

I work in a sales based environment and our data consists of 'leads'.
Let's say we record CompanyName, PhoneNumber, Address1 & PostCode(ZIP). These rows a seeded with a unique ID in the schema.
The leads come in from various sources and are compiled onto a spread sheet and then imported into SQL 2012 using SSIS.
After a validation check to see if a file exists we then use a simple data flow which consists of an Excel source, Derived Column, Data Conversion and finally an OLE DB Destination.
My requirement I'm sure has a relatively simple solution. I understand what I need to achieve is the first step. I need to take a sample of data from the last rolling two months, if 2 or more fields in the source excel file match the corresponding field in the destination sql table then I want to redirect to another table.
I am unsure of which combination of components I could use to achieve this. I believe that Fuzzy lookup may not be what I am looking for as I am looking to find exact field matches, I have looked at the lookup component but I am unsure if this is the way to go.
Could anyone please provide some advice on how I can best achieve this as simply as possible.

You can use the Lookup to check for matches in your existing table. However, it will be fairly complicated to implement the requirement of checking for any two or more fields matching. Your expression would be long and complex basically consisting of:
(using pseudo code for readability)
IIF((a=a AND b=b) OR (a=a AND c=c) OR (b=b AND c=c) OR ...and so on
for every combination of two columns you want to test
I would do this by importing the entire spreadsheet to a staging table, and doing the existing rows check in a SQL stored proc that moves the data to the desired destination table.

Dynamic Length Excel Tables with Formatting Driven from Dynamic Source Tables

My current setup:
Users need to create reports in Excel which can contain various components like tables, charts etc. These report components are driven off large source tables which are provided to the workbook via a web service. Each source table lives on its own worksheet and the report components live on a separate sheet called "front_sheet".
A greatly simplified example is as follows:
On "input_sheet_1" there is a table which looks like so
The user would then like to create two tables for the report (on "front_sheet") which reference the table on input_sheet_1, that look like this:
These "output" tables contain columns which aren't on the source table (Total Spend) but they may contain more columns such as "Price in euros" where the "price" column is multiplied by some constant.
The table rows are also colour coordinated by their Category. Also there is a "Total" at the bottom of the output tables.
This is easy to do when the input table is static. However I do not know how to deal with this when the input table has a variable number of rows i.e. each time the workbook is refreshed the basket will have different numbers of different items.
Does anyone know how I can achieve this? A requirement is that the user setting up the report does not have to write any VBA at all.
Thanks for taking the time to read this.

Automate Same Query (and Export) for Multiple Values

I am working with a database with 5 tables, all of which contain different sets of information about clients and their employees. If you drill down in any table for data relating to a particular client, many rows will return according to the number of their employees in the dataset.
If I were to manually run the reports I need, I would query each table, one at a time, for all results where a particular client number is specified. Then I would export each table to a .csv, and then copy those exports into the same excel workbook with 5 tabs (corresponding to the 5 tables in the SQL database). At the end I would have an individual workbook for each client.
A complicating factor is that not every client ID appears in each of the 5 tables. Preferably, I would not export empty datasets and clients with data in only three of the tables would have only three tabs in the final workbook.
Is there way of giving to SQL server a list of Client IDs, for which it should query the 5 tables, export the existing data, and (possibly / hopefully) combine in a workbook on separate tabs.

Your question is rather vague and broad, but here's the key bits of information you'll need to investigate to get things going:
Create five different datasets, each querying one table.
Create five tablixes, add a PageBreak before each tablix so in Excel they will land on different tabs.
Either set a NoRowsMessage or hide the tablix along these lines using an expression based on the RowNumber function
Create a parameter for selecting the client ID, and use that in your WHERE clause of the datasets.
The tricky bit would be how to generate multiple Excel files. SSRS does one export at a time, so your basic options:
Put multiple clients in one XLS (i.e. don't use a parameter, but include clientId as a column on the worksheets)
Have the user select one client at a time, and export one XLS at a time.
Automate generating the reports.

Splitting data by column value into an indefinite number of tables using an ETL tool

I'm trying to split a table into multiple tables based on the value of a given column using Talend Open Studio. Let's say this column can contain any of the integer values of 1, 2, 3, etc. then according to this value, these rows should go to table_1, table_2, table_3 etc.
It would be best if I could solve this when the number of different values in that column is not known in advance, but for now we can assume that all these output tables exists already. The bottom line is that the number of different values and therefore the number of different tables are high enough that setting up the individual filters manually is not an option.
Is this possible to solve this using Talend Open Studio or any similiary open source ETL tools like Pentaho Keetle?
Of course, I could just write a simple script myself, but I would prefer to use a proper ETL tool since the complete ETL process is quite complex.

In PDI or Pentaho Kettle you could do this with partitioning. (A right click option on the step IIRC) Partitioning in PDI is designed for exactly this sort of problem.

Yes that's Possible to do and split the data on the basis of single column to different table, but for that you need to create table dynamically :-
tFileInputDelimited->tFlowtoIterate ->tFixedFlowInput->and the can use
globalMap() to get the column values and use the same to seperate the
data to different tables. -> And the can use globalMap(Columnused to
seperate data) in table name.

The first solution that came to my mind was using the replicator to transport the current row to three filters which act as guard and only let rows through with either 1 2 or 3 in the given column. pic: http://i.imgur.com/FmvwU.png
But you could also build the table name dynamically, if that is what you want, pic: http://i.imgur.com/8LR7Q.png

SSIS - Column names as Variable/Changed Programmatically

I'm hoping someone might be able to help me out with this one - I have 24 files in CSV format, they all have the same layout and need to be joined onto some pre-existing data. Each file has a single column that needs to be joined onto the rest of the data, but those columns all have the same names in the original files. I need the columns automatically renamed to the filename as part of the join.
The final name of the column needs to be: Filename - data from another column.
My current approach is to use a foreach container and use the variable generated by the container to name the column, but there's nowhere I can input that value in the join, and even if I did, it'd mess up the output mappings, because the column names would be different.
Does anyone have any thoughts about how to get around these issues? Whoever has an idea will be saving my neck!
EDIT In case some more detail helps with this... SSIS version is 2008 and there are only a few hundred rows per file. It's basically a one time task to collect a full billing history from several bills which are issued monthly.
The source data has three columns, the product number, the product type and the cost.
The destination needs to have 24*3 columns, each of which has a monthly cost for a given product category. There are three product categories, and 24 bills (in seperate files) hence 24*3.
So hopefully I'm being a bit clearer - all I really need to know how to do, is to change the name of a column using a variable passed in from the foreach file container.

I think the easiest is to create a tmp database (aka staging db)
to load data from xls file to it and to define stored procedures where you can pass paramas (ex file names etc) and to build your won logic ...
Cheers Mario

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas