I have an SSIS package that is to ingest a number of Excel files with similar structures but irregular names and import them into a SQL table. Along with the data from the excel files, I have a number of variables that are set and different with each file (User::ExcelFileName, User::VarMonth, User::VarProgram, User::VarYear, etc). All of the table data from the Excel files are going to the same destination table, but for each row of data alongside the Excel dataset I want to insert a column for each variable to pass through as well into SQL. An example of my dataset is below:
Excel
ID
Name
Foo
Bar
111
Bob
88yu
117
112
Jim
JKL
A TU
113
George
FTD
19900
SSIS Variables (set during execution)
User::ExcelFileName = c:\temp\excelfile1.xlsx
User::VarMonth = Jan
User::VarProgram = Daily
User::VarYear = 2023
Desired SQL Destination:
ExcelFileName
VarMonth
VarProgram
VarYear
ID
Name
Foo
Bar
c:\temp\excelfile1.xlsx
Jan
Daily
2023
111
Bob
88yu
117
c:\temp\excelfile1.xlsx
Jan
Daily
2023
112
Jim
JKL
A TU
c:\temp\excelfile1.xlsx
Jan
Daily
2023
113
George
FTD
19900
I've tried a few configurations and I've referenced this post for piping in variable data into SQL, but I haven't gotten a working model yet.
Worth noting, Excel COnnection is dynamic and set to run within a Foreach Loop container to iterate through my Excel sources. Any advice or guidance would be appreciated!
It sounds like you want a Derived Column task.
in the task, just add the new columns you want, and map the variables to the column.
Related
I am using Camelot to extract borderless tables from a pdf file. I've used the below parameters
budget_tables = camelot.read_pdf(budget_file,pages='all',flavor='stream',edge_tol=80,strip_text='\n')
The issue is that for some tables(there are over 300 tables in this file) some of the values that are too large end up grouped together in the same cell. So that I have an output like the below where some rows each value is in a separate column and other values are separated by a space and placed in the same cell.
I was thinking I'd have to create a function that goes through the dataframe and check each cell for the delimiters (' '), splits it and fills the empty cells around it with the splits (which I think i still need help with as its not consistent whether cells are empty to the left or right). But if there is a method within the Camelot line that may help reduce these type of outputs then i think that's where I'd prefer to start.
Sorry for the bad table formatting below. Any tips on showing this table a bit better would be appreciated. I can't upload images from my workstation.
0 | 1 | 2 | 3 | 4 | 5 | 6
30 Sales of non-financial assets |173,853 |192,108 |176,957 |226,843 |188,370 |74,022
31 Payments for non-financial asset|-1,274,120 |-866,331 |-1,372,111 -1,100,557 -1,359,568 ...
32 Net cash flows from investments |-1,100,267 |-674,223 -1,195,154| |-873,714 -1,171,198 -1,229,102|
33 in non-financial assets
34 Cash flows from investments in
35 financial assets for policy
36 purposes
37 Receipts
38 Repayment of loans | 30,044 | 29,409 | 1,185 |3,235 |6,136 |9,036
I'd like to read a .xlsx using python pandas. The problem is the at the beginning of the excel file, it has some additional data like title or description of the table and tables contents starts. That introduce the unnamed columns because pandas DataReader takes it as the columns.
But tables contents starts after few lines later.
A B C
this is description
last updated: Mar 18th,2014
Table content
Country Year Product_output
Canada 2017 3002
Bulgaria 2016 2201
...
The table content starts in line 4. And columns must be "Country", "year", "proudct_output" instead "this is description", "unnamed", "unnamed".
when you use read_excel function set skiprows paramter to 3.
Try using index_col=[0] parameter
pd.read_excel('Excel_Sample.xlsx',sheet_name='Sheet1',index_col=[0])
New to SQL, my question is I had some trouble importing data which led to some discrepancy.
Column A Column B Column C Column D Column E Column F
WB-002 "Brown Sales" 14A 140000 12/5/2015 12/5/2016
WB-002 "Johnson Inc" 24B 150000 12/5/2015,2/5/2016
WB-005 "Sonoma Inc" 26C 300000 7/30/2015,7/30/2016
How would I be able to shift the data over one for the rows affected past column 1? Or would I have to replace each rows data with the next row over and over again? Final result wanted:
Column A Column B Column C Column D Column E Column F
WB-002 "Brown Sales" 14A 140000 12/5/2015 12/5/2016
WB-002 "Johnson Inc" 24B 150000 12/5/2015 2/5/2016
WB-005 "Sonoma Inc" 26C 300000 7/30/2015 7/30/2016
This is too long for a comment.
I don't think SQL Server understands the real CSV format (unless more recent versions have seen improvements in this regard). Alas. You should try re-importing the data (okay fingers, don't type Postgres which does understand CSV).
If the file is small enough, then load it into Excel and save it with tab delimiters -- or something that is not a comma. Then you can bring it into SQL Server correctly.
If it is larger, I'm not sure what to do (I guess when I've faced this problem, Excel has always come to the rescue). Depending on your skills, you could pre-process in a language such as Python, grep, or PowerShell. Or you could load each line into SQL Server as a string and then do all the parsing in SQL (not trivial either).
In the meantime, let Microsoft know that the most common export format from their Excel product should be able to be imported into their database product.
I am using MS Access 2007. This is a really simple problem, but I cannot work out how to do it.
I have the following table produced from a query:
1 2 3 4
1000 5500 9500 3000
I want to produce a line chart of the data.
The columns headings are respectively:
SumOfA1 SumOfA2 SumOfA3 SumOfA4
How do I do this?
Here's what Excel can do with it:
Excel build-in functions are, at most of the time, effective. However, there are some functions really like implemented half-way and some how dictated their usage. The SUBTOTAL function is one of them.
My data is presented in the following format:
Value Count
100 20
102 3
105 4
102 5
And I want to build a table in this format:
Value Count
100 20
101 0
102 8
103 0
104 0
105 4
I've read this in SO but my situation is a bit differ. Pivot table will be able to give you the subtotals of the values appears in the original data and I don't want to have a loop to insert missing values in the original data (if it is gonna to be a loop over the original data, the loop could use to build the table - which I would prefer to avoid at all)