Data Shifted Over for Specific Columns in SQL Management Studio - sql

New to SQL, my question is I had some trouble importing data which led to some discrepancy.
Column A Column B Column C Column D Column E Column F
WB-002 "Brown Sales" 14A 140000 12/5/2015 12/5/2016
WB-002 "Johnson Inc" 24B 150000 12/5/2015,2/5/2016
WB-005 "Sonoma Inc" 26C 300000 7/30/2015,7/30/2016
How would I be able to shift the data over one for the rows affected past column 1? Or would I have to replace each rows data with the next row over and over again? Final result wanted:
Column A Column B Column C Column D Column E Column F
WB-002 "Brown Sales" 14A 140000 12/5/2015 12/5/2016
WB-002 "Johnson Inc" 24B 150000 12/5/2015 2/5/2016
WB-005 "Sonoma Inc" 26C 300000 7/30/2015 7/30/2016

This is too long for a comment.
I don't think SQL Server understands the real CSV format (unless more recent versions have seen improvements in this regard). Alas. You should try re-importing the data (okay fingers, don't type Postgres which does understand CSV).
If the file is small enough, then load it into Excel and save it with tab delimiters -- or something that is not a comma. Then you can bring it into SQL Server correctly.
If it is larger, I'm not sure what to do (I guess when I've faced this problem, Excel has always come to the rescue). Depending on your skills, you could pre-process in a language such as Python, grep, or PowerShell. Or you could load each line into SQL Server as a string and then do all the parsing in SQL (not trivial either).
In the meantime, let Microsoft know that the most common export format from their Excel product should be able to be imported into their database product.

Related

Import/Insert Excel Range and SSIS variables into SQL table?

I have an SSIS package that is to ingest a number of Excel files with similar structures but irregular names and import them into a SQL table. Along with the data from the excel files, I have a number of variables that are set and different with each file (User::ExcelFileName, User::VarMonth, User::VarProgram, User::VarYear, etc). All of the table data from the Excel files are going to the same destination table, but for each row of data alongside the Excel dataset I want to insert a column for each variable to pass through as well into SQL. An example of my dataset is below:
Excel
ID
Name
Foo
Bar
111
Bob
88yu
117
112
Jim
JKL
A TU
113
George
FTD
19900
SSIS Variables (set during execution)
User::ExcelFileName = c:\temp\excelfile1.xlsx
User::VarMonth = Jan
User::VarProgram = Daily
User::VarYear = 2023
Desired SQL Destination:
ExcelFileName
VarMonth
VarProgram
VarYear
ID
Name
Foo
Bar
c:\temp\excelfile1.xlsx
Jan
Daily
2023
111
Bob
88yu
117
c:\temp\excelfile1.xlsx
Jan
Daily
2023
112
Jim
JKL
A TU
c:\temp\excelfile1.xlsx
Jan
Daily
2023
113
George
FTD
19900
I've tried a few configurations and I've referenced this post for piping in variable data into SQL, but I haven't gotten a working model yet.
Worth noting, Excel COnnection is dynamic and set to run within a Foreach Loop container to iterate through my Excel sources. Any advice or guidance would be appreciated!
It sounds like you want a Derived Column task.
in the task, just add the new columns you want, and map the variables to the column.

Unable to create new features in Machine learning

I have a dataset. I am using pandas dataframe and named it df.
The dataset has 50,000 rows - here are the first 5:.
Name_Restaurant cuisines_available Average cost
Food Heart Japnese, chinese 60$
Spice n Hungary Indian, American, mexican 42$
kfc, Lukestreet Thai, Japnese 29$
Brown bread shop American 11$
kfc, Hypert mall Thai, Japnese 40$
I want to create column which contains the no. of cuisines available
I am trying code
df['no._of_cuisines_available']=df['cuisines_available'].str.len()
Then instead of showing the no. of cuisines, it is showing the sum of charecters.
For example - for first row the o/p should be 2 , but its showing 17.
I need a new column that contain number of stores for each restaurant. example -
here kfc has 2 stores kfc, lukestreet and kfc, hypert mall. I have completely
no idea how to code this.
i)
df['cuisines_available'].str.split(',').apply(len)
ii)
df['Name_Restaurant'].str.split(',', expand=True).melt().['value'].str.strip().value_counts()
What ii) does: split columns at ',' and store all strings thus generated in an individual column. Then use melt to make one big column, strip away spaces etc. and count individual entries.

Include missing dates with missing values with libreoffice-calc

I searched a lot, but didn't find an answer to the following question:
Financial data often come as daily data but with missing dates (weekends, banking holidays ...). I would like to have those data really on a daily basis with missing values, where originally the dates were missing.
So far I did this in liberoffice-calc half-manually, which takes a lot of time. I didn't find ways to really automate this, as there is no fixed rule, which dates are missing.
Example:
I have:
21/12/18 1
27/12/18 2
28/12/18 3
02/01/19 4
I want:
21/12/18 1
22/12/18
23/12/18
24/12/18
25/12/18
26/12/18
27/12/18 2
28/12/18 3
29/12/18
30/12/18
31/12/18
01/01/19
02/01/19 4
I'm not familiar with liberoffice-calc. In Excel or Google Sheets, I would use a lookup table.
In one tab of the spreadsheet, I was enter the dates I want in column A. In another tab, I would place the actual data I have. Then, in column B of the first tab, I would lookup the value for that day from the data on the second tab.
Assuming 1 is in B2, put a start date in say D2 and in E2:
=IFERROR(INDEX(B:B,MATCH(D2,A:A,0)),"")
then copy both down to suit.

Parse data from Morningstar Direct to worksheet

I have to put together a report every quarter using data pulled off of Morningstar Direct. I have to automate the whole process, or at least parts of it. We have put this report together for the last two quarters, and we use the same format each time. So, we already have the general templates for the report - now I'm just looking for a way to pull the data from Morningstar and putting into the templates correctly.
Does anyone have any general idea where I should start?
A B C D E F
Group Name Weight Gross Net Contribution
Equity 25% 10% 8% .25
IBM 5% 15% 12%
AAPL 7% 23% 18%
Fixed Income 25% 5% 4% .17
10 Yr Bond 10% 7% 5%
Emerging Mrkts
And it goes on breaking things into more groups, and there are many more holdings within each group.
What I want it to do is search until it finds "Equity", for example, and then go over one row, grab the name of the position, its weight, and its net return, and do that for each holding in Equity. The for it to do the same thing in Fixed Income, and on and on - selecting the names, weights, and nets for each holding. Then copy and pasting them into another workbook.
Anyway that is possible?
It sounds like you need to parse your information. By using left(), right(), and mid() you can select the good data and ignore the superfluous. You could separate the data in one cell into multiple cells in the desired format.
A B
Name Address
John Q. Public 123 My Street, City, State, Zip
E (First Name) F (Middle Initial) (extra work to program missing data)
=LEFT(A2,FIND(" ",A2)) =MID(A2,LEN(E2)+1,FIND(" ",MID(A2,LEN(E2)-1,99)))
G (Last Name) H (City)
=MID(A2,(LEN(E2)+LEN(F2)+2),99) =MID(B2,LEN(H2)+2,FIND(",",MID(B2,LEN(H2)+2,99))-1)
I (State)
=MID(B2,(LEN(I2)+LEN(H2)+4),FIND(",",MID(B2,(LEN(I2)+LEN(H2)+4),99))-1)
J (Zip Code)
=MID(B2,(LEN(H2)+LEN(I2)+LEN(J2)+6),99)
This code will parse the name in the cell A2 and address in cell B2 into separate fields.
Similar cuts should allow you to get rid of the unwanted data.
==================================================================
7/8/2015
Your data seems to be your desired output. If so, please provide sanitized input data for comparison. You probably need to loop through your input to find the groups. When the group changes, prepare the summary figures.

Repetition while copying data to SQL table from multiple sheets

I have to copy data from multiple excel sheets to the single SQL table.
Excel inputs:
Sheet1's columns: fname a, b. lname c, d. (2 rows)
Sheet2's columns: city boston, austin, state ma, tx. (2 rows)
My output (tMSSqlOutpout) has 4 rows instead of 2.
a c boston ma, a c austin tx, b d boston ma, b d austin tx.
Desired output: a c boston ma, b d austin tx. (2 rows only)
How do I manage this?
As per the comments, you don't have a natural key to join the two data sets. Instead you could generate a sequence for each data set that would increment equally for both data sets and would equate to being your row number on each data set.
First of all, this should set alarm bells ringing about the state of your data and how you can be sure that row n in one data set definitely corresponds to row n in another data set. It smacks of something being badly normalised out without proper keys being added and it can be very dangerous to assume that the resulting data set from this is going to be accurate.
If you absolutely must do this, however, then you should assign a Numeric.sequence to each of your data sets. You can do this in a tMap that precedes your joining tMap:
Notice the "s1" parameter to the Numeric.sequence. If you reuse this elsewhere then it will increment this one rather than starting from 1 so typically you would want to choose a unique name for each sequence you have in your job (although there are obviously occasions where incrementing a previously defined sequence is what you desire).
Once you have defined a unique sequence with the same starting numbers (the second parameter) and the same increment numbers (the third parameter) then you should be able to create a join on these instances: