Repetition while copying data to SQL table from multiple sheets - sql

I have to copy data from multiple excel sheets to the single SQL table.
Excel inputs:
Sheet1's columns: fname a, b. lname c, d. (2 rows)
Sheet2's columns: city boston, austin, state ma, tx. (2 rows)
My output (tMSSqlOutpout) has 4 rows instead of 2.
a c boston ma, a c austin tx, b d boston ma, b d austin tx.
Desired output: a c boston ma, b d austin tx. (2 rows only)
How do I manage this?

As per the comments, you don't have a natural key to join the two data sets. Instead you could generate a sequence for each data set that would increment equally for both data sets and would equate to being your row number on each data set.
First of all, this should set alarm bells ringing about the state of your data and how you can be sure that row n in one data set definitely corresponds to row n in another data set. It smacks of something being badly normalised out without proper keys being added and it can be very dangerous to assume that the resulting data set from this is going to be accurate.
If you absolutely must do this, however, then you should assign a Numeric.sequence to each of your data sets. You can do this in a tMap that precedes your joining tMap:
Notice the "s1" parameter to the Numeric.sequence. If you reuse this elsewhere then it will increment this one rather than starting from 1 so typically you would want to choose a unique name for each sequence you have in your job (although there are obviously occasions where incrementing a previously defined sequence is what you desire).
Once you have defined a unique sequence with the same starting numbers (the second parameter) and the same increment numbers (the third parameter) then you should be able to create a join on these instances:

Related

How do I tally how many times a word appears on a certain row?

I have four sets of data representing a softball schedule. Looks like this:
Day Team 1 Team 2
M A Team B Team
T C Team D Team
....
but four times over. I want to be able to change the schedule and have it automatically tally how many times a team plays on a given day. Ideas?
You would us something like this:
=COUNTIF(2:2,"A Team")
Edit:
You can use a SUMPRODUCT() Function with math operands * and +:
=SUMPRODUCT(($A$2:$A$43=H$1)*(($B$2:$B$43=$G2)+($C$2:$C$43=$G2)))
So how it works:
Since TRUE/FALSE is a Boolean and it can be reduced to 1/0 respectively. Using the * and + operands is like AND and OR respectively.
The SUMPRODUCT iterates through the range and test each criterion inside the () So it first test whether the cell in column A is equal to H1, if so it returns a 1, or a 0 if not. the next part sets up the OR if in the same row the team name is found it also returns a 1. 1 * 1 = 1. SUMPRODUCT keeps track of all the 1 and 0 and adds them together, so you get the count.
If there are other columns that have the team names just add those columns with the + area.
Ok, so let's start with making your table a real table via "Start > format as table" and call your table "data". Then you have three columns called data[Day], data[Team 1] and data[Team 2]. For instance this:
Day Team 1 Team 2
Monday A Team B team
Tuesday C Team D Team
Wednesday C Team A Team
Monday B Team C Team
Now comes the ugly part. You need a matrix of 7*10 (days * teams)
(Cell E1) Team 1 Team 2 Team 3 Team 4 ...
Monday *1
Tuesday
Wednesday
...
Formula *1
=SUMPRODUCT((data[Day]=$E2)*((data[Team 1]=F$1)+(data[Team 2]=F$1)))
Now drag down that formula till Sunday and then copy it to the other teams (when I tried dragging it to the other teams, Excel messed up the column names!).
This will automatically fill the matrix and tell you which team plays how often on a specific day.
What does it do? Basically SUMPRODUCT can not only build products, but it can also evaluate boolean conditions. So if on Monday, Team A plays, then the first column would return (for Team A / Monday):
1*(1+0)
SUMPRODUCT does that for each line in the matrix and then sums up the result.

recoding multiple variables in the same way

I am looking for the shortest way to recode many variables in the same way.
For example I have data frame where columns a,b,c are names of items of survey and rows are observations.
d <- data.frame(a=c(1,2,3), b=c(1,3,2), c=c(1,2,1))
I want to change values of all observations for selected columns. For instance value 1 of column "a" and "c" should be replaced to string "low" and values 2,3 of these columns should be replaced to "high".
I do it often with many columns so I am looking for function which can do it in very simple way, like this:
recode2(data=d, columns=a,d, "1=low, 2,3=high").
Almost ok is function recode from package cars, but if I have 10 columns to recode I have to rewrite it 10 times and it is not as effective as I want.

Parse data from Morningstar Direct to worksheet

I have to put together a report every quarter using data pulled off of Morningstar Direct. I have to automate the whole process, or at least parts of it. We have put this report together for the last two quarters, and we use the same format each time. So, we already have the general templates for the report - now I'm just looking for a way to pull the data from Morningstar and putting into the templates correctly.
Does anyone have any general idea where I should start?
A B C D E F
Group Name Weight Gross Net Contribution
Equity 25% 10% 8% .25
IBM 5% 15% 12%
AAPL 7% 23% 18%
Fixed Income 25% 5% 4% .17
10 Yr Bond 10% 7% 5%
Emerging Mrkts
And it goes on breaking things into more groups, and there are many more holdings within each group.
What I want it to do is search until it finds "Equity", for example, and then go over one row, grab the name of the position, its weight, and its net return, and do that for each holding in Equity. The for it to do the same thing in Fixed Income, and on and on - selecting the names, weights, and nets for each holding. Then copy and pasting them into another workbook.
Anyway that is possible?
It sounds like you need to parse your information. By using left(), right(), and mid() you can select the good data and ignore the superfluous. You could separate the data in one cell into multiple cells in the desired format.
A B
Name Address
John Q. Public 123 My Street, City, State, Zip
E (First Name) F (Middle Initial) (extra work to program missing data)
=LEFT(A2,FIND(" ",A2)) =MID(A2,LEN(E2)+1,FIND(" ",MID(A2,LEN(E2)-1,99)))
G (Last Name) H (City)
=MID(A2,(LEN(E2)+LEN(F2)+2),99) =MID(B2,LEN(H2)+2,FIND(",",MID(B2,LEN(H2)+2,99))-1)
I (State)
=MID(B2,(LEN(I2)+LEN(H2)+4),FIND(",",MID(B2,(LEN(I2)+LEN(H2)+4),99))-1)
J (Zip Code)
=MID(B2,(LEN(H2)+LEN(I2)+LEN(J2)+6),99)
This code will parse the name in the cell A2 and address in cell B2 into separate fields.
Similar cuts should allow you to get rid of the unwanted data.
==================================================================
7/8/2015
Your data seems to be your desired output. If so, please provide sanitized input data for comparison. You probably need to loop through your input to find the groups. When the group changes, prepare the summary figures.

Mark accumulated values on a QlikView column if condition is fulfilled

I have a table in Qlikview with 2 columns:
A B
a 10
b 45
c 30
d 15
Based on this table, I have a formula with full acumulation defined as:
SUM(a)/SUM(TOTAL a)
As a result,
A B D
b 45 45/100=0.45
c 30 75/100=0.75
d 15 90/100=0.90
a 10 100/100=1
My question is. how do I mark in colour the values in column A that have on column D <=0.8)?
The challenge is that D is defined with full accumulation, but if I reference D in a formula, it doesn't consider the full accumulation!
I tried with defining a formula E=if(D>0.8,'Y','N') but this formula doesn't take the visible (accumulated) value for D unfortunately, instead it takes the D with no accumulation. If this worked, I would have tried to hide (not disable) E and reference it from the dimensions column of the table , Text colour option. Any ideas please?? Thanks
You can't get an expression column's value from within a dimension or it's properties, because the expression columns rely on the dimensions provided. It would create an endless loop. Your options are:
Apply your background colour to the expression columns, not the dimensions. This would actually make more sense as the accumulated values would have the colour, not the dimension.
When loading this specific table, have QlikView create a new column that contains the accumulated values of B. This would mean, however, that the order of your chart-table would need to be fixed for the accumulations to make any sense.
Use aggregation to create a temporary table and accumulate the values using RangeSum(). Note this will only accumulate properly if the table is ordered in Ascending order of Column A
=IF(Aggr(RangeSum(Above(Sum(B),0,10)),A)/100>0.8,
rgb(0,0,0),
rgb(255,0,0)
)

Excel: one column has duplicates of each value, I need to take averages of the corresponding two values from the other columns

Example:
column A column B
A 1
A 2
B 2
B 2
C 1
C 1
I would somehow like to get the following result:
column A column B
A 1.5
B 2
C 1
(which are averages of 1 and 2, 2 and 2 and 1 and 1)
How do I achieve that?
Thanks
If you're using Excel 2007 or above, you can also use the shorter AVERAGEIF function:
=AVERAGEIF($A$1:$A:$6,D1,$B$1:$B$6)
Less typing, easier to read..
In D1:D3, type A, B, C. Then in E1, put this formula
=SUMIF($A$1:$A$6,D1,$B$1:$B$6)/COUNTIF($A$1:$A$6,D1)
and fill down to E3. If you want to replace the existing data, copy E1:E3 and paste-special-values over itself. Then delete A:C.
Alternatively, you can add headers to your data, say "Letter" and "Number". Then create a Pivot Table from your data. Put Letter in the rows section and Number in the Data section. Change your Data section from SUM to AVERAGE and you'll get the same result.