computation and printing in a local Minitab macro is not working right. How to fix? - minitab

I am running a local macro in minitab 19.
The output is not correct for k7 (-.276 x 10)
How to fix? Output and input are below.
Also, it seems that printing constants in Minitab
requires constants to be in numerical order?
Is that correct?
Thanks.
Mary A. Marion
Output: Data Display
c4 0.972700
sigma 10.0000
B6 1.66900
B5 0.276000
UCL 16.6900
CL 9.72700
LCL -16.6900
MACRO
SQCs k1 k2 k3 k4
# Problem 6.17 %SQCs .9727 10 1.669 .276.
mconstant k1 k2 k3 k4 k5 k6 k7
name k1 'c4'
name k2 'sigma'
name k3 'B6'
name k4 'B5'
name k5 'CL'
name k6 'UCL'
name k7 'LCL'
let k6 = k3*k2
let k5 = k1*k2
let k7 = k4*k2
PRINT k4 k2 k7
PRINT k1 k2 k3 k4 k6 k5 k7
endmacro

Related

how to filter by day with pandas

I am reading a series of data from file via pd.read_csv().
Then somehow I create a dataframe like the following:
col1 col2
01/01/2001 a1 a2
02/01/2001 b1 b2
03/01/2001 c1 c2
04/01/2001 d1 d2
01/01/2002 e1 e2
02/01/2002 f1 d2
03/01/2002 g1 g2
04/01/2002 h1 h2
What I would like to do is to groub by the same day, and assign to it a value, I mean:
col1
01/01 ax
02/01 bx
03/01 cx
04/01 dx
Does anyone have any clues how to perform this smoothly?
Thanks a lot in advance.
LS
First thing I'd do is make sure your index are dates. If you know they are, then skip this.
df.index = pd.to_datetime(df.index)
Then you groupby with something like [df.index.month, df.index.day] or df.index.strftime('%m-%d'). However, you have to choose to aggregate or transform. You didn't specify what you wanted to do, so I chose the first function to aggregate.
df.groupby(df.index.strftime('%m-%d')).first()
col1 col2
01-01 a1 a2
02-01 b1 b2
03-01 c1 c2
04-01 d1 d2

Compare Pandas dataframes and add column

I have two dataframe as below
df1 df2
A A C
A1 A1 C1
A2 A2 C2
A3 A3 C3
A1 A4 C4
A2
A3
A4
The values of column 'A' are defined in df2 in column 'C'.
I want to add a new column to df1 with column B with its value from df2 column 'C'
The final df1 should look like this
df1
A B
A1 C1
A2 C2
A3 C3
A1 C1
A2 C2
A3 C3
A4 C4
I can loop over df2 and add the value to df1 but its time consuming as the data is huge.
for index, row in df2.iterrows():
df1.loc[df1.A.isin([row['A']]), 'B']= row['C']
Can someone help me to understand how can I solve this without looping over df2.
Thanks
You can use map by Series:
df1['B'] = df1.A.map(df2.set_index('A')['C'])
print (df1)
A B
0 A1 C1
1 A2 C2
2 A3 C3
3 A1 C1
4 A2 C2
5 A3 C3
6 A4 C4
It is same as map by dict:
d = df2.set_index('A')['C'].to_dict()
print (d)
{'A4': 'C4', 'A3': 'C3', 'A2': 'C2', 'A1': 'C1'}
df1['B'] = df1.A.map(d)
print (df1)
A B
0 A1 C1
1 A2 C2
2 A3 C3
3 A1 C1
4 A2 C2
5 A3 C3
6 A4 C4
Timings:
len(df1)=7:
In [161]: %timeit merged = df1.merge(df2, on='A', how='left').rename(columns={'C':'B'})
1000 loops, best of 3: 1.73 ms per loop
In [162]: %timeit df1['B'] = df1.A.map(df2.set_index('A')['C'])
The slowest run took 4.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 873 µs per loop
len(df1)=70k:
In [164]: %timeit merged = df1.merge(df2, on='A', how='left').rename(columns={'C':'B'})
100 loops, best of 3: 12.8 ms per loop
In [165]: %timeit df1['B'] = df1.A.map(df2.set_index('A')['C'])
100 loops, best of 3: 6.05 ms per loop
IIUC you can just merge and rename the col
df1.merge(df2, on='A', how='left').rename(columns={'C':'B'})
In [103]:
df1 = pd.DataFrame({'A':['A1','A2','A3','A1','A2','A3','A4']})
df2 = pd.DataFrame({'A':['A1','A2','A3','A4'], 'C':['C1','C2','C4','C4']})
merged = df1.merge(df2, on='A', how='left').rename(columns={'C':'B'})
merged
Out[103]:
A B
0 A1 C1
1 A2 C2
2 A3 C4
3 A1 C1
4 A2 C2
5 A3 C4
6 A4 C4
Based on searchsorted method, here are three approaches with different indexing schemes -
df1['B'] = df2.C[df2.A.searchsorted(df1.A)].values
df1['B'] = df2.C[df2.A.searchsorted(df1.A)].reset_index(drop=True)
df1['B'] = df2.C.values[df2.A.searchsorted(df1.A)]

MS Excel Partial Compare and Copy values from one cell to another

How can i partial compare records in two cell using Excel-formula
Let Say i have AA BB in Sheet1- A1 and Test in Sheet1 B1 and AA VV in sheet2 cell A1 How can i partial Compare the two cell using the first word or word before space and create something shown on below #3
1. Sheet1 look like below
A2 B2
----- -----
AA BB test.
2. Sheet 2 looks like
A2
-----
AA VV
3. I want to see Sheet 3 as
A2 B2
----- -----
AA VV test.
Compare the first word if match
Assuming Sheet 1 has a range of values like the one below, and column A of Sheet 3 is equated to column A of Sheet 2:
A B
--------------------
1 AA BB test.
2 CC DD test..
3 EE FF test...
4 GG HH test....
5 II JJ test.....
You can use this formula on column B of Sheet 3:
=VLOOKUP("*" & MID($A1, 1, FIND(" ", $A1)-1) & "*", Sheet1!$A$1:$B$5, 2, FALSE)

Split value from a data.frame and create additional row to store its component

In R, I have a data frame called df such as the following:
A B C D
a1 b1 c1 2.5
a2 b2 c2 3.5
a3 b3 c3 5 - 7
a4 b4 c4 2.5
I want to split the value of the third row and D column by the dash and create another row for the second value retaining the other values for that row.
So I want this:
A B C D
a1 b1 c1 2.5
a2 b2 c2 3.5
a3 b3 c3 5
a3 b3 c3 7
a4 b4 c4 2.5
Any idea how this can be achieved?
Ideally, I would also want to create an extra column to specify whether the value I split is either a minimum or maximum.
So this:
A B C D E
a1 b1 c1 2.5
a2 b2 c2 3.5
a3 b3 c3 5 min
a3 b3 c3 7 max
a4 b4 c4 2.5
Thanks.
One option would be to use sub to paste 'min' and 'max in the 'D" column where - is found, and then use cSplit to split the 'D' column.
library(splitstackshape)
df1$D <- sub('(\\d+) - (\\d+)', '\\1,min - \\2,max', df1$D)
res <- cSplit(cSplit(df1, 'D', ' - ', 'long'), 'D', ',')[is.na(D_2), D_2 := '']
setnames(res, 4:5, LETTERS[4:5])
res
# A B C D E
#1: a1 b1 c1 2.5
#2: a2 b2 c2 3.5
#3: a3 b3 c3 5.0 min
#4: a3 b3 c3 7.0 max
#5: a4 b4 c4 2.5
Here's a dplyrish way:
DF %>%
group_by(A,B,C) %>%
do(data.frame(D = as.numeric(strsplit(as.character(.$D), " - ")[[1]]))) %>%
mutate(E = if (n()==2) c("min","max") else "")
A B C D E
(fctr) (fctr) (fctr) (dbl) (chr)
1 a1 b1 c1 2.5
2 a2 b2 c2 3.5
3 a3 b3 c3 5.0 min
4 a3 b3 c3 7.0 max
5 a4 b4 c4 2.5
Dplyr has a policy against expanding rows, as far as I can tell, so the ugly
do(data.frame(... .$ ...))
construct is required. If you are open to data.table, it's arguably simpler here:
library(data.table)
setDT(DF)[,{
D = as.numeric(strsplit(as.character(D)," - ")[[1]])
list(D = D, E = if (length(D)==2) c("min","max") else "")
}, by=.(A,B,C)]
A B C D E
1: a1 b1 c1 2.5
2: a2 b2 c2 3.5
3: a3 b3 c3 5.0 min
4: a3 b3 c3 7.0 max
5: a4 b4 c4 2.5
We can use tidyr::separate_rows. I altered the input to include a negative value to makeit more general :
df <- read.table(header=TRUE,stringsAsFactors=FALSE,text=
"A B C D
a1 b1 c1 -2.5
a2 b2 c2 3.5
a3 b3 c3 '5 - 7'
a4 b4 c4 2.5")
library(dplyr)
library(tidyr)
df %>%
mutate(E="", E = replace(E, grepl("[^^]-",D), "min - max")) %>%
separate_rows(D,E,sep = "[^^]-", convert = TRUE)
#> A B C D E
#> 1 a1 b1 c1 -2.5
#> 2 a2 b2 c2 3.5
#> 3 a3 b3 c3 5.0 min
#> 4 a3 b3 c3 7.0 max
#> 5 a4 b4 c4 2.5

Multiple group-by with one common variable with pandas?

I want to mark duplicate values within an ID group. For example
ID A B
i1 a1 b1
i1 a1 b2
i1 a2 b2
i2 a1 b2
should become
ID A An B Bn
i1 a1 2 b1 1
i1 a1 2 b2 2
i1 a2 1 b2 2
i2 a1 1 b2 1
Basically An and Bn count multiplicity within each ID group. How can I do this in pandas? I've found groupBy, but it was quite messy to put everything together. Also I tried individual groupby for ID, A and ID, B. Maybe there is a way to pre-group by ID first and then do all the other variables? (there are many variables and I have very man rows!)
Also I tried individual groupby for ID, A and ID, B
I think this is a straight-forward way to tackle it; As you suggest, you can groupby each separately and then compute the size of the groups. And use transform so you can easily add the results to the original dataframe:
df['An'] = df.groupby(['ID','A'])['A'].transform(np.size)
df['Bn'] = df.groupby(['ID','B'])['B'].transform(np.size)
print df
ID A B An Bn
0 i1 a1 b1 2 1
1 i1 a1 b2 2 2
2 i1 a2 b2 1 2
3 i2 a1 b2 1 1
Of course, with lots of columns you could do:
for col in ['A','B']:
df[col + 'n'] = df.groupby(['ID',col])[col].transform(np.size)
The duplicated method can also be used to give you something similar, but it will mark observations within a group after the first as duplicates:
for col in ['A','B']:
df[col + 'n'] = df.duplicated(['ID',col])
print df
ID A B An Bn
0 i1 a1 b1 False False
1 i1 a1 b2 True False
2 i1 a2 b2 False True
3 i2 a1 b2 False False
EDIT: increasing performance for large data. I did it on a large dataset (4 million rows) and it was significantly faster if I avoided transform with something like the following (it is much less elegant):
for col in ['A','B']:
x = df.groupby(['ID',col]).size()
df.set_index(['ID',col],inplace=True)
df[col + 'n'] = x
df.reset_index(inplace=True)