How to break the line every n-th time when results are written individually into a file? - line

I have this program, which takes the values from 2 separate files (ex1.idl and ex2.idl), performs a calculation and writes the results into a different file (ex3.txt). It all works well and the output are the results in one large line.
I am looking for an easy way to break the line every 10th element in the output, like this:
100 200 500 600 120 180 400 450 900 100
100 200 700 600 620 580 400 450 900 400
200 200 700 800 620 580 400 450 800 300
with open('ex1.idl') as f1, open('ex2_1.idl') as f2:
with open('ex3.txt', 'w') as f3:
start_line = 905 #reading from this line forward
for i in range(start_line - 1):
next(f1)
next(f2)
f1 = list(map(float, f1.read().split()))
f2 = list(map(float, f2.read().split()))
for result in map(lambda v : v[0]/v[1], zip(f1,f2)):
if(f3.count()%10 != 0):
f3.write(str(result) + ' ')
else:
f3.write(str(result) + '\n')
I thank you in advance for a solution (advice).

I figured it ouf for future viewers:
for i, result in enumerate(map(lambda v : v[0]/v[1], zip(f1,f2)), start=1):
if(i%10 != 0):
f3.write(' ' + str('{0:.9f}'.format(result)))
else:
f3.write(' ' + str('{0:.9f}'.format(result)) + '\n')

Related

MS Access User-Function Runs But Doesn't display value

MS Access newbie here. I've written a VBA UDF to calculate an AQL Sample Size by the Inspection Level (IL Variable in my code), and Lot size (Batch Variable in my code)
In an access userform, I have the two values for my two variables displayed in a text box. I then have a third text box which has my UDF in it, and that I want to auto-update to the new value, when I add or change either of the two variables, but it never changes. It just stays on zero.
When I update either variable, if I put a stop in the UDF VBA code, the code does run, and provides me with the correct answer in a Watch in my VBE as I step through it. So the code seems to be OK, I'm assuming. Just something I'm doing wrong in the text box properties I'm trying to return the value in, I'm guessing.
The code is below if it helps.
Also, somewhat related question, I'm guessing you cannot use a UDF in a table? I can use the same UDF in an excel table, so I was thinking I should be able to, but it appears I cannot.
Thanks
Public Function SampleSize(IL As String, Batch As Long) As Long
If IL = "S1" Then
Select Case Batch
Case 2 To 50
Sample = 2
Case 51 To 500
Sample = 3
Case 501 To 35000
Sample = 5
Case Is >= 35001
Sample = 8
End Select
End If
If IL = "S2" Then
Select Case Batch
Case 2 To 25
Sample = 2
Case 26 To 150
Sample = 3
Case 151 To 1200
Sample = 5
Case 1201 To 35000
Sample = 8
Case Is >= 35001
Sample = 13
End Select
End If
If IL = "S3" Then
Select Case Batch
Case 2 To 15
Sample = 2
Case 16 To 50
Sample = 3
Case 51 To 150
Sample = 5
Case 151 To 500
Sample = 8
Case 501 To 3200
Sample = 13
Case 3201 To 35000
Sample = 20
Case 35001 To 500000
Sample = 32
Case Is >= 500001
Sample = 50
End Select
End If
If IL = "S4" Then
Select Case Batch
Case 2 To 15
Sample = 2
Case 16 To 25
Sample = 3
Case 26 To 90
Sample = 5
Case 91 To 150
Sample = 8
Case 151 To 500
Sample = 13
Case 501 To 1200
Sample = 20
Case 1201 To 10000
Sample = 32
Case 10001 To 35000
Sample = 50
Case 35001 To 500000
Sample = 80
Case Is >= 500001
Sample = 125
End Select
End If
If IL = "G1" Then
Select Case Batch
Case 2 To 15
Sample = 2
Case 16 To 25
Sample = 3
Case 26 To 90
Sample = 5
Case 91 To 150
Sample = 8
Case 151 To 280
Sample = 13
Case 281 To 500
Sample = 20
Case 501 To 1200
Sample = 32
Case 1201 To 3200
Sample = 50
Case 3201 To 10000
Sample = 80
Case 10001 To 35000
Sample = 125
Case 35001 To 150000
Sample = 200
Case 150001 To 500000
Sample = 315
Case Is >= 500001
Sample = 500
End Select
End If
If IL = "G2" Then
Select Case Batch
Case 2 To 8
Sample = 2
Case 9 To 15
Sample = 3
Case 16 To 25
Sample = 5
Case 26 To 50
Sample = 8
Case 51 To 90
Sample = 13
Case 91 To 150
Sample = 20
Case 151 To 280
Sample = 32
Case 281 To 500
Sample = 50
Case 501 To 1200
Sample = 80
Case 1201 To 3200
Sample = 125
Case 3201 To 10000
Sample = 200
Case 10001 To 35000
Sample = 315
Case 35001 To 150000
Sample = 500
Case 150001 To 500000
Sample = 800
Case Is >= 500001
Sample = 1250
End Select
End If
If IL = "G3" Then
Select Case Batch
Case 2 To 8
Sample = 3
Case 9 To 15
Sample = 5
Case 16 To 25
Sample = 8
Case 26 To 50
Sample = 13
Case 51 To 90
Sample = 20
Case 91 To 150
Sample = 32
Case 151 To 280
Sample = 50
Case 281 To 500
Sample = 80
Case 501 To 1200
Sample = 125
Case 1201 To 3200
Sample = 200
Case 3201 To 10000
Sample = 315
Case 10001 To 35000
Sample = 500
Case 35001 To 150000
Sample = 800
Case 150001 To 500000
Sample = 1250
Case Is >= 500001
Sample = 2000
End Select
End If
End Function
You never assign a return value to the function itself!
You should place this line of code as indicated:
SampleSize = Sample
End Function
Also it is good practice to declare variables within the scope of your function.
Dim Sample As Long
When declared it will be initialized to 0, which will also be your default return value and the same Type as your Function.
Undeclared variables will be Variant type and initialized to Empty...

Pandas - Pivot/stack/unstack/melt

I have a dataframe that looks like this:
name
value 1
value 2
A
100
101
A
100
102
A
100
103
B
200
201
B
200
202
B
200
203
C
300
301
C
300
302
C
300
303
And I'm trying to get to this:
name
value 1
value 2
value 3
value 4
value 5
value 6
A
100
101
100
102
100
103
B
200
201
200
202
200
203
C
300
301
300
302
300
303
Here is what i have tried so far;
dataframe.stack()
dataframe.unstack()
dataframe.melt(id_vars=['name'])
I need to transpose the data by ensuring that;
The first row remains as it is but every subsequent value associated with the same name should be transposed to a coulmn.
Whereas the second value B (for. ex) should transpose it's associated value as a new value under the column A values, it should not form a separate altogether.
Try:
def fn(x):
vals = x.values.ravel()
return pd.DataFrame(
[vals],
columns=[f"value {i}" for i in range(1, vals.shape[0] + 1)],
)
out = (
df.set_index("name")
.groupby(level=0)
.apply(fn)
.reset_index()
.drop(columns="level_1")
)
print(out.to_markdown())
Prints:
name
value 1
value 2
value 3
value 4
value 5
value 6
0
A
100
101
100
102
100
103
1
B
200
201
200
202
200
203
2
C
300
301
300
302
300
303
Flatten values for each name
(
df.set_index('name')
.groupby(level=0)
.apply(lambda x: pd.Series(x.values.flat))
.rename(columns=lambda x: f'value {x + 1}')
.reset_index()
)
One option using melt, groupby`, and pivot_wider (from pyjanitor):
# pip install pyjanitor
import pandas as pd
import janitor
(df
.melt('name', ignore_index = False)
.sort_index()
.drop(columns='variable')
.assign(header = lambda df: df.groupby('name').cumcount() + 1)
.pivot_wider('name', 'header', names_sep = ' ')
)
name value 1 value 2 value 3 value 4 value 5 value 6
0 A 100 101 100 102 100 103
1 B 200 201 200 202 200 203
2 C 300 301 300 302 300 303

How can I round up an entire column to the next 10?

I'm really struggling to organise my data into 'bins' in Jupyter Notebook. I need the Length column to be rounded UP to the next 10 but I can only seem to round it up to the nearest whole number. I would really appreciate some guidance. Thanks in advance. :)
IN[58] df2['Length']
OUT[58] 0 541.56
1 541.73
2 482.22
3 345.45
...
Needs to look something like this:
IN[58] df2['Length']
OUT[58] 0 550
1 550
2 490
3 350
...
Sample
print (df2)
Length
0 541.56
1 541.73
2 482.22
3 500.00 <- whole number for better sample
You can use integer division, mutiple by 10 and convert to integers and add 10 if modulo is not 0:
s = (df2['Length'] // 10 * 10).astype(int) + (df2['Length'] % 10 > 0).astype(int) * 10
print (s)
0 550
1 550
2 490
3 500
Name: Length, dtype: int32
Another one use fact // round down:
s = -(-df2['Length'] // 10 * 10).astype(int)
print (s)
0 550
1 550
2 490
3 500
Name: Length, dtype: int32
Or is possible use division with np.ceil:
s = (np.ceil(df2['Length'] / 10) * 10).astype(int)
print (s)
0 550
1 550
2 490
3 500
Name: Length, dtype: int32

pandas rolling window: Avoiding O(n^2) in for loops

Given the following data frame:
df = pd.DataFrame()
df['A'] = [np.random.randint(1, 100) for i in xrange(1000)]
df['B'] = [np.random.randint(1, 100) for i in xrange(1000)]
I would like to compute some statistics based on a rolling window:
that has a 50% overlap
within this window, I would like to break it into 10 smaller non-overlapping windows and compute statistics for each of the 10 windows and save this information to a list.
This is what I mean:
0 100
____________________
0 10
10 20
20 30
30 40
40 50
50 60
60 70
70 80
80 90
90 100
____________________
50 150
____________________
50 60
60 70
70 80
80 90
90 100
100 110
110 120
120 130
130 140
140 150
____________________
100 200
____________________
100 110
110 120
...
Take a window of size 100 data points.
Break that into a small window of 10 data points.
Compute statistics.
Back to 1: Move the window by 50%.
Repeat steps 2 and 3
Back to 1: ...
I have the following code that works.
def rolling_window(df, size=100):
start = 0
while start < df.count():
yield start, start + size
start += (size / 2)
stats = []
for start, end in windows(df['A']):
step = 10
time_range = np.arange(start, end + step, step)
times = zip(time_range[:-1], time_range[1:])
for t in times:
s = t[0]
e = t[1]
this_drange = df.loc[s:e,'B'].max() - df.iloc[s:e,'B'].min()
stats.append(this_drange)
But the two for loops take 9 hours for 0.5 million rows. How do I modify the code such that it is really fast? Is there a way to vectorize this?
I tried looking at pd.rolling() but I have no idea how to set it up such that there is a 50% overlap. Also, this is much more than just 50% overlap.
This should give you some inspiration. I'm not sure it handles all edge cases correctly though..
def thing2(window=100, step=50, subwindow=10, substep=10):
# Calculate stats for all possible subwindows
rolled = df['B'].rolling(window=subwindow)
stats = rolled.max() - rolled.min()
# Only take the stats of complete subwindows
stats = stats[subwindow-1:]
# Collect the subwindow stats for every "macro" window
idx, subidx = np.ogrid[:len(df)-window+1:step, :window:substep]
linidx = (idx + subidx).ravel()
return stats.iloc[linidx]

using awk to eliminate records that have a match specified by field 1 and within a defined value of field 2

I have a problem that I am trying to use awk to solve. It has application in selecting good quality single nucleaotide ploymorphisms (SNP) for placing on a SNP-chip, where there is a requirement that no SNP is within 60bp of another SNP. The file looks like this:
comp1008_seq1 20
comp1008_seq1 234
comp1008_seq1 260
comp1008_seq1 500
comp3044_seq1 300
comp3044_seq1 350
comp3044_seq1 460
comp3044_seq1 600
................
I want to only print records that are not within +-60 (based on field 2) when they are from the same component (based on field 1). Therefore, it doesn't matter if they are within +-60 when they are from different components (based on field 1). The output in the above example should look like this:
comp1008_seq1 20
comp1008_seq1 234
comp1008_seq1 500
comp3044_seq1 300
comp3044_seq1 460
comp3044_seq1 600
http://ideone.com/h6oEI
{
if ($1 != last1 || abs($2-last2) > 60 ) print
last1 = $1; last2 = $2
}
function abs(x){
return x > 0 ? x : -x
}