My file looks looks like this:
1000074493 1D # # # # #
1000098165 1D # # # # #
1000105360 1D # # # # #
1000115763 1D 2D # # # #
1000345208 1D # # # # #
1000470774 1D 2D # 4D # #
1000487544 # # 3D # 5D #
1000499657 1D # # # # #
1000531456 1D # # # # #
1000561333 # # # # 5D #
I want to loop per record through fields 2:NF
print if $NF != #
and stop reading the line but continue in next line.
In other words,
find the first field after the first which isn't #, then print only the first field and that field, and skip to the next line.
So the expected result would be:
1000074493 1D
1000098165 1D
1000105360 1D
1000115763 1D
1000345208 1D
1000470774 1D
1000487544 3D
1000499657 1D
1000531456 1D
1000561333 5D
My code is:
awk '{for(i=2; i<=NF; i++) {if($i != "#" ) print $1,$i }}' $FILE
which gives me:
1000074493 1D
1000098165 1D
1000105360 1D
1000115763 1D
1000115763 1D
1000345208 1D
1000470774 1D
1000470774 2D
1000470774 4D
1000487544 3D
1000487544 5D
1000499657 1D
1000531456 1D
1000561333 5D
What do I need to change?
Like your original question articulation should already have suggested, the keyword you are looking for is break.
awk '{for(i=2; i<=NF; i++) if($i != "#" ) { print $1,$i; break }}' "$FILE"
Demo: https://ideone.com/hWRM9K
As an aside, avoid useless uses of cat and use lower case for your private variables, and quote file name variables.
With awk you can do this:
awk '{gsub(/#/,""); print $1,$2}' file
1000074493 1D
1000098165 1D
1000105360 1D
1000115763 1D
1000345208 1D
1000470774 1D
1000487544 3D
1000499657 1D
1000531456 1D
1000561333 5D
the following applied to your file gave me your expected result
awk '{i=2; while ($i == "#") i++; print $1 " " $i}' $FILE
Found out finally by myself:
awk '{for(i=2; i<=NF; i++) {if($i != "#" ) print $1,$i }}' $FILE|awk '$1 != p {print $1,$2}{p=$1}'
if anyone knows how to combine both awk statements in one, it would be appreciated!
I have a list of lists with two values that represent a start time-point and an end time-point. I would like to count how much of the time range between the two points fall into bins.
The bins are between 0-300,300-500 and 500-1200.
I would also like to bin them between 0-50, 50-100, 100-150 and so on.
The question is similar to Python: Checking to which bin a value belongs, but different since it involves a two-points time-range which can fall into separate bins at the same time.
I have created the a for loop in the code below, which works. But I'm wondering if there is a faster more pythonic way to calculate this, perhaps using pandas or numpy.
import numpy
x = numpy.array([[100, 150],[100, 125],[290, 310],[277, 330],
[300, 400],[480, 510],[500, 600]])
d = {'0-300': [0], '300-500': [0], '500-1200':[0]}
import pandas as pd
df = pd.DataFrame(data=d)
for i in x:
start,end = i[0],i[1]
if start <= 300 and end <= 300: # checks if time ranges falls into only 1st bin
df['0-300'][0] += end - start
elif start <= 300 and end > 300: # checks if time ranges falls into 1st and 2ed bin
df['0-300'][0] += (300 - start)
df['300-500'][0] += (end - 300)
elif start >= 300 and end >= 300 and end <= 500: # checks if time ranges falls into only 2ed bin
df['300-500'][0] += end - start
elif start <= 500 and end > 500: # checks if time ranges falls into 2ed and 3ed bin
df['300-500'][0] += (500 - start)
df['500-1200'][0] += (end - 500)
elif start > 500: # checks if time ranges falls into only 3ed bin
df['500-1200'][0] += end - start
df:
0-300 300-500 500-1200
108 160 110
thanks for reading
For a generic number of bins, here's a vectorized way leveraging np.add.at to get the counts and then np.add.reduceat for getting binned summations -
bins = [0, 300, 500, 1200] # Declare bins
id_arr = np.zeros(bins[-1], dtype=int)
np.add.at(id_arr, x[:,0], 1)
np.add.at(id_arr, x[:,1], -1)
c = id_arr.cumsum()
out = np.add.reduceat(c, bins[:-1])
# Present in a dataframe format
col_names = [str(i)+'-' + str(j) for i,j in zip(bins[:-1], bins[1:])]
df_out = pd.DataFrame([out], columns=col_names)
Sample output -
In [524]: df_out
Out[524]:
0-300 300-500 500-1200
0 108 160 110
Here is one way of doing it
In [1]: counts = np.zeros(1200, dtype=int)
In [2]: for x_lower, x_upper in x: counts[x_lower:x_upper] += 1
In [3]: d['0-300'] = counts[0:300].sum()
In [4]: d['300-500'] = counts[300:500].sum()
In [5]: d['500-1200'] = counts[500:1200].sum()
In [6]: d
Out[6]: {'0-300': 108, '300-500': 160, '500-1200': 110}
However, in order to sum up the results for all bins, it will be better to wrap those 3 steps into a for loop.
Currently going through examples volume 1 and came across an error with the dyes example.
When I try to load inits from the example it returns "this chain contains uninitialized variables. I am not sure which part of it is not right as on the first sight I see theta, tau.btw and tau.with is all specified and nothing is left out.
I am using the code directly from Examples Vol 1 under help tab. The same error happened to all three choices of priors for between-variation.
I would really appreciate any advice on the problem. Thanks in advance.
Below is the code I copied directly from the dyes example.
model
{
for( i in 1 : batches ) {
mu[i] ~ dnorm(theta, tau.btw)
for( j in 1 : samples ) {
y[i , j] ~ dnorm(mu[i], tau.with)
}
}
theta ~ dnorm(0.0, 1.0E-10)
# prior for within-variation
sigma2.with <- 1 / tau.with
tau.with ~ dgamma(0.001, 0.001)
# Choice of priors for between-variation
# Prior 1: uniform on SD
#sigma.btw~ dunif(0,100)
#sigma2.btw<-sigma.btw*sigma.btw
#tau.btw<-1/sigma2.btw
# Prior 2: Uniform on intra-class correlation coefficient,
# ICC=sigma2.btw / (sigma2.btw+sigma2.with)
ICC ~ dunif(0,1)
sigma2.btw <- sigma2.with *ICC/(1-ICC)
tau.btw<-1/sigma2.btw
# Prior 3: gamma(0.001, 0.001) NOT RECOMMENDED
#tau.btw ~ dgamma(0.001, 0.001)
#sigma2.btw <- 1 / tau.btw
}
Data
list(batches = 6, samples = 5,
y = structure(
.Data = c(1545, 1440, 1440, 1520, 1580,
1540, 1555, 1490, 1560, 1495,
1595, 1550, 1605, 1510, 1560,
1445, 1440, 1595, 1465, 1545,
1595, 1630, 1515, 1635, 1625,
1520, 1455, 1450, 1480, 1445), .Dim = c(6, 5)))
Inits1
list(theta=1500, tau.with=1, sigma.btw=1)
Inits2
list(theta=1500, tau.with=1,ICC=0.5)
Inits3
list(theta=1500, tau.with=1, tau.btw=1)
That is not an error per se. Yes you have provided the inits for the parameters of interest.
However there are the six mu[i] variables that are not data, but are variables drawn from mu[i] ~ dnorm(theta, tau.btw).
You could provide initial values for these as well, but it is best imo to just click on gen inits if you are using WinBUGS from the GUI - this will provide initial values for those.
Given an array of 2D points (#pts x 2) and an array of which points are connected to which (#bonds x 2 int array with indices of pts), how can I efficiently return an array of polygons formed from the bonds?
There can be 'dangling' bonds (like in the top left of the image below) that don't close a polygon, and these should be ignored.
Here's an example:
import numpy as np
xy = np.array([[2.72,-2.976], [2.182,-3.40207],
[-3.923,-3.463], [2.1130,4.5460], [2.3024,3.4900], [.96979,-.368],
[-2.632,3.7555], [-.5086,.06170], [.23409,-.6588], [.20225,-.9540],
[-.5267,-1.981], [-2.190,1.4710], [-4.341,3.2331], [-3.318,3.2654],
[.58510,4.1406], [.74331,2.9556], [.39622,3.6160], [-.8943,1.0643],
[-1.624,1.5259], [-1.414,3.5908], [-1.321,3.6770], [1.6148,1.0070],
[.76172,2.4627], [.76935,2.4838], [3.0322,-2.124], [1.9273,-.5527],
[-2.350,-.8412], [-3.053,-2.697], [-1.945,-2.795], [-1.905,-2.767],
[-1.904,-2.765], [-3.546,1.3208], [-2.513,1.3117], [-2.953,-.5855],
[-4.368,-.9650]])
BL= np.array([[22,23], [28,29], [8,9],
[12,31], [18,19], [31,32], [3,14],
[32,33], [24,25], [10,30], [15,23],
[5,25], [12,13], [0,24], [27,28],
[15,16], [5,8], [0,1], [11,18],
[2,27], [11,13], [33,34], [26,33],
[29,30], [7,17], [9,10], [26,30],
[17,22], [5,21], [19,20], [17,18],
[14,16], [7,26], [21,22], [3,4],
[4,15], [11,32], [6,19], [6,13],
[16,20], [27,34], [7,8], [1,9]])
I can't tell you how to implement it with numpy, but here's an outline of a possible algorithm:
Add a list of attached bonds to each point.
Remove the points that have only one bond attached, remove this bond as well (these are the dangling bonds)
Attach two boolean markers to each bond, indicating if the bond has already been added to a polygon in one of the two possible directions. Each bond can only be used in two polygons. Initially set all markers to false.
Select any initial point and repeat the following step until all bonds have been used in both directions:
Select a bond that has not been used (in the respective direction). This is the first edge of the polygon. Of the bonds attached to the end point of the selected one, choose the one with minimal angle in e.g. counter-clockwise direction. Add this to the polygon and continue until you return to the initial point.
This algorithm will also produce a large polygon containing all the outer bonds of the network. I guess you will find a way to recognize this one and remove it.
For future readers, the bulk of the implementation of Frank's suggestion in numpy is below. The extraction of the boundary follows essentially the same algorithm as walking around a polygon, except using the minimum angle bond, rather than the max.
def extract_polygons_lattice(xy, BL, NL, KL):
''' Extract polygons from a lattice of points.
Parameters
----------
xy : NP x 2 float array
points living on vertices of dual to triangulation
BL : Nbonds x 2 int array
Each row is a bond and contains indices of connected points
NL : NP x NN int array
Neighbor list. The ith row has neighbors of the ith particle, padded with zeros
KL : NP x NN int array
Connectivity list. The ith row has ones where ith particle is connected to NL[i,j]
Returns
----------
polygons : list
list of lists of indices of each polygon
PPC : list
list of patches for patch collection
'''
NP = len(xy)
NN = np.shape(KL)[1]
# Remove dangling bonds
# dangling bonds have one particle with only one neighbor
finished_dangles = False
while not finished_dangles:
dangles = np.where([ np.count_nonzero(row)==1 for row in KL])[0]
if len(dangles) >0:
# Make sorted bond list of dangling bonds
dpair = np.sort(np.array([ [d0, NL[d0,np.where(KL[d0]!=0)[0]] ] for d0 in dangles ]), axis=1)
# Remove those bonds from BL
BL = setdiff2d(BL,dpair.astype(BL.dtype))
print 'dpair = ', dpair
print 'ending BL = ', BL
NL, KL = BL2NLandKL(BL,NP=NP,NN=NN)
else:
finished_dangles = True
# bond markers for counterclockwise, clockwise
used = np.zeros((len(BL),2), dtype = bool)
polygons = []
finished = False
while (not finished) and len(polygons)<20:
# Check if all bond markers are used in order A-->B
todoAB = np.where(~used[:,0])[0]
if len(todoAB) > 0:
bond = BL[todoAB[0]]
# bb will be list of polygon indices
# Start with orientation going from bond[0] to bond[1]
nxt = bond[1]
bb = [ bond[0], nxt ]
dmyi = 1
# as long as we haven't completed the full outer polygon, add next index
while nxt != bond[0]:
n_tmp = NL[ nxt, np.argwhere(KL[nxt]).ravel()]
# Exclude previous boundary particle from the neighbors array, unless its the only one
# (It cannot be the only one, if we removed dangling bonds)
if len(n_tmp) == 1:
'''The bond is a lone bond, not part of a triangle.'''
neighbors = n_tmp
else:
neighbors = np.delete(n_tmp, np.where(n_tmp == bb[dmyi-1])[0])
angles = np.mod( np.arctan2(xy[neighbors,1]-xy[nxt,1],xy[neighbors,0]-xy[nxt,0]).ravel() \
- np.arctan2( xy[bb[dmyi-1],1]-xy[nxt,1], xy[bb[dmyi-1],0]-xy[nxt,0] ).ravel(), 2*np.pi)
nxt = neighbors[angles == max(angles)][0]
bb.append( nxt )
# Now mark the current bond as used
thisbond = [bb[dmyi-1], bb[dmyi]]
# Get index of used matching thisbond
mark_used = np.where((BL == thisbond).all(axis=1))
if len(mark_used)>0:
#print 'marking bond [', thisbond, '] as used'
used[mark_used,0] = True
else:
# Used this bond in reverse order
used[mark_used,1] = True
dmyi += 1
polygons.append(bb)
else:
# Check for remaining bonds unused in reverse order (B-->A)
todoBA = np.where(~used[:,1])[0]
if len(todoBA) >0:
bond = BL[todoBA[0]]
# bb will be list of polygon indices
# Start with orientation going from bond[0] to bond[1]
nxt = bond[0]
bb = [ bond[1], nxt ]
dmyi = 1
# as long as we haven't completed the full outer polygon, add nextIND
while nxt != bond[1]:
n_tmp = NL[ nxt, np.argwhere(KL[nxt]).ravel()]
# Exclude previous boundary particle from the neighbors array, unless its the only one
# (It cannot be the only one, if we removed dangling bonds)
if len(n_tmp) == 1:
'''The bond is a lone bond, not part of a triangle.'''
neighbors = n_tmp
else:
neighbors = np.delete(n_tmp, np.where(n_tmp == bb[dmyi-1])[0])
angles = np.mod( np.arctan2(xy[neighbors,1]-xy[nxt,1],xy[neighbors,0]-xy[nxt,0]).ravel() \
- np.arctan2( xy[bb[dmyi-1],1]-xy[nxt,1], xy[bb[dmyi-1],0]-xy[nxt,0] ).ravel(), 2*np.pi)
nxt = neighbors[angles == max(angles)][0]
bb.append( nxt )
# Now mark the current bond as used --> note the inversion of the bond order to match BL
thisbond = [bb[dmyi], bb[dmyi-1]]
# Get index of used matching [bb[dmyi-1],nxt]
mark_used = np.where((BL == thisbond).all(axis=1))
if len(mark_used)>0:
used[mark_used,1] = True
dmyi += 1
polygons.append(bb)
else:
# All bonds have been accounted for
finished = True
# Check for duplicates (up to cyclic permutations) in polygons
# Note that we need to ignore the last element of each polygon (which is also starting pt)
keep = np.ones(len(polygons),dtype=bool)
for ii in range(len(polygons)):
polyg = polygons[ii]
for p2 in polygons[ii+1:]:
if is_cyclic_permutation(polyg[:-1],p2[:-1]):
keep[ii] = False
polygons = [polygons[i] for i in np.where(keep)[0]]
# Remove the polygon which is the entire lattice boundary, except dangling bonds
boundary = extract_boundary_from_NL(xy,NL,KL)
print 'boundary = ', boundary
keep = np.ones(len(polygons),dtype=bool)
for ii in range(len(polygons)):
polyg = polygons[ii]
if is_cyclic_permutation(polyg[:-1],boundary.tolist()):
keep[ii] = False
elif is_cyclic_permutation(polyg[:-1],boundary[::-1].tolist()):
keep[ii] = False
polygons = [polygons[i] for i in np.where(keep)[0]]
# Prepare a polygon patch collection
PPC = []
for polyINDs in polygons:
pp = Path(xy[polyINDs],closed=True)
ppp = patches.PathPatch(pp, lw=2)
PPC.append(ppp)
return polygons, PPC