`Bad integer for item 1 of list input` when reading a structured datafile in Fortran - file-io

I am getting run time error: Bad integer for item 1 of list input for the model file, and the error is declared At line 87 of file teleseis3.f (unit=2,file='model')
* < Default structure >
tqp=1.0
tqs=4.0
nl= 4
data vp/ 5.2, 6.24, 6.58, 7.8, 6*0.0/
data vs/ 3.0, 3.60, 3.80, 4.4, 6*0.0/
data den/ 2.4, 2.67, 2.80, 3.3, 6*0.0/
data dep/ 5.5, 9.50, 19.0, 0.0, 6*0.0/
nl1=4
data vp1/ 5.2, 6.40, 8.0, 7*0.0/
data vs1/ 3.0, 3.75, 4.5, 7*0.0/
data den1/ 2.4, 2.70, 3.3, 7*0.0/
data dep1/ 5.0, 30.0, 0.0, 7*0.0/
nl2=3
data vp2/ 6.40, 8.0, 8*0.0/
data vs2/ 3.75, 4.4, 8*0.0/
data den2/ 2.70, 3.3, 8*0.0/
data dep2/ 30.00, 0.0, 8*0.0/
c ------------ End initialization ----------------------------
read(*,*) model
if(model(1:4) .ne. 'none') then
c print*, model
open(2,file=model)
READ(2,'(a40)') name
READ(2,*) TQP,TQS,NL ,(VP (L),VS (L),DEN (L),DEP (L),L=1,NL )
c line 87 is here
READ(2,*) NL1,(VP1(L),VS1(L),DEN1(L),DEP1(L),L=1,NL1)
READ(2,*) NL2,(VP2(L),VS2(L),DEN2(L),DEP2(L),L=1,NL2)
close(2)
endif
The data in the 'model' file arranged in lines to break 'default structure' pattern.
#model
1.0 4.0 50 5.8 3.2 2.6 15.0
5.8 3.2 2.6 0.0
6.8 3.9 2.9 9.4
8.1 4.5 3.4 0.0
8.10119 4.48486 3.37906 15.6
8.08907 4.47715 3.37688 20
8.07688 4.46953 3.37471 20
8.0554 4.45643 3.37091 35
8.0337 4.44361 3.3671 35
8.0118 4.43108 3.3633 35
7.9897 4.41885 3.3595 35
8.55896 4.64391 3.43578 0
8.64552 4.6754 3.46264 45
8.73209 4.7069 3.48951 45
8.81867 4.7384 3.51639 45
8.90522 4.76989 3.54325 45
9.13397 4.93259 3.72378 0
9.64588 5.22428 3.8498 50
9.90185 5.37014 3.91282 50
10.1578 5.51602 3.97584 50
10.212 5.54311 3.984 35
10.2662 5.5702 3.99214 35
10.7513 5.94508 4.38071 0
10.91 6.09418 4.41241 51
11.0656 6.24046 4.44317 50
11.2449 6.31091 4.50372 100
11.4156 6.37813 4.56307 100
11.5783 6.44232 4.62129 100
11.7336 6.5037 4.67844 100
11.8821 6.5625 4.7346 100
12.0245 6.6189 4.78983 100
12.1613 6.67317 4.84422 100
12.2932 6.72548 4.89783 100
12.4207 6.77606 4.95073 100
12.5447 6.82512 5.00299 100
12.6655 6.87289 5.05469 100
12.7839 6.91957 5.1059 100
12.9004 6.96538 5.15669 100
13.0158 7.01053 5.20713 100
13.1306 7.05525 5.25729 100
13.2453 7.09976 5.30724 100
13.3607 7.14423 5.35706 100
13.4774 7.18892 5.40681 100
13.596 7.23403 5.45657 100
13.6804 7.26597 5.49145 100
13.6875 7.26575 5.50642 100
13.7117 7.26486 5.55641 100
13.7166 7.26466 5.56645 100
4 5.8 3.2 2.6 15.0
5.8 3.2 2.6 0.0
6.8 3.9 2.9 9.4
8.1 4.5 3.4 0.0
3 5.8000 3.2000 2.6000 10.000
6.8000 3.9000 2.9000 22.000
8.1000 4.5000 3.3800 0.0000
However the error appears. Please give me ways to correct it.
Here I provide a testing script named'test.f':
Please test it for me because I still has the error.
PARAMETER (ND0=2048,NM0=6,LK0=10,NL0=48,PI=3.141593,RAD=.0174533)
IMPLICIT COMPLEX*8 (Z)
CHARACTER NAME*40,NAM*4, prefix*80, adel*3, outfile*80
COMMON /STR0/NL ,VP (NL0),VS (NL0),DEN (NL0),DEP (NL0)
COMMON /STR1/NL1,VP1(NL0),VS1(NL0),DEN1(NL0),DEP1(NL0)
COMMON /STR2/NL2,VP2(NL0),VS2(NL0),DEN2(NL0),DEP2(NL0)
read(*,'(a80)') model
if(model(1:4) .ne. 'none') then
c print*, model
open(2,file=model)
READ(2,'(a40)') name
READ(2,*) TQP,TQS,NL ,(VP (L),VS (L),DEN (L),DEP (L),L=1,NL )
READ(2,*) NL1,(VP1(L),VS1(L),DEN1(L),DEP1(L),L=1,NL1)
print *,NL1,VP1(L),VS1(L),DEN1(L),DEP1(L)
READ(2,*) NL2,(VP2(L),VS2(L),DEN2(L),DEP2(L),L=1,NL2)
close(2)
endif

Related

Pandas / How to insert variable number of lines inside a DataFrame?

Here is the structure of my dataframe
plan
ADO_ver_x
ADO_incr_x
ADO_ver_y
ADO_incr_y
3ABP3
25.0
4.0
25.0
7.0
I would like to add ADO_incr_y - ADO_incr_x lines, which means in this case the result would be :
plan
ADO_ver_x
ADO_incr_x
ADO_ver_y
ADO_incr_y
3ABP3
25.0
4.0
25.0
5.0
3ABP3
25.0
5.0
25.0
6.0
3ABP3
25.0
6.0
25.0
7.0
Is there a Panda/Pythonic way to do that ?
I was thinking something like :
reps = [ val2-val1 for val2, val1 in zip(df_insert["ADO_incr_y"],df_insert["ADO_incr_x"]) ]
df_insert.loc[np.repeat(df.index_insert.values, reps)]
But I don't get the incremental progression :
4 -> 5, 5->-6, 6 -> 7
How can I get the index inside the list comprehension ?
You can repeat the data, then modify with groupby.cumcount():
repeats = df['ADO_incr_y'].sub(df['ADO_incr_x']).astype(int)
out = df.reindex(df.index.repeat(repeats))
out['ADO_incr_x'] += out.groupby(level=0).cumcount()
out['ADO_incr_y'] = out['ADOE_incr_x'] + 1

Avoiding loops in python/pandas

I can do python/pandas to basic stuff, but I still struggle with the "no loops necessary" world of pandas. I tend to fall back to converting to lists and doing loops like in VBA and then just bring those list back to dfs. I know there is a simpler way, but I can't figure it out.
I simple example is just a very basic strategy of creating a signal of -1 if a series is above 70 and keep it -1 until the series breaks below 30 when the signal changes to 1 and keep this until a value above 70 again and so on.
I can do this via simple list looping, but I know this is far from "Pythonic"! Can anyone help "translating" this to some nicer code without loops?
#rsi_list is just a list from a df column of numbers. Simple example:
rsi={'rsi':[35, 45, 75, 56, 34, 29, 26, 34, 67. 78]}
rsi=pd.DataFrame(rsi)
rsi_list=rsi['rsi'].tolist()
signal_list=[]
hasShort=0
hasLong=0
for i in range(len(rsi_list)-1):
if rsi_list[i] >= 70 or hasShort==1:
signal_list.append(-1)
if rsi_list[i+1] >= 30:
hasShort=1
else:
hasShort=0
elif rsi_list[i] <= 30 or hasLong==1:
signal_list.append(1)
if rsi_list[i+1] <= 70:
hasLong=1
else:
hasLong=0
else:
signal_list.append(0)
#last part just for the list to be the same lenght of the original df as I put it back as a column
if rsi_list[-1]>=70:
signal_list.append(-1)
else:
signal_list.append(1)
First clip the values to 30 in lower and 70 in upper bound, use where to change to nan all the values that are not 30 or 70. replace by 1 and -1 and propagate these values with ffill. fillna with 0 the values before the first 30 or 70.
rsi['rsi_cut'] = (
rsi['rsi'].clip(lower=30,upper=70)
.where(lambda x: x.isin([30,70]))
.replace({30:1, 70:-1})
.ffill()
.fillna(0)
)
print(rsi)
rsi rsi_cut
0 35 0.0
1 45 0.0
2 75 -1.0
3 56 -1.0
4 34 -1.0
5 29 1.0
6 26 1.0
7 34 1.0
8 67 1.0
9 78 -1.0
Edit: maybe a bit easier, use ge (greater or equal) and le (less or equal) and do a subtraction, then replace the 0s with the ffill method
print((rsi['rsi'].le(30).astype(int) - rsi['rsi'].gt(70))
.replace(to_replace=0, method='ffill'))

Filtering out a chemical dataset according to information in columns

I'm working with a chemical dataset and I was wondering about the smartest way to do the following thing. My dataset looks something like this:
formula Temperature (Kelvin) (Physical) Property Value
CO2 298 5
CO2 298 7.6
CO2 300 3.2
NaCl 300 3.4
NaCl 296 1.4
H2O 298 7.2
H2O 298 8.3
H2O 293 6.4
ZnO 300 3.10
ZnO 290 1.2
FeO 295 4.6
FeO 290 3.6
Given that Room Temperature := 298K,
what I would like to accomplish is to filter the original dataset in order to have only values reported with Room Temperature when it is available, and if there's no value reported at room temperature, I would like to keep the closest value to the room temperature that is available. According to what I would like to achieve, the sample initial dataset above would become something like
formula Temperature (Kelvin) (Physical) Property Value
CO2 298 5
CO2 298 7.6
NaCl 300 3.4
H2O 298 7.2
H2O 298 8.3
ZnO 300 3.10
FeO 295 4.6
Maybe I should use a lambda expression?
Any suggestions on how to achieve something like this?
Many thanks,
James
We can first filter the "good" ones, i.e., those that have temp 298 K. Then we can sort the remaining rows with respect to their distance to 298 K and then drop the duplicates to keep only the closests. We lastly merge good ones and these:
# room temp in K
rt = 298
# taking those that have `rt` K temp
good_ones = df[df["Temperature (Kelvin)"].eq(rt)]
good_names = good_ones.formula.unique()
# getting others
others = df[~df.formula.isin(good_names)]
# filtering others according to distance to `rt`
sorter = lambda s: s.sub(rt).abs()
others_filtered = (others
.sort_values("Temperature (Kelvin)", key=sorter)
.drop_duplicates("formula", keep="first"))
# merging them all
result = pd.concat([good_ones, others_filtered]).sort_index(ignore_index=True)
to get
>>> result
formula Temperature (Kelvin) (Physical) Property Value
0 CO2 298 5.0
1 CO2 298 7.6
2 NaCl 300 3.4
3 H2O 298 7.2
4 H2O 298 8.3
5 ZnO 300 3.1
6 FeO 295 4.6
There's also the apply way:
def filter_temp(gr):
# get he temp column and a bool series where it equals `rt`
temps = gr["Temperature (Kelvin)"]
rt_temps = temps.eq(rt)
# does any temp match `rt`?
if rt_temps.any():
# then return the locations it matches
return gr[rt_temps]
else:
# otherwise return the closest one
return gr.loc[[gr.temps.sub(rt).abs().idxmin()]]
result = (df.groupby("formula", as_index=False, group_keys=False)
.apply(filter_temp).sort_index(ignore_index=True))
The idea here is to group the rows by formula. Each group is then filtered to keep all rows with the required room temperature if any or the unique row with the closest temperature. Let's define this function:
def temperature_filter(df, room_temp, temp_col="Temperature (Kelvin)"):
if room_temp in df[temp_col].values:
return df[df[temp_col] == room_temp]
else:
return df.loc[[abs(df[temp_col] - room_temp).idxmin()]]
It only remains to apply this function to each group:
ROOM_TEMP = 298
df.groupby("formula", sort=False).apply(temperature_filter, ROOM_TEMP).droplevel("formula")
Note that temperature_filter has been written with clarity in mind, bun can also be included as a lambda function to reach a one-line solution!
def filter_closest_to_rt(df, rt=298):
df['tmrt'] = df['Temperature (Kelvin)'].sub(rt).abs()
return df[df['tmrt'] == df.groupby('formula')['tmrt'].transform('min')].drop(columns='tmrt')
filter_closest_to_rt(df)
formula Temperature (Kelvin) (Physical) Property Value
0 CO2 298 5.0
1 CO2 298 7.6
3 NaCl 300 3.4
4 NaCl 296 1.4
5 H2O 298 7.2
6 H2O 298 8.3
8 ZnO 300 3.1
10 FeO 295 4.6
How about this:
df['val'] = np.abs(df['Temperature (Kelvin)'] - 298)
df = df.sort_values(['formula', 'val'], ascending=[True, True])
df = df.drop_duplicates(subset='formula', keep="first")
To make sure you don't lose any 298 duplicates another solution is:
df['val'] = np.abs(df['Temperature (Kelvin)'] - 298)
the_298s = df[df['Temperature (Kelvin)'] == 298]
others = df[df['Temperature (Kelvin)'] != 298]
others = others.sort_values(['formula', 'val'], ascending=[True, True])
others = others.drop_duplicates(subset='formula', keep="first")
the_298s_formulas = the_298s.formula.unique()
others = others[~ others.formula.isin(the_298s_formulas)]
final_df = the_298s.append(others)

Python - Looping through dataframe using methods other than .iterrows()

Here is the simplified dataset:
Character x0 x1
0 T 0.0 1.0
1 h 1.1 2.1
2 i 2.2 3.2
3 s 3.3 4.3
5 i 5.5 6.5
6 s 6.6 7.6
8 a 8.8 9.8
10 s 11.0 12.0
11 a 12.1 13.1
12 m 13.2 14.2
13 p 14.3 15.3
14 l 15.4 16.4
15 e 16.5 17.5
16 . 17.6 18.6
The simplified dataset is generated by the following code:
ch = ['T']
x0 = [0]
x1 = [1]
string = 'his is a sample.'
for s in string:
ch.append(s)
x0.append(round(x1[-1]+0.1,1))
x1.append(round(x0[-1]+1,1))
df = pd.DataFrame(list(zip(ch, x0, x1)), columns = ['Character', 'x0', 'x1'])
df = df.drop(df.loc[df['Character'] == ' '].index)
x0 and x1 represents the starting and ending position of each Character, respectively. Assume that the distance between any two adjacent characters equals to 0.1. In other words, if the difference between x0 of a character and x1 of the previous character is 0.1, the two characters belongs to the same string. If such difference is larger than 0.1, the character should be the start of a new string, etc. I need to produce a dataframe of strings and their respective x0 and x1, which is done by looping through the dataframe using .iterrows()
string = []
x0 = []
x1 = []
for index, row in df.iterrows():
if index == 0:
string.append(row['Character'])
x0.append(row['x0'])
x1.append(row['x1'])
else:
if round(row['x0']-x1[-1],1) == 0.1:
string[-1] += row['Character']
x1[-1] = row['x1']
else:
string.append(row['Character'])
x0.append(row['x0'])
x1.append(row['x1'])
df_string = pd.DataFrame(list(zip(string, x0, x1)), columns = ['String', 'x0', 'x1'])
Here is the result:
String x0 x1
0 This 0.0 4.3
1 is 5.5 7.6
2 a 8.8 9.8
3 sample. 11.0 18.6
Is there any other faster way to achieve this?
You could use groupby + agg:
# create diff column
same = (df['x0'] - df['x1'].shift().fillna(df.at[0, 'x0'])).abs()
# create grouper column, had to use this because of problems with floating point
grouper = ((same - 0.1) > 0.00001).cumsum()
# group and aggregate accordingly
res = df.groupby(grouper).agg({ 'Character' : ''.join, 'x0' : 'first', 'x1' : 'last' })
print(res)
Output
Character x0 x1
0 This 0.0 4.3
1 is 5.5 7.6
2 a 8.8 9.8
3 sample. 11.0 18.6
The tricky part is this one:
# create grouper column, had to use this because of problems with floating point
grouper = ((same - 0.1) > 0.00001).cumsum()
The idea is to convert the column of diffs (same) into a True or False column, where every time a True appears it means a new group needs to be created. The cumsum will take care of assigning the same id to each group.
As suggested by #ShubhamSharma, you could do:
# create diff column
same = (df['x0'] - df['x1'].shift().fillna(df['x0'])).abs().round(3).gt(.1)
# create grouper column, had to use this because of problems with floating point
grouper = same.cumsum()
The other part remains the same.

Bar Plot with more than one variables

I have this set of data:
TestSystems
[1] 0013-021 0013-022 0013-031 0013-032 0013-033 0013-034
Levels: 0013-021 0013-022 0013-031 0013-032 0013-033 0013-034
Utilization
[1] 61.42608 64.95802 31.51387 45.11971 43.66110 63.68363
Availability
[1] 28.92506 32.58015 11.86372 16.22164 36.23264 40.54977
str(TestSystems)
Factor w/ 6 levels "0013-021","0013-022",..: 1 2 3 4 5 6
str(Utilization)
num [1:6] 61.4 65 31.5 45.1 43.7 ...
str(Availability)
num [1:6] 28.9 32.6 11.9 16.2 36.2 ...
I would like to have a plot as below:
http://imgur.com/snPOVW5
The plot is not from R, but other software. I would like the same plot to be from R. Appreciate any help.
thanks.
I have tried with below code and it works
df <- data.frame(TestSystems, Availability, Utilization)
df.long<-melt(df)
p <- ggplot(df.long,aes(TestSystems, value,fill=variable))+ geom_bar(stat="identity",position="dodge")
how do I display those 12 values at the top of each bar?