blank value istead in a mutiply subquery - sql

I have this SQL in Sybase to get the cumulative value of of a quetity mulitiplied by the price but it gives blank values when I want it to show real values.
Here is my code :
SELECT Pmu.IdVal, Pmu.IdInt, Pmu.IdNumEcrPpal, Pmu.IdSensOpe, Pmu.DtEcr, Pmu.QteEcr, Pmu.PrixAcquis,
(SELECT SUM(p1.QteEcr)
FROM casimir.dbo.Pmu p1
WHERE p1.IdNumEcrPpal < Pmu.IdNumEcrPpal and p1.IdInt = Pmu.IdInt) AS QCP,
(SELECT SUM(p2.QteEcr * p2.PrixAcquis)
FROM casimir.dbo.Pmu p2
WHERE p2.IdNumEcrPpal < Pmu.IdNumEcrPpal and p2.IdInt = Pmu.IdInt) AS PRUP
FROM casimir.dbo.Pmu Pmu
where IdInt = 1733
order by IdNumEcrPpal
Here is the result:
IdVal * IdInt * IdNumEcrPpal * QteEcr * PrixAcquis * QCP PRUP
650 1733 1074292 69 0.00 {null} {null}
650 1733 1165538 6 0.00 69 0.00
650 1733 1618644 7 0.00 75 0.00
650 1733 1934483 10 0.00 82 0.00
650 1733 1934484 1 0.00 92 0.00
650 1733 2140552 93 0.00 93 0.00
650 1733 2506329 200 0.00 186 0.00
650 1733 2515839 100 0.00 386 0.00
650 1733 2520087 110 0.00 486 0.00
650 1733 2572565 400 0.00 596 0.00
650 1733 2581126 1 0.00 996 0.00
650 1733 2858466 56 0.00 997 0.00
650 1733 2907483 6 0.00 1053 0.00
650 1733 3227255 7 0.00 1059 0.00
650 1733 3440560 173 0.00 1066 0.00
650 1733 3440727 67 0.00 1239 0.00
650 1733 3467592 100 0.00 1306 0.00
650 1733 3482135 100 188.00 1406 0.00
650 1733 3483475 30 185.35 1506
650 1733 3491124 350 0.00 1536
650 1733 3717502 70 0.00 1886
650 1733 3717503 4 0.00 1956
650 1733 4046744 20 65.44 1960
650 1733 4047669 200 0.00 1980
650 1733 4059311 150 67.12 2180
650 1733 4101861 200 0.00 2330
650 1733 4118371 36 0.00 2530
650 1733 4118372 3 0.00 2566
The column PRUP gives me the right value but give blank values after.
Any idea

You need to use ISNULL function
SELECT Pmu.IdVal, Pmu.IdInt, Pmu.IdNumEcrPpal, Pmu.IdSensOpe, Pmu.DtEcr, Pmu.QteEcr, Pmu.PrixAcquis,
isnull((SELECT SUM(p1.QteEcr)
FROM casimir.dbo.Pmu p1
WHERE p1.IdNumEcrPpal < Pmu.IdNumEcrPpal and p1.IdInt = Pmu.IdInt),0) AS QCP,
isnull((SELECT SUM(isnull(p2.QteEcr,1) * isnull(p2.PrixAcquis,1))
FROM casimir.dbo.Pmu p2
WHERE p2.IdNumEcrPpal < Pmu.IdNumEcrPpal and p2.IdInt = Pmu.IdInt),0) AS PRUP
FROM casimir.dbo.Pmu Pmu
where IdInt = 1733
order by IdNumEcrPpal
You can also use coalesce function instead.

I edited my query. Please also refer the link http://sqlfiddle.com/#!2/c48ec/27
SELECT Pmu.IdVal, Pmu.IdInt, Pmu.IdNumEcrPpal, Pmu.QteEcr, Pmu.PrixAcquis
,
(SELECT SUM(p1.QteEcr)
FROM Pmu p1
WHERE p1.IdNumEcrPpal < Pmu.IdNumEcrPpal and p1.IdInt = Pmu.IdInt) AS QCP,
coalesce ((SELECT SUM( coalesce (p2.QteEcr,0) * coalesce (p2.PrixAcquis,0))
FROM Pmu p2
WHERE p2.IdNumEcrPpal < Pmu.IdNumEcrPpal and p2.IdInt = Pmu.IdInt),0) AS PRUP
FROM Pmu Pmu
where IdInt = 1733
order by IdNumEcrPpal

Related

Jedis benchmarking on local Redis server

I'm using JMH to test the performance of Jedis on a local Redis server (Jedis version 2.9.0, Redis version 6.2.6, CPU Quad-Core Intel Core i5). I use 200 threads to send SET command within a connection pool.
#State(Scope.Benchmark)
public class CommonClientBenchmark {
private JedisPool jedisPool;
private final String host = "127.0.0.1";
private final int port = 6379;
#Setup
public void setup() {
JedisPoolConfig jedisPoolConfig = new JedisPoolConfig();
jedisPoolConfig.setMaxTotal(200);
jedisPoolConfig.setMaxIdle(200);
jedisPool = new JedisPool(jedisPoolConfig, host, port, 30000);
}
#TearDown
public void tearDown() {
jedisPool.close();
}
#Threads(200)
#Fork(1)
#Benchmark
#BenchmarkMode(Mode.Throughput)
#Warmup(iterations = 1, time = 30, timeUnit = TimeUnit.SECONDS)
#Measurement(iterations = 2, time = 30, timeUnit = TimeUnit.SECONDS)
public void jedisSet() {
try (Jedis jedis = jedisPool.getResource()) {
jedis.set("jedis", "jedis");
}
}
public static void main(String[] args) throws IOException, RunnerException {
CommonClientBenchmark commonClientBenchmark = new CommonClientBenchmark();
commonClientBenchmark.setup();
org.openjdk.jmh.Main.main(args);
}
}
With the code above, I obtain about 25000+ QPS. However, when I decrease the maxTotal and maxIdle parameter of the connection pool from 200 to 100, the result QPS is even much higher - it reaches about 75000. Could anyone explain the phenomenon? Thanks a lot!
EDIT: I've change the version of Jedis to 4.1.1 and run multiple benchmarking tests, the result is similar. When the size of connection pool is set to 100 (both maxTotal and maxIdle), I obtain about 25000 ~ 50000 QPS. When I increase the size (both maxTotal and maxIdle) to 200, the QPS rise to 60000 ~ 75000.
I've also use iostat 1 to monitor the usage of CPU while running the tests. And I found that when the pool size is set to 200, the %system is often much higher than when it is set to 100.
connection pool size set to 200:
disk0 cpu load average
KB/t tps MB/s us sy id 1m 5m 15m
4.09 929 3.71 7 87 6 39.58 14.44 8.06
4.00 902 3.52 6 89 5 39.58 14.44 8.06
4.50 8 0.04 5 88 6 38.33 14.60 8.15
4.39 145 0.62 6 89 6 38.33 14.60 8.15
28.00 11 0.30 6 88 5 38.33 14.60 8.15
8.00 1 0.01 5 88 6 38.33 14.60 8.15
0.00 0 0.00 5 88 7 38.33 14.60 8.15
4.00 5 0.02 5 88 7 38.94 15.12 8.37
0.00 0 0.00 5 89 6 38.94 15.12 8.37
0.00 0 0.00 5 88 7 38.94 15.12 8.37
0.00 0 0.00 5 89 6 38.94 15.12 8.37
8.68 222 1.88 5 88 7 38.94 15.12 8.37
5.60 10 0.05 5 87 8 45.20 16.81 9.01
29.65 46 1.33 11 82 7 45.20 16.81 9.01
52.57 7 0.36 8 85 7 45.20 16.81 9.01
28.00 2 0.05 5 87 8 45.20 16.81 9.01
223.33 6 1.31 6 87 7 45.20 16.81 9.01
4.19 1344 5.49 8 85 7 44.54 17.15 9.17
4.61 952 4.29 6 89 5 44.54 17.15 9.17
4.00 690 2.69 6 89 5 44.54 17.15 9.17
connection pool size set to 100:
disk0 cpu load average
KB/t tps MB/s us sy id 1m 5m 15m
4.31 13 0.05 16 59 26 6.55 7.86 7.49
750.67 3 2.20 30 53 17 6.58 7.85 7.48
9.14 225 2.01 23 54 23 6.58 7.85 7.48
37.00 8 0.29 23 56 21 6.58 7.85 7.48
32.00 6 0.19 18 55 26 6.58 7.85 7.48
145.20 10 1.41 22 56 22 6.58 7.85 7.48
0.00 0 0.00 22 56 22 6.46 7.80 7.47
4.00 2660 10.39 24 58 18 6.46 7.80 7.47
4.00 1952 7.62 19 56 25 6.46 7.80 7.47
4.00 1 0.00 19 56 24 6.46 7.80 7.47
4.00 5 0.02 18 56 27 6.46 7.80 7.47
0.00 0 0.00 15 57 28 6.10 7.71 7.44
0.00 0 0.00 18 57 25 6.10 7.71 7.44
256.00 10 2.50 18 56 25 6.10 7.71 7.44
6.29 7 0.04 20 57 23 6.10 7.71 7.44
4.00 5 0.02 20 56 24 6.10 7.71 7.44
17.71 7 0.12 20 56 24 6.01 7.66 7.42
23.00 4 0.09 20 58 23 6.01 7.66 7.42
5.00 4 0.02 23 55 22 6.01 7.66 7.42
4.00 1 0.00 20 56 24 6.01 7.66 7.42

Heat Map with DataFrame

I have a pandas data frame of the form
State RF LOG KNN MLP DT LDA AB
0 AR 0.95 0.87 0.81 0.89 0.81 0.84 0.87
1 FL 0.83 0.86 0.85 0.86 0.89 0.82 0.85
2 NJ 0.89 0.81 0.88 0.83 0.89 0.84 0.83
3 NV 0.77 0.72 0.89 0.79 0.79 0.73 0.70
4 TX 0.71 0.70 0.71 0.77 0.70 0.70 0.92
5 CA 0.69 0.81 0.81 0.88 0.88 0.60 0.89
How could I make a heat map, for example, on Seaborn, that in the X-axis has the names of the columns: [RF, LOG, KNN, MLP, DT, LDA, AB], in the Y-axis the names of the column State [AR, FL, NJ, NV, TX, CA], and the corresponding values, displayed in the squares, are the "heat" indicators?
If you index the columns of the states, you can draw a heat map directly.
import pandas as pd
import numpy as np
import io
import seaborn as sns
sns.set_theme()
data = '''
State RF LOG KNN MLP DT LDA AB
0 AR 0.95 0.87 0.81 0.89 0.81 0.84 0.87
1 FL 0.83 0.86 0.85 0.86 0.89 0.82 0.85
2 NJ 0.89 0.81 0.88 0.83 0.89 0.84 0.83
3 NV 0.77 0.72 0.89 0.79 0.79 0.73 0.70
4 TX 0.71 0.70 0.71 0.77 0.70 0.70 0.92
5 CA 0.69 0.81 0.81 0.88 0.88 0.60 0.89
'''
df = pd.read_csv(io.StringIO(data), delim_whitespace=True)
df.set_index('State', inplace=True)
ax = sns.heatmap(df)

count the number of occurrences of a number greater than 1 in a row

I have table in the following format. There exist >500 column and 113 rows where column1 is the identifier. I want to have only those identifier for which >90% of the entry values are greater than 1 i.e for the A1 if >90% of the values is greater than 1 than i want to print the total number of entries greater than 1 in the the last column and retain it. Any suggestion please.
Id M1 M2 M3 M4 M5 M6
A1 0.82 0.73 1.40 0.52 1.84 3.20
A2 14.44 23.73 55.27 68.77 14.18 0.05
A3 5.56 5.69 10.46 10.55 7.49 7.77
A4 1.06 3.62 1.68 1.38 1.90 6.64
A5 0.01 0.00 0.03 0.01 0.00 0.07
A6 0.07 0.72 27.68 19.70 2.33 0.00
A7 5.57 8.95 18.71 6.75 16.76 33.66
A8 0.86 2.30 1.65 0.92 2.01 0.92
A9 20.21 25.59 25.86 21.62 26.75 24.66
A10 28.05 28.26 22.48 27.41 32.28 26.94
A11 0.22 0.83 7.39 5.88 2.05 9.27
A12 13.90 19.43 28.51 25.48 21.44 29.24
A13 15.43 18.39 12.49 14.75 15.79 10.85
A14 3.92 13.00 14.13 8.18 13.92 23.83
A15 0.06 0.02 0.01 0.01 0.04 0.03
A16 0.99 2.46 6.08 4.56 3.81 3.43
A17 1.31 2.05 3.18 1.73 2.80 4.12
A18 3.60 7.90 8.57 5.56 7.18 12.20
A19 44.82 47.53 37.16 42.20 41.51 26.33
A20 1.59 2.88 2.55 3.05 3.08 2.88
I have very limited knowledge. I know how to count exact match with this awk '$0=$0OFS NF-1' FS=1.40 but not for greater or less condition.
I primarily want the output in the following format, where last column indicate number of entries >1.
Id M1 M2 M3 M4 M5 M6
A1 0.82 0.73 1.40 0.52 1.84 3.20 3
A2 14.44 23.73 55.27 68.77 14.18 0.05 5
A3 5.56 5.69 10.46 10.55 7.49 7.77 6
A4 1.06 3.62 1.68 1.38 1.90 6.64 6
A5 0.01 0.00 0.03 0.01 0.00 0.07 0
A6 0.07 0.72 27.68 19.70 2.33 0.00 3
A7 5.57 8.95 18.71 6.75 16.76 33.66 6
A8 0.86 2.30 1.65 0.92 2.01 0.92 3
A9 20.21 25.59 25.86 21.62 26.75 24.66 6
A10 28.05 28.26 22.48 27.41 32.28 26.94 6
A11 0.22 0.83 7.39 5.88 2.05 9.27 4
A12 13.90 19.43 28.51 25.48 21.44 29.24 6
A13 15.43 18.39 12.49 14.75 15.79 10.85 6
A14 3.92 13.00 14.13 8.18 13.92 23.83 6
A15 0.06 0.02 0.01 0.01 0.04 0.03 0
A16 0.99 2.46 6.08 4.56 3.81 3.43 5
A17 1.31 2.05 3.18 1.73 2.80 4.12 6
A18 3.60 7.90 8.57 5.56 7.18 12.20 6
A19 44.82 47.53 37.16 42.20 41.51 26.33 6
A20 1.59 2.88 2.55 3.05 3.08 2.88 6
$ awk '{for(i=1;i<=NF;i++) {if($i+0>1) c++; printf "%-5s%s", $i, (i==NF? OFS c ORS: OFS)}c=0}' file
Id M1 M2 M3 M4 M5 M6
A1 0.82 0.73 1.40 0.52 1.84 3.20 3
A2 14.44 23.73 55.27 68.77 14.18 0.05 5
A3 5.56 5.69 10.46 10.55 7.49 7.77 6
A4 1.06 3.62 1.68 1.38 1.90 6.64 6
A5 0.01 0.00 0.03 0.01 0.00 0.07 0
A6 0.07 0.72 27.68 19.70 2.33 0.00 3
A7 5.57 8.95 18.71 6.75 16.76 33.66 6
A8 0.86 2.30 1.65 0.92 2.01 0.92 3
A9 20.21 25.59 25.86 21.62 26.75 24.66 6
A10 28.05 28.26 22.48 27.41 32.28 26.94 6
A11 0.22 0.83 7.39 5.88 2.05 9.27 4
A12 13.90 19.43 28.51 25.48 21.44 29.24 6
A13 15.43 18.39 12.49 14.75 15.79 10.85 6
A14 3.92 13.00 14.13 8.18 13.92 23.83 6
A15 0.06 0.02 0.01 0.01 0.04 0.03 0
A16 0.99 2.46 6.08 4.56 3.81 3.43 5
A17 1.31 2.05 3.18 1.73 2.80 4.12 6
A18 3.60 7.90 8.57 5.56 7.18 12.20 6
A19 44.82 47.53 37.16 42.20 41.51 26.33 6
A20 1.59 2.88 2.55 3.05 3.08 2.88 6
.
{
for(i=1;i<=NF;i++) { # for each field
if($i+0>1) c++ # if field > 1, count
printf "%-5s%s", $i, (i==NF? OFS c ORS: OFS) # output nicely
}
c=0 # reset counter
}
$ awk 'NR>1{$0=$0"\t"NF-gsub(/^.|[[:space:]]0\./,"&")} 1' file
Id M1 M2 M3 M4 M5 M6
A1 0.82 0.73 1.40 0.52 1.84 3.20 3
A2 14.44 23.73 55.27 68.77 14.18 0.05 5
A3 5.56 5.69 10.46 10.55 7.49 7.77 6
A4 1.06 3.62 1.68 1.38 1.90 6.64 6
A5 0.01 0.00 0.03 0.01 0.00 0.07 0
A6 0.07 0.72 27.68 19.70 2.33 0.00 3
A7 5.57 8.95 18.71 6.75 16.76 33.66 6
A8 0.86 2.30 1.65 0.92 2.01 0.92 3
A9 20.21 25.59 25.86 21.62 26.75 24.66 6
A10 28.05 28.26 22.48 27.41 32.28 26.94 6
A11 0.22 0.83 7.39 5.88 2.05 9.27 4
A12 13.90 19.43 28.51 25.48 21.44 29.24 6
A13 15.43 18.39 12.49 14.75 15.79 10.85 6
A14 3.92 13.00 14.13 8.18 13.92 23.83 6
A15 0.06 0.02 0.01 0.01 0.04 0.03 0
A16 0.99 2.46 6.08 4.56 3.81 3.43 5
A17 1.31 2.05 3.18 1.73 2.80 4.12 6
A18 3.60 7.90 8.57 5.56 7.18 12.20 6
A19 44.82 47.53 37.16 42.20 41.51 26.33 6
A20 1.59 2.88 2.55 3.05 3.08 2.88 6
The gsub() returns the count of times it could match it's regexp which is the first character in the line, ^. or any numbers starting with 0. so matches counts every number on the line except numbers that start with 1. or greater. Then just subtract the gsub() return value from the total number of fields NF to get the count of numbers greater than 1 on each line.

Multiple Regression, Minitab or pandas

I have some data which I want to run multiple regression on.
1- is multiple regression the right analysis for this problem
2- can someone guide me on how to do this in pandas or Minitab using the data set below
Here is a sample of the data which is for 100 random sales personnel.
The output metric is the amount of revenue per interaction each person has (this can be negative if a customer cancels a sale within 90 days).
The input metrics are the number of sales per unit type out of 100 interactions. Obviously, the more units sold per interaction (3 types of units) the more revenue would be earned per interaction. How can I account for the relationship between these 3 unit type metrics and my output metric? I'd want to be able to say if X1 is 0.75 and X2 is 1.0 and X3 is 0.25 then my Y will be a specific value.
Right now we are driving each metric individually without accounting for their interactions and dependencies which seems inefficient for predicting potential performance.
Person Y X1 X2 X3
1 ($0.81) 0.43 0.54 0.00
2 $3.75 0.67 1.11 0.11
3 $1.76 0.23 0.70 0.00
4 $2.38 0.87 1.24 0.00
5 $5.06 0.62 1.11 0.37
6 $5.35 0.63 1.13 0.25
7 $2.94 0.64 0.76 0.00
8 $2.84 0.51 0.64 0.00
9 $0.35 0.00 0.90 0.00
10 $2.61 0.53 0.92 0.00
11 ($0.31) 0.40 0.27 0.13
12 $4.78 0.41 0.81 0.00
13 $2.76 0.54 1.09 0.00
14 $5.25 0.82 1.09 0.00
15 $2.23 0.14 0.82 0.14
16 $1.45 0.42 0.84 0.00
17 $3.14 0.28 0.99 0.00
18 $4.21 0.71 0.71 0.71
19 $1.33 0.57 0.57 0.00
20 $2.78 0.58 1.01 0.00
21 $1.71 0.29 1.15 0.00
22 $4.43 0.44 0.73 0.15
23 $4.74 0.73 1.17 0.00
24 $1.30 0.44 0.44 0.00
25 $2.68 0.59 0.74 0.15
26 $1.84 0.30 0.74 0.00
27 $3.88 0.74 1.33 0.00
28 $2.11 0.30 0.74 0.00
29 $4.50 0.30 0.60 0.00
30 $3.46 0.60 1.05 0.00
31 $4.07 0.30 1.20 0.00
32 $3.50 0.90 1.20 0.00
33 $1.21 0.30 0.45 0.00
34 $2.55 0.45 0.60 0.15
35 $4.06 0.76 1.06 0.00
36 $0.44 0.46 0.61 0.00
37 $2.00 0.76 0.46 0.00
38 $0.33 0.15 0.77 0.00
39 $2.24 0.61 0.92 0.00
40 $2.81 0.77 1.54 0.00
41 $1.12 0.00 0.31 0.00
42 $1.30 0.15 0.46 0.31
43 $3.05 0.31 1.69 -0.15
44 $3.59 0.62 0.92 0.00
45 $3.17 0.62 1.39 0.00
46 $0.99 0.31 0.00 0.00
47 $2.00 0.63 0.63 0.47
48 $3.90 0.78 1.10 0.00
49 ($0.26) 0.00 0.32 0.00
50 $5.81 0.48 0.95 0.00
51 $1.91 0.16 0.16 0.00
52 $0.55 0.00 0.48 0.00
53 $1.26 0.32 0.64 0.16
54 $2.63 0.80 0.96 0.00
55 $4.00 0.96 1.28 0.00
56 $6.55 0.96 1.59 0.00
57 $1.85 -0.16 0.32 0.32
58 $4.40 1.12 1.60 0.00
59 $0.78 0.32 0.16 0.16
60 $2.33 0.64 0.48 0.16
61 $4.33 0.32 0.97 0.00
62 $2.73 0.97 1.45 0.16
63 $0.89 0.16 0.32 0.00
64 $1.24 0.16 0.32 0.00
65 $2.38 0.33 0.33 0.00
66 $2.97 0.33 0.82 0.00
67 $4.17 0.33 0.82 0.82
68 $1.79 0.33 0.49 0.00
69 $4.14 0.49 0.82 0.00
70 ($0.02) 0.33 0.99 0.00
71 $4.54 0.33 0.83 0.00
72 $3.31 0.50 0.83 0.00
73 $4.71 0.50 1.17 0.00
74 $2.54 0.50 1.01 0.17
75 $2.82 0.34 0.68 0.00
76 $1.76 0.17 0.68 0.00
77 $0.42 0.17 0.34 0.00
78 $2.46 0.51 0.51 0.00
79 $2.75 0.34 0.34 0.00
80 $2.09 0.35 0.69 0.17
81 $3.11 0.52 1.04 0.00
82 $0.79 0.17 0.70 0.00
83 $3.55 0.70 0.87 0.00
84 $0.81 0.52 1.22 0.00
85 $2.50 0.53 0.70 -0.18
86 $4.38 0.35 1.23 0.00
87 $0.59 0.53 0.88 0.00
88 $0.75 0.00 0.35 0.00
89 $2.03 0.18 0.18 0.00
90 $2.33 0.18 0.18 0.00
91 $3.20 0.18 0.36 0.53
92 $0.01 0.00 0.36 0.00
93 $1.97 0.90 0.72 1.08
94 $2.26 0.54 1.44 0.00
95 $4.85 1.09 2.72 0.00
96 $1.05 0.18 0.91 0.00
97 $1.15 0.18 0.18 0.00
98 $3.10 1.09 1.28 0.00
99 $3.11 0.37 1.10 0.00
100 $0.33 -0.18 0.00 0.18

Extracting the second last line from a table using a specific number followed by an asterisk (e.g. xy.z*)

I'm looking to extract and print a specific line from a table I have in a long log file. It looks something like this:
******************************************************************************
XSCALE (VERSION July 4, 2012) 4-Jun-2013
******************************************************************************
Author: Wolfgang Kabsch
Copy licensed until 30-Jun-2013 to
academic users for non-commercial applications
No redistribution.
******************************************************************************
CONTROL CARDS
******************************************************************************
MAXIMUM_NUMBER_OF_PROCESSORS=16
RESOLUTION_SHELLS= 20 10 6 4 3 2.5 2.0 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1.0 0.9 0.8
MINIMUM_I/SIGMA=4.0
OUTPUT_FILE=fae-ip.ahkl
INPUT_FILE= /dls/sci-scratch/Sam/FC59251/fr6_1/XDS_ASCII.HKL
THE DATA COLLECTION STATISTICS REPORTED BELOW ASSUMES:
SPACE_GROUP_NUMBER= 97
UNIT_CELL_CONSTANTS= 128.28 128.28 181.47 90.000 90.000 90.000
***** 16 EQUIVALENT POSITIONS IN SPACE GROUP # 97 *****
If x',y',z' is an equivalent position to x,y,z, then
x'=x*ML(1)+y*ML( 2)+z*ML( 3)+ML( 4)/12.0
y'=x*ML(5)+y*ML( 6)+z*ML( 7)+ML( 8)/12.0
z'=x*ML(9)+y*ML(10)+z*ML(11)+ML(12)/12.0
# 1 2 3 4 5 6 7 8 9 10 11 12
1 1 0 0 0 0 1 0 0 0 0 1 0
2 -1 0 0 0 0 -1 0 0 0 0 1 0
3 -1 0 0 0 0 1 0 0 0 0 -1 0
4 1 0 0 0 0 -1 0 0 0 0 -1 0
5 0 1 0 0 1 0 0 0 0 0 -1 0
6 0 -1 0 0 -1 0 0 0 0 0 -1 0
7 0 -1 0 0 1 0 0 0 0 0 1 0
8 0 1 0 0 -1 0 0 0 0 0 1 0
9 1 0 0 6 0 1 0 6 0 0 1 6
10 -1 0 0 6 0 -1 0 6 0 0 1 6
11 -1 0 0 6 0 1 0 6 0 0 -1 6
12 1 0 0 6 0 -1 0 6 0 0 -1 6
13 0 1 0 6 1 0 0 6 0 0 -1 6
14 0 -1 0 6 -1 0 0 6 0 0 -1 6
15 0 -1 0 6 1 0 0 6 0 0 1 6
16 0 1 0 6 -1 0 0 6 0 0 1 6
ALL DATA SETS WILL BE SCALED TO /dls/sci-scratch/Sam/FC59251/fr6_1/XDS_ASCII.HKL
******************************************************************************
READING INPUT REFLECTION DATA FILES
******************************************************************************
DATA MEAN REFLECTIONS INPUT FILE NAME
SET# INTENSITY ACCEPTED REJECTED
1 0.1358E+03 1579957 0 /dls/sci-scratch/Sam/FC59251/fr6_1/XDS_ASCII.HKL
******************************************************************************
CORRECTION FACTORS AS FUNCTION OF IMAGE NUMBER & RESOLUTION
******************************************************************************
RECIPROCAL CORRECTION FACTORS FOR INPUT DATA SETS MERGED TO
OUTPUT FILE: fae-ip.ahkl
THE CALCULATIONS ASSUME FRIEDEL'S_LAW= TRUE
TOTAL NUMBER OF CORRECTION FACTORS DEFINED 720
DEGREES OF FREEDOM OF CHI^2 FIT 357222.9
CHI^2-VALUE OF FIT OF CORRECTION FACTORS 1.024
NUMBER OF CYCLES CARRIED OUT 4
CORRECTION FACTORS for visual inspection by XDS-Viewer DECAY_001.cbf
XMIN= 0.6 XMAX= 1799.3 NXBIN= 36
YMIN= 0.00049 YMAX= 0.44483 NYBIN= 20
NUMBER OF REFLECTIONS USED FOR DETERMINING CORRECTION FACTORS 396046
******************************************************************************
CORRECTION FACTORS AS FUNCTION OF X (fast) & Y(slow) IN THE DETECTOR PLANE
******************************************************************************
RECIPROCAL CORRECTION FACTORS FOR INPUT DATA SETS MERGED TO
OUTPUT FILE: fae-ip.ahkl
THE CALCULATIONS ASSUME FRIEDEL'S_LAW= TRUE
TOTAL NUMBER OF CORRECTION FACTORS DEFINED 7921
DEGREES OF FREEDOM OF CHI^2 FIT 356720.6
CHI^2-VALUE OF FIT OF CORRECTION FACTORS 1.023
NUMBER OF CYCLES CARRIED OUT 3
CORRECTION FACTORS for visual inspection by XDS-Viewer MODPIX_001.cbf
XMIN= 5.4 XMAX= 2457.6 NXBIN= 89
YMIN= 40.0 YMAX= 2516.7 NYBIN= 89
NUMBER OF REFLECTIONS USED FOR DETERMINING CORRECTION FACTORS 396046
******************************************************************************
CORRECTION FACTORS AS FUNCTION OF IMAGE NUMBER & DETECTOR SURFACE POSITION
******************************************************************************
RECIPROCAL CORRECTION FACTORS FOR INPUT DATA SETS MERGED TO
OUTPUT FILE: fae-ip.ahkl
THE CALCULATIONS ASSUME FRIEDEL'S_LAW= TRUE
TOTAL NUMBER OF CORRECTION FACTORS DEFINED 468
DEGREES OF FREEDOM OF CHI^2 FIT 357286.9
CHI^2-VALUE OF FIT OF CORRECTION FACTORS 1.022
NUMBER OF CYCLES CARRIED OUT 3
CORRECTION FACTORS for visual inspection by XDS-Viewer ABSORP_001.cbf
XMIN= 0.6 XMAX= 1799.3 NXBIN= 36
DETECTOR_SURFACE_POSITION= 1232 1278
DETECTOR_SURFACE_POSITION= 1648 1699
DETECTOR_SURFACE_POSITION= 815 1699
DETECTOR_SURFACE_POSITION= 815 858
DETECTOR_SURFACE_POSITION= 1648 858
DETECTOR_SURFACE_POSITION= 2174 1673
DETECTOR_SURFACE_POSITION= 1622 2230
DETECTOR_SURFACE_POSITION= 841 2230
DETECTOR_SURFACE_POSITION= 289 1673
DETECTOR_SURFACE_POSITION= 289 884
DETECTOR_SURFACE_POSITION= 841 326
DETECTOR_SURFACE_POSITION= 1622 326
DETECTOR_SURFACE_POSITION= 2174 884
NUMBER OF REFLECTIONS USED FOR DETERMINING CORRECTION FACTORS 396046
******************************************************************************
CORRECTION PARAMETERS FOR THE STANDARD ERROR OF REFLECTION INTENSITIES
******************************************************************************
The variance v0(I) of the intensity I obtained from counting statistics is
replaced by v(I)=a*(v0(I)+b*I^2). The model parameters a, b are chosen to
minimize the discrepancies between v(I) and the variance estimated from
sample statistics of symmetry related reflections. This model implicates
an asymptotic limit ISa=1/SQRT(a*b) for the highest I/Sigma(I) that the
experimental setup can produce (Diederichs (2010) Acta Cryst D66, 733-740).
Often the value of ISa is reduced from the initial value ISa0 due to systematic
errors showing up by comparison with other data sets in the scaling procedure.
(ISa=ISa0=-1 if v0 is unknown for a data set.)
a b ISa ISa0 INPUT DATA SET
1.086E+00 1.420E-03 25.46 29.00 /dls/sci-scratch/Sam/FC59251/fr6_1/XDS_ASCII.HKL
FACTOR TO PLACE ALL DATA SETS TO AN APPROXIMATE ABSOLUTE SCALE 0.4178E+04
(ASSUMING A PROTEIN WITH 50% SOLVENT)
******************************************************************************
STATISTICS OF SCALED OUTPUT DATA SET : fae-ip.ahkl
FILE TYPE: XDS_ASCII MERGE=FALSE FRIEDEL'S_LAW=TRUE
186 OUT OF 1579957 REFLECTIONS REJECTED
1579771 REFLECTIONS ON OUTPUT FILE
******************************************************************************
DEFINITIONS:
R-FACTOR
observed = (SUM(ABS(I(h,i)-I(h))))/(SUM(I(h,i)))
expected = expected R-FACTOR derived from Sigma(I)
COMPARED = number of reflections used for calculating R-FACTOR
I/SIGMA = mean of intensity/Sigma(I) of unique reflections
(after merging symmetry-related observations)
Sigma(I) = standard deviation of reflection intensity I
estimated from sample statistics
R-meas = redundancy independent R-factor (intensities)
Diederichs & Karplus (1997), Nature Struct. Biol. 4, 269-275.
CC(1/2) = percentage of correlation between intensities from
random half-datasets. Correlation significant at
the 0.1% level is marked by an asterisk.
Karplus & Diederichs (2012), Science 336, 1030-33
Anomal = percentage of correlation between random half-sets
Corr of anomalous intensity differences. Correlation
significant at the 0.1% level is marked.
SigAno = mean anomalous difference in units of its estimated
standard deviation (|F(+)-F(-)|/Sigma). F(+), F(-)
are structure factor estimates obtained from the
merged intensity observations in each parity class.
Nano = Number of unique reflections used to calculate
Anomal_Corr & SigAno. At least two observations
for each (+ and -) parity are required.
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA R-meas CC(1/2) Anomal SigAno Nano
LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr
20.00 557 66 74 89.2% 2.7% 3.0% 557 58.75 2.9% 100.0* 45 1.674 25
10.00 5018 417 417 100.0% 2.4% 3.1% 5018 75.34 2.6% 100.0* 2 0.812 276
6.00 18352 1583 1584 99.9% 2.8% 3.3% 18351 65.55 2.9% 100.0* 11* 0.914 1248
4.00 59691 4640 4640 100.0% 3.2% 3.5% 59690 64.96 3.4% 100.0* 4 0.857 3987
3.00 112106 8821 8822 100.0% 4.4% 4.4% 112102 50.31 4.6% 99.9* -3 0.844 7906
2.50 147954 11023 11023 100.0% 8.7% 8.6% 147954 29.91 9.1% 99.8* 0 0.829 10096
2.00 332952 24698 24698 100.0% 21.4% 21.6% 332949 14.32 22.3% 99.2* 1 0.804 22992
1.90 106645 8382 8384 100.0% 56.5% 57.1% 106645 5.63 58.8% 94.7* -2 0.767 7886
1.80 138516 10342 10343 100.0% 86.8% 87.0% 138516 3.64 90.2% 87.9* -2 0.762 9741
1.70 175117 12897 12899 100.0% 140.0% 140.1% 175116 2.15 145.4% 69.6* -2 0.732 12188
1.60 209398 16298 16304 100.0% 206.1% 208.5% 209397 1.35 214.6% 48.9* -2 0.693 15466
1.50 273432 20770 20893 99.4% 333.4% 342.1% 273340 0.80 346.9% 23.2* -1 0.644 19495
1.40 33 27 27248 0.1% 42.6% 112.7% 12 0.40 60.3% 88.2 0 0.000 0
1.30 0 0 36205 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
1.20 0 0 49238 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
1.10 0 0 68746 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
1.00 0 0 98884 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
0.90 0 0 147505 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
0.80 0 0 230396 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
total 1579771 119964 778303 15.4% 12.8% 13.1% 1579647 14.33 13.4% 99.9* -1 0.755 111306
========== STATISTICS OF INPUT DATA SET ==========
R-FACTORS FOR INTENSITIES OF DATA SET /dls/sci-scratch/Sam/FC59251/fr6_1/XDS_ASCII.HKL
RESOLUTION R-FACTOR R-FACTOR COMPARED
LIMIT observed expected
20.00 2.7% 3.0% 557
10.00 2.4% 3.1% 5018
6.00 2.8% 3.3% 18351
4.00 3.2% 3.5% 59690
3.00 4.4% 4.4% 112102
2.50 8.7% 8.6% 147954
2.00 21.4% 21.6% 332949
1.90 56.5% 57.1% 106645
1.80 86.8% 87.0% 138516
1.70 140.0% 140.1% 175116
1.60 206.1% 208.5% 209397
1.50 333.4% 342.1% 273340
1.40 42.6% 112.7% 12
1.30 -99.9% -99.9% 0
1.20 -99.9% -99.9% 0
1.10 -99.9% -99.9% 0
1.00 -99.9% -99.9% 0
0.90 -99.9% -99.9% 0
0.80 -99.9% -99.9% 0
total 12.8% 13.1% 1579647
******************************************************************************
WILSON STATISTICS OF SCALED DATA SET: fae-ip.ahkl
******************************************************************************
Data is divided into resolution shells and a straight line
A - 2*B*SS is fitted to log<I>, where
RES = mean resolution (Angstrom) in shell
SS = mean of (sin(THETA)/LAMBDA)**2 in shell
<I> = mean reflection intensity in shell
BO = (A - log<I>)/(2*SS)
# = number of reflections in resolution shell
WILSON LINE (using all data) : A= 14.997 B= 29.252 CORRELATION= 0.99
# RES SS <I> log(<I>) BO
1667 8.445 0.004 2.3084E+06 14.652 49.2
2798 5.260 0.009 1.5365E+06 14.245 41.6
3547 4.106 0.015 2.0110E+06 14.514 16.3
4147 3.480 0.021 1.2910E+06 14.071 22.4
4688 3.073 0.026 7.3586E+05 13.509 28.1
5154 2.781 0.032 4.6124E+05 13.042 30.3
5568 2.560 0.038 3.1507E+05 12.661 30.6
5966 2.384 0.044 2.4858E+05 12.424 29.2
6324 2.240 0.050 1.8968E+05 12.153 28.5
6707 2.119 0.056 1.3930E+05 11.844 28.3
7030 2.016 0.062 9.1378E+04 11.423 29.0
7331 1.926 0.067 5.4413E+04 10.904 30.4
7664 1.848 0.073 3.5484E+04 10.477 30.9
7934 1.778 0.079 2.4332E+04 10.100 31.0
8193 1.716 0.085 1.8373E+04 9.819 30.5
8466 1.660 0.091 1.4992E+04 9.615 29.7
8743 1.609 0.097 1.1894E+04 9.384 29.1
9037 1.562 0.102 9.4284E+03 9.151 28.5
9001 1.520 0.108 8.3217E+03 9.027 27.6
HIGHER ORDER MOMENTS OF WILSON DISTRIBUTION OF CENTRIC DATA
AS COMPARED WITH THEORETICAL VALUES. (EXPECTED: 1.00)
# RES <I**2>/ <I**3>/ <I**4>/
3<I>**2 15<I>**3 105<I>**4
440 8.445 0.740 0.505 0.294
442 5.260 0.762 0.733 0.735
442 4.106 0.888 0.788 0.717
439 3.480 1.339 1.733 2.278
438 3.073 1.168 1.259 1.400
440 2.781 1.215 1.681 2.269
438 2.560 1.192 1.603 2.405
450 2.384 1.117 1.031 0.891
432 2.240 1.214 1.567 2.173
438 2.119 0.972 0.992 0.933
445 2.016 1.029 1.019 0.986
441 1.926 1.603 1.701 1.554
440 1.848 1.544 1.871 2.076
436 1.778 0.927 0.661 0.435
444 1.716 1.134 1.115 1.197
440 1.660 1.271 1.618 2.890
436 1.609 1.424 1.045 0.941
448 1.562 1.794 1.447 1.423
426 1.520 2.517 1.496 2.099
8355 overall 1.253 1.255 1.455
HIGHER ORDER MOMENTS OF WILSON DISTRIBUTION OF ACENTRIC DATA
AS COMPARED WITH THEORETICAL VALUES. (EXPECTED: 1.00)
# RES <I**2>/ <I**3>/ <I**4>/
2<I>**2 6<I>**3 24<I>**4
1227 8.445 1.322 1.803 2.340
2356 5.260 1.167 1.420 1.789
3105 4.106 1.010 1.046 1.100
3708 3.480 1.055 1.262 1.592
4250 3.073 0.999 1.083 1.375
4714 2.781 1.061 1.232 1.591
5130 2.560 1.049 1.178 1.440
5516 2.384 1.025 1.117 1.290
5892 2.240 1.001 1.058 1.230
6269 2.119 1.060 1.140 1.233
6585 2.016 1.109 1.344 1.709
6890 1.926 1.028 1.100 1.222
7224 1.848 1.060 1.150 1.348
7498 1.778 1.143 1.309 1.655
7749 1.716 1.182 1.299 1.549
8026 1.660 1.286 1.376 1.538
8307 1.609 1.419 1.481 1.707
8589 1.562 1.663 1.750 2.119
8575 1.520 2.271 2.172 5.088
111610 overall 1.253 1.354 1.804
======= CUMULATIVE INTENSITY DISTRIBUTION =======
DEFINITIONS:
<I> = mean reflection intensity
Na(Z)exp = expected number of acentric reflections with I <= Z*<I>
Na(Z)obs = observed number of acentric reflections with I <= Z*<I>
Nc(Z)exp = expected number of centric reflections with I <= Z*<I>
Nc(Z)obs = observed number of centric reflections with I <= Z*<I>
Nc(Z)obs/Nc(Z)exp versus resolution and Z (0.1-1.0)
# RES 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
440 8.445 0.75 0.95 0.98 1.00 0.98 0.99 1.00 1.00 1.02 1.02
442 5.260 1.18 1.11 1.09 1.09 1.07 1.08 1.08 1.08 1.07 1.06
442 4.106 0.97 1.01 0.98 0.97 0.96 0.94 0.92 0.91 0.92 0.94
439 3.480 0.91 0.88 0.91 0.91 0.89 0.90 0.90 0.89 0.89 0.93
438 3.073 0.92 0.92 0.90 0.93 0.94 0.99 1.02 0.99 0.96 0.96
440 2.781 0.98 1.01 1.02 1.05 1.04 1.03 1.04 1.02 1.01 1.01
438 2.560 1.02 1.10 1.05 1.03 1.01 1.03 1.04 1.01 1.04 1.02
450 2.384 0.78 0.93 0.92 0.93 0.89 0.89 0.92 0.95 0.96 0.95
432 2.240 0.69 0.82 0.84 0.86 0.91 0.92 0.93 0.94 0.95 0.95
438 2.119 0.75 0.87 0.95 1.02 1.09 1.09 1.12 1.12 1.10 1.08
445 2.016 0.86 0.86 0.87 0.90 0.91 0.93 0.98 0.99 1.00 1.00
441 1.926 0.88 0.79 0.79 0.81 0.82 0.84 0.85 0.85 0.86 0.86
440 1.848 1.00 0.89 0.85 0.83 0.85 0.85 0.88 0.90 0.90 0.92
436 1.778 1.03 0.87 0.79 0.79 0.80 0.84 0.85 0.87 0.90 0.92
444 1.716 1.09 0.85 0.81 0.78 0.80 0.80 0.81 0.81 0.84 0.85
440 1.660 1.27 1.01 0.93 0.88 0.85 0.84 0.84 0.85 0.88 0.91
436 1.609 1.34 1.00 0.89 0.83 0.80 0.80 0.80 0.81 0.80 0.83
448 1.562 1.39 1.09 0.93 0.86 0.81 0.78 0.77 0.79 0.78 0.78
426 1.520 1.38 1.03 0.88 0.83 0.82 0.80 0.78 0.76 0.75 0.74
8355 overall 1.01 0.95 0.92 0.91 0.91 0.91 0.92 0.92 0.93 0.93
Na(Z)obs/Na(Z)exp versus resolution and Z (0.1-1.0)
# RES 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
1227 8.445 1.10 1.22 1.21 1.21 1.14 1.10 1.12 1.10 1.11 1.09
2356 5.260 1.15 1.10 1.09 1.03 1.03 1.03 1.01 1.01 1.01 1.00
3105 4.106 0.91 0.96 0.99 1.01 1.02 1.00 1.00 0.99 0.99 1.00
3708 3.480 0.93 0.97 1.00 1.06 1.05 1.04 1.04 1.04 1.04 1.05
4250 3.073 0.94 1.02 1.01 1.00 1.01 1.00 1.00 1.01 1.02 1.02
4714 2.781 1.11 1.04 1.02 1.02 1.02 1.01 1.01 1.01 1.00 1.00
5130 2.560 1.00 1.10 1.06 1.03 1.01 1.02 1.01 1.01 1.01 1.02
5516 2.384 1.09 1.08 1.05 1.04 1.04 1.02 1.01 1.01 1.01 1.01
5892 2.240 0.98 0.99 1.00 1.01 1.01 1.01 1.00 1.00 1.00 1.00
6269 2.119 1.14 1.04 1.02 1.00 1.00 1.00 1.01 1.02 1.02 1.01
6585 2.016 1.17 1.02 1.01 1.02 1.02 1.03 1.02 1.02 1.02 1.02
6890 1.926 1.35 1.07 1.00 0.99 1.00 1.01 1.01 1.00 1.00 1.01
7224 1.848 1.52 1.11 1.01 0.97 0.96 0.98 0.98 0.98 0.98 0.99
7498 1.778 1.80 1.22 1.03 0.97 0.95 0.94 0.95 0.95 0.95 0.96
7749 1.716 2.01 1.28 1.07 0.99 0.94 0.92 0.92 0.92 0.93 0.93
8026 1.660 2.31 1.41 1.13 1.01 0.95 0.92 0.90 0.89 0.89 0.89
8307 1.609 2.62 1.54 1.19 1.04 0.95 0.90 0.88 0.87 0.86 0.87
8589 1.562 2.94 1.69 1.29 1.10 1.00 0.93 0.89 0.86 0.85 0.85
8575 1.520 3.14 1.78 1.34 1.13 1.01 0.93 0.88 0.85 0.83 0.83
111610 overall 1.73 1.24 1.09 1.03 0.99 0.97 0.96 0.96 0.96 0.96
List of 33 reflections *NOT* obeying Wilson distribution (Z> 10.0)
h k l RES Z Intensity Sigma
72 11 61 1.52 17.34 0.2886E+06 0.2367E+05 "alien"
67 53 6 1.50 15.85 0.2638E+06 0.1128E+06 "alien"
35 10 25 3.17 14.39 0.2118E+08 0.2364E+06 "alien"
46 17 99 1.50 14.16 0.2357E+06 0.9588E+05 "alien"
34 32 2 2.75 13.44 0.1239E+08 0.1279E+06 "alien"
79 6 15 1.60 13.10 0.3117E+06 0.2477E+05 "alien"
61 20 33 1.88 12.54 0.8900E+06 0.3054E+05 "alien"
44 4 48 2.30 12.38 0.4695E+07 0.6072E+05 "alien"
66 25 19 1.79 11.89 0.5788E+06 0.2739E+05 "alien"
66 25 11 1.81 11.88 0.5781E+06 0.2771E+05 "alien"
60 43 61 1.50 11.77 0.1959E+06 0.9769E+05 "alien"
72 11 17 1.74 11.64 0.4278E+06 0.2619E+05 "alien"
80 24 26 1.50 11.41 0.1899E+06 0.9793E+05 "alien"
41 21 26 2.59 11.09 0.6988E+07 0.7945E+05 "alien"
44 18 20 2.59 11.08 0.6982E+07 0.7839E+05 "alien"
23 3 62 2.59 11.06 0.6971E+07 0.9154E+05 "alien"
69 7 22 1.80 11.06 0.5383E+06 0.2564E+05 "alien"
73 10 15 1.72 10.98 0.4036E+06 0.2356E+05 "alien"
70 17 35 1.68 10.96 0.3286E+06 0.2415E+05 "alien"
57 24 41 1.88 10.91 0.7746E+06 0.2842E+05 "alien"
82 24 6 1.50 10.74 0.1787E+06 0.1019E+06 "alien"
69 25 62 1.50 10.67 0.1775E+06 0.8689E+05 "alien"
24 20 44 2.91 10.45 0.9641E+07 0.1017E+06 "alien"
66 43 5 1.63 10.37 0.2468E+06 0.2294E+05 "alien"
81 4 29 1.53 10.36 0.1725E+06 0.2364E+05 "alien"
60 40 26 1.72 10.32 0.3792E+06 0.2578E+05 "alien"
39 18 57 2.18 10.24 0.3885E+07 0.5573E+05 "alien"
70 41 15 1.57 10.19 0.1922E+06 0.2281E+05 "alien"
55 36 41 1.79 10.16 0.4942E+06 0.2967E+05 "alien"
37 4 81 1.88 10.15 0.7202E+06 0.3357E+05 "alien"
56 27 5 2.06 10.14 0.1854E+07 0.3569E+05 "alien"
44 39 29 2.06 10.09 0.1844E+07 0.3805E+05 "alien"
65 46 29 1.56 10.06 0.1898E+06 0.2270E+05 "alien"
List of 33 reflections *NOT* obeying Wilson distribution (sorted by resolution)
Ice rings could occur at (Angstrom):
3.897,3.669,3.441, 2.671,2.249,2.072, 1.948,1.918,1.883,1.721
h k l RES Z Intensity Sigma
82 24 6 1.50 10.74 0.1787E+06 0.1019E+06
67 53 6 1.50 15.85 0.2638E+06 0.1128E+06
80 24 26 1.50 11.41 0.1899E+06 0.9793E+05
60 43 61 1.50 11.77 0.1959E+06 0.9769E+05
69 25 62 1.50 10.67 0.1775E+06 0.8689E+05
46 17 99 1.50 14.16 0.2357E+06 0.9588E+05
72 11 61 1.52 17.34 0.2886E+06 0.2367E+05
81 4 29 1.53 10.36 0.1725E+06 0.2364E+05
65 46 29 1.56 10.06 0.1898E+06 0.2270E+05
70 41 15 1.57 10.19 0.1922E+06 0.2281E+05
79 6 15 1.60 13.10 0.3117E+06 0.2477E+05
66 43 5 1.63 10.37 0.2468E+06 0.2294E+05
70 17 35 1.68 10.96 0.3286E+06 0.2415E+05
73 10 15 1.72 10.98 0.4036E+06 0.2356E+05
60 40 26 1.72 10.32 0.3792E+06 0.2578E+05
72 11 17 1.74 11.64 0.4278E+06 0.2619E+05
66 25 19 1.79 11.89 0.5788E+06 0.2739E+05
55 36 41 1.79 10.16 0.4942E+06 0.2967E+05
69 7 22 1.80 11.06 0.5383E+06 0.2564E+05
66 25 11 1.81 11.88 0.5781E+06 0.2771E+05
61 20 33 1.88 12.54 0.8900E+06 0.3054E+05
57 24 41 1.88 10.91 0.7746E+06 0.2842E+05
37 4 81 1.88 10.15 0.7202E+06 0.3357E+05
56 27 5 2.06 10.14 0.1854E+07 0.3569E+05
44 39 29 2.06 10.09 0.1844E+07 0.3805E+05
39 18 57 2.18 10.24 0.3885E+07 0.5573E+05
44 4 48 2.30 12.38 0.4695E+07 0.6072E+05
44 18 20 2.59 11.08 0.6982E+07 0.7839E+05
41 21 26 2.59 11.09 0.6988E+07 0.7945E+05
23 3 62 2.59 11.06 0.6971E+07 0.9154E+05
34 32 2 2.75 13.44 0.1239E+08 0.1279E+06
24 20 44 2.91 10.45 0.9641E+07 0.1017E+06
35 10 25 3.17 14.39 0.2118E+08 0.2364E+06
cpu time used by XSCALE 25.9 sec
elapsed wall-clock time 28.1 sec
I would like to extract the second last line where the 11th column has a number followed by an asterisk (xy.z*). E.g. in this table the line I'm looking for would contain "23.2*" from the 11th column (CC(1/2)). I would like the second last because the last would be the line that starts with total, and this was a lot easier to extract with a simple grep command.
So the expected output for the code in this case would be to print the line:
1.50 273432 20770 20893 99.4% 333.4% 342.1% 273340 0.80 346.9% 23.2* -1 0.644 19495
In a different file the second last value in the 11th with an asterisk after may correspond to 1.6 in the first column so the expected output would be:
1.60 216910 5769 5769 100.0% 207.5% 214.7% 216910 1.72 210.4% 26.0* -3 0.654 5204
And so on for all the different possible positions of the asterisk in the table.
I've tried using things like grep "[0-9, 0-9, ., 0-9*]" file.name and various other grep and fgrep things but I'm pretty new to this and can't get it to work.
Any help would be greatly appreciated.
Sam
GNU sed
(for your updated script)
sed -n '/LIMIT/,/=/{/^\s*\(\S*\s*\)\{10\}[0-9.-]*\*/H;x;s/^.*\n\(.*\n.*\)$/\1/;x;/=/{x;P;q}}' file
.. output is:
1.50 273432 20770 20893 99.4% 333.4% 342.1% 273340 0.80 346.9% 23.2* -1 0.644 19495
To print the entire second last line which matches that regex, you can do something like this:
awk '$11~/[0-9.]+\*/{secondlast=last;last=$0}END{print secondlast}' logFile
This one liner can do it:
$ awk '{if ($11 ~ /\*/) {i++; a[i]=$0}} END {print a[i -1]}' file
1.50 274090 20781 20874 99.6% 333.7% 341.9% 274015 0.80 347.1% 24.8* 0 0.645 19516
Explanation
It add to the array a[] all lines that contain * the 11th field. Then prints not the last but the previous one.
Update
Since your log is very big and asterisks appear all around, I update my code to:
$ awk '{if ($11 == /[0-9]*.[0-9]*\*/) {i++; a[i]=$0}} END {print a[i -1]}' a
0.90 0 0 147505 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
so it looks for lines with NNN.XXX* format.
awk '$11~/^[0-9.]+\*$/ {prev=val; val=$11+0} END {print prev}' log
I add 0 to the value of $11 to convert the string "23.2*" to the number 23.2.
Alternately, when I hear "nth from the end", I think: reverse it and take the nth from the top:
tac log | awk '$11~/^[0-9.]+\*$/ && ++n == 2 {print $11+0; exit}'