Heat Map with DataFrame - pandas

I have a pandas data frame of the form
State RF LOG KNN MLP DT LDA AB
0 AR 0.95 0.87 0.81 0.89 0.81 0.84 0.87
1 FL 0.83 0.86 0.85 0.86 0.89 0.82 0.85
2 NJ 0.89 0.81 0.88 0.83 0.89 0.84 0.83
3 NV 0.77 0.72 0.89 0.79 0.79 0.73 0.70
4 TX 0.71 0.70 0.71 0.77 0.70 0.70 0.92
5 CA 0.69 0.81 0.81 0.88 0.88 0.60 0.89
How could I make a heat map, for example, on Seaborn, that in the X-axis has the names of the columns: [RF, LOG, KNN, MLP, DT, LDA, AB], in the Y-axis the names of the column State [AR, FL, NJ, NV, TX, CA], and the corresponding values, displayed in the squares, are the "heat" indicators?

If you index the columns of the states, you can draw a heat map directly.
import pandas as pd
import numpy as np
import io
import seaborn as sns
sns.set_theme()
data = '''
State RF LOG KNN MLP DT LDA AB
0 AR 0.95 0.87 0.81 0.89 0.81 0.84 0.87
1 FL 0.83 0.86 0.85 0.86 0.89 0.82 0.85
2 NJ 0.89 0.81 0.88 0.83 0.89 0.84 0.83
3 NV 0.77 0.72 0.89 0.79 0.79 0.73 0.70
4 TX 0.71 0.70 0.71 0.77 0.70 0.70 0.92
5 CA 0.69 0.81 0.81 0.88 0.88 0.60 0.89
'''
df = pd.read_csv(io.StringIO(data), delim_whitespace=True)
df.set_index('State', inplace=True)
ax = sns.heatmap(df)

Related

Performance Issue with Tensoflow Hub and Universal Sentence Encoder

I am training Top2vec models using universal-sentence-encoder_4.
On larger data sets (e.g. > 500k rows), the code is dismally slow.
Here's the notebook:
https://github.com/ddangelov/Top2Vec/blob/master/notebooks/CORD-19_top2vec.ipynb
top2vec_trained = Top2Vec(documents=papers_filtered_df.text.tolist(), split_documents=True, embedding_batch_size=64, embedding_model="universal-sentence-encoder", use_embedding_model_tokenizer=True, embedding_model_path="/Users/davidlaxer/Downloads/universal-sentence-encoder_4", workers=8)
Here's the code snippet I profiled from Top2vec.py:
import cProfile, pstats, io
from pstats import SortKey
pr = cProfile.Profile()
pr.enable()
for ind in range(0, batches):
document_vectors.append(self.embed(train_corpus[current:current + batch_size]))
current += batch_size
if extra > 0:
document_vectors.append(self.embed(train_corpus[current:current + extra]))
document_vectors = self._l2_normalize(np.array(np.vstack(document_vectors)))
pr.disable()
s = io.StringIO()
sortby = SortKey.CUMULATIVE
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.print_stats()
print(s.getvalue())
Here's the profiler output:
3638257 function calls (3568425 primitive calls) in 154.779 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
6983 0.015 0.000 152.706 0.022 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py:743(_call_attribute)
6983 0.019 0.000 152.692 0.022 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py:138(error_handler)
6983 0.129 0.000 152.643 0.022 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py:887(__call__)
6983 0.086 0.000 152.281 0.022 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py:934(_call)
6983 0.068 0.000 152.188 0.022 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2951(__call__)
6983 0.081 0.000 149.721 0.021 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:1782(_call_flat)
6983 0.205 0.000 149.416 0.021 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:454(call)
6983 0.034 0.000 149.041 0.021 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/execute.py:29(quick_execute)
6983 148.998 0.021 148.998 0.021 {built-in method tensorflow.python._pywrap_tfe.TFE_Py_Execute}
6983 0.048 0.000 1.928 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:3219(_maybe_define_function)
6/4 0.148 0.025 1.925 0.481 {built-in method numpy.core._multiarray_umath.implement_array_function}
1 0.000 0.000 1.882 1.882 <__array_function__ internals>:2(vstack)
1 0.000 0.000 1.877 1.877 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/numpy/core/shape_base.py:223(vstack)
6987 1.833 0.000 1.833 0.000 {built-in method numpy.array}
1 0.000 0.000 1.736 1.736 <__array_function__ internals>:2(atleast_2d)
1 0.015 0.015 1.734 1.734 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/numpy/core/shape_base.py:82(atleast_2d)
6984 0.004 0.000 1.718 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/numpy/core/_asarray.py:110(asanyarray)
6983 0.046 0.000 1.381 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2648(canonicalize_function_inputs)
6983 0.080 0.000 1.311 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2817(_convert_inputs_to_signature)
6983 0.009 0.000 0.652 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/profiler/trace.py:178(wrapped)
6983 0.076 0.000 0.643 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:1636(convert_to_tensor)
6983 0.015 0.000 0.472 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2051(captured_inputs)
6983 0.005 0.000 0.405 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py:340(_constant_tensor_conversion_function)
6983 0.007 0.000 0.400 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py:170(constant)
6983 0.016 0.000 0.393 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py:271(_constant_impl)
6983 0.009 0.000 0.367 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py:302(_constant_eager_impl)
6983 0.343 0.000 0.359 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py:74(convert_to_eager_tensor)
27932 0.024 0.000 0.322 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/util/nest.py:353(flatten)
27932 0.298 0.000 0.298 0.000 {built-in method tensorflow.python.util._pywrap_utils.Flatten}
6983 0.027 0.000 0.275 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function_cache.py:250(make_cache_key)
6983 0.169 0.000 0.232 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2058(<listcomp>)
6983 0.014 0.000 0.219 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/util/nest.py:1157(flatten_up_to)
6983 0.006 0.000 0.181 0.000 {built-in method builtins.any}
13966 0.012 0.000 0.175 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/util/nest.py:689(pack_sequence_as)
13966 0.010 0.000 0.175 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2851(<genexpr>)
6983 0.051 0.000 0.172 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2196(_build_call_outputs)
6983 0.018 0.000 0.165 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/tensor_spec.py:124(is_compatible_with)
6983 0.016 0.000 0.164 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function_cache.py:154(lookup)
13966 0.032 0.000 0.163 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/util/nest.py:649(_pack_sequence_as)
13966/6983 0.058 0.000 0.158 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/util/nest.py:1029(assert_shallow_structure)
13966 0.044 0.000 0.151 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/ops/handle_data_util.py:25(copy_handle_data)
6983 0.031 0.000 0.146 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/tensor_spec.py:70(is_compatible_with)
1 0.000 0.000 0.141 0.141 <__array_function__ internals>:2(concatenate)
6983 0.030 0.000 0.137 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py:185(called_without_tracing)
13966 0.020 0.000 0.119 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function_cache.py:98(__eq__)
6983 0.051 0.000 0.116 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function_cache.py:265(_make_execution_context)
251402 0.060 0.000 0.112 0.000 {built-in method builtins.isinstance}
6983 0.014 0.000 0.106 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function_trace_type.py:314(make_function_signature)
1 0.000 0.000 0.099 0.099 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/top2vec/Top2Vec.py:812(_l2_normalize)
1 0.026 0.026 0.099 0.099 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/sklearn/preprocessing/_data.py:1731(normalize)
6983 0.023 0.000 0.093 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py:180(_get_detector)
6983 0.048 0.000 0.090 0.000 {built-in method tensorflow.python._pywrap_tfe.TFE_Py_EncodeArg}
48881 0.089 0.000 0.089 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:175(__eq__)
13966 0.016 0.000 0.082 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function_trace_type.py:201(__eq__)
6983 0.006 0.000 0.082 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:1534(_autopacking_conversion_function)
6983 0.011 0.000 0.075 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:1524(_should_not_autopack)
6983 0.033 0.000 0.075 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/util/nest.py:183(_sequence_like)
1256940 0.063 0.000 0.063 0.000 {built-in method builtins.callable}
69830/20949 0.045 0.000 0.062 0.000 {built-in method builtins.hash}
34915 0.062 0.000 0.062 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/context.py:927(executing_eagerly)
20949 0.007 0.000 0.061 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function_cache.py:95(__hash__)
6983 0.019 0.000 0.060 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:1320(shape)
1 0.000 0.000 0.059 0.059 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/sklearn/utils/validation.py:486(check_array)
6983 0.058 0.000 0.058 0.000 /Users/davidlaxer/anaconda3/lib/python3.8/weakref.py:422(__contains__)
6983 0.015 0.000 0.053 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/context.py:2087(executing_eagerly)
55866 0.016 0.000 0.052 0.000 /Users/davidlaxer/anaconda3/lib/python3.8/abc.py:96(__instancecheck__)
41898 0.039 0.000 0.050 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:1193(dtype)
41898 0.026 0.000 0.049 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/util/nest.py:253(_yield_value)
69830 0.049 0.000 0.049 0.000 {built-in method tensorflow.python.util._pywrap_utils.IsNestedOrComposite}
6983 0.005 0.000 0.048 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/util/nest.py:1242(<listcomp>)
13966 0.016 0.000 0.044 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:343(__eq__)
20949 0.006 0.000 0.043 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function_trace_type.py:210(__hash__)
27932/13966 0.027 0.000 0.042 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/util/nest.py:999(_yield_flat_up_to)
6983 0.020 0.000 0.039 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/tensor_spec.py:229(__tf_tracing_type__)
55866 0.035 0.000 0.035 0.000 {built-in method _abc._abc_instancecheck}
6983 0.011 0.000 0.035 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/tensor_shape.py:761(__init__)
6983 0.013 0.000 0.034 0.000 {built-in method builtins.all}
69830 0.029 0.000 0.032 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/util/nest.py:258(_yield_sorted_items)
13967 0.031 0.000 0.031 0.000 {built-in method builtins.getattr}
6983 0.006 0.000 0.030 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py:32(is_traceback_filtering_enabled)
20949 0.009 0.000 0.030 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:188(__ne__)
13966 0.016 0.000 0.029 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:6285(get_default_graph)
6983 0.013 0.000 0.029 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/util/nest.py:609(_packed_nest_with_indices)
1 0.000 0.000 0.029 0.029 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/sklearn/utils/validation.py:90(_assert_all_finite)
1 0.000 0.000 0.028 0.028 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/sklearn/utils/extmath.py:869(_safe_accumulator_op)
1 0.000 0.000 0.028 0.028 <__array_function__ internals>:2(sum)
1 0.000 0.000 0.028 0.028 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/numpy/core/fromnumeric.py:2111(sum)
1 0.000 0.000 0.028 0.028 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/numpy/core/fromnumeric.py:70(_wrapreduction)
1 0.028 0.028 0.028 0.028 {method 'reduce' of 'numpy.ufunc' objects}
20949 0.008 0.000 0.028 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:340(__hash__)
6983 0.013 0.000 0.024 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:155(is_compatible_with)
13966 0.023 0.000 0.024 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:678(as_dtype)
20949 0.009 0.000 0.024 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:63(base_dtype)
6983 0.013 0.000 0.023 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/tensor_shape.py:771(<listcomp>)
6983 0.005 0.000 0.022 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/util/nest.py:146(is_namedtuple)
118711 0.020 0.000 0.020 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:1530(<genexpr>)
104747 0.019 0.000 0.019 0.000 {built-in method builtins.len}
6983 0.014 0.000 0.019 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function_trace_type.py:61(__init__)
6983 0.017 0.000 0.017 0.000 {built-in method tensorflow.python.util._pywrap_utils.IsNamedtuple}
13966 0.010 0.000 0.015 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2968(input_signature)
6983 0.008 0.000 0.015 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:5418(_distribution_strategy_stack)
13966 0.015 0.000 0.015 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py:850(experimental_get_tracing_count)
20949 0.015 0.000 0.015 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:50(_is_ref_dtype)
6983 0.008 0.000 0.014 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/tensor_conversion_registry.py:112(get)
6983 0.010 0.000 0.014 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/six.py:582(iterkeys)
6983 0.013 0.000 0.014 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2865(<listcomp>)
6983 0.014 0.000 0.014 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py:162(called_without_tracing)
1 0.000 0.000 0.014 0.014 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/sklearn/utils/extmath.py:51(row_norms)
1 0.000 0.000 0.014 0.014 <__array_function__ internals>:2(einsum)
1 0.000 0.000 0.014 0.014 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/numpy/core/einsumfunc.py:997(einsum)
1 0.014 0.014 0.014 0.014 {built-in method numpy.core._multiarray_umath.c_einsum}
6983 0.006 0.000 0.014 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/tape.py:220(could_possibly_record)
13966 0.010 0.000 0.013 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function_trace_type.py:158(_has_same_structure)
6983 0.011 0.000 0.013 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:290(__init__)
13966 0.013 0.000 0.013 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:5814(get_default)
13966 0.012 0.000 0.012 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/context.py:949(device_name)
6983 0.011 0.000 0.011 0.000 /Users/davidlaxer/anaconda3/lib/python3.8/weakref.py:382(__getitem__)
6983 0.011 0.000 0.011 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/context.py:1203(function_call_options)
6983 0.008 0.000 0.011 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/saved_model/save_context.py:59(in_save_context)
41898 0.011 0.000 0.011 0.000 {method '_datatype_enum' of 'tensorflow.python.framework.ops.EagerTensor' objects}
6983 0.005 0.000 0.011 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/ops/gradients_util.py:1016(PossibleTapeGradientTypes)
6983 0.010 0.000 0.010 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py:539(__exit__)
13976 0.010 0.000 0.010 0.000 {built-in method builtins.hasattr}
6983 0.007 0.000 0.009 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/tensor_shape.py:1107(is_compatible_with)
6983 0.009 0.000 0.009 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/tensor_shape.py:196(__init__)
6983 0.009 0.000 0.009 0.000 {built-in method tensorflow.python.util._pywrap_utils.IsTensor}
6983 0.004 0.000 0.008 0.000 <string>:1(__new__)
6983 0.008 0.000 0.008 0.000 {built-in method tensorflow.python._pywrap_tfe.TFE_Py_TapeSetIsEmpty}
13966 0.007 0.000 0.007 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/context.py:546(ensure_initialized)
20949 0.007 0.000 0.007 0.000 {built-in method tensorflow.python.util._pywrap_utils.IsCompositeTensor}
6983 0.006 0.000 0.007 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2973(flat_input_signature)
13966 0.007 0.000 0.007 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/tensor_spec.py:60(dtype)
6983 0.005 0.000 0.007 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2640(_validate_inputs)
6983 0.006 0.000 0.006 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/profiler/trace.py:50(__init__)
6983 0.006 0.000 0.006 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function_cache.py:60(__init__)
6983 0.006 0.000 0.006 0.000 {method '_shape_tuple' of 'tensorflow.python.framework.ops.EagerTensor' objects}
6983 0.006 0.000 0.006 0.000 {built-in method tensorflow.python.util._pywrap_utils.IsMutableMapping}
6983 0.006 0.000 0.006 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function_trace_type.py:38(__init__)
6983 0.006 0.000 0.006 0.000 {built-in method tensorflow.python._pywrap_tfe.TFE_Py_TapeSetPossibleGradientTypes}
27932 0.005 0.000 0.005 0.000 {method 'append' of 'list' objects}
6983 0.005 0.000 0.005 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/framework/c_api_util.py:100(has_been_garbage_collected)
27932 0.005 0.000 0.005 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/context.py:2057(context_safe)
13966 0.005 0.000 0.005 0.000 {built-in method tensorflow.python.util._pywrap_utils.IsTypeSpec}
6983 0.005 0.000 0.005 0.000 {method 'acquire' of '_thread.RLock' objects}
13966 0.005 0.000 0.005 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2552(input_signature)
6983 0.005 0.000 0.005 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/function.py:216(__exit__)
13966 0.005 0.000 0.005 0.000 /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages/tensorflow/python/eager/context.py:885(_handle)
...
When I increased 'enmbedding_batch_size=256' it crashed OS X 12.3.1
panic(cpu 8 caller 0xffffff8020d449ad): userspace watchdog timeout: no successful checkins from WindowServer in 120 seconds
service: logd, total successful checkins since wake (127621 seconds ago): 12763, last successful checkin: 0 seconds ago
service: WindowServer, total successful checkins since wake (127621 seconds ago): 12751, last successful checkin: 120 seconds ago
service: remoted, total successful checkins since wake (127621 seconds ago): 12763, last successful checkin: 0 seconds ago
service: opendirectoryd, total successful checkins since wake (127621 seconds ago): 12763, last successful checkin: 0 seconds ago
Panicked task 0xffffff95dce7f9c0: 3 threads: pid 133: watchdogd
Backtrace (CPU 8), panicked thread: 0xffffff87770d6550, Frame : Return Address
0xffffffd15babb670 : 0xffffff801d883e2d
0xffffffd15babb6c0 : 0xffffff801d9e3cb6
0xffffffd15babb700 : 0xffffff801d9d350d
0xffffffd15babb750 : 0xffffff801d823a60
0xffffffd15babb770 : 0xffffff801d8841fd
0xffffffd15babb890 : 0xffffff801d8839b6
0xffffffd15babb8f0 : 0xffffff801e116573
0xffffffd15babb9e0 : 0xffffff8020d449ad
0xffffffd15babb9f0 : 0xffffff8020d44600
0xffffffd15babba10 : 0xffffff8020d439c7
0xffffffd15babbb40 : 0xffffff801e0848cc
0xffffffd15babbca0 : 0xffffff801d98a1a6
0xffffffd15babbdb0 : 0xffffff801d88a6db
0xffffffd15babbe10 : 0xffffff801d85ed03
0xffffffd15babbe60 : 0xffffff801d875259
0xffffffd15babbef0 : 0xffffff801d9b61a8
0xffffffd15babbfa0 : 0xffffff801d824246
Kernel Extensions in backtrace:
com.apple.driver.watchdog(1.0)[01A90A91-CE41-37C4-A5C0-BBD735087472]#0xffffff8020d42000->0xffffff8020d44fff
Process name corresponding to current thread (0xffffff87770d6550): watchdogd
Boot args: chunklist-security-epoch=0 -chunklist-no-rev2-dev
Mac OS version:
21E258
Kernel version:
Darwin Kernel Version 21.4.0: Fri Mar 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64
Kernel UUID: B6F8637B-0844-355F-8C82-60FA06149384
KernelCache slide: 0x000000001d600000
KernelCache base: 0xffffff801d800000
Kernel slide: 0x000000001d610000
Kernel text base: 0xffffff801d810000
__HIB text base: 0xffffff801d700000
System model name: iMac20,2 (Mac-AF89B6D9451A490B)
System shutdown begun: NO
Hibernation exit count: 0
System uptime in nanoseconds: 580860938653267
Last Sleep: absolute base_tsc base_nano
Uptime : 0x0002104a39e99311
Sleep : 0x00014c05bb0263eb 0x00000000b1a4f4fc 0x00014bf3f0948456
Wake : 0x00014c05c311c37a 0x00000000b156c318 0x00014c05bfe0324e
Compressor Info: 32% of compressed pages limit (OK) and 99% of segments limit (BAD) with 8 swapfiles and OK swap space
Zone info:
Foreign : 0xffffff80354ab000 - 0xffffff80354b9000
Native : 0xffffff81102b8000 - 0xffffffa1102b8000
Readonly: 0xffffff85dcf84000 - 0xffffff877691d000
Metadata: 0xffffffe9d1f0a000 - 0xffffffe9f2cb9000
Bitmaps : 0xffffffe9f2cb9000 - 0xffffffea12cb9000
last started kext at 356398810041: #filesystems.afpfs 11.3.1 (addr 0xffffff7fb6c29000, size 282624)
last stopped kext at 580792842657981: >!AThunderboltEDMSink 5.0.3 (addr 0xffffff7fb65c3000, size 32768)
loaded kexts:
#filesystems.afpfs 11.3.1
#nke.asp_tcp 8.2.1
>!AHIDALSService 1
>!APlatformEnabler 2.7.0d0
>AGPM 127
>X86PlatformShim 1.0.0
>!AUpstreamUserClient 3.6.9
...
Tensorflow-macos & Tensorflow-metal
% pip show tensorflow-macos
WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages)
Name: tensorflow-macos
Version: 2.8.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages#tensorflow.org
License: Apache 2.0
Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages
Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, keras, keras-preprocessing, libclang, numpy, opt-einsum, protobuf, setuptools, six, tensorboard, termcolor, tf-estimator-nightly, typing-extensions, wrapt
Required-by:
(tensorflow-metal) (base) davidlaxer#x86_64-apple-darwin13 top2vec % pip show tensorflow-metal
WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages)
Name: tensorflow-metal
Version: 0.4.0
Summary: TensorFlow acceleration for Mac GPUs.
Home-page: https://developer.apple.com/metal/tensorflow-plugin/
Author:
Author-email:
License: MIT License. Copyright © 2020-2021 Apple Inc. All rights reserved.
Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages
Requires: six, wheel
Required-by:
Any suggestions on how to increase performance?

count the number of occurrences of a number greater than 1 in a row

I have table in the following format. There exist >500 column and 113 rows where column1 is the identifier. I want to have only those identifier for which >90% of the entry values are greater than 1 i.e for the A1 if >90% of the values is greater than 1 than i want to print the total number of entries greater than 1 in the the last column and retain it. Any suggestion please.
Id M1 M2 M3 M4 M5 M6
A1 0.82 0.73 1.40 0.52 1.84 3.20
A2 14.44 23.73 55.27 68.77 14.18 0.05
A3 5.56 5.69 10.46 10.55 7.49 7.77
A4 1.06 3.62 1.68 1.38 1.90 6.64
A5 0.01 0.00 0.03 0.01 0.00 0.07
A6 0.07 0.72 27.68 19.70 2.33 0.00
A7 5.57 8.95 18.71 6.75 16.76 33.66
A8 0.86 2.30 1.65 0.92 2.01 0.92
A9 20.21 25.59 25.86 21.62 26.75 24.66
A10 28.05 28.26 22.48 27.41 32.28 26.94
A11 0.22 0.83 7.39 5.88 2.05 9.27
A12 13.90 19.43 28.51 25.48 21.44 29.24
A13 15.43 18.39 12.49 14.75 15.79 10.85
A14 3.92 13.00 14.13 8.18 13.92 23.83
A15 0.06 0.02 0.01 0.01 0.04 0.03
A16 0.99 2.46 6.08 4.56 3.81 3.43
A17 1.31 2.05 3.18 1.73 2.80 4.12
A18 3.60 7.90 8.57 5.56 7.18 12.20
A19 44.82 47.53 37.16 42.20 41.51 26.33
A20 1.59 2.88 2.55 3.05 3.08 2.88
I have very limited knowledge. I know how to count exact match with this awk '$0=$0OFS NF-1' FS=1.40 but not for greater or less condition.
I primarily want the output in the following format, where last column indicate number of entries >1.
Id M1 M2 M3 M4 M5 M6
A1 0.82 0.73 1.40 0.52 1.84 3.20 3
A2 14.44 23.73 55.27 68.77 14.18 0.05 5
A3 5.56 5.69 10.46 10.55 7.49 7.77 6
A4 1.06 3.62 1.68 1.38 1.90 6.64 6
A5 0.01 0.00 0.03 0.01 0.00 0.07 0
A6 0.07 0.72 27.68 19.70 2.33 0.00 3
A7 5.57 8.95 18.71 6.75 16.76 33.66 6
A8 0.86 2.30 1.65 0.92 2.01 0.92 3
A9 20.21 25.59 25.86 21.62 26.75 24.66 6
A10 28.05 28.26 22.48 27.41 32.28 26.94 6
A11 0.22 0.83 7.39 5.88 2.05 9.27 4
A12 13.90 19.43 28.51 25.48 21.44 29.24 6
A13 15.43 18.39 12.49 14.75 15.79 10.85 6
A14 3.92 13.00 14.13 8.18 13.92 23.83 6
A15 0.06 0.02 0.01 0.01 0.04 0.03 0
A16 0.99 2.46 6.08 4.56 3.81 3.43 5
A17 1.31 2.05 3.18 1.73 2.80 4.12 6
A18 3.60 7.90 8.57 5.56 7.18 12.20 6
A19 44.82 47.53 37.16 42.20 41.51 26.33 6
A20 1.59 2.88 2.55 3.05 3.08 2.88 6
$ awk '{for(i=1;i<=NF;i++) {if($i+0>1) c++; printf "%-5s%s", $i, (i==NF? OFS c ORS: OFS)}c=0}' file
Id M1 M2 M3 M4 M5 M6
A1 0.82 0.73 1.40 0.52 1.84 3.20 3
A2 14.44 23.73 55.27 68.77 14.18 0.05 5
A3 5.56 5.69 10.46 10.55 7.49 7.77 6
A4 1.06 3.62 1.68 1.38 1.90 6.64 6
A5 0.01 0.00 0.03 0.01 0.00 0.07 0
A6 0.07 0.72 27.68 19.70 2.33 0.00 3
A7 5.57 8.95 18.71 6.75 16.76 33.66 6
A8 0.86 2.30 1.65 0.92 2.01 0.92 3
A9 20.21 25.59 25.86 21.62 26.75 24.66 6
A10 28.05 28.26 22.48 27.41 32.28 26.94 6
A11 0.22 0.83 7.39 5.88 2.05 9.27 4
A12 13.90 19.43 28.51 25.48 21.44 29.24 6
A13 15.43 18.39 12.49 14.75 15.79 10.85 6
A14 3.92 13.00 14.13 8.18 13.92 23.83 6
A15 0.06 0.02 0.01 0.01 0.04 0.03 0
A16 0.99 2.46 6.08 4.56 3.81 3.43 5
A17 1.31 2.05 3.18 1.73 2.80 4.12 6
A18 3.60 7.90 8.57 5.56 7.18 12.20 6
A19 44.82 47.53 37.16 42.20 41.51 26.33 6
A20 1.59 2.88 2.55 3.05 3.08 2.88 6
.
{
for(i=1;i<=NF;i++) { # for each field
if($i+0>1) c++ # if field > 1, count
printf "%-5s%s", $i, (i==NF? OFS c ORS: OFS) # output nicely
}
c=0 # reset counter
}
$ awk 'NR>1{$0=$0"\t"NF-gsub(/^.|[[:space:]]0\./,"&")} 1' file
Id M1 M2 M3 M4 M5 M6
A1 0.82 0.73 1.40 0.52 1.84 3.20 3
A2 14.44 23.73 55.27 68.77 14.18 0.05 5
A3 5.56 5.69 10.46 10.55 7.49 7.77 6
A4 1.06 3.62 1.68 1.38 1.90 6.64 6
A5 0.01 0.00 0.03 0.01 0.00 0.07 0
A6 0.07 0.72 27.68 19.70 2.33 0.00 3
A7 5.57 8.95 18.71 6.75 16.76 33.66 6
A8 0.86 2.30 1.65 0.92 2.01 0.92 3
A9 20.21 25.59 25.86 21.62 26.75 24.66 6
A10 28.05 28.26 22.48 27.41 32.28 26.94 6
A11 0.22 0.83 7.39 5.88 2.05 9.27 4
A12 13.90 19.43 28.51 25.48 21.44 29.24 6
A13 15.43 18.39 12.49 14.75 15.79 10.85 6
A14 3.92 13.00 14.13 8.18 13.92 23.83 6
A15 0.06 0.02 0.01 0.01 0.04 0.03 0
A16 0.99 2.46 6.08 4.56 3.81 3.43 5
A17 1.31 2.05 3.18 1.73 2.80 4.12 6
A18 3.60 7.90 8.57 5.56 7.18 12.20 6
A19 44.82 47.53 37.16 42.20 41.51 26.33 6
A20 1.59 2.88 2.55 3.05 3.08 2.88 6
The gsub() returns the count of times it could match it's regexp which is the first character in the line, ^. or any numbers starting with 0. so matches counts every number on the line except numbers that start with 1. or greater. Then just subtract the gsub() return value from the total number of fields NF to get the count of numbers greater than 1 on each line.

Multiple Regression, Minitab or pandas

I have some data which I want to run multiple regression on.
1- is multiple regression the right analysis for this problem
2- can someone guide me on how to do this in pandas or Minitab using the data set below
Here is a sample of the data which is for 100 random sales personnel.
The output metric is the amount of revenue per interaction each person has (this can be negative if a customer cancels a sale within 90 days).
The input metrics are the number of sales per unit type out of 100 interactions. Obviously, the more units sold per interaction (3 types of units) the more revenue would be earned per interaction. How can I account for the relationship between these 3 unit type metrics and my output metric? I'd want to be able to say if X1 is 0.75 and X2 is 1.0 and X3 is 0.25 then my Y will be a specific value.
Right now we are driving each metric individually without accounting for their interactions and dependencies which seems inefficient for predicting potential performance.
Person Y X1 X2 X3
1 ($0.81) 0.43 0.54 0.00
2 $3.75 0.67 1.11 0.11
3 $1.76 0.23 0.70 0.00
4 $2.38 0.87 1.24 0.00
5 $5.06 0.62 1.11 0.37
6 $5.35 0.63 1.13 0.25
7 $2.94 0.64 0.76 0.00
8 $2.84 0.51 0.64 0.00
9 $0.35 0.00 0.90 0.00
10 $2.61 0.53 0.92 0.00
11 ($0.31) 0.40 0.27 0.13
12 $4.78 0.41 0.81 0.00
13 $2.76 0.54 1.09 0.00
14 $5.25 0.82 1.09 0.00
15 $2.23 0.14 0.82 0.14
16 $1.45 0.42 0.84 0.00
17 $3.14 0.28 0.99 0.00
18 $4.21 0.71 0.71 0.71
19 $1.33 0.57 0.57 0.00
20 $2.78 0.58 1.01 0.00
21 $1.71 0.29 1.15 0.00
22 $4.43 0.44 0.73 0.15
23 $4.74 0.73 1.17 0.00
24 $1.30 0.44 0.44 0.00
25 $2.68 0.59 0.74 0.15
26 $1.84 0.30 0.74 0.00
27 $3.88 0.74 1.33 0.00
28 $2.11 0.30 0.74 0.00
29 $4.50 0.30 0.60 0.00
30 $3.46 0.60 1.05 0.00
31 $4.07 0.30 1.20 0.00
32 $3.50 0.90 1.20 0.00
33 $1.21 0.30 0.45 0.00
34 $2.55 0.45 0.60 0.15
35 $4.06 0.76 1.06 0.00
36 $0.44 0.46 0.61 0.00
37 $2.00 0.76 0.46 0.00
38 $0.33 0.15 0.77 0.00
39 $2.24 0.61 0.92 0.00
40 $2.81 0.77 1.54 0.00
41 $1.12 0.00 0.31 0.00
42 $1.30 0.15 0.46 0.31
43 $3.05 0.31 1.69 -0.15
44 $3.59 0.62 0.92 0.00
45 $3.17 0.62 1.39 0.00
46 $0.99 0.31 0.00 0.00
47 $2.00 0.63 0.63 0.47
48 $3.90 0.78 1.10 0.00
49 ($0.26) 0.00 0.32 0.00
50 $5.81 0.48 0.95 0.00
51 $1.91 0.16 0.16 0.00
52 $0.55 0.00 0.48 0.00
53 $1.26 0.32 0.64 0.16
54 $2.63 0.80 0.96 0.00
55 $4.00 0.96 1.28 0.00
56 $6.55 0.96 1.59 0.00
57 $1.85 -0.16 0.32 0.32
58 $4.40 1.12 1.60 0.00
59 $0.78 0.32 0.16 0.16
60 $2.33 0.64 0.48 0.16
61 $4.33 0.32 0.97 0.00
62 $2.73 0.97 1.45 0.16
63 $0.89 0.16 0.32 0.00
64 $1.24 0.16 0.32 0.00
65 $2.38 0.33 0.33 0.00
66 $2.97 0.33 0.82 0.00
67 $4.17 0.33 0.82 0.82
68 $1.79 0.33 0.49 0.00
69 $4.14 0.49 0.82 0.00
70 ($0.02) 0.33 0.99 0.00
71 $4.54 0.33 0.83 0.00
72 $3.31 0.50 0.83 0.00
73 $4.71 0.50 1.17 0.00
74 $2.54 0.50 1.01 0.17
75 $2.82 0.34 0.68 0.00
76 $1.76 0.17 0.68 0.00
77 $0.42 0.17 0.34 0.00
78 $2.46 0.51 0.51 0.00
79 $2.75 0.34 0.34 0.00
80 $2.09 0.35 0.69 0.17
81 $3.11 0.52 1.04 0.00
82 $0.79 0.17 0.70 0.00
83 $3.55 0.70 0.87 0.00
84 $0.81 0.52 1.22 0.00
85 $2.50 0.53 0.70 -0.18
86 $4.38 0.35 1.23 0.00
87 $0.59 0.53 0.88 0.00
88 $0.75 0.00 0.35 0.00
89 $2.03 0.18 0.18 0.00
90 $2.33 0.18 0.18 0.00
91 $3.20 0.18 0.36 0.53
92 $0.01 0.00 0.36 0.00
93 $1.97 0.90 0.72 1.08
94 $2.26 0.54 1.44 0.00
95 $4.85 1.09 2.72 0.00
96 $1.05 0.18 0.91 0.00
97 $1.15 0.18 0.18 0.00
98 $3.10 1.09 1.28 0.00
99 $3.11 0.37 1.10 0.00
100 $0.33 -0.18 0.00 0.18

Improve the speed of loop performance

I am trying to build a sample for my Markov chain Monte Carlo code using pyMC. So with the sampled parameters of the model, each time the output is built by calling getLensing instance from nfw class and compared to the observed data. My problem is that my code is very slow, when it computed the model parameter. I have for instance 24000 data point and then for each of them I have a probability distribution- e.g. obj_pdf- which I marginalize over it (integrate) in the inner loop. so each time at least it takes an hour to compute all the outputs of the model.
import numpy as np
z=np.arange(0,1.5,0.001)
z_h=0.15
for j in range(pos.shape[0]):
value1=0;value2=0
pdf=obj_pdf[j,:]/sum(obj_pdf[j,:])
for i in range(len(z)):
if (z[i]>z_h) :
g1,g2=nfw.getLensing( pos[j,:], z[i])
value1+=g1*pdf[i]
value2+=g2*pdf[i]
if (j<1):
value=np.array([value1,value2])
else:
value=np.vstack((value, np.array([value1,value2])))
so if I want to re-sample the input parameters for instance 100000 time, it would get months to do the MCMC calculation. Is there any smart way to speed up my code and loops?
Do I need to use something like numpy.vectorize or still it won't improve the speed of my code? How about cython, would it increase the performance of the code? In case it helps, how does it work?
I ran python -m cProfile mycode.py to see what was caused that my code gets slow, and it was the results:
12071 0.004 0.000 0.004 0.000 {min}
2 0.000 0.000 0.000 0.000 {next}
1 0.000 0.000 0.000 0.000 {numexpr.interpreter._set_num_threads}
8 0.002 0.000 0.002 0.000 {numpy.core.multiarray.arange}
132424695 312.210 0.000 312.210 0.000 {numpy.core.multiarray.array}
73498 3.933 0.000 3.933 0.000 {numpy.core.multiarray.concatenate}
99151506 201.497 0.000 201.497 0.000 {numpy.core.multiarray.copyto}
99151500 164.303 0.000 164.303 0.000 {numpy.core.multiarray.empty_like}
28 0.000 0.000 0.000 0.000 {numpy.core.multiarray.empty}
2 0.000 0.000 0.000 0.000 {numpy.core.multiarray.set_string_function}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.set_typeDict}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.where}
14 0.000 0.000 0.000 0.000 {numpy.core.multiarray.zeros}
14 0.000 0.000 0.000 0.000 {numpy.core.umath.geterrobj}
7 0.000 0.000 0.000 0.000 {numpy.core.umath.seterrobj}
270 0.000 0.000 0.000 0.000 {numpy.lib._compiled_base.add_docstring}
6 0.011 0.002 0.011 0.002 {open}
1 0.000 0.000 0.000 0.000 {operator.div}
2 0.000 0.000 0.000 0.000 {operator.mul}
1918 0.000 0.000 0.000 0.000 {ord}
2 0.000 0.000 0.000 0.000 {posix.WEXITSTATUS}
2 0.000 0.000 0.000 0.000 {posix.WIFEXITED}
1 0.000 0.000 0.000 0.000 {posix.WIFSIGNALED}
9 0.002 0.000 0.002 0.000 {posix.access}
3 0.000 0.000 0.000 0.000 {posix.close}
5 0.002 0.000 0.002 0.000 {posix.fdopen}
1 0.002 0.002 0.002 0.002 {posix.fork}
4 0.000 0.000 0.000 0.000 {posix.getcwd}
6 0.000 0.000 0.000 0.000 {posix.getpid}
1 0.000 0.000 0.000 0.000 {posix.getuid}
1 0.000 0.000 0.000 0.000 {posix.listdir}
6 0.000 0.000 0.000 0.000 {posix.lstat}
4 0.043 0.011 0.043 0.011 {posix.open}
2 0.000 0.000 0.000 0.000 {posix.pipe}
2 0.004 0.002 0.004 0.002 {posix.popen}
1 0.007 0.007 0.007 0.007 {posix.read}
205 0.059 0.000 0.059 0.000 {posix.stat}
3 0.000 0.000 0.000 0.000 {posix.sysconf}
2 0.000 0.000 0.000 0.000 {posix.uname}
4 0.004 0.001 0.004 0.001 {posix.unlink}
3 0.000 0.000 0.000 0.000 {posix.urandom}
1 0.000 0.000 0.000 0.000 {posix.waitpid}
1 0.000 0.000 0.000 0.000 {pow}
1522 0.004 0.000 0.004 0.000 {range}
73 0.000 0.000 0.000 0.000 {repr}
99151501 2102.879 0.000 6380.906 0.000 {scipy.integrate._quadpack._qagse}
1776 0.002 0.000 0.002 0.000 {setattr}
32 0.000 0.000 0.000 0.000 {sorted}
24500 18.861 0.001 18.861 0.001 {sum}
184 0.000 0.000 0.000 0.000 {sys._getframe}
1 0.000 0.000 0.000 0.000 {sys.getfilesystemencoding}
2 0.000 0.000 0.000 0.000 {sys.settrace}
1 0.000 0.000 0.000 0.000 {tables.utilsextension._broken_hdf5_long_double}
1 0.000 0.000 0.000 0.000 {tables.utilsextension.blosc_compressor_list}
2 0.000 0.000 0.000 0.000 {tables.utilsextension.get_hdf5_version}
1 0.000 0.000 0.000 0.000 {tables.utilsextension.get_pytables_version}
2 0.000 0.000 0.000 0.000 {tables.utilsextension.which_lib_version}
27 0.000 0.000 0.000 0.000 {thread.allocate_lock}
6 0.000 0.000 0.000 0.000 {thread.get_ident}
4 0.000 0.000 0.000 0.000 {thread.start_new_thread}
1 0.000 0.000 0.000 0.000 {time.localtime}
2 0.000 0.000 0.000 0.000 {time.time}
105 0.000 0.000 0.000 0.000 {unichr}
229 0.000 0.000 0.000 0.000 {vars}
49300 2.127 0.000 2.127 0.000 {zip}
Here is some code. I'd be amazed if the timings went from 60 to 59 minutes though.
import numpy as np
z_h=0.15
z=np.arange(z_h, 1.5,0.001) #start the range from what you need (not exactly
z=z[1:] # needed because you said if (z[i]>z_h), range gives (z[i]>=z_h)
value=np.array([])
for j in range(pos.shape[0]):
value1=0;value2=0
pdf=obj_pdf[j,:]/sum(obj_pdf[j,:])
posj=pos[j,:] #precalculate
for i,zi in enumerate(z): #use enumerate if you need value and index
g1,g2=nfw.getLensing( posj, zi)
value1+=g1*pdf[i]
value2+=g2*pdf[i]
value=np.append(value, np.array([value1,value2])) # use a proper append function
Like the others, I assume getLensing is eating up your CPU cycles.
According to the first answer to this, np.vectorize will not speed up your function.

Extracting the second last line from a table using a specific number followed by an asterisk (e.g. xy.z*)

I'm looking to extract and print a specific line from a table I have in a long log file. It looks something like this:
******************************************************************************
XSCALE (VERSION July 4, 2012) 4-Jun-2013
******************************************************************************
Author: Wolfgang Kabsch
Copy licensed until 30-Jun-2013 to
academic users for non-commercial applications
No redistribution.
******************************************************************************
CONTROL CARDS
******************************************************************************
MAXIMUM_NUMBER_OF_PROCESSORS=16
RESOLUTION_SHELLS= 20 10 6 4 3 2.5 2.0 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1.0 0.9 0.8
MINIMUM_I/SIGMA=4.0
OUTPUT_FILE=fae-ip.ahkl
INPUT_FILE= /dls/sci-scratch/Sam/FC59251/fr6_1/XDS_ASCII.HKL
THE DATA COLLECTION STATISTICS REPORTED BELOW ASSUMES:
SPACE_GROUP_NUMBER= 97
UNIT_CELL_CONSTANTS= 128.28 128.28 181.47 90.000 90.000 90.000
***** 16 EQUIVALENT POSITIONS IN SPACE GROUP # 97 *****
If x',y',z' is an equivalent position to x,y,z, then
x'=x*ML(1)+y*ML( 2)+z*ML( 3)+ML( 4)/12.0
y'=x*ML(5)+y*ML( 6)+z*ML( 7)+ML( 8)/12.0
z'=x*ML(9)+y*ML(10)+z*ML(11)+ML(12)/12.0
# 1 2 3 4 5 6 7 8 9 10 11 12
1 1 0 0 0 0 1 0 0 0 0 1 0
2 -1 0 0 0 0 -1 0 0 0 0 1 0
3 -1 0 0 0 0 1 0 0 0 0 -1 0
4 1 0 0 0 0 -1 0 0 0 0 -1 0
5 0 1 0 0 1 0 0 0 0 0 -1 0
6 0 -1 0 0 -1 0 0 0 0 0 -1 0
7 0 -1 0 0 1 0 0 0 0 0 1 0
8 0 1 0 0 -1 0 0 0 0 0 1 0
9 1 0 0 6 0 1 0 6 0 0 1 6
10 -1 0 0 6 0 -1 0 6 0 0 1 6
11 -1 0 0 6 0 1 0 6 0 0 -1 6
12 1 0 0 6 0 -1 0 6 0 0 -1 6
13 0 1 0 6 1 0 0 6 0 0 -1 6
14 0 -1 0 6 -1 0 0 6 0 0 -1 6
15 0 -1 0 6 1 0 0 6 0 0 1 6
16 0 1 0 6 -1 0 0 6 0 0 1 6
ALL DATA SETS WILL BE SCALED TO /dls/sci-scratch/Sam/FC59251/fr6_1/XDS_ASCII.HKL
******************************************************************************
READING INPUT REFLECTION DATA FILES
******************************************************************************
DATA MEAN REFLECTIONS INPUT FILE NAME
SET# INTENSITY ACCEPTED REJECTED
1 0.1358E+03 1579957 0 /dls/sci-scratch/Sam/FC59251/fr6_1/XDS_ASCII.HKL
******************************************************************************
CORRECTION FACTORS AS FUNCTION OF IMAGE NUMBER & RESOLUTION
******************************************************************************
RECIPROCAL CORRECTION FACTORS FOR INPUT DATA SETS MERGED TO
OUTPUT FILE: fae-ip.ahkl
THE CALCULATIONS ASSUME FRIEDEL'S_LAW= TRUE
TOTAL NUMBER OF CORRECTION FACTORS DEFINED 720
DEGREES OF FREEDOM OF CHI^2 FIT 357222.9
CHI^2-VALUE OF FIT OF CORRECTION FACTORS 1.024
NUMBER OF CYCLES CARRIED OUT 4
CORRECTION FACTORS for visual inspection by XDS-Viewer DECAY_001.cbf
XMIN= 0.6 XMAX= 1799.3 NXBIN= 36
YMIN= 0.00049 YMAX= 0.44483 NYBIN= 20
NUMBER OF REFLECTIONS USED FOR DETERMINING CORRECTION FACTORS 396046
******************************************************************************
CORRECTION FACTORS AS FUNCTION OF X (fast) & Y(slow) IN THE DETECTOR PLANE
******************************************************************************
RECIPROCAL CORRECTION FACTORS FOR INPUT DATA SETS MERGED TO
OUTPUT FILE: fae-ip.ahkl
THE CALCULATIONS ASSUME FRIEDEL'S_LAW= TRUE
TOTAL NUMBER OF CORRECTION FACTORS DEFINED 7921
DEGREES OF FREEDOM OF CHI^2 FIT 356720.6
CHI^2-VALUE OF FIT OF CORRECTION FACTORS 1.023
NUMBER OF CYCLES CARRIED OUT 3
CORRECTION FACTORS for visual inspection by XDS-Viewer MODPIX_001.cbf
XMIN= 5.4 XMAX= 2457.6 NXBIN= 89
YMIN= 40.0 YMAX= 2516.7 NYBIN= 89
NUMBER OF REFLECTIONS USED FOR DETERMINING CORRECTION FACTORS 396046
******************************************************************************
CORRECTION FACTORS AS FUNCTION OF IMAGE NUMBER & DETECTOR SURFACE POSITION
******************************************************************************
RECIPROCAL CORRECTION FACTORS FOR INPUT DATA SETS MERGED TO
OUTPUT FILE: fae-ip.ahkl
THE CALCULATIONS ASSUME FRIEDEL'S_LAW= TRUE
TOTAL NUMBER OF CORRECTION FACTORS DEFINED 468
DEGREES OF FREEDOM OF CHI^2 FIT 357286.9
CHI^2-VALUE OF FIT OF CORRECTION FACTORS 1.022
NUMBER OF CYCLES CARRIED OUT 3
CORRECTION FACTORS for visual inspection by XDS-Viewer ABSORP_001.cbf
XMIN= 0.6 XMAX= 1799.3 NXBIN= 36
DETECTOR_SURFACE_POSITION= 1232 1278
DETECTOR_SURFACE_POSITION= 1648 1699
DETECTOR_SURFACE_POSITION= 815 1699
DETECTOR_SURFACE_POSITION= 815 858
DETECTOR_SURFACE_POSITION= 1648 858
DETECTOR_SURFACE_POSITION= 2174 1673
DETECTOR_SURFACE_POSITION= 1622 2230
DETECTOR_SURFACE_POSITION= 841 2230
DETECTOR_SURFACE_POSITION= 289 1673
DETECTOR_SURFACE_POSITION= 289 884
DETECTOR_SURFACE_POSITION= 841 326
DETECTOR_SURFACE_POSITION= 1622 326
DETECTOR_SURFACE_POSITION= 2174 884
NUMBER OF REFLECTIONS USED FOR DETERMINING CORRECTION FACTORS 396046
******************************************************************************
CORRECTION PARAMETERS FOR THE STANDARD ERROR OF REFLECTION INTENSITIES
******************************************************************************
The variance v0(I) of the intensity I obtained from counting statistics is
replaced by v(I)=a*(v0(I)+b*I^2). The model parameters a, b are chosen to
minimize the discrepancies between v(I) and the variance estimated from
sample statistics of symmetry related reflections. This model implicates
an asymptotic limit ISa=1/SQRT(a*b) for the highest I/Sigma(I) that the
experimental setup can produce (Diederichs (2010) Acta Cryst D66, 733-740).
Often the value of ISa is reduced from the initial value ISa0 due to systematic
errors showing up by comparison with other data sets in the scaling procedure.
(ISa=ISa0=-1 if v0 is unknown for a data set.)
a b ISa ISa0 INPUT DATA SET
1.086E+00 1.420E-03 25.46 29.00 /dls/sci-scratch/Sam/FC59251/fr6_1/XDS_ASCII.HKL
FACTOR TO PLACE ALL DATA SETS TO AN APPROXIMATE ABSOLUTE SCALE 0.4178E+04
(ASSUMING A PROTEIN WITH 50% SOLVENT)
******************************************************************************
STATISTICS OF SCALED OUTPUT DATA SET : fae-ip.ahkl
FILE TYPE: XDS_ASCII MERGE=FALSE FRIEDEL'S_LAW=TRUE
186 OUT OF 1579957 REFLECTIONS REJECTED
1579771 REFLECTIONS ON OUTPUT FILE
******************************************************************************
DEFINITIONS:
R-FACTOR
observed = (SUM(ABS(I(h,i)-I(h))))/(SUM(I(h,i)))
expected = expected R-FACTOR derived from Sigma(I)
COMPARED = number of reflections used for calculating R-FACTOR
I/SIGMA = mean of intensity/Sigma(I) of unique reflections
(after merging symmetry-related observations)
Sigma(I) = standard deviation of reflection intensity I
estimated from sample statistics
R-meas = redundancy independent R-factor (intensities)
Diederichs & Karplus (1997), Nature Struct. Biol. 4, 269-275.
CC(1/2) = percentage of correlation between intensities from
random half-datasets. Correlation significant at
the 0.1% level is marked by an asterisk.
Karplus & Diederichs (2012), Science 336, 1030-33
Anomal = percentage of correlation between random half-sets
Corr of anomalous intensity differences. Correlation
significant at the 0.1% level is marked.
SigAno = mean anomalous difference in units of its estimated
standard deviation (|F(+)-F(-)|/Sigma). F(+), F(-)
are structure factor estimates obtained from the
merged intensity observations in each parity class.
Nano = Number of unique reflections used to calculate
Anomal_Corr & SigAno. At least two observations
for each (+ and -) parity are required.
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA R-meas CC(1/2) Anomal SigAno Nano
LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr
20.00 557 66 74 89.2% 2.7% 3.0% 557 58.75 2.9% 100.0* 45 1.674 25
10.00 5018 417 417 100.0% 2.4% 3.1% 5018 75.34 2.6% 100.0* 2 0.812 276
6.00 18352 1583 1584 99.9% 2.8% 3.3% 18351 65.55 2.9% 100.0* 11* 0.914 1248
4.00 59691 4640 4640 100.0% 3.2% 3.5% 59690 64.96 3.4% 100.0* 4 0.857 3987
3.00 112106 8821 8822 100.0% 4.4% 4.4% 112102 50.31 4.6% 99.9* -3 0.844 7906
2.50 147954 11023 11023 100.0% 8.7% 8.6% 147954 29.91 9.1% 99.8* 0 0.829 10096
2.00 332952 24698 24698 100.0% 21.4% 21.6% 332949 14.32 22.3% 99.2* 1 0.804 22992
1.90 106645 8382 8384 100.0% 56.5% 57.1% 106645 5.63 58.8% 94.7* -2 0.767 7886
1.80 138516 10342 10343 100.0% 86.8% 87.0% 138516 3.64 90.2% 87.9* -2 0.762 9741
1.70 175117 12897 12899 100.0% 140.0% 140.1% 175116 2.15 145.4% 69.6* -2 0.732 12188
1.60 209398 16298 16304 100.0% 206.1% 208.5% 209397 1.35 214.6% 48.9* -2 0.693 15466
1.50 273432 20770 20893 99.4% 333.4% 342.1% 273340 0.80 346.9% 23.2* -1 0.644 19495
1.40 33 27 27248 0.1% 42.6% 112.7% 12 0.40 60.3% 88.2 0 0.000 0
1.30 0 0 36205 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
1.20 0 0 49238 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
1.10 0 0 68746 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
1.00 0 0 98884 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
0.90 0 0 147505 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
0.80 0 0 230396 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
total 1579771 119964 778303 15.4% 12.8% 13.1% 1579647 14.33 13.4% 99.9* -1 0.755 111306
========== STATISTICS OF INPUT DATA SET ==========
R-FACTORS FOR INTENSITIES OF DATA SET /dls/sci-scratch/Sam/FC59251/fr6_1/XDS_ASCII.HKL
RESOLUTION R-FACTOR R-FACTOR COMPARED
LIMIT observed expected
20.00 2.7% 3.0% 557
10.00 2.4% 3.1% 5018
6.00 2.8% 3.3% 18351
4.00 3.2% 3.5% 59690
3.00 4.4% 4.4% 112102
2.50 8.7% 8.6% 147954
2.00 21.4% 21.6% 332949
1.90 56.5% 57.1% 106645
1.80 86.8% 87.0% 138516
1.70 140.0% 140.1% 175116
1.60 206.1% 208.5% 209397
1.50 333.4% 342.1% 273340
1.40 42.6% 112.7% 12
1.30 -99.9% -99.9% 0
1.20 -99.9% -99.9% 0
1.10 -99.9% -99.9% 0
1.00 -99.9% -99.9% 0
0.90 -99.9% -99.9% 0
0.80 -99.9% -99.9% 0
total 12.8% 13.1% 1579647
******************************************************************************
WILSON STATISTICS OF SCALED DATA SET: fae-ip.ahkl
******************************************************************************
Data is divided into resolution shells and a straight line
A - 2*B*SS is fitted to log<I>, where
RES = mean resolution (Angstrom) in shell
SS = mean of (sin(THETA)/LAMBDA)**2 in shell
<I> = mean reflection intensity in shell
BO = (A - log<I>)/(2*SS)
# = number of reflections in resolution shell
WILSON LINE (using all data) : A= 14.997 B= 29.252 CORRELATION= 0.99
# RES SS <I> log(<I>) BO
1667 8.445 0.004 2.3084E+06 14.652 49.2
2798 5.260 0.009 1.5365E+06 14.245 41.6
3547 4.106 0.015 2.0110E+06 14.514 16.3
4147 3.480 0.021 1.2910E+06 14.071 22.4
4688 3.073 0.026 7.3586E+05 13.509 28.1
5154 2.781 0.032 4.6124E+05 13.042 30.3
5568 2.560 0.038 3.1507E+05 12.661 30.6
5966 2.384 0.044 2.4858E+05 12.424 29.2
6324 2.240 0.050 1.8968E+05 12.153 28.5
6707 2.119 0.056 1.3930E+05 11.844 28.3
7030 2.016 0.062 9.1378E+04 11.423 29.0
7331 1.926 0.067 5.4413E+04 10.904 30.4
7664 1.848 0.073 3.5484E+04 10.477 30.9
7934 1.778 0.079 2.4332E+04 10.100 31.0
8193 1.716 0.085 1.8373E+04 9.819 30.5
8466 1.660 0.091 1.4992E+04 9.615 29.7
8743 1.609 0.097 1.1894E+04 9.384 29.1
9037 1.562 0.102 9.4284E+03 9.151 28.5
9001 1.520 0.108 8.3217E+03 9.027 27.6
HIGHER ORDER MOMENTS OF WILSON DISTRIBUTION OF CENTRIC DATA
AS COMPARED WITH THEORETICAL VALUES. (EXPECTED: 1.00)
# RES <I**2>/ <I**3>/ <I**4>/
3<I>**2 15<I>**3 105<I>**4
440 8.445 0.740 0.505 0.294
442 5.260 0.762 0.733 0.735
442 4.106 0.888 0.788 0.717
439 3.480 1.339 1.733 2.278
438 3.073 1.168 1.259 1.400
440 2.781 1.215 1.681 2.269
438 2.560 1.192 1.603 2.405
450 2.384 1.117 1.031 0.891
432 2.240 1.214 1.567 2.173
438 2.119 0.972 0.992 0.933
445 2.016 1.029 1.019 0.986
441 1.926 1.603 1.701 1.554
440 1.848 1.544 1.871 2.076
436 1.778 0.927 0.661 0.435
444 1.716 1.134 1.115 1.197
440 1.660 1.271 1.618 2.890
436 1.609 1.424 1.045 0.941
448 1.562 1.794 1.447 1.423
426 1.520 2.517 1.496 2.099
8355 overall 1.253 1.255 1.455
HIGHER ORDER MOMENTS OF WILSON DISTRIBUTION OF ACENTRIC DATA
AS COMPARED WITH THEORETICAL VALUES. (EXPECTED: 1.00)
# RES <I**2>/ <I**3>/ <I**4>/
2<I>**2 6<I>**3 24<I>**4
1227 8.445 1.322 1.803 2.340
2356 5.260 1.167 1.420 1.789
3105 4.106 1.010 1.046 1.100
3708 3.480 1.055 1.262 1.592
4250 3.073 0.999 1.083 1.375
4714 2.781 1.061 1.232 1.591
5130 2.560 1.049 1.178 1.440
5516 2.384 1.025 1.117 1.290
5892 2.240 1.001 1.058 1.230
6269 2.119 1.060 1.140 1.233
6585 2.016 1.109 1.344 1.709
6890 1.926 1.028 1.100 1.222
7224 1.848 1.060 1.150 1.348
7498 1.778 1.143 1.309 1.655
7749 1.716 1.182 1.299 1.549
8026 1.660 1.286 1.376 1.538
8307 1.609 1.419 1.481 1.707
8589 1.562 1.663 1.750 2.119
8575 1.520 2.271 2.172 5.088
111610 overall 1.253 1.354 1.804
======= CUMULATIVE INTENSITY DISTRIBUTION =======
DEFINITIONS:
<I> = mean reflection intensity
Na(Z)exp = expected number of acentric reflections with I <= Z*<I>
Na(Z)obs = observed number of acentric reflections with I <= Z*<I>
Nc(Z)exp = expected number of centric reflections with I <= Z*<I>
Nc(Z)obs = observed number of centric reflections with I <= Z*<I>
Nc(Z)obs/Nc(Z)exp versus resolution and Z (0.1-1.0)
# RES 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
440 8.445 0.75 0.95 0.98 1.00 0.98 0.99 1.00 1.00 1.02 1.02
442 5.260 1.18 1.11 1.09 1.09 1.07 1.08 1.08 1.08 1.07 1.06
442 4.106 0.97 1.01 0.98 0.97 0.96 0.94 0.92 0.91 0.92 0.94
439 3.480 0.91 0.88 0.91 0.91 0.89 0.90 0.90 0.89 0.89 0.93
438 3.073 0.92 0.92 0.90 0.93 0.94 0.99 1.02 0.99 0.96 0.96
440 2.781 0.98 1.01 1.02 1.05 1.04 1.03 1.04 1.02 1.01 1.01
438 2.560 1.02 1.10 1.05 1.03 1.01 1.03 1.04 1.01 1.04 1.02
450 2.384 0.78 0.93 0.92 0.93 0.89 0.89 0.92 0.95 0.96 0.95
432 2.240 0.69 0.82 0.84 0.86 0.91 0.92 0.93 0.94 0.95 0.95
438 2.119 0.75 0.87 0.95 1.02 1.09 1.09 1.12 1.12 1.10 1.08
445 2.016 0.86 0.86 0.87 0.90 0.91 0.93 0.98 0.99 1.00 1.00
441 1.926 0.88 0.79 0.79 0.81 0.82 0.84 0.85 0.85 0.86 0.86
440 1.848 1.00 0.89 0.85 0.83 0.85 0.85 0.88 0.90 0.90 0.92
436 1.778 1.03 0.87 0.79 0.79 0.80 0.84 0.85 0.87 0.90 0.92
444 1.716 1.09 0.85 0.81 0.78 0.80 0.80 0.81 0.81 0.84 0.85
440 1.660 1.27 1.01 0.93 0.88 0.85 0.84 0.84 0.85 0.88 0.91
436 1.609 1.34 1.00 0.89 0.83 0.80 0.80 0.80 0.81 0.80 0.83
448 1.562 1.39 1.09 0.93 0.86 0.81 0.78 0.77 0.79 0.78 0.78
426 1.520 1.38 1.03 0.88 0.83 0.82 0.80 0.78 0.76 0.75 0.74
8355 overall 1.01 0.95 0.92 0.91 0.91 0.91 0.92 0.92 0.93 0.93
Na(Z)obs/Na(Z)exp versus resolution and Z (0.1-1.0)
# RES 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
1227 8.445 1.10 1.22 1.21 1.21 1.14 1.10 1.12 1.10 1.11 1.09
2356 5.260 1.15 1.10 1.09 1.03 1.03 1.03 1.01 1.01 1.01 1.00
3105 4.106 0.91 0.96 0.99 1.01 1.02 1.00 1.00 0.99 0.99 1.00
3708 3.480 0.93 0.97 1.00 1.06 1.05 1.04 1.04 1.04 1.04 1.05
4250 3.073 0.94 1.02 1.01 1.00 1.01 1.00 1.00 1.01 1.02 1.02
4714 2.781 1.11 1.04 1.02 1.02 1.02 1.01 1.01 1.01 1.00 1.00
5130 2.560 1.00 1.10 1.06 1.03 1.01 1.02 1.01 1.01 1.01 1.02
5516 2.384 1.09 1.08 1.05 1.04 1.04 1.02 1.01 1.01 1.01 1.01
5892 2.240 0.98 0.99 1.00 1.01 1.01 1.01 1.00 1.00 1.00 1.00
6269 2.119 1.14 1.04 1.02 1.00 1.00 1.00 1.01 1.02 1.02 1.01
6585 2.016 1.17 1.02 1.01 1.02 1.02 1.03 1.02 1.02 1.02 1.02
6890 1.926 1.35 1.07 1.00 0.99 1.00 1.01 1.01 1.00 1.00 1.01
7224 1.848 1.52 1.11 1.01 0.97 0.96 0.98 0.98 0.98 0.98 0.99
7498 1.778 1.80 1.22 1.03 0.97 0.95 0.94 0.95 0.95 0.95 0.96
7749 1.716 2.01 1.28 1.07 0.99 0.94 0.92 0.92 0.92 0.93 0.93
8026 1.660 2.31 1.41 1.13 1.01 0.95 0.92 0.90 0.89 0.89 0.89
8307 1.609 2.62 1.54 1.19 1.04 0.95 0.90 0.88 0.87 0.86 0.87
8589 1.562 2.94 1.69 1.29 1.10 1.00 0.93 0.89 0.86 0.85 0.85
8575 1.520 3.14 1.78 1.34 1.13 1.01 0.93 0.88 0.85 0.83 0.83
111610 overall 1.73 1.24 1.09 1.03 0.99 0.97 0.96 0.96 0.96 0.96
List of 33 reflections *NOT* obeying Wilson distribution (Z> 10.0)
h k l RES Z Intensity Sigma
72 11 61 1.52 17.34 0.2886E+06 0.2367E+05 "alien"
67 53 6 1.50 15.85 0.2638E+06 0.1128E+06 "alien"
35 10 25 3.17 14.39 0.2118E+08 0.2364E+06 "alien"
46 17 99 1.50 14.16 0.2357E+06 0.9588E+05 "alien"
34 32 2 2.75 13.44 0.1239E+08 0.1279E+06 "alien"
79 6 15 1.60 13.10 0.3117E+06 0.2477E+05 "alien"
61 20 33 1.88 12.54 0.8900E+06 0.3054E+05 "alien"
44 4 48 2.30 12.38 0.4695E+07 0.6072E+05 "alien"
66 25 19 1.79 11.89 0.5788E+06 0.2739E+05 "alien"
66 25 11 1.81 11.88 0.5781E+06 0.2771E+05 "alien"
60 43 61 1.50 11.77 0.1959E+06 0.9769E+05 "alien"
72 11 17 1.74 11.64 0.4278E+06 0.2619E+05 "alien"
80 24 26 1.50 11.41 0.1899E+06 0.9793E+05 "alien"
41 21 26 2.59 11.09 0.6988E+07 0.7945E+05 "alien"
44 18 20 2.59 11.08 0.6982E+07 0.7839E+05 "alien"
23 3 62 2.59 11.06 0.6971E+07 0.9154E+05 "alien"
69 7 22 1.80 11.06 0.5383E+06 0.2564E+05 "alien"
73 10 15 1.72 10.98 0.4036E+06 0.2356E+05 "alien"
70 17 35 1.68 10.96 0.3286E+06 0.2415E+05 "alien"
57 24 41 1.88 10.91 0.7746E+06 0.2842E+05 "alien"
82 24 6 1.50 10.74 0.1787E+06 0.1019E+06 "alien"
69 25 62 1.50 10.67 0.1775E+06 0.8689E+05 "alien"
24 20 44 2.91 10.45 0.9641E+07 0.1017E+06 "alien"
66 43 5 1.63 10.37 0.2468E+06 0.2294E+05 "alien"
81 4 29 1.53 10.36 0.1725E+06 0.2364E+05 "alien"
60 40 26 1.72 10.32 0.3792E+06 0.2578E+05 "alien"
39 18 57 2.18 10.24 0.3885E+07 0.5573E+05 "alien"
70 41 15 1.57 10.19 0.1922E+06 0.2281E+05 "alien"
55 36 41 1.79 10.16 0.4942E+06 0.2967E+05 "alien"
37 4 81 1.88 10.15 0.7202E+06 0.3357E+05 "alien"
56 27 5 2.06 10.14 0.1854E+07 0.3569E+05 "alien"
44 39 29 2.06 10.09 0.1844E+07 0.3805E+05 "alien"
65 46 29 1.56 10.06 0.1898E+06 0.2270E+05 "alien"
List of 33 reflections *NOT* obeying Wilson distribution (sorted by resolution)
Ice rings could occur at (Angstrom):
3.897,3.669,3.441, 2.671,2.249,2.072, 1.948,1.918,1.883,1.721
h k l RES Z Intensity Sigma
82 24 6 1.50 10.74 0.1787E+06 0.1019E+06
67 53 6 1.50 15.85 0.2638E+06 0.1128E+06
80 24 26 1.50 11.41 0.1899E+06 0.9793E+05
60 43 61 1.50 11.77 0.1959E+06 0.9769E+05
69 25 62 1.50 10.67 0.1775E+06 0.8689E+05
46 17 99 1.50 14.16 0.2357E+06 0.9588E+05
72 11 61 1.52 17.34 0.2886E+06 0.2367E+05
81 4 29 1.53 10.36 0.1725E+06 0.2364E+05
65 46 29 1.56 10.06 0.1898E+06 0.2270E+05
70 41 15 1.57 10.19 0.1922E+06 0.2281E+05
79 6 15 1.60 13.10 0.3117E+06 0.2477E+05
66 43 5 1.63 10.37 0.2468E+06 0.2294E+05
70 17 35 1.68 10.96 0.3286E+06 0.2415E+05
73 10 15 1.72 10.98 0.4036E+06 0.2356E+05
60 40 26 1.72 10.32 0.3792E+06 0.2578E+05
72 11 17 1.74 11.64 0.4278E+06 0.2619E+05
66 25 19 1.79 11.89 0.5788E+06 0.2739E+05
55 36 41 1.79 10.16 0.4942E+06 0.2967E+05
69 7 22 1.80 11.06 0.5383E+06 0.2564E+05
66 25 11 1.81 11.88 0.5781E+06 0.2771E+05
61 20 33 1.88 12.54 0.8900E+06 0.3054E+05
57 24 41 1.88 10.91 0.7746E+06 0.2842E+05
37 4 81 1.88 10.15 0.7202E+06 0.3357E+05
56 27 5 2.06 10.14 0.1854E+07 0.3569E+05
44 39 29 2.06 10.09 0.1844E+07 0.3805E+05
39 18 57 2.18 10.24 0.3885E+07 0.5573E+05
44 4 48 2.30 12.38 0.4695E+07 0.6072E+05
44 18 20 2.59 11.08 0.6982E+07 0.7839E+05
41 21 26 2.59 11.09 0.6988E+07 0.7945E+05
23 3 62 2.59 11.06 0.6971E+07 0.9154E+05
34 32 2 2.75 13.44 0.1239E+08 0.1279E+06
24 20 44 2.91 10.45 0.9641E+07 0.1017E+06
35 10 25 3.17 14.39 0.2118E+08 0.2364E+06
cpu time used by XSCALE 25.9 sec
elapsed wall-clock time 28.1 sec
I would like to extract the second last line where the 11th column has a number followed by an asterisk (xy.z*). E.g. in this table the line I'm looking for would contain "23.2*" from the 11th column (CC(1/2)). I would like the second last because the last would be the line that starts with total, and this was a lot easier to extract with a simple grep command.
So the expected output for the code in this case would be to print the line:
1.50 273432 20770 20893 99.4% 333.4% 342.1% 273340 0.80 346.9% 23.2* -1 0.644 19495
In a different file the second last value in the 11th with an asterisk after may correspond to 1.6 in the first column so the expected output would be:
1.60 216910 5769 5769 100.0% 207.5% 214.7% 216910 1.72 210.4% 26.0* -3 0.654 5204
And so on for all the different possible positions of the asterisk in the table.
I've tried using things like grep "[0-9, 0-9, ., 0-9*]" file.name and various other grep and fgrep things but I'm pretty new to this and can't get it to work.
Any help would be greatly appreciated.
Sam
GNU sed
(for your updated script)
sed -n '/LIMIT/,/=/{/^\s*\(\S*\s*\)\{10\}[0-9.-]*\*/H;x;s/^.*\n\(.*\n.*\)$/\1/;x;/=/{x;P;q}}' file
.. output is:
1.50 273432 20770 20893 99.4% 333.4% 342.1% 273340 0.80 346.9% 23.2* -1 0.644 19495
To print the entire second last line which matches that regex, you can do something like this:
awk '$11~/[0-9.]+\*/{secondlast=last;last=$0}END{print secondlast}' logFile
This one liner can do it:
$ awk '{if ($11 ~ /\*/) {i++; a[i]=$0}} END {print a[i -1]}' file
1.50 274090 20781 20874 99.6% 333.7% 341.9% 274015 0.80 347.1% 24.8* 0 0.645 19516
Explanation
It add to the array a[] all lines that contain * the 11th field. Then prints not the last but the previous one.
Update
Since your log is very big and asterisks appear all around, I update my code to:
$ awk '{if ($11 == /[0-9]*.[0-9]*\*/) {i++; a[i]=$0}} END {print a[i -1]}' a
0.90 0 0 147505 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
so it looks for lines with NNN.XXX* format.
awk '$11~/^[0-9.]+\*$/ {prev=val; val=$11+0} END {print prev}' log
I add 0 to the value of $11 to convert the string "23.2*" to the number 23.2.
Alternately, when I hear "nth from the end", I think: reverse it and take the nth from the top:
tac log | awk '$11~/^[0-9.]+\*$/ && ++n == 2 {print $11+0; exit}'