Extracting Multiple Tables On Different Pages From Multiple Page PDF With Camelot

Extracting Multiple Tables On Different Pages From Multiple Page PDF With Camelot - pdf

My PDF contains 16 tables on 3 pages, which I want to output to an Excel file as a single worksheet using Camelot. I can extract each page individually with no problems but I cannot figure out how to handle all 3 pages in one pass. My code shown below:
# Read Obslog Page 1 to extract all the required tables
obstables = camelot.read_pdf(filepath,
pages='1', \
flavor='stream', \
edge_tol=500, \
strip_text=' °, kn, m, µbar, mbar, in³, psi,\n', \
table_areas=[' 15, 750, 575, 680', \
' 15, 680, 575, 570', \
' 15, 570, 575, 460', \
' 15, 460, 575, 380', \
' 15, 380, 575, 300', \
' 15, 300, 575, 240', \
' 15, 240, 575, 180', \
' 15, 180, 575, 110'], \
columns=['','','','','','','',''])
# Read Obslog Page 2 to extract all the required tables
obstables1 = camelot.read_pdf(filepath,
pages='2', \
flavor='stream', \
edge_tol=500, \
strip_text=' °, kn, m, µbar, mbar, in³, psi,\n', \
table_areas=[' 20, 820, 575, 750', \
' 20, 730, 140, 655', \
' 20, 635, 270, 560', \
' 20, 540, 270, 470'], \
columns=['','','',''])
# Read Obslog Page 3 to extract all the required tables
obstables2 = camelot.read_pdf(filepath,
pages='3', \
flavor='stream', \
edge_tol=500, \
strip_text=' °, kn, m, µbar, mbar, in³, psi,\n', \
table_areas=[' 15, 820, 575, 750', \
' 15, 730, 575, 660', \
' 15, 640, 575, 570', \
' 15, 560, 150, 500', \
' 15, 480, 575, 390',] \
columns=['','','','',''])
When I try to execute the script the first line of the page 2 'table_areas' gives me the following syntax error:
table_areas=[' 15, 820, 575, 750',
^^^^^^^^^^^^^^^^^^^^^^^^
I cannot see any syntax problem with this line.
I get the same error if I try to use the 'tables.append' option(as suggested by Anakin87 on 12/7/2021 in answer a similar post). In this case replacing the camelot procedures for pages 2 and 3 with the following code:
obstables._tables.append(camelot.read_pdf(filepath,
pages='2', \
flavor='stream', \
edge_tol=500, \
strip_text=' °, kn, m, µbar, mbar, in³, psi,\n', \
table_areas=[' 20, 820, 575, 750', \
' 20, 730, 140, 655', \
' 20, 635, 270, 560', \
' 20, 540, 270, 470'], \
columns=['','','','']))
obstables._tables.append(camelot.read_pdf(filepath,
pages='3', \
flavor='stream', \
edge_tol=500, \
strip_text=' °, kn, m, µbar, mbar, in³, psi,\n', \
table_areas=[' 15, 820, 575, 750', \
' 15, 730, 575, 660', \
' 15, 640, 575, 570', \
' 15, 560, 150, 500', \
' 15, 480, 575, 390',] \
columns=['','','','','']))
Appending all the tables seems a good option as I the final output will be concatenated to a single dataframe before output to an Excel worksheet, however at the moment I am stuck with the cause of the syntax error.

After going through all the code the error was a simple rookie mistake! I was trying find the syntax error on the first line of the table_areas definition, in fact I had left a comma in the last line of the definition before the ']'. I was slightly mislead by the error message which pointed to the first line of the table_areas definition rather than the last, because I copy/pasted the code this was also why the 'tables.append' option failed.
' 15, 480, 575, 390',] \
which should have read
' 15, 480, 575, 390'], \

Related

vhost in rabbitmq is not starting

My rabbitmq was working fine. Suddenly, one of the vhosts is not starting up. On restarting rabbitmq server, it shows in admin UI, that error is starting up vhost, and all the queues are in down state. Here is the error when I restart the vhost. Please suggest. Our production message broker is down, need immediate help.
Rabbitmq ver. - 3.8.3
Erlang ver. - 22.3
Trying to restart vhost 'r_t' on node 'rabbit#myserver' ...
Error:
Failed to start vhost 'r_t' on node 'rabbit#myserver'Reason: {:shutdown, {:failed_to_start_child,
:rabbit_vhost_process, {:error, {{{:badarg, [{:erlang, :binary_to_term, [<<131, 104, 6, 100, 0, 13, 98,
97, 115, 105, 99, 95, 109, 101, 115, 115, 97, 103, 101, 104, 4, 100, 0, 8, 114, 101, 115, 111, 117, 114,
99, 101, 109, 0, ...>>], []}, {:rabbit_queue_index, :parse_pub_record_body, 2, [file:
'src/rabbit_queue_index.erl', line: 783]}, {:rabbit_queue_index, :"-segment_entries_foldr/3-fun-0-", 4,
[file: 'src/rabbit_queue_index.erl', line: 1111]}, {:array, :sparse_foldr_3, 6, [file: 'array.erl', line:
1847]}, {:array, :sparse_foldr_2, 8, [file: 'array.erl', line: 1836]}, {:rabbit_queue_index,
:scan_queue_segments, 3, [file: 'src/rabbit_queue_index.erl', line: 741]}, {:rabbit_queue_index,
:queue_index_walker_reader, 2, [file: 'src/rabbit_queue_index.erl', line: 728]}, {:rabbit_queue_index,
:"-queue_index_walker/1-fun-1-", 2, [file: 'src/rabbit_queue_index.erl', line: 710]}]}, {:gen_server2,
:call, [#PID<10691.1882.0>, :out, :infinity]}}, {:child, :undefined, :msg_store_persistent,
{:rabbit_msg_store, :start_link, [:msg_store_persistent,
'/var/lib/rabbitmq/mnesia/rabbit#1myserver/msg_stores/vhosts/1SLGRHB3T7STV1U1TEB4MR6QS', [],
{#Function<2.23124100/1 in :rabbit_queue_index>, {:start, [{:resource, "r_t", :queue,
"product.import_royn_se"}, {:resource, "r_t", :queue, "customer.import_ronin_es"}, {:resource, "r_t",
...}, {:resource, ...}, {...}, ...]}}]}, :transient, 30000, :worker, [:rabbit_msg_store]}}}}}

I got a workaround.
We just exported and saved vhost definitions for existing vhost which was not starting. The deleted this vhost. Created same vhost and imported the definition back. So, we got all queues with same features.

-> For dump all messages
from os import listdir
from os.path import isfile, join
import re
base_path = '/docker/rabbitmq-nfe/data/mnesia/rabbit#rabbitmq_nfe/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/queues'
list_queues = listdir(base_path)
#GET ALL QUEUES
for queue in list_queues:
dir_queue = '%s/%s' % (base_path, queue)
list_files = [f for f in listdir(dir_queue) if isfile(join(dir_queue, f)) and f not in ['.queue_name', 'journal.jif']]
list_files.sort()
name_queue = open('%s/.queue_name' % (dir_queue), 'r').read().split('QUEUE: ')[-1][:-1]
payload_queue = []
#GET ALL FILES QUEUES
for file in list_files:
path_file = '%s/%s' % (dir_queue, file)
binary_file = open(path_file, 'r')
string_file = binary_file.read()
string_file_decoded = string_file.decode('iso 8859-1')
#GET ALL PAYLOAD FILES QUEUES
list_payload_queue = ['{%s}' % f for f in re.split('\{(.*?)\}', string_file_decoded) if 'chave' in f and 'rede' in f]
for idx, payload in enumerate(list_payload_queue):
if payload.count('{') > 1:
list_payload_queue[idx] = ['{%s' % f for f in payload.split('{') if 'chave' in f and 'rede' in f][0]
payload_queue = payload_queue + list_payload_queue
#SAVE BACKUP QUEUES
print('FILA: %s ARQUIVOS: %s' % (name_queue, str(len(payload_queue))))
with open('/tmp/%s.log' % name_queue, 'w') as f:
for line in payload_queue:
f.write("%s\n" % line)

Export inference graph - valueerror: the passed save_path is not a valid checkpoint:

Thanks for any help you can give.
Here is my code to setup variables and paths:
# by Chengwei
#dir where the model will be saved
output_directory = './fine_tuned_model'
lst = os.listdir('training')
lst = [l for l in lst if 'model.ckpt-' in l and '.meta' in l]
steps=np.array([int(re.findall('\d+', l)[0]) for l in lst])
last_model = lst[steps.argmax()].replace('.meta', '')
last_model_path = os.path.join('/training/', last_model)
print(last_model_path)
And here is my code to exporting the inference graph:
!python /content/drive/'My Drive'/object_detection/models/research/object_detection/export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path /content/drive/'My Drive'/object_detection/models/research/object_detection/samples/configs/export_graph_ssd_mobilenet_v2_coco.config \
--output_directory output_directory \
--inference_graph_path output_inference_graph \
--trained_checkpoint_prefix last_model_path
I get the following error:
Traceback (most recent call last):
File "/content/drive/My Drive/object_detection/models/research/object_detection/export_inference_graph.py", line 83, in <module>
tf.app.run()
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/content/drive/My Drive/object_detection/models/research/object_detection/export_inference_graph.py", line 79, in main
FLAGS.inference_graph_path)
File "/content/drive/My Drive/object_detection/models/research/object_detection/exporter.py", line 625, in export_inference_graph
side_input_types=side_input_types)
File "/content/drive/My Drive/object_detection/models/research/object_detection/exporter.py", line 538, in _export_inference_graph
trained_checkpoint_prefix=checkpoint_to_use)
File "/content/drive/My Drive/object_detection/models/research/object_detection/exporter.py", line 423, in write_graph_and_checkpoint
saver.restore(sess, trained_checkpoint_prefix)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 1282, in restore
checkpoint_prefix)
ValueError: The passed save_path is not a valid checkpoint:
I've tried playing around with the paths, to make sure there were no errors there. I've looked at similar threads and followed suggestions there but all other threads the valueerror points to specific path/file, where as this doesn;t???
PLease help if you can.

sorry for the late response but I hope I can still help you.
I had the same problem. The reason why you get this error is because when you cancelled training your neural network, not all data has beem written to the file. Therefore, it is inconsistent. You can easily solve this by using a model with a lower number. Example: my highest number is 640. The second highest number is 417. Model.ckpt-640 is inconsistent, therfore I will export the graph using model.ckpt-417

Errors occurred while converting 32-bit float TensorFlow model into 8-bit fixed TensorFlow model

I am following the procedures listed in the github, Quantization-aware training, https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantize.
To quantize my own TF model, landing_retrained_graph.pb, I fed it into the instructions of quantizations.
freeze_graph \
--input_graph=landing_retrained_graph.pb \
--input_checkpoint=checkpoint \
--output_graph=landing_frozen_eval_graph.pb --output_node_names=outputs
Then the error pops up like below.
(ztdl) Jisoos-MacBook-Pro:tf_files jisooyu$ freeze_graph \
> --input_graph=landing_retrained_graph.pb \
> --input_checkpoint=checkpoint \
> --output_graph=landing_frozen_eval_graph.pb --output_node_names=outputs
Traceback (most recent call last):
File "/Users/jisooyu/anaconda3/envs/ztdl/bin/freeze_graph", line 11, in <module>
sys.exit(run_main())
File "/Users/jisooyu/anaconda3/envs/ztdl/lib/python3.7/site-packages/tensorflow/python/tools/freeze_graph.py", line 487, in run_main
app.run(main=my_main, argv=[sys.argv[0]] + unparsed)
File "/Users/jisooyu/anaconda3/envs/ztdl/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/Users/jisooyu/anaconda3/envs/ztdl/lib/python3.7/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/Users/jisooyu/anaconda3/envs/ztdl/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/Users/jisooyu/anaconda3/envs/ztdl/lib/python3.7/site-packages/tensorflow/python/tools/freeze_graph.py", line 486, in <lambda>
my_main = lambda unused_args: main(unused_args, flags)
File "/Users/jisooyu/anaconda3/envs/ztdl/lib/python3.7/site-packages/tensorflow/python/tools/freeze_graph.py", line 378, in main
flags.saved_model_tags, checkpoint_version)
File "/Users/jisooyu/anaconda3/envs/ztdl/lib/python3.7/site-packages/tensorflow/python/tools/freeze_graph.py", line 338, in freeze_graph
input_graph_def = _parse_input_graph_proto(input_graph, input_binary)
File "/Users/jisooyu/anaconda3/envs/ztdl/lib/python3.7/site-packages/tensorflow/python/tools/freeze_graph.py", line 253, in _parse_input_graph_proto
text_format.Merge(f.read(), input_graph_def)
File "/Users/jisooyu/anaconda3/envs/ztdl/lib/python3.7/site-packages/tensorflow/python/lib/io/file_io.py", line 128, in read
pywrap_tensorflow.ReadFromStream(self._read_buf, length))
File "/Users/jisooyu/anaconda3/envs/ztdl/lib/python3.7/site-packages/tensorflow/python/lib/io/file_io.py", line 98, in _prepare_value
return compat.as_str_any(val)
File "/Users/jisooyu/anaconda3/envs/ztdl/lib/python3.7/site-packages/tensorflow/python/util/compat.py", line 117, in as_str_any
return as_str(value)
File "/Users/jisooyu/anaconda3/envs/ztdl/lib/python3.7/site-packages/tensorflow/python/util/compat.py", line 87, in as_text
return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 55: invalid continuation byte
Any suggestions to fix the error are appreciated.

Extract lines corresponding to the minimum value in the last column

I need help with extracting all the lines from the file that has minimum number in the last column, i.e 7 in in this case.
The sample file is as below:
File-1.txt
VALID_PATH : [102, 80, 112, 109, 23, 125, 111] 7
VALID_PATH : [102, 81, 112, 109, 23, 125, 111] 7
VALID_PATH : [102, 112, 37, 109, 23, 125, 111] 7
VALID_PATH : [102, 112, 37, 56, 23, 125, 111] 7
VALID_PATH : [102, 80, 112, 37, 109, 23, 125, 111] 8
VALID_PATH : [102, 80, 112, 37, 56, 23, 125, 111] 8
VALID_PATH : [102, 80, 112, 109, 23, 125, 110, 111] 8
VALID_PATH : [102, 80, 127, 6, 112, 109, 23, 125, 111] 9
VALID_PATH : [102, 80, 127, 88, 112, 109, 23, 125, 111] 9
VALID_PATH : [102, 80, 112, 37, 109, 23, 125, 110, 111] 9
VALID_PATH : [102, 80, 112, 37, 56, 23, 125, 110, 111] 9
VALID_PATH : [102, 80, 127, 6, 112, 37, 109, 23, 125, 111] 10
VALID_PATH : [102, 80, 127, 6, 112, 37, 56, 23, 125, 111] 10
VALID_PATH : [102, 80, 127, 6, 112, 109, 23, 125, 110, 111] 10
Here, I want to extract all the lines that have 7, which is the least value (minimum value) in the last column and save the output into another file File-2.txt by only extracting the values enclosed in [], as shown below.
File-2.txt
102, 80, 112, 109, 23, 125, 111
102, 81, 112, 109, 23, 125, 111
102, 112, 37, 109, 23, 125, 111
102, 112, 37, 56, 23, 125, 111
I could use awk to get the least value as "7" from the last column using the code as below:
awk 'BEGIN{getline;min=max=$NF}
NF{
max=(max>$NF)?max:$NF
min=(min>$NF)?$NF:min
}
END{print min,max}' File-1.txt
and to print only the values in square brackets [] buy using the awk code as below:
awk 'NR > 1 {print $1}' RS='[' FS=']' File-1.txt
but, I am stuck in assigning the least value obtained from first awk script, i.e. 7 in this case to extract the corresponding numbers enclosed in [], as shown in File-2.txt.
Any help in resolving this problem will be appreciated.

#Asha:#try:
awk '{Q=$NF;gsub(/.*\[|\]/,"");$NF="";A[Q]=A[Q]?A[Q] ORS $0:$0;MIN=MIN<Q?(MIN?MIN:Q):Q} END{print A[MIN]}' Input_file
Will add description shortly too.
EDIT: Following is the description on same too.
awk '{
Q=$NF; ##### Making last field of Input_file as NULL.
gsub(/.*\[|\]/,""); ##### Using global substitution functionality of awk to remove everything till [ and then remove ] from the line as per your required output.
$NF=""; ##### Nullifying the last column of each line as you don't need them in your output.
A[Q]=A[Q]?A[Q] ORS $0:$0; ##### creating an array named A whose index is Q variable(whose value is already assigned previously to last column), creating array A with index Q and concatenating it's value in itself.
MIN=MIN<Q?(MIN?MIN:Q):Q} ##### Creating a variable named MIN(to get the minimum last value of each line) and comparing it's value to each line's last field and keeping the minimum value in it as per requirement.
END{print A[MIN]} ##### In end block of code printing the value of array A whose index is variable MIN to print all the lines whose index is variable named MIN.
' Input_file ##### Mentioning the Input_file here.

Reading same file twice, instead of using array practically bit slower, as we read file 2 times, but zero memory overhead.
awk -F'[][]' 'FNR==NR{if(min > $NF || min==""){ min=$NF} next }
$NF==min{ print $2 }' file file
Explanation
awk -F'[][]' 'FNR==NR{ # This block we read file
# and will find whats minimum
if(min > $NF || min==""){
min=$NF # NF gives no of fields, assign the value of $NF to variable min
}
next
}
$NF==min{ # Here we read file 2nd time, if last field value is equal to minimum
print $2
}' file file
Input
$ cat file
VALID_PATH : [102, 80, 112, 109, 23, 125, 111] 7
VALID_PATH : [102, 81, 112, 109, 23, 125, 111] 7
VALID_PATH : [102, 112, 37, 109, 23, 125, 111] 7
VALID_PATH : [102, 112, 37, 56, 23, 125, 111] 7
VALID_PATH : [102, 80, 112, 37, 109, 23, 125, 111] 8
VALID_PATH : [102, 80, 112, 37, 56, 23, 125, 111] 8
VALID_PATH : [102, 80, 112, 109, 23, 125, 110, 111] 8
VALID_PATH : [102, 80, 127, 6, 112, 109, 23, 125, 111] 9
VALID_PATH : [102, 80, 127, 88, 112, 109, 23, 125, 111] 9
VALID_PATH : [102, 80, 112, 37, 109, 23, 125, 110, 111] 9
VALID_PATH : [102, 80, 112, 37, 56, 23, 125, 110, 111] 9
VALID_PATH : [102, 80, 127, 6, 112, 37, 109, 23, 125, 111] 10
VALID_PATH : [102, 80, 127, 6, 112, 37, 56, 23, 125, 111] 10
VALID_PATH : [102, 80, 127, 6, 112, 109, 23, 125, 110, 111] 10
Output
$ awk -F'[][]' 'FNR==NR{ if(min > $NF || min==""){ min=$NF } next }
$NF==min{ print $2 }' file file
102, 80, 112, 109, 23, 125, 111
102, 81, 112, 109, 23, 125, 111
102, 112, 37, 109, 23, 125, 111
102, 112, 37, 56, 23, 125, 111

Using sort as a helper to get a neat code:
$ sort -t\] -nk 2 your_file |awk '$NF!=L && L{exit}{L=$NF;print $2}' FS='[][]'
102, 112, 37, 109, 23, 125, 111
102, 112, 37, 56, 23, 125, 111
102, 80, 112, 109, 23, 125, 111
102, 81, 112, 109, 23, 125, 111

read once (ex: for streaming/piped info) with minimum memory use
awk -F'[][]' '
# init counter
NR == 1 { m = $3 + 1 }
# add or replace content into the buffer if counter is lower or equal
$3 <= m { b = ( $3 == m ? b "\n" : "" ) $2; m = $3 }
# at the end, print buffer
END { print b }
' YourFile

$ awk -F'[][]' -vmin=99999 '$NF<=min{min=$NF;print $2}'
-F'[][]' set FS to regexp [][] which mean "or [ or ]", i.e. your input string will be splited in 3 field.
-vmin=99999 set variable min to 99999. In this variable will be stored minum value of last field
$NF <= min {min = $NF; print $2} if current last field less or equal then stored in variable min,
then update min, and output what we need.

matplotlib error: ValueError: x and y must have same first dimension

I am trying to graph two lists with matplotlib but I am getting an error regarding the dimension of x and y. One of the lists contains dates and the other numbers, you can see the content of the lists, I have printed them below.
I have tried checking the length of the lists with len() and they seem to be equal so I am a bit lost. I have checked several theads on this error without much luck.
Note: "query" contains my SQL query which I have not included for simplicity.
##### My code
t = 0
for row in query:
data = query[t]
date.append(data[0])
close.append(data[1])
t = t + 1
print "date = ", date
print "close = ", close
print "date length = ", len(date)
print "close length = ", len(close)
def plot2():
plt.plot(date, close)
plt.show()
plot2()
#
Output of my script:
date = [datetime.datetime(2010, 1, 31, 22, 0), datetime.datetime(2010, 1, 31, 22, 1), datetime.datetime(2010, 1, 31, 22, 2), datetime.datetime(2010, 1, 31, 22, 3), datetime.datetime(2010, 1, 31, 22, 4), datetime.datetime(2010, 1, 31, 22, 5), datetime.datetime(2010, 1, 31, 22, 6), datetime.datetime(2010, 1, 31, 22, 7), datetime.datetime(2010, 1, 31, 22, 8), datetime.datetime(2010, 1, 31, 22, 9), datetime.datetime(2010, 1, 31, 22, 10)]
close = [1.5945, 1.5946, 1.59465, 1.59505, 1.59525, 1.59425, 1.5938, 1.59425, 1.59425, 1.5939, 1.5939]
date length = 11
close length = 11
Traceback (most recent call last):
File "script.py", line 234, in <module>
plot2()
File "script.py", line 231, in plot2
plt.plot(date, close)
File "/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line 2467, in plot
ret = ax.plot(*args, **kwargs)
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 3893, in plot
for line in self._get_lines(*args, **kwargs):
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 322, in _grab_next_args
for seg in self._plot_args(remaining, kwargs):
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 300, in _plot_args
x, y = self._xy_from_xy(x, y)
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 240, in _xy_from_xy
raise ValueError("x and y must have same first dimension")
ValueError: x and y must have same first dimension
Thanks in advance.

Works for me with your data.
Change your code and put the print statements inside the function.
def plot2():
print "date = ", date
print "close = ", close
print "date length = ", len(date)
print "close length = ", len(close)
plt.plot(date, close)
plt.show()
There must be something happing your code does not show.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Extracting Multiple Tables On Different Pages From Multiple Page PDF With Camelot - pdf

Related

vhost in rabbitmq is not starting

Export inference graph - valueerror: the passed save_path is not a valid checkpoint:

Errors occurred while converting 32-bit float TensorFlow model into 8-bit fixed TensorFlow model

Extract lines corresponding to the minimum value in the last column

matplotlib error: ValueError: x and y must have same first dimension

Categories

Resources