I am using Jython and wish to import a text file that contains many configuration values such as:
QManager = MYQM
ProdDBName = MYDATABASE
etc.
.. and then I am reading the file line by line.
What I am unable to figure out is now that as I read each line and have assigned whatever is before the = sign to a local loop variable named MYVAR and assigned whatever is after the = sign to a local loop variable MYVAL - how do I ensure that once the loop finishes I have a bunch of global variables such as QManager & ProdDBName etc.
I've been working on this for days - I really hope someone can help.
Many thanks,
Bret.
See other question: Properties file in python (similar to Java Properties)
Automatically setting global variables is not a good idea for me. I would prefer global ConfigParser object or dictionary. If your config file is similar to Windows .ini files then you can read it and set some global variables with something like:
def read_conf():
global QManager
import ConfigParser
conf = ConfigParser.ConfigParser()
conf.read('my.conf')
QManager = conf.get('QM', 'QManager')
print('Conf option QManager: [%s]' % (QManager))
(this assumes you have [QM] section in your my.conf config file)
If you want to parse config file without help of ConfigParser or similar module then try:
my_options = {}
f = open('my.conf')
for line in f:
if '=' in line:
k, v = line.split('=', 1)
k = k.strip()
v = v.strip()
print('debug [%s]:[%s]' % (k, v))
my_options[k] = v
f.close()
print('-' * 20)
# this will show just read value
print('Option QManager: [%s]' % (my_options['QManager']))
# this will fail with KeyError exception
# you must be aware of non-existing values or values
# where case differs
print('Option qmanager: [%s]' % (my_options['qmanager']))
Related
#nmnhI'm trying to move over 200 pdf files, each to separate folders that are already created and named 2018. The destination path for each is like- GFG-0777>>2018. Each pdf has an unique GFG-0### name that matches the folders I already created that lead to the 2018 destination folders. Not sure how to iterate and get each pdf into the right folder.... :/
I've tried shutil.move which i think is best but have issues with paths I think.
import os
import shutil
srcDir = r'C:\Complete'
#print (srcDir)
dstDir = r'C:\Python27\end_dir'
dirList = os.listdir(srcDir)
for f in dirList:
fp = [f for f in dirList if ".pdf" in f] #list comprehension to iterate task (flat for loop)
for file in fp:
dst = (srcDir+"/"+file[:-4]+"/"+dstDir+"/"+"2018")
shutil.move(os.path.join(srcDir, dst, dstDir))
error: shutil.move(os.path.join(srcDir, dst, dstDir))
TypeError: move() missing 1 required positional argument: 'dst'
AFAICT you are calling
shutil.move(os.path.join(srcDir, dst, dstDir))
without a to.
According to the documentation you need to have a from and to folder.
https://docs.python.org/3/library/shutil.html#shutil.move
I guess your idea was to somehow create a string containing the dst and src :
dst = (srcDir+"/"+file[:-4]+"/"+dstDir+"/"+"2018")
What you actually want is something along this line:
dst_dir = dstDir+"/"+"2018"
src_dir = srcDir+"/"+file[:-4]
shutil.move(src_dir,dst_dir)
Above code is just for demonstration.
If this does not work you could tree or ls -la example a small part of your srcdir and dstdir and we could work something out.
#nmanh
I managed to work it out. Thanks for calling out the issue to create string with src and dst. After removing the string, I tweaked a bit more but found I had too many "file" in code. I had to make two of them "file1" and add a comma in the shutil.move between src and dst.
Thanks again
import os
import shutil
srcDir = r'C:\Complete'
#print (srcDir)
dstDir = r'C:\Python27\end_dir'
dirList = os.listdir(srcDir)
for file in dirList:
fp = [f for f in dirList if ".pdf" in f] #list comprehension to iterate task
(flat for loop)
for file in fp:
if ' ' in file: #removing space in some of pdf names noticed during fp print
file1 = file.split(' ')[0]# removing space continued
else:
file1 = file[:-4]# removing .pdf
final = dstDir+"\\"+file1+"\\2018"
print (srcDir+"\\"+file1+" "+final)
shutil.move(srcDir+"\\"+file,final)
I use a config file (type .ini) to save my SQL queries, then i get a query by its key. All work fine, until creating a query with parameters, example :
;the ini file
product_by_cat = select * from products where cat =%s
I use :
config = configparser.ConfigParser()
args= ('cat1')
config.read(path_to_ini_file)
query= config.get(section_where_are_stored_thequeries,key_of_the_query)
complete_query= query%args
I get the error :
TypeError: not all arguments converted during string formatting
So it try to format the string at retrieving the value from the ini file.
Any proposition of my problem.
You can use format function like this
ini file
product_by_cat = select * from products where cat ={}
python:
complete_query= query.format(args)
depending on the versions of ConfigParser (Python 2 or Python 3) you may need to double the % like this or it throws an error:
product_by_cat = select * from products where cat =%%s
Although a better way would be to use the raw version of the config parser, so the % char isn't interpreted
config = configparser.RawConfigParser()
I'm trying to automate writing CSV files to an RSQLite DB.
I am doing so by indexing csvFiles, which is a list of data.frame variables stored in the environment.
I can't seem to figure out why my dbWriteTable() code works perfectly fine when I enter it manually but not when I try to index the name and value fields.
### CREATE DB ###
mydb <- dbConnect(RSQLite::SQLite(),"")
# FOR LOOP TO BATCH IMPORT DATA INTO DATABASE
for (i in 1:length(csvFiles)) {
dbWriteTable(mydb,name = csvFiles[i], value = csvFiles[i], overwrite=T)
i=i+1
}
# EXAMPLE CODE THAT SUCCESSFULLY MANUAL IMPORTS INTO mydb
dbWriteTable(mydb,"DEPARTMENT",DEPARTMENT)
When I run the for loop above, I'm given this error:
"Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'DEPARTMENT': No such file or directory
# note that 'DEPARTMENT' is the value of csvFiles[1]
Here's the dput output of csvFiles:
c("DEPARTMENT", "EMPLOYEE_PHONE", "PRODUCT", "EMPLOYEE", "SALES_ORDER_LINE",
"SALES_ORDER", "CUSTOMER", "INVOICES", "STOCK_TOTAL")
I've researched this error and it seems to be related to my working directory; however, I don't really understand what to change, as I'm not even trying to manipulate files from my computer, simply data.frames already in my environment.
Please help!
Simply use get() for the value argument as you are passing a string value when a dataframe object is expected. Notice your manual version does not have DEPARTMENT quoted for value.
# FOR LOOP TO BATCH IMPORT DATA INTO DATABASE
for (i in seq_along(csvFiles)) {
dbWriteTable(mydb,name = csvFiles[i], value = get(csvFiles[i]), overwrite=T)
}
Alternatively, consider building a list of named dataframes with mget and loop element-wise between list's names and df elements with Map:
dfs <- mget(csvfiles)
output <- Map(function(n, d) dbWriteTable(mydb, name = n, value = d, overwrite=T), names(dfs), dfs)
I want to convert my pdf files to txt files and used pdfminer3k module & pdf2txt.py, however, I got an error.
pdf2txt.py -o file.txt -t tag file.pdf
This is my code at cmd screen.
Traceback (most recent call last):
File "C:\Python36\lib\site.py", line 67, in
import os
File "C:\Python36\lib\os.py", line 409
yield from walk(new_path, topdown, onerror, followlinks)
^
SyntaxError: invalid syntax
This is an error message that I got.
Could you help me to fix this problem??
Added for reference: Great resourse:
http://www.degeneratestate.org/posts/2016/Jun/15/extracting-tabular-data-from-pdfs/
The -t flag is the type of output. The options are text, tag, xml, and html.
Tag refers to generating a tag for xml. Replace tag with text in your command and try it.
The order of optional input also matters.
You also must invoke python, your command line does'nt know what import means, yet some of your environment seems to be setup. My example is for windows cmd from Anaconda3\Scripts directory. If your in juptyer notebook or a console, you should be able to run import pdf2txt with the .py
To setup your environment you need to append the os.path.append(yourpdfdirectory) otherwise file.pdf will not be found.
Try python pdf2txt.py -t text -o file.txt file.pdf
Or if you are brave...this is how to do programmatically. The trouble with xml is if you want to get the text, each character from xml tree is returned in an arbitrary order. You can get it to work but you need to build the string character by character which is not that hard, its just logically time consuming.
fp = open(filesin,'rb')
parser = PDFParser(fp)
doc = PDFDocument()
parser.set_document(doc)
doc.set_parser(parser)
doc.initialize('')
rsrcmgr = PDFResourceManager(caching=False)
laparams = LAParams(all_texts=True)
laparams.boxes_flow = -0.2
laparams.paragraph_indent = 0.2
laparams.detect_vertical = False
#laparams.heuristic_word_margin = 0.03
laparams.word_margin = 0.2
laparams.line_margin = 0.3
outfp = open(filesin+".out.tag" ,'wb')
device = PDFPageAggregator(rsrcmgr, laparams=laparams)
interpreter = PDFPageInterpreter(rsrcmgr, device)
#process_pdf(rsrcmgr, device, pdfparse, pagenos,caching=c, check_extractable=True)
for p,page in enumerate(doc.get_pages()):
if p == 0: #temporary for page 1
interpreter.process_page(page)
layout = device.get_result()
alltextinbox = ''
#This is a rich environment so categorization of this object hierarchy is needed
for c,lt_obj in enumerate(layout):
#print(type(lt_obj),"This is type ",c,"th object on the ",p,"th page")
if isinstance(lt_obj,LTTextBoxHorizontal) or isinstance(lt_obj,LTTextBox) or isinstance(lt_obj,LTTextLine):
print("Type ,",type(lt_obj)," and text ..",lt_obj.get_text())
obj_textbox_line.update({lt_obj:lt_obj.get_text()})
elif p != 0:
pass
fp.close()
#print(obj_textbox_line)
#call the column finder here
#check_matching("example", "example1")
#text_doc_df = pd.DataFrame(obj_textbox_line,columns=['text'])
#print (text_doc_df)
pass
I'm working on a generic row/column matcher. If you don't want to bother, you can buy this software already for like 150 bucks for a pro converter.
So heres the issue guys,
I have a very simple little program that reads in some setup details from a file (to make it reuseable for other sets of data) and stores them into variables.
It then uses one of those variables to open another file that I need to write some results to, as well as various search parameters.
When passing the variable to the .open() function, it fails saying it cant find the file, but when passing the exact same information, but as a written string instead of a variable, it works.
Is this a known problem, or am I just doing something wrong?
The code(problem bit bolded)
def urlTrawl(filename):
import urllib
read = open(getMediaPath(filename), "rt")
baseurl = read.readline()
orgurl = read.readline()
lasturlfile = read.readline()
linksfile = read.readline()
read.close()
webpage = ""
links = ""
counter = 0
lasturl = ""
nexturl = ""
url = ""
connection = ""
try:
read = open(lasturlfile, "rt")
lasturl = read.readline()
except IOError:
print "IOError"
webpage = connection.read()
connection.close()
**file = open(linksfile, "wt")**
file.close()
file = open(lasturlfile, "wt")
file.write(nexturl)
return 1
The information being passed in
http://www.questionablecontent.net/
http://www.questionablecontent.net/view.php?comic=2480
C:\\Users\\James\\Desktop\\comics\\qclast.txt
C:\\Users\\James\\Desktop\\comics\\comiclinksqc.txt
strip\"
src=\"
\"
Pevious
Next
f=\"
\"
EDIT: removed working code, to narrow down the problem area and updated code to use a direct reference rather then a relative one.
I found the problem in the end.
The problem was that it was reading in the \n at the end of each line in my details file, and of course the \n isn't anywhere in the website data I'm reading. Removing the last character of each read did the trick:
baseurl = baseurl[:-1]
orgurl = orgurl[:-1]
lasturlfile = lasturlfile[:-1]
linksfile = linksfile[:-1]
search1 = search1[:-1]
search2 = search2[:-1]
search3 = search3[:-1]
search4 = search4[:-1]
search5 = search5[:-1]
search6 = search6[:-1]
I might not be right, but I think this is what's happening.
You're saying this works fine:
file = open('C:\\Users\\James\\Desktop\\comics\\comiclinksqc.txt', "wt")
But this doesn't:
# After reading three lines
linksfile = read.readline()
file = open(linksfile, "wt")
There is a difference between these two. In the first piece of code, the double slashes are escapes. They resolve to single slashes when Python is done parsing. Like so:
>>> print 'C:\\Users\\James\\Desktop\\comics\\comiclinksqc.txt'
C:\Users\James\Desktop\comics\comiclinksqc.txt
But when you read that same text from the file, there's no parsing of the text. That means that the string stored in your variable still has double slashes.
Try this command out. I bet it fails the same way as when you read the file path in:
file = open(r'C:\\Users\\James\\Desktop\\comics\\comiclinksqc.txt', "wt")
The r stands for "raw"; it prevents Python from interpreting escape characters. If it does fail the same way, then the double slashes are your problem. To fix it, in your file, you need to remove the double slashes:
C:\Users\James\Desktop\comics\comiclinksqc.txt
This isn't a problem in CPython 2.7; I'm betting it's not in 3.x, either. CPython interprets double slashes in some manner that they are effectively a single slash (in most cases, at least). So this may be an issue specific to Jython.
If unclean paths cause errors, you might want to consider doing something to clean them up. os.path.abspath might be helpful, although I can't say if Jython's implementation works as well as CPython's:
>>> print os.path.abspath(r'C:\\Users\\James\\Desktop\\comics\\comiclinksqc.txt')
C:\Users\James\Desktop\comics\comiclinksqc.txt
>>> print os.path.abspath(r'C:/Users/James/Desktop/comics/comiclinksqc.txt')
C:\Users\James\Desktop\comics\comiclinksqc.txt
I am trying to create a script which will list the datasource name and will show the connection pool utilization(pooled connection, Free Pool Size ext.)
But facing the issue when list the connection pool, if the data source name having space in between the name like "Default Datasource"
then it is listing list "Default Datasource and it is not parsing the datasource name correctly to the next function.
datasource = AdminConfig.list('DataSource', AdminConfig.getid( '/Cell:'
+ cell + '/')).splitlines()
for datasourceID in datasource:
datasourceName = datasourceID.split('(')[0]
print datasourceName
Request you to help if possible drop me mail at bubuldey#gmail.com
Regards,
Bubul