moving individual 230 pdf files to already created folders - gae-python27

#nmnhI'm trying to move over 200 pdf files, each to separate folders that are already created and named 2018. The destination path for each is like- GFG-0777>>2018. Each pdf has an unique GFG-0### name that matches the folders I already created that lead to the 2018 destination folders. Not sure how to iterate and get each pdf into the right folder.... :/
I've tried shutil.move which i think is best but have issues with paths I think.
import os
import shutil
srcDir = r'C:\Complete'
#print (srcDir)
dstDir = r'C:\Python27\end_dir'
dirList = os.listdir(srcDir)
for f in dirList:
fp = [f for f in dirList if ".pdf" in f] #list comprehension to iterate task (flat for loop)
for file in fp:
dst = (srcDir+"/"+file[:-4]+"/"+dstDir+"/"+"2018")
shutil.move(os.path.join(srcDir, dst, dstDir))
error: shutil.move(os.path.join(srcDir, dst, dstDir))
TypeError: move() missing 1 required positional argument: 'dst'

AFAICT you are calling
shutil.move(os.path.join(srcDir, dst, dstDir))
without a to.
According to the documentation you need to have a from and to folder.
https://docs.python.org/3/library/shutil.html#shutil.move
I guess your idea was to somehow create a string containing the dst and src :
dst = (srcDir+"/"+file[:-4]+"/"+dstDir+"/"+"2018")
What you actually want is something along this line:
dst_dir = dstDir+"/"+"2018"
src_dir = srcDir+"/"+file[:-4]
shutil.move(src_dir,dst_dir)
Above code is just for demonstration.
If this does not work you could tree or ls -la example a small part of your srcdir and dstdir and we could work something out.

#nmanh
I managed to work it out. Thanks for calling out the issue to create string with src and dst. After removing the string, I tweaked a bit more but found I had too many "file" in code. I had to make two of them "file1" and add a comma in the shutil.move between src and dst.
Thanks again
import os
import shutil
srcDir = r'C:\Complete'
#print (srcDir)
dstDir = r'C:\Python27\end_dir'
dirList = os.listdir(srcDir)
for file in dirList:
fp = [f for f in dirList if ".pdf" in f] #list comprehension to iterate task
(flat for loop)
for file in fp:
if ' ' in file: #removing space in some of pdf names noticed during fp print
file1 = file.split(' ')[0]# removing space continued
else:
file1 = file[:-4]# removing .pdf
final = dstDir+"\\"+file1+"\\2018"
print (srcDir+"\\"+file1+" "+final)
shutil.move(srcDir+"\\"+file,final)

Related

Skip csv header row using boto3 in lambda and copy_from in psycopg2

I'm loading a csv into memory from s3 and then I need to insert it into postgres. I think the problem is I'm not using the right call for the s3 object or something as I don't appear to be able to skip the header line. On my local machine I would just load the file from the directory:
cur = DBCONN.cursor()
for filename in absolute_file_paths('/path/to/file/csv.log'):
print('Importing: ' + filename)
with open(filename, 'r') as log:
next(log) # Skip the header row.
cur.copy_from(log, 'vesta', sep='\t')
DBCONN.commit()
I have the below in lambda which I would like to work kind of like above, but it's different with s3. What is the correct way to have the below work like above? Or perhaps - what IS the correct way to do this?
s3 = boto3.client('s3')
#Load the file from s3 into memory
obj = s3.get_object(Bucket=bucket, Key=key)
contents = obj['Body']
next(contents, None) # Skip the header row - this does not seem to work
cur = DBCONN.cursor()
cur.copy_from(contents, 'my_table', sep='\t')
DBCONN.commit()
Seemingly, my problem had something to do with an incredibly wide csv file (I have over 200 columns) and somehow that messed up the next() function to not give the next row. SO! I will say that IF your file is not seemingly that wide, then the code I placed in the question should work. Below however is how I got it work, basically by just reading the file into memory and then writing that back to an in memory file after skipping the header row. This honestly seems a little like overkill so I'd be happy if someone could provide something more efficient but seeing as how I spend the last eight hours on this, I'm just happy to have SOMETHING that works.
s3 = boto3.client('s3')
...
def remove_header(contents):
# Reformat the file, removing the header row
data = csv.reader(io.StringIO(contents), delimiter='\t') #read data in
mem_file = io.StringIO() #create in memory file object
next(data) #skip header row
writer = csv.writer(mem_file, delimiter='\t') #set up the csv writer
writer.writerows(data) #write the data in memory to the in mem file
mem_file.getvalue() # Get the string from the buffer
mem_file.seek(0) # Go back to the beginning of the memory stream
return mem_file
...
#Load the file from s3 into memory
obj = s3.get_object(Bucket=bucket, Key=key)
contents = obj['Body'].read().decode('utf-8')
mem_file = remove_header(contents)
#Insert into postgres
try:
cur = DBCONN.cursor()
cur.copy_from(mem_file, 'my_table', sep='\t')
DBCONN.commit()
except BaseException as e:
DBCONN.rollback()
raise e
or if you want to do it with pandas
def remove_header_pandas(contents):
df = pd.read_csv(io.StringIO(contents), sep='\t')
mem_file = io.StringIO()
df.to_csv(mem_file, header=False, index=False) #remove header
mem_file.getvalue()
mem_file.seek(0)
return mem_file

How to iterate over a list of csv files and compile files with common filenames into a single csv as multiple columns

I am currently iterating through a list of csv files and want to combine csv files with common filename strings into a single csv file merging the data from the new csv file as a set of two new columns. I am having trouble with the final part of this in that the append command adds the data as rows at the base of the csv. I have tried with pd.concat, but must be going wrong somewhere. Any help would be much appreciated.
**Note the code is using Python 2 - just for compatibility with the software I am using - Python 3 solution welcome if it translates.
Here is the code I'm currently working with:
rb_headers = ["OID_RB", "Id_RB", "ORIG_FID_RB", "POINT_X_RB", "POINT_Y_RB"]
for i in coords:
if fnmatch.fnmatch(i, '*RB_bank_xycoords.csv'):
df = pd.read_csv(i, header=0, names=rb_headers)
df2 = df[::-1]
#Export the inverted RB csv file as a new csv to the original folder overwriting the original
df2.to_csv(bankcoords+i, index=False)
#Iterate through csvs to combine those with similar key strings in their filenames and merge them into a single csv
files_of_interest = {}
forconc = []
for filename in coords:
if filename[-4:] == '.csv':
key = filename[:39]
files_of_interest.setdefault(key, [])
files_of_interest[key].append(filename)
for key in files_of_interest:
buff_df = pd.DataFrame()
for filename in files_of_interest[key]:
buff_df = buff_df.append(pd.read_csv(filename))
files_of_interest[key]=buff_df
redundant_headers = ["OID", "Id", "ORIG_FID", "OID_RB", "Id_RB", "ORIG_FID_RB"]
outdf = buff_df.drop(redundant_headers, axis=1)
If you want only to merge in one file:
paths_list=['path1', 'path2',...]
dfs = [pd.read_csv(f, header=None, sep=";") for f in paths_list]
dfs=pd.concat(dfs,ignore_index=True)
dfs.to_csv(...)

How to add new file to dataframe

I have a folder where CSV files are stored, AT certain interval a new CSV file(SAME FORMAT) is added to the folder.
I need to detect the new file and add the contents to data frame.
My current code reads all CSV files at once and stores in dataframe , But Dataframe should get updated with the contents of new CSV when a new file(CSV) is added to the folder.
import os
import glob
import pandas as pd
os.chdir(r"C:\Users\XXXX\CSVFILES")
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
#combine all files in the list
df = pd.concat([pd.read_csv(f) for f in all_filenames ])
Let's say you have a path into your folder where new csv are downloaded:
path_csv = r"C:\........\csv_folder"
I assume that your dataframe (the one you want to append to) is created and that you load it into your script (you have probably updated it before, saved to some csv in another folder). Let's assume you do this:
path_saved_df = r"C:/..../saved_csv" #The path to which you've saved the previously read csv:s
filename = "my_old_files.csv"
df_old = pd.read_csv(path + '/' +filename, sep="<your separator>") #e.g. sep =";"
Then, to only read the latest addition of a csv to the folder in path you simply do the following:
list_of_csv = glob.glob(path_csv + "\\\\*.csv")
latest_csv = max(list_of_csv , key=os.path.getctime) #max ensures you only read the latest file
with open(latest_csv) as csv_file:
csv_reader = csv.reader(csv_file , delimiter=';')
new_file = pd.read_csv(latest_csv, sep="<your separator>", encoding ="iso-8859-1") #change encoding if you need to
Your new dataframe is then
New_df = pd.concat([df_old,new_file])

Jython not finding file when using variable to pass file name

So heres the issue guys,
I have a very simple little program that reads in some setup details from a file (to make it reuseable for other sets of data) and stores them into variables.
It then uses one of those variables to open another file that I need to write some results to, as well as various search parameters.
When passing the variable to the .open() function, it fails saying it cant find the file, but when passing the exact same information, but as a written string instead of a variable, it works.
Is this a known problem, or am I just doing something wrong?
The code(problem bit bolded)
def urlTrawl(filename):
import urllib
read = open(getMediaPath(filename), "rt")
baseurl = read.readline()
orgurl = read.readline()
lasturlfile = read.readline()
linksfile = read.readline()
read.close()
webpage = ""
links = ""
counter = 0
lasturl = ""
nexturl = ""
url = ""
connection = ""
try:
read = open(lasturlfile, "rt")
lasturl = read.readline()
except IOError:
print "IOError"
webpage = connection.read()
connection.close()
**file = open(linksfile, "wt")**
file.close()
file = open(lasturlfile, "wt")
file.write(nexturl)
return 1
The information being passed in
http://www.questionablecontent.net/
http://www.questionablecontent.net/view.php?comic=2480
C:\\Users\\James\\Desktop\\comics\\qclast.txt
C:\\Users\\James\\Desktop\\comics\\comiclinksqc.txt
strip\"
src=\"
\"
Pevious
Next
f=\"
\"
EDIT: removed working code, to narrow down the problem area and updated code to use a direct reference rather then a relative one.
I found the problem in the end.
The problem was that it was reading in the \n at the end of each line in my details file, and of course the \n isn't anywhere in the website data I'm reading. Removing the last character of each read did the trick:
baseurl = baseurl[:-1]
orgurl = orgurl[:-1]
lasturlfile = lasturlfile[:-1]
linksfile = linksfile[:-1]
search1 = search1[:-1]
search2 = search2[:-1]
search3 = search3[:-1]
search4 = search4[:-1]
search5 = search5[:-1]
search6 = search6[:-1]
I might not be right, but I think this is what's happening.
You're saying this works fine:
file = open('C:\\Users\\James\\Desktop\\comics\\comiclinksqc.txt', "wt")
But this doesn't:
# After reading three lines
linksfile = read.readline()
file = open(linksfile, "wt")
There is a difference between these two. In the first piece of code, the double slashes are escapes. They resolve to single slashes when Python is done parsing. Like so:
>>> print 'C:\\Users\\James\\Desktop\\comics\\comiclinksqc.txt'
C:\Users\James\Desktop\comics\comiclinksqc.txt
But when you read that same text from the file, there's no parsing of the text. That means that the string stored in your variable still has double slashes.
Try this command out. I bet it fails the same way as when you read the file path in:
file = open(r'C:\\Users\\James\\Desktop\\comics\\comiclinksqc.txt', "wt")
The r stands for "raw"; it prevents Python from interpreting escape characters. If it does fail the same way, then the double slashes are your problem. To fix it, in your file, you need to remove the double slashes:
C:\Users\James\Desktop\comics\comiclinksqc.txt
This isn't a problem in CPython 2.7; I'm betting it's not in 3.x, either. CPython interprets double slashes in some manner that they are effectively a single slash (in most cases, at least). So this may be an issue specific to Jython.
If unclean paths cause errors, you might want to consider doing something to clean them up. os.path.abspath might be helpful, although I can't say if Jython's implementation works as well as CPython's:
>>> print os.path.abspath(r'C:\\Users\\James\\Desktop\\comics\\comiclinksqc.txt')
C:\Users\James\Desktop\comics\comiclinksqc.txt
>>> print os.path.abspath(r'C:/Users/James/Desktop/comics/comiclinksqc.txt')
C:\Users\James\Desktop\comics\comiclinksqc.txt
I am trying to create a script which will list the datasource name and will show the connection pool utilization(pooled connection, Free Pool Size ext.)
But facing the issue when list the connection pool, if the data source name having space in between the name like "Default Datasource"
then it is listing list "Default Datasource and it is not parsing the datasource name correctly to the next function.
datasource = AdminConfig.list('DataSource', AdminConfig.getid( '/Cell:'
+ cell + '/')).splitlines()
for datasourceID in datasource:
datasourceName = datasourceID.split('(')[0]
print datasourceName
Request you to help if possible drop me mail at bubuldey#gmail.com
Regards,
Bubul

Jython - importing a text file to assign global variables

I am using Jython and wish to import a text file that contains many configuration values such as:
QManager = MYQM
ProdDBName = MYDATABASE
etc.
.. and then I am reading the file line by line.
What I am unable to figure out is now that as I read each line and have assigned whatever is before the = sign to a local loop variable named MYVAR and assigned whatever is after the = sign to a local loop variable MYVAL - how do I ensure that once the loop finishes I have a bunch of global variables such as QManager & ProdDBName etc.
I've been working on this for days - I really hope someone can help.
Many thanks,
Bret.
See other question: Properties file in python (similar to Java Properties)
Automatically setting global variables is not a good idea for me. I would prefer global ConfigParser object or dictionary. If your config file is similar to Windows .ini files then you can read it and set some global variables with something like:
def read_conf():
global QManager
import ConfigParser
conf = ConfigParser.ConfigParser()
conf.read('my.conf')
QManager = conf.get('QM', 'QManager')
print('Conf option QManager: [%s]' % (QManager))
(this assumes you have [QM] section in your my.conf config file)
If you want to parse config file without help of ConfigParser or similar module then try:
my_options = {}
f = open('my.conf')
for line in f:
if '=' in line:
k, v = line.split('=', 1)
k = k.strip()
v = v.strip()
print('debug [%s]:[%s]' % (k, v))
my_options[k] = v
f.close()
print('-' * 20)
# this will show just read value
print('Option QManager: [%s]' % (my_options['QManager']))
# this will fail with KeyError exception
# you must be aware of non-existing values or values
# where case differs
print('Option qmanager: [%s]' % (my_options['qmanager']))