How to read different file format data and use it for compression - file-io

fob = open('this.txt','rb')
fob1 = open('that.txt','wb')
content = ''
for i in fob:
content += i
fob1.write(content)
fob.close()
fob1.close()
This is a code that is used to read a txt file and store it in a txt file.. How do I read any kind of file??? tht might evn be a jpeg file,pdf file or someother file.. Pls do help me..
Thanks in advance..

Your code reads a *.txt file line by line (and copies it).
If you want to read a different type of file byte by byte, and print its bits you can do this:
f = open('test.gnu','rb')
flag=1;
while flag:
byte = f.read(1)
flag = (byte != "")
if flag:
# do something with the byte, eg:
# print its bits:
print '{0:08b}'.format(ord(byte))
f.close()
Or if you want to zip and unzip files, you can use the package "zipfile"
http://docs.python.org/2/library/zipfile; for code with examples with various compression formats see:
http://pymotw.com/2/compression.html

Related

How can I read two input files and create one output file via Python?

So I have two files and I need to create an output file for them - my prof. wants me to use a while loop to process the first file and a for loop for the second file, as well as use a try/except block to read from the input and write to the output. I think I have like, a general idea for the initial code but I'm still lost.
#reads the file
n1 = open('nameslist1.txt', 'r')
n2 = open('nameslist2.txt', 'r')
print(n1.read())
print(n2.read())
n1.close()
n2.close()
#writes the file
n1_o = open('allnames.txt', 'w')
n2_o = open('allnames.txt', 'w')
n1_o.write('nameslist1.txt')
n2_o.write('nameslist2.txt')
n1_o.close()
n2_o.close()

Skip csv header row using boto3 in lambda and copy_from in psycopg2

I'm loading a csv into memory from s3 and then I need to insert it into postgres. I think the problem is I'm not using the right call for the s3 object or something as I don't appear to be able to skip the header line. On my local machine I would just load the file from the directory:
cur = DBCONN.cursor()
for filename in absolute_file_paths('/path/to/file/csv.log'):
print('Importing: ' + filename)
with open(filename, 'r') as log:
next(log) # Skip the header row.
cur.copy_from(log, 'vesta', sep='\t')
DBCONN.commit()
I have the below in lambda which I would like to work kind of like above, but it's different with s3. What is the correct way to have the below work like above? Or perhaps - what IS the correct way to do this?
s3 = boto3.client('s3')
#Load the file from s3 into memory
obj = s3.get_object(Bucket=bucket, Key=key)
contents = obj['Body']
next(contents, None) # Skip the header row - this does not seem to work
cur = DBCONN.cursor()
cur.copy_from(contents, 'my_table', sep='\t')
DBCONN.commit()
Seemingly, my problem had something to do with an incredibly wide csv file (I have over 200 columns) and somehow that messed up the next() function to not give the next row. SO! I will say that IF your file is not seemingly that wide, then the code I placed in the question should work. Below however is how I got it work, basically by just reading the file into memory and then writing that back to an in memory file after skipping the header row. This honestly seems a little like overkill so I'd be happy if someone could provide something more efficient but seeing as how I spend the last eight hours on this, I'm just happy to have SOMETHING that works.
s3 = boto3.client('s3')
...
def remove_header(contents):
# Reformat the file, removing the header row
data = csv.reader(io.StringIO(contents), delimiter='\t') #read data in
mem_file = io.StringIO() #create in memory file object
next(data) #skip header row
writer = csv.writer(mem_file, delimiter='\t') #set up the csv writer
writer.writerows(data) #write the data in memory to the in mem file
mem_file.getvalue() # Get the string from the buffer
mem_file.seek(0) # Go back to the beginning of the memory stream
return mem_file
...
#Load the file from s3 into memory
obj = s3.get_object(Bucket=bucket, Key=key)
contents = obj['Body'].read().decode('utf-8')
mem_file = remove_header(contents)
#Insert into postgres
try:
cur = DBCONN.cursor()
cur.copy_from(mem_file, 'my_table', sep='\t')
DBCONN.commit()
except BaseException as e:
DBCONN.rollback()
raise e
or if you want to do it with pandas
def remove_header_pandas(contents):
df = pd.read_csv(io.StringIO(contents), sep='\t')
mem_file = io.StringIO()
df.to_csv(mem_file, header=False, index=False) #remove header
mem_file.getvalue()
mem_file.seek(0)
return mem_file

How to read the newly appended lines of a growing log file continuously in julia?

There is shell command:
tail -n0 -f /path/to/growing/log
to display the newly appended lines of a file continuously.
Please guide me in achieving the objective in Julia!
Just repeatedly read the file:
file = open("/path/to/growing/log")
seekend(file) # ignore contents that are already there to match the `-n0` option
while true
sleep(0.2)
data = read(file, String)
!isempty(data) && print(data)
end

moving individual 230 pdf files to already created folders

#nmnhI'm trying to move over 200 pdf files, each to separate folders that are already created and named 2018. The destination path for each is like- GFG-0777>>2018. Each pdf has an unique GFG-0### name that matches the folders I already created that lead to the 2018 destination folders. Not sure how to iterate and get each pdf into the right folder.... :/
I've tried shutil.move which i think is best but have issues with paths I think.
import os
import shutil
srcDir = r'C:\Complete'
#print (srcDir)
dstDir = r'C:\Python27\end_dir'
dirList = os.listdir(srcDir)
for f in dirList:
fp = [f for f in dirList if ".pdf" in f] #list comprehension to iterate task (flat for loop)
for file in fp:
dst = (srcDir+"/"+file[:-4]+"/"+dstDir+"/"+"2018")
shutil.move(os.path.join(srcDir, dst, dstDir))
error: shutil.move(os.path.join(srcDir, dst, dstDir))
TypeError: move() missing 1 required positional argument: 'dst'
AFAICT you are calling
shutil.move(os.path.join(srcDir, dst, dstDir))
without a to.
According to the documentation you need to have a from and to folder.
https://docs.python.org/3/library/shutil.html#shutil.move
I guess your idea was to somehow create a string containing the dst and src :
dst = (srcDir+"/"+file[:-4]+"/"+dstDir+"/"+"2018")
What you actually want is something along this line:
dst_dir = dstDir+"/"+"2018"
src_dir = srcDir+"/"+file[:-4]
shutil.move(src_dir,dst_dir)
Above code is just for demonstration.
If this does not work you could tree or ls -la example a small part of your srcdir and dstdir and we could work something out.
#nmanh
I managed to work it out. Thanks for calling out the issue to create string with src and dst. After removing the string, I tweaked a bit more but found I had too many "file" in code. I had to make two of them "file1" and add a comma in the shutil.move between src and dst.
Thanks again
import os
import shutil
srcDir = r'C:\Complete'
#print (srcDir)
dstDir = r'C:\Python27\end_dir'
dirList = os.listdir(srcDir)
for file in dirList:
fp = [f for f in dirList if ".pdf" in f] #list comprehension to iterate task
(flat for loop)
for file in fp:
if ' ' in file: #removing space in some of pdf names noticed during fp print
file1 = file.split(' ')[0]# removing space continued
else:
file1 = file[:-4]# removing .pdf
final = dstDir+"\\"+file1+"\\2018"
print (srcDir+"\\"+file1+" "+final)
shutil.move(srcDir+"\\"+file,final)

How to modify a line in a file with Erlang OTP module

I got a big file and I would like to replace the first line with other content.
When I use {ok, IoDev} = file:open("/root/FileName", [write, raw, binary]), the whole content is removed.
But when I use {ok, IoDev} = file:open("/root/FileName", [append, raw, binary]) and file:pwrite(S, {bof,0}, <<"new content\n">>), I got the result {error, badarg}.
If I set Location to 0: file:pwrite(S, 0, <<"new content\n">>), the string is appended at tail of the file.
You seem to be confused with the actual file API.
file:open/2 will truncate the file if you pass [write, raw, binary]as you do:
(about write mode): The file is opened for writing. It is created if it does not exist. If the file exists, and if write is not combined with read, the file will be truncated.
So you need to pass either [write, read] or [write, append] as documented.
file:pwrite/3 also works exactly as documented. It allows you to write at a given position in the file. In particular, you cannot pass {bof, 0} as second argument since you opened the file in raw mode:
If IoDevice has been opened in raw mode, some restrictions apply: Location is only allowed to be an integer; and the current position of the file is undefined after the operation.
The following sample code shows how they work:
ok = file:write_file("/tmp/file", "This is line 1.\nThis is line 2.\n"),
{ok, F} = file:open("/tmp/file", [read, write, raw, binary]),
ok = file:pwrite(F, 0, <<"This is line A.\n">>),
ok = file:close(F),
{ok, Content} = file:read_file("/tmp/file"),
io:put_chars(Content),
ok = file:delete("/tmp/file").
It will output:
This is line A.
This is line 2.
This works because text "This is line A.\n" is exactly as long as "This is line 1.\n". It does not really replace the line, but just bytes. If you need to replace the first line with content that has a different length, you need to rewrite the whole content of the file. A common approach is indeed to write a new file and swap them eventually. If the file is small enough, however, you can read it entirely in memory and rewrite it. file:read_file/1 and file:write_file/2 would work:
replace_first_line(Path, NewLine) ->
{ok, Content} = file:read_file(Path),
[FirstLine | Tail] = binary:split(Content, <<"\n">>),
NewContent = [NewLine, <<"\n">> | Tail],
ok = file:write_file(Path, NewContent).
The question is not related to erlang but rather general file operations.
Replacing a line in a file requires to rewrite the file in a whole. The easiest way to do so would be to write all the new content in a new file and then to move the file.