Return file with only urls that keywords were found - scrapy

I am trying to use scrapy to scrape through a list of urls and get as an output only the urls that a specific keyword was found. I have tried the if statement through shell and it seems to be working --I tried if(..): exit() and it did exit the shell-- but the output of the following is an empty file (using the same url that returned the if statement as true in shell).
def parse(self, response):
filename = f'file.txt'
if (response.css('*').re('keyword')):
with open(filename, 'wb') as f:
f.write(response.url)
self.log(f'Saved file {filename}')

You open file in write-binary mode and try to write strings to it, this should raise an error.
Instead open file in append mode: with open(filename, 'a') as f:
Or just in write mode: with open(filename, 'a') as f:

Related

Upload small file to FastAPI enpoint but UploadFile content is empty

I am trying to upload file (csv) to FastAPI POST endpoint.
This is the server code:
#app.post("/csv/file/preview")
async def post_csv_file_preview(file: UploadFile):
"""
:param file:
:return:
"""
contents = file.file.read()
print(contents)
But contents is empty if the file is small. If I increase the file content (add new lines in the csv file), without any others changes, it is working.
If I directly get the request body, the file content is returned normally.
print(await request.body())
The problem is only with my production server, not localally:
PYTHONPATH=/var/www/api/current /var/www/api/current/venv/bin/python /var/www/api/current/venv/bin/uvicorn main:app --reload --port=8004
I dont understand
Thanks to the comment from Chris, I solved my problem by adding the .seek() method before reading the file contents.
#app.post("/csv/file/preview")
def post_csv_file_preview(file: UploadFile):
"""
:param file:
:return:
"""
file.file.seek(0)
contents = file.file.read()
print(contents)

I'm trying to use input to output to a filename

I'm trying to use input to variable to create a file using the input the filename
The only examples I've seen are print(input)
I'm new to Python but trying to write a functional program
thanks
This is nice beginning for you
def create_file():
fn = input('Enter file name: ').strip()
try:
file = open(fn, 'r')
except IOError:
file = open(fn, 'w')
In Python you can use the open() function to create a file (assuming it will be a text file).
The documentation for it is located here
Using your input variable to create the file, you could do it like so:
file = open(input, 'w+')
This will give you a file object which you can write lines to using the write() function.

Is it possible to get the name of the file when opened with messagebox.askopenfile

I am trying to open a file with file = filedialog.askopenfile(initialdir='./'), but I need also to know the name of the file opened for other purposes. I know that if the user selects a file, file is not None, otherwise it's something like this:
<_io.TextIOWrapper name='/Users/u/Desktop/e/config.py' mode='r' encoding='US-ASCII'>
but _io.TextIOWrapper objects are not sub-scriptable.
By suggestion, I discovered that there exists another function similar to messagebox.askopenfile, askopenfilename, that instead of opening directly the file, returns just the name of the file. If we want also to open the file, we can open and read it manually:
file_name = filedialog.askopenfilename(initialdir='./')
if file_name != '':
with open(file_name, 'r') as file:
string = ''
for line in file:
string += str(line)
print(string)
Even if this is a good way, I'm still thinking that tkinter should have provided this functionality with messagebox.askopenfile directly.
Navigating trough the Python directory we can find the filedialog.py file, which contains the specifications of both functions, which are very similar:
def askopenfilename(**options):
"Ask for a filename to open"
return Open(**options).show()
askopenfile
def askopenfile(mode = "r", **options):
"Ask for a filename to open, and returned the opened file"
filename = Open(**options).show()
if filename:
return open(filename, mode)
return None
As we can see, the first returns the result of the call to the show function, whereas the second returns an opened file.

No such file or directory when opening file in memory with ZipRuby Zip::File

I'm consuming an api that replies with a zip file in the contents of the body of the http response. I'm able to unzip the file and write each file to disk using the example at the zip-ruby wiki (https://bitbucket.org/winebarrel/zip-ruby/wiki/Home):
Zip::Archive.open('filename.zip') do |ar| # except I'm opening from a string in memory
ar.each do |zf|
if zf.directory?
FileUtils.mkdir_p(zf.name)
else
dirname = File.dirname(zf.name)
FileUtils.mkdir_p(dirname) unless File.exist?(dirname)
open(zf.name, 'wb') do |f|
f << zf.read
end
end
end
end
However, I don't want to write the files to disk. Instead, I want create an active record object, and set a paperclip file attachment:
asset = Asset.create(:name => zf.name)
asset.file = open(zf.name, 'r')
asset.save
What's odd is the open statement in the first example that writes the file to disk works consistently. However, when I want to just open the zf (Zip::File) as a generic File in memory, I will sometimes get:
*** Errno::ENOENT Exception: No such file or directory - assets/somefilename.png
How can I assign the Zip::File zipruby creates to the paperclip file without getting this error?

Using fileinput.input() to read gzip files

I'm using fileinput to read some large data:
import gzip
import fileinput
f=gzip.open('/scratch/try.fastq.gz','r')
for line in fileinput.input(f):
print line
However I got errors like:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/share/lib/python2.6/fileinput.py", line 253, in next
line = self.readline()
File "/share/lib/python2.6/fileinput.py", line 345, in readline
self._file = open(self._filename, self._mode)
IOError: [Errno 2] No such file or directory: '#HWI-ST150_0129:2:1:13466:2247#0/1\n'
Cannot fileinput take file object as input? Then how to use fileinput to deal with gzip file?
thx
Nope, the first argument to fileinput.input should be a list of filenames. What you want can be achieved with
for line in gzip.open('/scratch/try.fastq.gz')
print line
fileinput exists to support the idiom where a program reads from a list of files, probably supplied on the command line, or standard input if no files have been specified. If you still want to use it, even though it's useless in your example, you should do
for line in fileinput(['/scratch/try.fastq.gz'], openhook=gzip.open):
print line
As other sources have said, the value for openhook must be a function, but that doesn't mean you can't call a function to return a function. For example, if you want to support multiple different types of incoming files you could write something like this:
import fileinput
import gzip
def get_open_handler(compressed):
if deciding_data:
# mode comes in as 'r' by defualt, but that means binary to `gzip`
return lambda file_name, mode: gzip.open(file_name, mode='rt')
else:
# the default mode of 'r' means text for `open`
return open
# get args here
for line in fileinput.input(args.files, openhook=get_open_handler(args.compressed))
print(line)
As you can see, we are calling a function from openhook, but that function returns another function. In this case, we are fixing the mode of gzip.open, but we can do anything we want, including using functools.partial to bind some values to a function so that when the default filename and mode get passed to the function assigned to openhook, the function will do what you want.