wait until the blob storoage folder is created - pandas

I would like to download a picture into a blob folder.
Before that I need to create the folder first.
Below codes are what I am doing.
The issue is the folder needs time to be created.
When it comes to with open(abs_file_name, "wb") as f:
it can not find the folder.
I am wondering whether there is an 'await' to get to know the completion of the folder creation, then do the write operation.
for index, row in data.iterrows():
url = row['Creatives']
file_name = url.split('/')[-1]
r = requests.get(url)
abs_file_name = lake_root + file_name
dbutils.fs.mkdirs(abs_file_name)
if r.status_code == 200:
with open(abs_file_name, "wb") as f:
f.write(r.content)

The final sub folder will not be created when using dbutils.fs.mkdirs() on blob storage.
It creates a file with the final sub folder name which would be considered as a directory, but it is not a directory. Look at the following demonstration:
dbutils.fs.mkdirs('/mnt/repro/s1/s2/s3.csv')
When I try to open this file, the error says that this is a directory.
This might be the issue with the code. So, try using the following code instead:
for index, row in data.iterrows():
url = row['Creatives']
file_name = url.split('/')[-1]
r = requests.get(url)
abs_file_name = lake_root + 'fail' #creates the fake directory (to counter the problem we are facing above)
dbutils.fs.mkdirs(abs_file_name)
if r.status_code == 200:
with open(lake_root + file_name, "wb") as f:
f.write(r.content)

Related

Getting the root directory of a file in Google Drive using API

I need to get the full path of folders where a file is located in Google Drive. I'm getting the files themselves using the Google Drive API, but I need information about it's parent folders
I'm using the following code tothe the list of spreadsheets in a Shared Drive:
from googleapiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools
# Change the value of SCOPES to 'https://www.googleapis.com/auth/drive'
# if you want to be able to read and write to the user's Google Drive.
SCOPES = 'https://www.googleapis.com/auth/drive'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
creds = tools.run_flow(flow, store)
DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))
folder_id = "1Z1GzY-D3I3qwQu3oxIW-L1a9nXgD0PXl"
query = "mimeType='application/vnd.google-apps.spreadsheet'"
query+= "and fullText contains 'CLAS' and trashed = false"
# query += " and parents in '" + folder_id + "'"
spreadsheets = []
# Initialize the page token
next_page_token = None
# Loop until all pages of results have been retrieved
while True:
# Execute the list request
response = DRIVE.files().list(
q=query,
corpora='drive',
includeItemsFromAllDrives=True,
driveId='0AEJNMySKcEzsUk9PVA',
supportsAllDrives=True,
# orderBy='folder',
pageSize=1000,
fields='nextPageToken, files(id, name, parents, mimeType, webViewLink)',
pageToken=next_page_token,
).execute()
# Append the results to the list
spreadsheets.extend(response.get('files', []))
# Check if there is another page of results
next_page_token = response.get('nextPageToken', None)
if next_page_token is None:
break
# Set the page token for the next iteration
# parameters['pageToken'] = next_page_token
# Print the number of results
print(f'Last spreadsheet found: {spreadsheets[-1]["name"]}. Number of spreadsheets: {len(spreadsheets)}')
This returns a list of dictionaries with the specified fields. I would like to know the names of the parent folders for each file, for which I'm trying:
from googleapiclient.errors import HttpError
for item in spreadsheets:
if 'parents' in item:
parent_folders_list = []
parent_id = item['parents'][0]
try:
while parent_id:
folder=DRIVE.files().get(fileId=parent_id, fields='name, id, parents').execute()
parent_folders_list.append(folder.get("parents", []))
if parent_id:
parent_id = parent_id[0]
except HttpError as error:
print('An error occurred: %s' % error)
print(f'{item["name"]} is in {parent_folders_list}')
And I've been able to identify that parent_id is correctly retrieved, and that I am able to access it, as I was able to open it in the browser. However, I get back errors 'File Not Found' for all parent_id. I wonder if the DRIVE.files().get(fileId=) is the correct way to get back a folder using the API.
Any help would be greatly appreciated.

WDIO-test reads the file incorrectly

I am writing a test that needs to read a file. I wrote the following function:
upload_file = (file_name, input_class) ->
await browser.executeScript("window.document.getElementsByClassName('" +
input_class + "')[0].style.display = 'block'", [])
file_path = path.join(FILE_PATH, file_name)
await $('input.' + input_class).then((res) ->
return res.setValue(file_path)
)
However, the test fails to test the script
In the code that is executed after the file is uploaded to the site, I output the file that was read. All information is correct except file size which is 0
Please help me

Dropbox - Automatic Refresh token Using oauth 2.0 with offlineaccess

I now: the automatic token refreshing is not a new topic.
This is the use case that generate my problem: let's say that we want extract data from Dropbox. Below you can find the code: for the first time works perfectly: in fact 1) the user goes to the generated link; 2) after allow the app coping and pasting the authorization code in the input box.
The problem arise when some hours after the user wants to do the same operation. How to avoid or by-pass the newly generation of authorization code and go straight to the operation?enter code here
As you can see in the code in a short period is possible reinject the auth code inside the code (commented in the code). But after 1 hour or more this is not loger possible.
Any help is welcome.
#!/usr/bin/env python3
import dropbox
from dropbox import DropboxOAuth2FlowNoRedirect
'''
Populate your app key in order to run this locally
'''
APP_KEY = ""
auth_flow = DropboxOAuth2FlowNoRedirect(APP_KEY, use_pkce=True, token_access_type='offline')
target='/DVR/DVR/'
authorize_url = auth_flow.start()
print("1. Go to: " + authorize_url)
print("2. Click \"Allow\" (you might have to log in first).")
print("3. Copy the authorization code.")
auth_code = input("Enter the authorization code here: ").strip()
#auth_code="3NIcPps_UxAAAAAAAAAEin1sp5jUjrErQ6787_RUbJU"
try:
oauth_result = auth_flow.finish(auth_code)
except Exception as e:
print('Error: %s' % (e,))
exit(1)
with dropbox.Dropbox(oauth2_refresh_token=oauth_result.refresh_token, app_key=APP_KEY) as dbx:
dbx.users_get_current_account()
print("Successfully set up client!")
for entry in dbx.files_list_folder(target).entries:
print(entry.name)
def dropbox_list_files(path):
try:
files = dbx.files_list_folder(path).entries
files_list = []
for file in files:
if isinstance(file, dropbox.files.FileMetadata):
metadata = {
'name': file.name,
'path_display': file.path_display,
'client_modified': file.client_modified,
'server_modified': file.server_modified
}
files_list.append(metadata)
df = pd.DataFrame.from_records(files_list)
return df.sort_values(by='server_modified', ascending=False)
except Exception as e:
print('Error getting list of files from Dropbox: ' + str(e))
#function to get the list of files in a folder
def create_links(target, csvfile):
filesList = []
print("creating links for folder " + target)
files = dbx.files_list_folder('/'+target)
filesList.extend(files.entries)
print(len(files.entries))
while(files.has_more == True) :
files = dbx.files_list_folder_continue(files.cursor)
filesList.extend(files.entries)
print(len(files.entries))
for file in filesList :
if (isinstance(file, dropbox.files.FileMetadata)) :
filename = file.name + ',' + file.path_display + ',' + str(file.size) + ','
link_data = dbx.sharing_create_shared_link(file.path_lower)
filename += link_data.url + '\n'
csvfile.write(filename)
print(file.name)
else :
create_links(target+'/'+file.name, csvfile)
#create links for all files in the folder belgeler
create_links(target, open('links.csv', 'w', encoding='utf-8'))
listing = dbx.files_list_folder(target)
#todo: add implementation for files_list_folder_continue
for entry in listing.entries:
if entry.name.endswith(".pdf"):
# note: this simple implementation only works for files in the root of the folder
res = dbx.sharing_get_shared_links(
target + entry.name)
#f.write(res.content)
print('\r', res)

xhtml2pdf in django - issue link_callback

I am trying to generate a pdf from an html-template in django. Therefore i use pretty much the basic methods found in the web. The issue is that in can get it to generate the content of my template, but not the images from my media directory. I always get the following error:
SuspiciousFileOperation at /manager/createpdf/
The joined path is located outside of the base path component
Since i can get some result, i assume that nothing is wrong with my view. Here is my render_to_pdf
def render_to_pdf(template_src, context_dict):
template = get_template(template_src)
html = template.render(context_dict)
result = BytesIO()
pdf = pisa.pisaDocument(BytesIO(html.encode('UTF-8')), result, link_callback=link_callback)
if not pdf.err:
return HttpResponse(result.getvalue(), content_type='application/pdf')
return None
and the link_callback:
def link_callback(uri, rel):
result = finders.find(uri)
if result:
if not isinstance(result, (list, tuple)):
result = [result]
result = list(os.path.realpath(path) for path in result)
path=result[0]
else:
sUrl = settings.STATIC_URL
sRoot = settings.STATIC_ROOT
mUrl = settings.MEDIA_URL
mRoot = settings.MEDIA_ROOT
if uri.startswith(mUrl):
path = os.path.join(mRoot, uri.replace(mUrl, ""))
elif uri.startswith(sUrl):
path = os.path.join(sRoot, uri.replace(sUrl, ""))
else:
return uri
# make sure that file exists
if not os.path.isfile(path):
raise Exception( 'media URI must start with %s or %s' % (sUrl, mUrl))
return path
I am pretty much sure that the link_callback doesnt do it's purpose. But my knowledge is to little to patch it. I also assume that i configured the media directory correctly. I can access the media files in other views/templates.
Help is very appreciated, since i spend quiet some hours on this issue... A big thx to all the are going to contribute here!
OK, i found it! In the finders.py i checked the find-method. It turns out that the find method only looks for files in the static directory and disregards the media directory. i just deleted all the lins and it works now. Here is the code:
def link_callback(uri, rel):
sUrl = settings.STATIC_URL
sRoot = settings.STATIC_ROOT
mUrl = settings.MEDIA_URL
mRoot = settings.MEDIA_ROOT
if uri.startswith(mUrl):
path = os.path.join(mRoot, uri.replace(mUrl, ""))
elif uri.startswith(sUrl):
path = os.path.join(sRoot, uri.replace(sUrl, ""))
else:
return uri
if not os.path.isfile(path):
raise Exception( 'media URI must start with %s or %s' % (sUrl, mUrl))
return path

Download File from s3 using Boto via proxy server

I have a file to this address:
http://s3.amazonaws.com/bucket-name/sdile_pr_2_1_1/pr/0/2/1/1/dile_0_2_1_1.nc
in a s3 bucket, that i want to make accessible via a flask app.
to do so i created a function that looks like this:
#app.route('/select/dile')
def select_dile_by_uri():
uri=request.args.get('uri')
if uri is not None:
if uri.startswith("http://s3.amazonaws.com/"):
path = uri.replace("http://s3.amazonaws.com/","")
bname, kstr = path.split("/",1) # split the bname from the key string
conn = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
try:
bucket = conn.get_bucket(bname)
except:
print "BUCKET NOT FOUND"
return str("ERROR: bucket "+bname+" not found")
else:
print "BUCKET CONNECTED"
try:
key = bucket.get_key(kstr)
print "KEY: ", key
except:
print "KEY NOT FOUND"
return str("ERROR: key "+kstr+"not found")
else:
try:
key.open_read() # opens the file
headers = dict(key.resp.getheaders()) # request the headers
return Response(key, headers=headers) # return a response
except S3ResponseError as e:
return Response(e.body, status=e.status, headers=key.resp.getheaders())
abort(400)
the download works, but the name of the downloaded file appears to be only "dile" instead of dile_0_2_1_1.nc .
How come ? is there something i needed to set?
what i needed to do was add a field into the headers, specifically:
headers["Content-Disposition"] = "inline; filename=myfilename"
where -myfilename- is the name you want the file to have.