Why does urllib3 fail to download list of files if authentication is required <and> if headers aren't re-created - authentication

*NOTE: I'm posting this for two reasons:
It took me maybe 3-4hrs to figure out a solution (this is my first urllib3 project) and hopefully this will help others who run into this.
I'm curious why urllib3 behaves as described below, as it is (to me anyway) very un-intuitive.*
I'm using urllib3 to first load a list of files and then to download the files that are on the list. The server the files are on requires authentication.
The behavior I ran into is that if I don't re-make the headers before adding each file to the PoolManager, only the first file downloads correctly. The contents of all subsequent files is an error message from the server saying that authentication failed.
However, if I add a line that regenerates the headers (see the commented line in the code snippet below) the download works as expected. Is this intended behavior and if so can anyone explain why the headers can't be re-used (all they contain is my username/password, which doesn't change).
http = urllib3.PoolManager(num_pools=10,maxsize = 10, block=True)
myHeaders = urllib3.util.make_headers(basic_auth=f'{username}:{password}')
files = http.request('GET', url, headers=myHeaders)
file_list = files.data.decode('utf-8')
file_list = file_list.split('<a href="')
file_list_a = [file.split('">')[0] for file in file_list if file.startswith('https://')]
for path in tqdm.tqdm(file_list,desc = 'Downloading'):
output_fn = get_output_filename(path,output_dir)
#___---^^^Re-make headers^^^---___
myHeaders = urllib3.util.make_headers(basic_auth=f'{username}:{password}')
with open(output_fn, 'wb') as out:
r = http.request('GET', path, headers=myHeaders, preload_content=False)
shutil.copyfileobj(r, out)
Thanks in advance,

Related

Getting the source of a webpage with Selenium

After the browser finishes loading a page, I can right-click on the page, and select 'Save Page As'. There, I get an 'HTML Only' option as well as 'Web Page, Complete' option. Of course, the second option creates a directory (to save all the .js files etc.), but interestingly, the main source file is also different. That is, 'HTML Only' creates a file named (e.g.) site.html, while the 'Complete' option creates site.html as well as a site/ directory. The two site.html files are different. Why is that?
Anyway, I try to fetch (with Selenium) the second file, that is, I need to get a file identical to the site.html file saved by the 'Complete' option. It doesn't work. I get a different version of the html source file (I use Selenium's page_source method).
If there's a way to get it, in an automated way, without Selenium, I'm also interested.
There's a non-Selenium way of downloading html files using the requests library. Here's a method that will download a file to a folder of your choice:
import os
import requests
def download_file_to(file_url, destination_folder, new_file_name=None):
if new_file_name:
file_name = new_file_name
else:
file_name = file_url.split("/")[-1]
r = requests.get(file_url)
file_path = os.path.join(destination_folder, file_name)
with open(file_path, "wb") as code:
code.write(r.content)
The standard way of getting the page source with Selenium is:
driver.page_source
Is that the source you're looking for?

can karate read files from outside the classpath with a relative path instead of absolute path

I'm trying to read a properties file for karate-config.js. It works when i provide the absolute path from my local but when i provide a relative path. it doesn't work. Any way around this? Thanks !
var config = karate.read("file:/repo/tests/utils/al_dev.json"); -- This doesn't work
var config = karate.read("file:~/repo/tests/utils/al_dev.json"); -- This doesn't work
var config = karate.read("file:/Users/user1/IdeaProjects/repo/tests/utils/al_dev.json"); -- This works
I was able to get it working. I had to update the path to reflect the project structure and it worked.
var config = karate.read("file:../../utils/al_dev.json");
Project structure:
project1 ->
tests ->
Utils ->
Services ->
client 1 ->
client 2 ->
I took advantage of answer wrote by Peter Thomas and it worked for me, I could read a json body from a file somewhere else in the project, not needed to be in the same folder as features. This is a sample of code I used:
Scenario: POST with json file reading from anywhere.
Given path "/api/apitesting/v1/transactions"
And def projectPath = karate.properties['user.home']
And def filePath = projectPath + "/IdeaProjects/01 Courses TM/karate/src/test/resources/data/TestBodyAnywhere.json"
And def requestBody = read('file:' + filePath)
And request requestBody
When method post
Then status 201
As you can see, I used user.home instead of user.dir, (which use to be recommended but this points directly to the same folder where you are calling it instead of pointing to outside that folder). User.home points directly to your root user, so it would be something like this C:/Users/MyUser. Then, from there you can start indicating the relative path in a new variable to your file. Finally, remember to use 'File:' keyword inside the read method and concatenate it with your path variable.
Hope it helps. ;)
Best regards!
Sorry, Karate (Java) can't resolve these special OS paths. I guess you know that this is not recommended, best practice is all test resources be kept under the project root. Anyway, here is a workaround:
* def home = java.lang.System.getProperty('user.home')
* def temp = read('file:' + home + '/repo/tests/utils/al_dev.json')

S3 Boto3 Stubber doesn't have mapping for download file?

Currently writing tests and trying to make use of the Stubber provided by botocore.
I'm trying:
client = boto3.client("s3")
response = {'Body': 'content'}
expected_params = {'Bucket': 'a_bucket_name', 'Key': 'a_path', 'Filename': 'a_target'}
with Stubber(client) as stubber:
stubber.add_response('download_file', response, expected_params)
download_file(client, "a_bucket_name", "a_path", "a_target")
Where that download file is my own function that just wraps the client download_file call. It works in practice.
However, the test fails on the stubber.add_response due to a 'OperationNotFound' error. I stepped through using the debugger, and the issue appears here in the stub API:
if not hasattr(self.client, method):
raise ValueError(
"Client %s does not have method: %s"
% (self.client.meta.service_model.service_name, method))
# Create a successful http response
http_response = AWSResponse(None, 200, {}, None)
operation_name = self.client.meta.method_to_api_mapping.get(method) <------- Error here
self._validate_response(operation_name, service_response)
There doesn't seem to be a mapping between the two in the dictionary, is this a failure of the stub API or am I missing something?
I've just found this issue, so looks like for once it really is the library and not me:
https://github.com/boto/botocore/issues/974
That's because download_file and upload_file are customizations which live in boto3. They call out to one or many requests under the hood. Right now there's not a great story for supporting customizations other than recording underlying commands they use and adding them to the stubber. There's an external library that can handle that for you, though we don't support it ourselves.

Using Leigh version of S3Wrapper.cfc Can't get past Init

I am new to S3 and need to use it for image storage. I found a half dozen versions of an s2wrapper for cf but it appears that the only one set of for v4 is one modified by Leigh
https://gist.github.com/Leigh-/26993ed79c956c9309a9dfe40f1fce29
Dropped in the com directory and created a "test" page that contains the following code:
s3 = createObject('component','com.S3Wrapper').init(application.s3.AccessKeyId,application.s3.SecretAccessKey);
but got the following error :
So I changed the line 37 from
variables.Sv4Util = createObject('component', 'Sv4').init(arguments.S3AccessKey, arguments.S3SecretAccessKey);
to
variables.Sv4Util = createObject('component', 'Sv4Util').init(arguments.S3AccessKey, arguments.S3SecretAccessKey);
Now I am getting:
I feel like going through Leigh code and start changing things is a bad idea since I have lurked here for year an know Leigh's code is solid.
Does any know if there are any examples on how to use this anywhere? If not what I am doing wrong. If it makes a difference I am using Lucee 5 and not Adobe's CF engine.
UPDATE :
I followed Leigh's directions and the error is now gone. I am addedsome more code to my test page which now looks like this :
<cfscript>
s3 = createObject('component','com.S3v4').init(application.s3.AccessKeyId,application.s3.SecretAccessKey);
bucket = "imgbkt.domain.com";
obj = "fake.ping";
region = "s3-us-west-1"
test = s3.getObject(bucket,obj,region);
writeDump(test);
test2 = s3.getObjectLink(bucket,obj,region);
writeDump(test2);
writeDump(s3);
</cfscript>
Regardless of what I put in for bucket, obj or region I get :
JIC I did go to AWS and get new keys:
Leigh if you are still around or anyone how has used one of the s3Wrappers any suggestions or guidance?
UPDATE #2:
Even after Alex's help I am not able to get this to work. The Link I receive from getObjectLink is not valid and getObject never does download an object. I thought I would try the putObject method
test3 = s3.putObject(bucketName=bucket,regionName=region,keyName="favicon.ico");
writeDump(test3);
to see if there is any additional information, I received this :
I did find this article https://shlomoswidler.com/2009/08/amazon-s3-gotcha-using-virtual-host.html but it is pretty old and since S3 specifically suggests using dots in bucketnames I don't that it is relevant any longer. There is obviously something I am doing wrong but I have spent hours trying to resolve this and I can't seem to figure out what it might be.
I will give you a rundown of what the code does:
getObjectLink returns a HTTP URL for the file fake.ping that is found looking in the bucket imgbkt.domain.com of region s3-us-west-1. This link is temporary and expires after 60 seconds by default.
getObject invokes getObjectLink and immediately requests the URL using HTTP GET. The response is then saved to the directory of the S3v4.cfc with the filename fake.ping by default. Finally the function returns the full path of the downloaded file: E:\wwwDevRoot\taa\fake.ping
To save the file in a different location, you would invoke:
downloadPath = 'E:\';
test = s3.getObject(bucket,obj,region,downloadPath);
writeDump(test);
The HTTP request is synchronous, meaning the file will be downloaded completely when the functions returns the filepath.
If you want to access the actual content of the file, you can do this:
test = s3.getObject(bucket,obj,region);
contentAsString = fileRead(test); // returns the file content as string
// or
contentAsBinary = fileReadBinary(test); // returns the content as binary (byte array)
writeDump(contentAsString);
writeDump(contentAsBinary);
(You might want to stream the content if the file is large since fileRead/fileReadBinary reads the whole file into buffer. Use fileOpen to stream the content.
Does that help you?

How to set image path for fckeditor?

I am using fckeditor for PHP. I have set an absolute path for image uploading. I can upload images, but I am unable to use images that were uploaded. Can anyone help me find my problem?
Here is the code I have changed in my config.php file:
// Path to user files relative to the document root.
$Config['UserFilesPath'] = '/userfiles/' ;
// Fill the following value it you prefer to specify the absolute path for the
// user files directory. Useful if you are using a virtual directory, symbolic
// link or alias. Examples: 'C:\\MySite\\userfiles\\' or '/root/mysite/userfiles/'.
// Attention: The above 'UserFilesPath' must point to the same directory.
$Config['UserFilesAbsolutePath'] = '/var/www/host/mysite//userfiles/' ;
I just solved this frustrating problem after a full day of searching on Google.
The solution is here. Look for:
Returning Full URLs
You can configure the File Browser to return full URLs to FCKeditor, like "http://www.example.com/userfiles/", instead of absolute URLs, like "/userfiles/". To do that, you must configure the connector, combining the UserFilesPath and UserFilesAbsolutePath settings:
UserFilesPath: include here the full URL for the user files directory. For example, set it to "http://www.example.com/userfiles/".
UserFilesAbsolutePath: include here the server path to reach the above URL directory. For example, in a Windows environment, you could have something like "C:/inetpub/mysite/userfiles/", while on Linux, something like "/usr/me/public_html/mysite/userfiles/".
Just adjust the above settings to your installation values and the File Browser will start returning full URLs to the editor.
For your localhost :
$Config['UserFilesPath'] = 'http://localhost/mywebsite/userfiles/' ;
$Config['UserFilesAbsolutePath'] = 'C:\\wamp\www\\mywebsite\\userfiles\\' ;
and in order to get your images from there, use :
$path = 'http://localhost/mywebsite/userfiles/image/myimage.jpg';
Now, For your web server:
$Config['UserFilesPath'] = 'http://localhost/mywebsite/userfiles/' ; // if your webserver named localhost as mine
$Config['UserFilesAbsolutePath'] = '/var/www/vhosts/mywebsite.com/httpdocs/' ;
and the images path remains the same as above.
Check the permission of the folder
Full Subject: FCK editor 2.x: File/image/video upload in different folders for different applications using a single FCKeditor, by making $Config['UserFilesPath'] fully dynamic in a secure way
It can be done in many ways. I am explaining a process, which I applied as per my php applications' code structure. The same code structure/framework I followed for different applications, with each application as a sub-folder in my server. So, there is a logical need to use one single FCKeditor and configure it in some way, so that it work properly for all the applications. The content part of FCKeditor is ok. It can easily be reused by different applications or projects from a single FCKeditor component. But the problem arises with file upload, like image, video or any other document. To make it applicable for different project, the files must be uploaded in separe folders for different projects. And for that $Config['UserFilesPath'] must by configured with dynamic folder path, means different folder path for each project, but calling the the same FCKeditor component in the same location. I am explaning some differnt process together in a step-by-step way. Those worked for me fine with FCKeditor version 2.5.1 and VersionBuild 17566 and I hope they will work for others as well. If it does not work for other developrs, then may be they need to make some tweaks in those process as per their project code structure and folder write permission as well as per the FCKeditor version.
1) In fckeditor\editor\filemanager\connectors\phpconfig.php file
a) Go after global $Config ; and $Config['Enabled'] = false ;
i) There, if want a session dependent secure method: only for single site setting: i.e. one FCKeditor for each one project domain or subdomain, not one FCKeditor for multiple project then place this code:
if(!isset($_SESSION)){
session_start();
}
if(isset($_SESSION['SESSION_SERVER_RELATIVEPATH']) && $_SESSION['SESSION_SERVER_RELATIVEPATH']!="") {
$relative_path=$_SESSION['SESSION_SERVER_RELATIVEPATH'];
include_once($_SERVER['DOCUMENT_ROOT'].$relative_path."configurations/configuration.php");
}
N.B.: Here, $_SESSION['SESSION_SERVER_RELATIVEPATH']: relative folder path of the project corresponding to the webroot; should be like "/project/folder/path/" and set this session variable in a common file in your project where the session started. And there should be a configurations/configuration.php as the configuration file in your project. If it's name or path is different you have to place the corresponding path here instead of configurations/configuration.php
ii) If want to use a single FCKeditor component for different projects represented as different sub-folders and with a session dependent secure way (Assuming different session_name for different projects, to differentiate their sessions in a single server). But it will not work if projects represented as sub-domains or different domains, then have to use the session independent way (iii) provided bellow (though it is insecure). Place this code:
if(!isset($_SESSION)){
session_name($_REQUEST['param_project_to_fck']);
session_start();
}
if(isset($_SESSION['SESSION_SERVER_RELATIVEPATH']) && $_SESSION['SESSION_SERVER_RELATIVEPATH']!="") {
$relative_path=$_SESSION['SESSION_SERVER_RELATIVEPATH'];
include_once($_SERVER['DOCUMENT_ROOT'].$relative_path."configurations/configuration.php");
}
Please read N.B. at the end of previous point, i.e. point (i)
iii) If want to use a single FCKeditor component for different projects represented either different sub-folders as well as sub-domains or domains (though it is not fully secure). Place this code:
if(isset($_REQUEST['param_project_to_fck']) && $_REQUEST['param_project_to_fck']!=""){ //base64 encoded relative folder path of the project corresponding to the webroot; should be like "/project/folder/path/" before encoding
$relative_path=base64_decode($_REQUEST['param_project_to_fck']);
include_once($_SERVER['DOCUMENT_ROOT'].$relative_path."configurations/configuration.php");
}
Please read N.B. at the end of point (i)
b)Now after that for any case you selected, please find this code:
// Path to user files relative to the document root.
$Config['UserFilesPath'] = '/userfiles/' ;
and replace the following code:
if(isset($SERVER_RELATIVEPATH) && $SERVER_RELATIVEPATH==$relative_path) { //to make it relatively secure so that hackers can not create any upload folder automatcally in the server, using a direct link and can not upload files there
$Config['Enabled'] = true ;
$file_upload_relative_path=$SERVER_RELATIVEPATH;
}else{
$Config['Enabled'] = false ;
exit();
}
// Path to user files relative to the document root.
//$Config['UserFilesPath'] = '/userfiles/' ;
//$Config['UserFilesPath'] = $file_upload_relative_path.'userfiles/' ;
$Config['UserFilesPath'] = '/userfiles'.$file_upload_relative_path;
Here $SERVER_RELATIVEPATH is the relative path and it must be set in your project's configuration file included previously.
Here you can set the $Config['UserFilesPath'] with any other dynamic folder path using $file_upload_relative_path variable.In my bluehost linux server, as their was a folder user permission conflict between the project root folder (0755 permission) and the userfiles folder under it and subfolders under userfiles (should be 0777 as per FCKeditor coding), so it does not allow uploading files in those folders. So, I created a folder userfiles at the server webroot (beyond the project root folder), and set the permission to 0777 to it, use the code for the $config setting as :
$Config['UserFilesPath'] = '/userfiles'.$file_upload_relative_path;
But, if you have no problem with write permission in the project's subfolders in your case, then you can use the previous line (commented out in the previous code segment):
$Config['UserFilesPath'] = $file_upload_relative_path.'userfiles/' ;
Mind it, you mast comment out the existing $Config['UserFilesPath'] = '/userfiles/' ; in this file by either replacing or simply commenting out if it exist in other place of the file.
2) If you choose 1) (a) (ii) or (iii) method then open
(a) fckeditor\editor\filemanager\browser\default\browser.html file.
Search for this line: var sConnUrl = GetUrlParam( 'Connector' ) ;
Put these commands after that line:
var param_project_to_fck = GetUrlParam( 'param_project_to_fck' ) ;
Now, Search for this line: sUrl += '&CurrentFolder=' + encodeURIComponent( this.CurrentFolder ) ;
Put this command after that line:
sUrl += '&param_project_to_fck=' + param_project_to_fck ;
(b) Now, open ckeditor\editor\filemanager\browser\default\frmupload.html file.
Search for this line (it should be in the SetCurrentFolder() function):
sUrl += '&CurrentFolder=' + encodeURIComponent( folderPath ) ;
Put this command after that line:
sUrl += '&param_project_to_fck='+window.parent.param_project_to_fck;
3) Now where you want to show the FCKeditor in your project, you have to put those lines first in the corresponding php file/page:
include_once(Absolute/Folder/path/for/FCKeditor/."fckeditor/fckeditor.php") ;
$oFCKeditor = new FCKeditor(Field_name_for_editor_content_area) ;
$oFCKeditor->BasePath = http_full_path_for_FCKeditor_location.'fckeditor/' ;
$oFCKeditor->Height = 400;
$oFCKeditor->Width = 600;
$oFCKeditor->Value =Your_desired_content_to_show_in_editor;
$oFCKeditor->Create() ;
a) Now, if you choose 1) (a) (ii) or (iii) method then place the following code segment before that line: $oFCKeditor->Create() ;
$oFCKeditor->Config["LinkBrowserURL"] = ($oFCKeditor->BasePath)."editor/filemanager/browser/default/browser.html?Connector=../../connectors/php/connector.php&param_project_to_fck=".base64_encode($SERVER_RELATIVEPATH);
$oFCKeditor->Config["ImageBrowserURL"] = ($oFCKeditor->BasePath)."editor/filemanager/browser/default/browser.html?Type=Image&Connector=../../connectors/php/connector.php&param_project_to_fck=".base64_encode($SERVER_RELATIVEPATH);
$oFCKeditor->Config["FlashBrowserURL"] = ($oFCKeditor->BasePath)."editor/filemanager/browser/default/browser.html?Type=Flash&Connector=../../connectors/php/connector.php&param_project_to_fck=".base64_encode($SERVER_RELATIVEPATH);
b) if you chose 1) (a) (ii) method, then in the above code code segment, just replace all the texts: base64_encode($SERVER_RELATIVEPATH) with this one: base64_encode(session_name())
And you are done.
UserFilesPath: include here the full URL for the user files directory. For example, set it to "http://www.example.com/userfiles/".