xmllint doesn't work with https - warning: failed to load external entity - xmllint

xmllint does work fine with http://somesite.xml
But it doesn't work with https://somesite.xml
xmllint https://somesite.xml
warning: failed to load external entity "https://somesite.xml"

As a workaround, you could use another utility like curl or wget to download the file first, then pipe it to xmllint.
curl --silent "https://somesite.xml" | xmllint -
Notes:
Use - ("hyphen/minus") for xmllint's filename argument to get its XML input from the standard input stream instead of from a file or URL.
You might want to use --silent (-s) to suppress curl progress/error messages, to prevent those from being parsed by xmllint.
Quotes might be required around the URL if it contains special characters.
This should work for xmllint's XML input over HTTPS, but not sure about a DTD or schema; you might need to download that to a local file first, using a separate curl or wget command.

Related

Error - ERROR 1: Unable to find driver `‘ESRI'. while running ogr2ogr command

I have installed gdal
$ conda install -c esri gdal
And then tried to run the command to merge 2 shapefiles
$ ogr2ogr -f ‘ESRI Shapefile’ n4600e00800_30.tif_highlight-1.shp n4600e00900_30.tif_highlight-1.shp
but getting error
ERROR 1: Unable to find driver `‘ESRI'.
Not sure if the driver needs to be installed separately as couldn't find much on this error.
The ‘...’ quotes are the problem. Try double quotes: -f "ESRI Shapefile"
I'm not sure this is going to do what you expect. ogr2ogr converts between formats, it doesnt merge files. In your case it will results in a shapefile called n4600e00800_30.tif_highlight-1.shp that is identical to n4600e00900_30.tif_highlight-1.shp. Transformations between the same format are generally only useful if you add a filter, are changing coordinate reference system, etc.
What you are probably looking for is ogrmerge.py which should be callable from your CLI after installing GDAL. Your command should look like this then:
ogrmerge.py -o merged.shp n4600e00800_30.tif_highlight-1.shp n4600e00900_30.tif_highlight-1.shp
You can also add -f "ESRI Shapefile" to ensure that it writes to the correct format but it'll guess the format from the extension, .shp is a pretty safe bet that it'll get it right.

Side Runner argument for 'binary' not working

I am trying to specify the binary path to a specific browser location as described on this website https://www.selenium.dev/selenium-ide/docs/en/introduction/command-line-runner. However the example just does not work. Running the command (with the proper path)
selenium-side-runner -c "goog:chromeOptions.binary='/path/to/non-standard/Chrome/install'"
Generates an error:
TypeError: Target browser must be a string, but is <undefined>; did you forget to call forBrowser()?
Any ideas as to what is going on here?
I had this error too! That section of documentation may be outdated. I resolved the error by using additional params instead.
Here is the format:
selenium-side-runner --broswerName=chrome --chromeOptions="binary='/path/to/non-standard/Chrome/install'"
And here is my example:
selenium-side-runner --broswerName=chrome --chromeOptions="binary='C:\Program Files (x86)\Google\Chrome\Application'" sitegrammarstorevariable.side
I have an Applications folder in my $HOMEdirectory on the Mac and this worked for me:
eval selenium-side-runner -c \"browserName=chrome goog:chromeOptions.binary=\'$HOME/Applications/Google/Google Chrome.app/Contents/MacOS/Google Chrome\'\" *.side
eval is used to generalize the command to any user's home directory.
You have to set both the browser name and the path to the binary this way to make it function properly.

scp fails with "protocol error: filename does not match request"

I have a script that uses SCP to pull a file from a remote Linux host on AWS. After running the same code nightly for about 6 months without issue, it started failing today with protocol error: filename does not match request. I reproduced the issue on some simpler filenames below:
$ scp -i $IDENT $HOST_AND_DIR/"foobar" .
# the file is copied successfully
$ scp -i $IDENT $HOST_AND_DIR/"'foobar'" .
protocol error: filename does not match request
# used to work, i swear...
$ scp -i $IDENT $HOST_AND_DIR/"'foobarbaz'" .
scp: /home/user_redacted/foobarbaz: No such file or directory
# less surprising...
The reason for my single quotes was that I was grabbing a file with spaces in the name originally. To deal with the spaces, I had done $HOST_AND_DIR/"'foo bar'" for many months, but starting today, it would only accept $HOST_AND_DIR/"foo\ bar". So, my issue is fixed, but I'm still curious about what's going on.
I Googled the error message, but I don't see any real mentions of it, which surprises me.
Both hosts involved have OpenSSL 1.0.2g in the output of ssh -v localhost, and bash --version says GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)
Any ideas?
I ended up having a look through the source code and found the commit where this error is thrown:
GitHub Commit
remote->local directory copies satisfy the wildcard specified by the
user.
This checking provides some protection against a malicious server
sending unexpected filenames, but it comes at a risk of rejecting
wanted files due to differences between client and server wildcard
expansion rules.
For this reason, this also adds a new -T flag to disable the check.
They have added a new flag -T that will ignore this new check they've added so it is backwards compatible. However, I suppose we should look and find out why the filenames we're using are flagged as restricted.
In my case, I had [] characters in the filename that needed to be escaped using one of the options listed here. for example:
scp USERNAME#IP_ADDR:"/tmp/foo\[bar\].txt" /tmp

wget downloading only PDFs from website

I am trying to download all PDFs from http://www.fayette-pva.com/.
I believe the problem is that when hovering over the link to download the PDF chrome shows the URL in the bottom left hand corner without a .pdf file extension. I saw and used another forum answer similar to this but the .pdf extension was used for the URL when hovering over the PDF link with my cursor. I have tried the same code that is in the link below but it doesn't pick up the PDF files.
Here is the code I have been testing with:
wget --no-directories -e robots=off -A.pdf -r -l1 \
http://www.fayette-pva.com/sales-reports/salesreport03-feb-09feb2015/
I am using this on a single page of which I know that it has a PDF on it.
The complete code should be something like
wget --no-directories -e robots=off -A.pdf -r http://www.fayette-pva.com/
Related answer: WGET problem downloading pdfs from website
I am not sure if downloading the entire website would work and if it wouldn't take forever. How do I get around this and download only the PDFs?
Yes, the problem is precisely what you stated: The URLs do not contain regular or absolute filenames, but are calls to a script/servlet/... which hands out the actual files.
The solution is to use the --content-disposition option, which tells wget to honor the Content-Disposition field in the HTTP response, which carries the actual filename:
HTTP/1.1 200 OK
(...)
Content-Disposition: attachment; filename="SalesIndexThru09Feb2015.pdf"
(...)
Connection: close
This option is supported in wget at least since version 1.11.4, which is already 7 years old.
So you would do the following:
wget --no-directories --content-disposition -e robots=off -A.pdf -r \
http://www.fayette-pva.com/

How to download CSV file with poltergeist using Capybara on phantomjs?

For a integration test, I need to download a CSV file using poltergeist driver with Capybara. In selenium(for example firefox/chrom webdriver), I can specify download directory and it works fine. But in poltergeist, is there a way to specify the download directory or any special configuration?. Basically I need to know how download stuff works using poltergeist,Capybara, Phantomjs.
I can read server response header as Hash using ruby but can not read the server response to get the file content.Any clue? or help please.
Finally I solved the download part by simply using CURL inside Ruby code without using any webdriver. The idea is simple, first of all, I submitted the login form via CURL and saved the cookie into my server and then submitted(via CURL) the CVS Export form using the saved cookie like this
post_data = "p1=d1&p2=d2&p3=d3"
`curl -c cookie.txt -d "userName=USERNAME&password=PASSWORD" LOGIN SUBMIT_URL`
csv_data = `curl -X POST -b cookie.txt -d '#{post_data}' SUBMIT_URL_FOR_DOWNLOAD_CSV`