Wget file format - pdf

I have to download all site content and then parse the downloaded folder for "*.pdf" files. I am downloading site using wget -r --no-parent http://www.example.com/ But the problem is that sometimes link looks this
http://www.foodmanufuture.eu/dpubs?f=K20
and the dowloaded pdf is downloaded with name "dpubs?f=K20" and file format is not specified, it does not look like this "dpubs?f=K20.pdf", is there a way to check how many pdf files I have in this folder?

Have you tried the --content-disposition flag? From the man page:
If this is set to on, experimental (not fully-functional) support for "Content-Disposition" headers is enabled. This can currently result in extra round-trips to the server for a "HEAD" request, and is known to suffer from a few bugs, which is why it is not currently enabled by default. This option is useful for some file-downloading CGI programs that use "Content-Disposition" headers to describe what the name of a downloaded file should be.
So it tries to ask the server for a filename. I tried it for the URL you gave and it seemed to work.

You could use the command
file filename
Like this:
file pdfurl-guide
pdfurl-guide: PDF document, version 1.5
You could use:
file *
To know exactly which files in your folder are pdf files

Related

CMake FetchContent with HTTP redirection

Trying to download a public file from SharePoint via CMake's FetchContent.
My URL is like the following:
https://myorg.sharepoint.com/:u:/s/myfolder/EdajXJq3IV5HrSs9bKhEFoYByaMZHBYHyftA9GKLAGZ5wA?e=QPdu1N&Download=1
Note I added &Download=1 to the path given by SharePoint to access the file directly. However, my link gets redirect every time I use it. I'm able to download the file using wget & curl:
curl -v -L --cookie tmp.cookie 'https://link.from.above' --output myfile.txt
wget 'https://link.from.above'
Now trying do the same using CMake:
FetchContent_Declare(${MY_TARGET}
URL ${FILE_URL}
)
But that doesn't work. I guess it has something to do with redirection / cookies.
Same problem here. We are currently using the following workaround (which is far from perfect):
Check if the file exists in a specific location in the source directory
If it does not exist, download the file using curl
Pass the path to the downloaded file to FetchContent_Declare
If anyone has better ideas, I would like to hear them!

lessc Option --source-map-rootpath seems not to work

I use lessc 2.7.3. I generate css files via a makefile and use following paths
the makefile is in themes/bodensee
the css is generated in themes/bodensee/css
the less files are in themes/bodensee/less
the maps are in the same folder as the css files.
My problem is that css files misses the themes/bodensee path, so it raises a file not found on css.map files.
lessc -s less/wlb.less --clean-css="--s0 --advanced" --source-map-rootpath=themes/bodensee/ --source-map="css/wlb.css.map" css/wlb.css
The CSS file now contains `sourceMappingURL=css/wlb.css.map``The rootpath does not have any effect.
I also tried a fantasy rootpath and searched for it in the file - it does not appear anywhere. But the option is correct. When I try to missspell the option, LESS drops an error.
What am I missing?
Description of the --source-map-rootpath option from here
Specifies a rootpath that should be prepended to each of the less file paths inside the sourcemap and also to the path to the map file specified in your output css.
Because the basepath defaults to the directory of the input less file, the rootpath defaults to the path from the sourcemap output file to the base directory of the input less file.
Use this option if for instance you have a css file generated in the root on your web server but have your source less/css/map files in a different folder. So for the option above you might have
The problem was indeed related to the Clean-CSS plugin.
I now call
lessc --source-map --clean-css="--s0 --advanced" -s less/wlb.less css/wlb.css which is working.
There is a standalone clean-css program, but that does not generate sources for the Less files. It's not clear if the lessc plugin and the standalone tool are the same or different implementations but both use node.
The standalone cleancss tool removes the source map URL generated by lessc be default (did not play around with the dozens of options).
These Node tools develop very fast and manual/tutorials often are outdated. That's why my make file stopped working. Developers of that tools should really consider not to touch working parameters or features and to keep their code compatible.

Php extension not loaded

Using a .user.ini file with extension=geoip.so (or mysqli.so) I'm trying unsuccessfully to load the relevant module: in the phpinfo() page of Php 7.1 (or even Php5.4) the module is never shown.
1) The .user.ini file is working correctly because I'm able to modify the variable memory_limit.
2) The phpinfo() function correctly shows the extension_dir folder containing .so extensions that I want to load (in the php.ini file this variable is not present, however).
3) The php error log contains no message.
Every suggestion is welcome.
The .user.ini files can only set certain PHP ini settings. It just so happens that the extension setting is not one of them. In fact, according to the manual, the extension setting is only valid in the core php.ini file. So put the extension=geoip.so in your main php.ini file.
As a side note: I use Ubuntu/Debian for most of what I do with PHP. The standard PHP distro that is available through the Debian package archives has extra code compiled into it that allows for a distributed configuration. The way this works is the SAPI module scans a conf.d directory and includes any ini files. Typically when you package an external PHP extension for Debian (which I might add is a pain - I've done it for my own extensions) you include a little ini file that includes the extension (e.g. extension=myext.so). The package installs it in the distributed config directory and it is included into the php.ini file when PHP spins up. Perhaps you meant to install a Debian-based config like this?
Another side note: Since you are probably using a CGI SAPI and might want different sites to load different modules (exclusively), you could perhaps look into getting the Web server to point the CGI PHP at a different php.ini file. I'm just presuming you want to achieve something like this. However loading modules for certain directories using .user.ini files is just not possible.
Try disable or configure selinux. Check selinux audit log.

How to download a file through a link with its extension type with scrapy

I am using scrapy to scrape a website and I can download the file from the page, however everything that is being download is a plain text file. How do I download it with it's extension type? I am downloading scripts and as such, having the proper extension type on my download is necessary.
For example, if I am downloading exploits from exploit-db, the link that I go to to download them would be for example: https://www.exploit-db.com/exploits/19832/
and the link i would extract from there to download from is https://www.exploit-db.com/download/19832 which will, if I click on it normally, download a ruby file. But through scrapy it gets saved as a text file. Is there a way to download it as a .rb through scrapy?
Just save it as filename.rb. All files are text/binary files. Extension is there just to tell your operating system what to use to understand that file.
(In some operating systems extension isn't even required since files have headers at the beginning of the file telling what they are)
You can do try this:
scrapy shell https://www.exploit-db.com/download/19832
Then in the shell or your spider just do:
with open('ruby_file.rb', 'wb') as ruby_file:
ruby_file.write(response.body)

Trying to create .htaccess file using ftp doesn't work

I want to create a .htaccess file in a specific directory. I'm using Notepad++ and their plug-in for FTP (NppFTP). I'm able to create any other files and see them in the folder but when I try to create a .htaccess I don't see that file in the directory. I get no errors, it is like nothing happened.
I tried to create this file using an FTP program and it showed the file and right away it disappeared. My guess it is because this is a special file used by the system and prefixed by a (.)
What is a way to edit that file?
This is probably because your ".htaccess" file is a hidden file and your system is set up to no display hidden files.
Have a look in your Notepad++ settings if there's an option to make hidden files visible/unhide.
In addition to that check the windows folder options for that option!
switch on showing hidden files in your ftp client