Make Indexed File Downloadable In Apache Solr - apache

I am trying to indexed pdf file to Solr which I have done successfully using the command
curl "http://localhost:8983/solr/update/extract?literal.id=id=true"-F myfile=#filename.pdf"
I am able to see the file contents and search, but when I try to click on file name it shows
HTTP ERROR 404
Problem accessing /solr/collection1/id. Reason:
not found
What I want is to have a link which allows downloading the file, I know Solr merely indexes the file and stores it. I was wondering if there is a way by which I can add attribute location like you have done and proceed from there, can you please share with me what you have done, if you want any more clarity regarding my problem do ask.

We have the actual files hosted through a separate web application to be download from with auditing and additional security.
you can always directly host these files through http server.
If you are having the file names with id, it is as easy as appending the id.extension to the fixed http hosted url.
Else index the path of the file with an additional parameter e.g. literal.url.
The url will the solr field which will now be available with the Solr response.

Related

Can I transfer images between shopify sites?

I'm doing some work for a client who has an existing shopify website. They want to make some big changes to the site, so i have set up a new development site in shopify, exported all of the products/pages/blog posts to it and am now working on getting all the new functionality/design working on the dev site.
Once the new build is finished though, i want to transfer everything back over to their current site. Products/pages/blog posts will be fine (ive written a custom export/import thing using their api), but what about images?
I am uploading lots of images to the dev site and i am worried they will be deleted when development is finished and i shut down the dev site. Is it possible to transfer over images from one site to another?
Ideally, keeping the same urls on shopifys cdn when doing so, although if i have to change the urls, then i can probably do an automated replace on the csv files that will get uploaded.
There are going to be hundreds of images involved, and they will be used in various places throughout the site, including in the rich text area of pages/blogs, so it's not going to be practical to do manually in any way, must be something I can automate.
Thanks for any help.
When you export products as a CSV, you get links to your images. You could write a script to download each of the images in the CSV. Just redirect the output of curl to save the image.
curl link_url > imagename
Have you tried transferring between the two sites using FTP? If you have SSH Access
login to the server via SSH
change to the right directory to file location or desired location
FTP into the other server using ftp <name_or_IP_address_of_other_server> and your login details
use cd to locate your location / desired destination
use the binary command
hash if you want a progress bar
if sending the file from the server you SSHed into issue the put <filename> command, and if you want to pull the file from the other server to the one you are logged into use get <filename> instead.
Wait a while for the transfer to complete - might take a while

List of served files in apache

I am doing some reverse engineering on a website.
We are using LAMP stack under CENTOS 5, without any commercial/open source framework (symfony, laravel, etc). Just plain PHP with an in-house framework.
I wonder if there is any way to know which files in the server have been used to produce a request.
For example, let's say I am requesting http://myserver.com/index.php.
Let's assume that 'index.php' calls other PHP scripts (e.g. to connect to the database and retrieve some info), it also includes a couple of other html files, etc
How can I get the list of those accessed files?
I already tried to enable the server-status directive in apache, and although it is working I can't get what I want (I also passed the 'refresh' parameter)
I also used lsof -c httpd, as suggested in other forums, but it is producing a very big output and I can't find what I'm looking for.
I also read the apache logs, but I am only getting the requests that the server handled.
Some other users suggested to add the PHP directives like 'self', but that means I need to know which files I need to modify to include that directive beforehand (which I don't) and which is precisely what I am trying to find out.
Is that actually possible to trace the internal activity of the server and get those file names and locations?
Regards.
Not that I tried this, but it looks like mod_log_config is the answer to my own question

Server side: detecting if a user is downloading (save as...) or visualizing a file in the browser

I'm writing an apache2 module
by default and when viewed in a web browser, the module would only print the first lines of a large file and convert them to HTML.
if the user choose to 'download as...', the whole raw file would be downloaded.
Is it possible to detect this choice on the server side ? (for example is there a specific http header set ?).
note: I would like to avoid any parameter in the GET url (e.g: "http://example.org/file?mode=raw" )
Pierre
added my own answer to close the question: as said #alexeyten there is no difference. I ended by a javascript code the alter the index.html file generated by apache.

Remote streaming with Solr

I'm having trouble using remote streaming with Apache Solr.
We previously had Solr running on the same server where the files to be indexed are located so all we had to to was pass it the path of the file we wanted to index.
We used something like this:
stream.file=/path/to/file.pdf
This worked fine. We have now moved Solr so that it runs on a different server to the website that uses it. This was because it was using up too many resources.
I'm now using the following to point Solr in the direction of the file:
stream.file=http://www.remotesite.com/path/to/file.pdf
When I do this Solr reports the following error:
http:/www.remotesite.com/path/to/file.pdf (No such file or directory)
Note that it is stripping one of the slashes from http://.
How can I get Solr to index a file at a certain URL like i'm trying to do above? The enableRemoteStreaming parameter is already set to true.
Thank you
For remote streaming
you would need to enable remote streaming
<requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="2048" />
and probably use stream.url for urls
If remote streaming is enabled and URL content is called for during
request handling, the contents of each stream.url and stream.file
parameters are fetched and passed as a stream.

Sitefinity: Need to make a PDF available on a very specific URL but can't do it

I have a website on SiteFinity 4.4. I need to make a document available on a very specific URL, i.e.
http:www.example.com/reports/the-report.pdf
If I just create a directory in the root of the site it does not work (503 error). Also when I try to use the 302Redirect.xml file to redirect the URL to the PDF it does not work either (same error). The link has already been published and has to be exactly as specified. How do I solve this?
Any help would be greatly appreciated.
Sitefinity wouldn't block a folder. Adding a physical folder and dropping that report on the proper place should function, so it probably means you'll have to check your server configuration.
Anyway, the fastest way outside Sitefinity, would be to just create a IIS rewrite rule. Make the http:/www.example.com/reports/the-report.pdf the pattern and redirect them to the url of the document from the sitefinity library.
When you upload a document to the library in sitefinity it gives you an direct url, something like /docs/defaultlibrary/document. You can verify the url by going to content >> documents and files and chose Embed link to this file. That gives you a pop-up with the url.