Remote streaming with Solr - apache

I'm having trouble using remote streaming with Apache Solr.
We previously had Solr running on the same server where the files to be indexed are located so all we had to to was pass it the path of the file we wanted to index.
We used something like this:
stream.file=/path/to/file.pdf
This worked fine. We have now moved Solr so that it runs on a different server to the website that uses it. This was because it was using up too many resources.
I'm now using the following to point Solr in the direction of the file:
stream.file=http://www.remotesite.com/path/to/file.pdf
When I do this Solr reports the following error:
http:/www.remotesite.com/path/to/file.pdf (No such file or directory)
Note that it is stripping one of the slashes from http://.
How can I get Solr to index a file at a certain URL like i'm trying to do above? The enableRemoteStreaming parameter is already set to true.
Thank you

For remote streaming
you would need to enable remote streaming
<requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="2048" />
and probably use stream.url for urls
If remote streaming is enabled and URL content is called for during
request handling, the contents of each stream.url and stream.file
parameters are fetched and passed as a stream.

Related

Apache file upload (resource PUT DELETE) without CGI/PHP

Is it possible to configure Apache to support CRUD operations on file resources? GET works out of the box, how can you make PUT and DELETE work?
I would need to upload a file by HTML form and/or XMLHttpRequest2.
No PHP. No CGI. Just plain Apache by configuration.
Is this supported in other web servers? I'm trying to find a static REST interface for file resource management without CGI/PHP/Connector/Reverse Proxy/FTP server.

Make Indexed File Downloadable In Apache Solr

I am trying to indexed pdf file to Solr which I have done successfully using the command
curl "http://localhost:8983/solr/update/extract?literal.id=id=true"-F myfile=#filename.pdf"
I am able to see the file contents and search, but when I try to click on file name it shows
HTTP ERROR 404
Problem accessing /solr/collection1/id. Reason:
not found
What I want is to have a link which allows downloading the file, I know Solr merely indexes the file and stores it. I was wondering if there is a way by which I can add attribute location like you have done and proceed from there, can you please share with me what you have done, if you want any more clarity regarding my problem do ask.
We have the actual files hosted through a separate web application to be download from with auditing and additional security.
you can always directly host these files through http server.
If you are having the file names with id, it is as easy as appending the id.extension to the fixed http hosted url.
Else index the path of the file with an additional parameter e.g. literal.url.
The url will the solr field which will now be available with the Solr response.

Apache redirect for single XML file

I have a number of subdomains, which are using crossdomain.xml file and I'm looking to a simple way of managing them all - which get semi-regularly updated. One way I've thought is a PHP script, which pushes and overwrites the xml file. The other, which I much prefer is a an apache redirect on a single file.
So, question is how would I, across multiple domains, redirect an xml on dom1.domain.com and dom2.wirewax.com to the same crossdomain.xml file without Flash getting upset about. i.e. not a 302 HTTP redirect, but internal file fetching.
You can write a PHP script that fetches the content from a single location (database or text file) and sends it as-is to Flash. Yes, the script itself needs to be copied on all hosts.
If you have all websites hosted on same webserver, perhaps mod_alias could help:
Alias /crossdomain.xml /path/to/shared/crossdomain.xml
I have not personally tested this. The reference page includes instructions to setup the shared directory so that it can be read by multiple hosts.

Pyramid/Pylons: How to check if an uploaded file is complete in a POST request?

I'm building a web tool which allows users to upload PDFs to a server using their web browsers. The server is based on Python (Paste + Pyramid).
The problem I have right now is the following: If a user uploads a rather large file (let's say 100 MB) and they cancel the upload before it is completed, my handler code on the server is still called (instead of the request being aborted).
The problem is that the request.POST['myfile'].file is incomplete when that happens. This effectively means that the PDF file is corrupted if I simply write it to some place on the server.
When I watch the server's log, it shows a "broken pipe" exception within the Paste server; however I have no idea how to catch that exception and have it prevent my view/handler code from executing and storing the incomplete file.
Seems like the paster HTTP server does not correctly validate the uploaded form data and simply passes the request down the WSGI pipeline even if the connection (HTTP POST) was closed by the user.
I worked around this issue by simply setting up NGINX to act as a reverse proxy. This also adds some security benefits as it might be better tested than paster.
Update:
My main problem was that I was using runserver (the built in web server of manage.py). After some trial and error we ended up using WSGI.
More specifically, uWSGI and Nginx as web server. Static content is served directly by Nginx while dynamic pages are piped through uWSGI and are handled by the Python web app.
Unless you are doing something fancy (like tracking the upload progress, etc), your pylons controller should not be invoked until the entire file has been uploaded.

How do I get a status report of all files currently being uploaded via a HTTP form on an Apache Server?

How do I get a status report of all files currently being uploaded via HTTP form based file upload on an Apache Server?
I don't believe you can do this with Apache itself. The upload looks like nothing more than a POST as far as Apache cares. There are modules and other servers that do special processing to uploads so you may have some luck there. It would probably be easier to keep track of it in your application.
Check out SWFUpload, its uses Flash (in a nice way) to assist with managing multiple uploads.
There are events you can monitor for how many files of a set have been uploaded.