Alternate docroot not working on glassfish 4 - glassfish

Im trying to set-up an alternate docroot in order to serve uploaded documents from. I have included the following in my glassfish web xml
<context-root>/dom</context-root>
<property description="Uploaded Images" name="alternatedocroot_1" value="from=/uploads/* dir=C:/Test" />
I have then stored a test pdf in the Test folder called cars.pdf.
To access it i am typing the following into my browser
http://localhost:8080/uploads/cars.pdf
This however simply gives me a 404 error, ive tried googling and searching about here but nothing seems to work. Can some tell me what im doing wrong?
Thanks Steve

You should use
<property description="Uploaded Images" name="alternatedocroot_1" value="from=/uploads/* dir=C:\Test\" />
and then drop your images into C:\Test\uploads\
or, for example, use
<property description="Uploaded Images" name="alternatedocroot_1" value="from=/uploads/* dir=C:\" />
and then drop your images into C:\uploads\

Related

Why does Nutch (v2.3) crawl only the seed URL, instead of crawling an entire website?

I am trying to crawl an entire, specific website (ignoring external links) using Nutch 2.3 with HBase 0.94.14.
I have followed a step-by-step tutorial (can find it here) on how to set up and use these tools. However, I haven't been able to achieve my goal. Instead of crawling the entire website whose URL I've written in the seed.txt file, Nutch only retrieves that base URL in the first round. I need to run further crawls in order for Nutch to retrieve more URLs.
The problem is I don't know how many rounds I need in order to crawl the entire website, so I need a way to tell Nutch to "keep crawling until the entire website has been crawled" (in other words, "crawl the entire website in a single round").
Here are the key steps and settings I have followed so far:
Put base URL in the seed.txt file.
http://www.whads.com/
Set up Nutch's nutch-site.xml configuration file. After finishing the tutorial, I added a few more properties following suggestions on other StackOverflow questions (none of them, however, seem to have solved the problem for me).
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>http.agent.name</name>
<value>test-crawler</value>
</property>
<property>
<name>storage.data.store.class</name>
<value>org.apache.gora.hbase.store.HBaseStore</value>
</property>
<property>
<name>plugin.includes</name>
<value>protocol-httpclient|urlfilter-regex|index-(basic|more)|query-(basic|site|url|lang)|indexer-solr|nutch-extensionpoints|protocol-httpclient|urlfilter-regex|parse-(text|html|msexcel|msword|mspowerpoint|pdf)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)protocol-http|urlfilter-regex|parse-(html|tika|metatags)|index-(basic|anchor|more|metadata)</value>
</property>
<property>
<name>db.ignore.external.links</name>
<value>true</value>
</property>
<property>
<name>db.ignore.internal.links</name>
<value>false</value>
</property>
<property>
<name>fetcher.max.crawl.delay</name>
<value>-1</value>
</property>
<property>
<name>fetcher.threads.per.queue</name>
<value>50</value>
<description></description>
</property>
<property>
<name>generate.count.mode</name>
<value>host</value>
</property>
<property>
<name>generate.max.count</name>
<value>-1</value>
</property>
</configuration>
Added "accept anything else" rule to Nutch's regex-urlfilter.txt configuration file, following suggestions on StackOverflow and Nutch's mailing list.
# Already tried these two filters (one at a time,
# and each one combined with the 'anything else' one)
#+^http://www.whads.com
#+^http://([a-z0-9]*.)*whads.com/
# accept anything else
+.
Crawling: I have tried using two different approaches (both yielding the same result, with only one URL generated and fetched on the first round):
Using bin/nutch (following the tutorial):
bin/nutch inject urls
bin/nutch generate -topN 50000
bin/nutch fetch -all
bin/nutch parse -all
bin/nutch updatedb -all
Using bin/crawl:
bin/crawl urls whads 1
Am I still missing something? Am I doing something wrong? Or is it that Nutch can't crawl an entire website in one go?
Thank you so much in advance!
Please update your configuration like following
<property>
<name>db.ignore.external.links</name>
<value>false</value>
</property>
Actually, you are ignoring external links i.e. do not crawl external URLs
After playing around with Nutch for a few more days trying everything I found on the Internet, I ended up giving up. Some people said it is no longer possible to crawl an antire website in one go with Nutch.
So, in case anyone having the same problem stumbles upon this question, do the same I did: drop Nutch and use something like Scrapy (Python). You need to manually set up the spiders, but it works like a charm, is far more extensible and faster, and the results are better.
Did you try by using -1 at the end. I can see you are using 1 at the end which runs the crawl only once.

Tomcat8 remove unnecessary app name in the path

I am using Tomcat8. I deployed a war file by name admin.war.This resulted in my URL turning out to
http://localhost:8080/admin.
Nevertheless, I want the URL to be http://localhost:8080. So I tried adding the following inside /conf/server.xml as mentioned here.
< Context path="" docBase="Advocatoree" debug="0" reloadable="true" >
However, this did not work. Is there an alternative?
Try to add a file called ROOT.xml in <catalina_home>/conf/Catalina/localhost/
And enter there the following:
<Context
docBase="yourAppName"
path=""
reloadable="true"
/>
Now your application is default application on your server and you can access it with URL http://localhost:8080

Change Solr base context path

I've installed a Solr (5.3.1 and 5.5.0) in a Ubuntu machine.
With apache I've made a
ProxyPass /MySolr http://{url}:8984/solr
ProxyPassReverse /MySolr http://{url}:8984/solr
So, when I load {url}/MySolr the Dashboard doesn't load because one json.
http://{url}/solr/admin/cores?wt=json&indexInfo=false&_=...
That's normal because the correct URL to load would be:
http://{url}/MySolr/admin/cores?wt=json&indexInfo=false&_=...
When i see the other resouces, solr get the correct URL like:
http://{url}/MySolr/js/scripts/segments.js?_=5.5.0
Any idea?
This process will not work because probably this json is specified with absolute path. In this case Solr doesn't know your mapping. If you want to change the context path you need to change this configuration on Jetty. This way solr will start using the new context.
The first step is create a symbolic link MySolr pointing to solr directory (default located in $SOLR_INSTALL_DIR/server/).
Now change the Jetty configuration file $SOLR_INSTALL_DIR/server/contexts/solr-jetty-context.xml to point to the new context path like this:
<Configure class="org.eclipse.jetty.webapp.WebAppContext">
<Set name="contextPath"><Property name="hostContext" default="/MySolr"/></Set>
<Set name="war"><Property name="jetty.base"/>/solr-webapp/webapp</Set>
<Set name="defaultsDescriptor"><Property name="jetty.base"/>/etc/webdefault.xml</Set>
<Set name="extractWAR">false</Set>
</Configure>
Now just restart Solr to be able to access using the new base context path.

Sitecore redirect on errors

I know that I can extend Sitecore.Pipelines.HttpRequest.ExecuteRequest and override methods like RedirectOnItemNotFound to redirect to my custom 404 page etc. I was wondering if there is way to redirect to a custom page (that would sit in sitecore) for all errors except 404 and 500?
There is a RedirectOnNoAccess method for 403 error I guess, but I am looking for way to redirect on all errors like 400, 401, 403, 405 etc.
Sitecore v7.2
Cheers
You don't need to extend the ExecuteRequest processor, there are settings in the Sitecore section of config to handle these:
<!-- ITEM NOT FOUND HANDLER
Url of page handling 'Item not found' errors
-->
<setting name="ItemNotFoundUrl" value="/sitecore/service/notfound.aspx"/>
<!-- LINK ITEM NOT FOUND HANDLER
Url of page handling 'Link item not found' errors
-->
<setting name="LinkItemNotFoundUrl" value="/sitecore/service/notfound.aspx"/>
<!-- LAYOUT NOT FOUND HANDLER
Url of page handling 'Layout not found' errors
-->
<setting name="LayoutNotFoundUrl" value="/sitecore/service/nolayout.aspx"/>
<!-- ACCESS DENIED HANDLER
Url of page handling 'Acess denied' errors
-->
<setting name="NoAccessUrl" value="/sitecore/service/noaccess.aspx"/>
Update these values to point to the correct path. This can be a Sitecore item path, e.g. /errors/404 as long as that item exists in Sitecore. It's slightly annoying that a url parameter is added to the path, you will need to extend the processor if you want to get rid of this though. If you have a multi-site implementation then this will still work but you need to make sure that the structure is the same for all sites, since you are using a relative path. The error manager module is essentially a wrapper around these same settings, but it is better in that it is able to handle multi-site and shows the error page without making a 302 redirect first.
If you need to handle other errors then fallback to using the errors section in config to define those. The values can also be set through IIS (although it just updates the web.config anyway)
<system.webServer>
<httpErrors errorMode="DetailedLocalOnly" defaultResponseMode="ExecuteURL" defaultPath="/errors/404">
<remove statusCode="404" subStatusCode="-1" />
<remove statusCode="405" subStatusCode="-1" />
<remove statusCode="500" subStatusCode="-1" />
<error statusCode="404" prefixLanguageFilePath="" path="/errors/404" responseMode="ExecuteURL" />
<error statusCode="405" prefixLanguageFilePath="" path="/errors/405" responseMode="ExecuteURL" />
<error statusCode="500" prefixLanguageFilePath="" path="/errors/static/500.html" responseMode="ExecuteURL" />
</httpErrors>
</system.webServer>
http://www.iis.net/configreference/system.webserver/httperrors
https://msdn.microsoft.com/en-us/library/ms690497(v=vs.90).aspx
These can be in Sitecore by setting the URL path of an Item or static HTML files on disk, and again it works in multi-site as long as the structure is the same for all sites since the path can be relative. It is generally recommended that the 500 page is a static HTML page otherwise there is the possibility of an infinite loop (e.g. database goes down, show 500, fetch content from Sitecore, but database is down...).
Even if you use the Error Manager module, or use the Sitecore settings, I recommend that you have a 404 and 500 page defined in config. By default Sitecore will only handle dynamic and extentionless URL requests, so if a request is made for /file.txt, /style.css, /script.js or /document.pdf then you will get a standard IIS error page.
<preprocessRequest>
<processor type="Sitecore.Pipelines.PreprocessRequest.FilterUrlExtensions, Sitecore.Kernel">
<param desc="Allowed extensions (comma separated)">aspx, ashx, asmx</param>
<param desc="Blocked extensions (comma separated)">*</param>
<param desc="Blocked extensions that stream files (comma separated)">*</param>
<param desc="Blocked extensions that do not stream files (comma separated)"></param>
</processor>
</preprocessRequest>
You could allow all requests to go through Sitecore but this seems a bit heavy handed and you're making it run through additional pipelines. Setting the above will mean your static content is also gracefully handled.
You can definitely use the execute request pipeline to handle 403 and 401 errors as this pipeline is called early enough.
There is a great module that already does this on the marketplace, which you may be able to adapt to your needs.
https://marketplace.sitecore.net/en/Modules/Sitecore_Error_Manager.aspx
http://ctor.io/handling-404-and-other-errors-with-sitecore-items/

Running jsp files from /srv/http using Apache HTTP server and Tomcat

I'd like run jsp files directly from /srv/http without deploying them the Tomcat-way. For example, I want to be able to create symbolic link to my webapp directory (e.g. /home/user/myapp/) in /srv/http and access some app's page through http://localhost/myapp/page.jsp.
Is this possible and how would I set this up?
NOTE: This is not for production. We have to use JSP at university and I want to be able to quickly test my pages.
Open the server.xml of your Tomcat. Assuming if your are using Tomcat 6.x+ then it would be at /tomcatDir/conf/server.xml.
Make an entry with your path
<Context path="/myapp" docBase="yourPathGoesHere" debug="0" reloadable="true" />
Restart Tomcat if already running.
What I did at the moment was creating a symlink in /var/lib/tomcatX/webapps to my project path. This is not the answer I was looking for though, but it is a way to deploy an app without much work.
(X in the above path means your Tomcat version)
If you set <Host name="localhost" appBase="/srv/http"> then all of the directories in it will be deployed as web applications.
If you want /srv/http to be the ROOT application/directory add a file: tomcat/conf/Catalina/localhost/ROOT.xml
with the Context docBase="/srv/http", rather than adding a Context definition to server.xml - this has been strongly discouraged for years.