Jackrabbit Indexing Config Whitelisting (Magnolia CMS 5.5.5 Fulltextsearch) - indexing

I want to do a whitelisting of what properties are indexed/searched and shown in excerpt with a Magnolia search.
I am changing the indexing_configuration.xml in my website workspace.
Removing the index and restarting magnolia did not change anything...
By now I have this in my indexing_configuration.xml (next to other stuff)
but these are the String properties I want to include in my ecxcerpt the rest should be excluded:
<index-rule nodeType="nt:hierarchyNode">
<property boost="10" useInExcerpt="true">introTitle</property>
<property boost="1.0" useInExcerpt="true">introAbstract</property>
<property boost="1.0" useInExcerpt="true">contentText</property>
<property boost="1.0" useInExcerpt="true">subText</property>
<property boost="10" useInExcerpt="true">title</property>
<!-- exclude jcr:* and mgnl:* properties -->
<property isRegexp="true" nodeScopeIndex="false" useInExcerpt="false">.*:.*</property>
</index-rule>
<index-rule nodeType="mgnl:contentNode">
<property boost="5" nodeScopeIndex="false" useInExcerpt="true">introTitle</property>
<property boost="2" nodeScopeIndex="false" useInExcerpt="true">introAbstract</property>
<property boost="2" nodeScopeIndex="false" useInExcerpt="true">contentText</property>
<property boost="2" nodeScopeIndex="false" useInExcerpt="true">subText</property>
<property boost="5" nodeScopeIndex="false" useInExcerpt="true">title</property>
<!-- exclude jcr:* and mgnl:* properties -->
<property isRegexp="true" nodeScopeIndex="false" useInExcerpt="false">.*:.*</property>
</index-rule>
How can i get this to work as intended? Thanks for your help..

Most likely cause is that Magnolia/JR is not seeing your new configuration. Did you change your repo configuration (workspace.xml in website workspace) to point it to new index configuration?
Default looks like:
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${wsp.home}/index" />
<!-- SearchIndex will get the indexing configuration from the classpath, if not found in the workspace home -->
<param name="indexingConfiguration" value="/info/magnolia/jackrabbit/indexing_configuration.xml"/>
and you need to point it to your new file.
Also not sure why you are setting indexing based on nt:hierarchyNode or mgnl:contentNode rather then using more specific mgnl:page/mgnl:component

Related

Ignite QueryEntity Based Configuration for C++?

<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="mycache"/>
<!-- Configure query entities -->
<property name="queryEntities">
<list>
<bean class="org.apache.ignite.cache.QueryEntity">
<!-- Setting indexed type's key class -->
<property name="keyType" value="java.lang.Long"/>
<!-- Setting indexed type's value class -->
<property name="valueType"
value="org.apache.ignite.examples.Person"/>
<!-- Defining fields that will be either indexed or queryable.
Indexed fields are added to 'indexes' list below.-->
<property name="fields">
<map>
<entry key="id" value="java.lang.Long"/>
<entry key="name" value="java.lang.String"/>
<entry key="salary" value="java.lang.Long "/>
</map>
</property>
<!-- Defining indexed fields.-->
<property name="indexes">
<list>
<!-- Single field (aka. column) index -->
<bean class="org.apache.ignite.cache.QueryIndex">
<constructor-arg value="id"/>
</bean>
<!-- Group index. -->
<bean class="org.apache.ignite.cache.QueryIndex">
<constructor-arg>
<list>
<value>id</value>
<value>salary</value>
</list>
</constructor-arg>
<constructor-arg value="SORTED"/>
</bean>
</list>
</property>
</bean>
</list>
</property>
</bean>
I understand the above XML configuration can be used to define an SQL entity in ignite with indexes. The documentation is better understandable from a code perspective either Java or NET because API is available. As we do most of the development C++ and API is not available , we would like to know few more details to use the XML configuration. Could anyone please answer below points?
1.Where does this configuration file can be used? Server side or client side (thin & thick) or both side.
2.Is it possible to change the field names, types and indexes once it has been created and loaded data in the same entity?#
3.<property name="valueType" value="org.apache.ignite.examples.Person"/> If not mistaken, we understand the value here is taken from a namespace and from a DLL (for example in c#) but How does ignite knows about the location of DLL or namespace to get load from? where does the binaries to be kept?
4.In the case of C++ , what binary file can be used to define the value type? .lib or .dll or some other way.
C++ Thick Client can use the XML config, see IgniteConfiguration.springCfgPath.
Think about the CacheConfiguration as the "starting" config for a cache. Most of it can't be changed later. A few things, like the set of SQL columns or indexes, can be changed via SQL DDL: ALTER TABLE..., CREATE INDEX..., etc. If something isn't available in the DDL, assume that it can't be changed without recreating the cache.
Check out this. The value type name will be mapped by each platform component - Java, C++, .NET - accordingly to the binary marshaller configuration. For example, it's common to use BinaryBasicNameMapper that will map all platform type names (with namespaces/packages) to simple names, so that different namespace/package naming conventions don't create a problem. When a class is needed to deserialize a value, it will be loaded via the regular platform-specific mechanism to load code. For Java, it'll be the classpath. For C++, I guess it's LD_LIBRARY_PATH. In any case, Ignite has nothing to do with that really.
Again, Ignite has nothing to do with that. Whatever works on your platform to load code can be used.
After few experiments, I found the solution and actually it is easy.
The value given at the valueType property is directly mapped to the binary object name when it create from the code.
for e.g below configuration
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="TEST"/>
<property name="cacheMode" value="PARTITIONED"/>
<property name="atomicityMode" value="TRANSACTIONAL"/>
<property name="writeSynchronizationMode" value="FULL_SYNC"/>
<!-- Configure type metadata to enable queries. -->
<property name="queryEntities">
<list>
<bean class="org.apache.ignite.cache.QueryEntity">
<property name="keyType" value="java.lang.Long"/>
<property name="valueType" value="TEST"/>
<property name="fields">
<map>
<entry key="ID" value="java.lang.Long"/>
<entry key="DATE" value="java.lang.String"/>
</map>
</property>
</bean>
</list>
</property>
</bean>
the below C++ code works
template<>
struct BinaryType<examples::TEST> : BinaryTypeDefaultAll<examples::TEST>
{
static void GetTypeName(std::string& dst)
{
dst = "TEST";
}
static void Write(BinaryWriter& writer, const examples::RHO& obj)
{
writer.WriteInt64("ID", obj.Id);
writer.WriteString("DATE", obj.dt);
}
static void Read(BinaryReader& reader, examples::RHO& dst)
{
dst.Id = reader.ReadInt64("Id");
dst.dt = reader.ReadString("dt");
}
};

Ignite: Configuring persistence to a custom directory

I want to provide a custom directory to persist the data. My persistence configuration is:
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="persistenceEnabled" value="true"/>
</bean>
</property>
</bean>
</property>
As mentioned in the documentation, by default it persists under ${IGNITE_HOME}/work/db directory on each node. I can change the directory by calling setStoragePath() method. But how do I configure it through xml.
I have searched but couldn't find in the documentation. Please help to find the right xml key for modifying this configuration.
Thanks!!
The correct one would be the property of DataStorageConfiguration:
<property name="storagePath" value="$ENV_VAR/relative/path"/>
Javadoc link: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/DataStorageConfiguration.html#getStoragePath--

Adding new facet to DSpace has no effect (DSpace 4.1)

I changed the discovery.xml file as described in the documentation to add a new facet over dc.type to our DSpace. When I finished reindexing and deleting the cache I see the new search filter at advanced search but not as a facet.
These are the changes I made to discovery.xml:
Added filter to sidbarFacets and SearchFilter:
<ref bean="searchFilterType" />
and this is the filter:
<bean id="searchFilterType" class="org.dspace.discovery.configuration.DiscoverySearchFilterFacet">
<property name="indexFieldName" value="type"/>
<property name="metadataFields">
<list>
<value>dc.type</value>
</list>
</property>
</bean>
Thanks in advance
The following modifications to discovery.xml on the latest DSpace master branch worked on my local setup:
https://github.com/bram-atmire/DSpace/commit/3f084569cf1bbc6c6684d114a09a1617c8d3de5d
One reason why the facet wouldn't appear in your setup, could be that you omitted to add it to both the "defaultconfiguration" as well as the specific configuration for the DSpace homepage.
After building and deploying, a forced discovery re-index using the following command made the facet appear:
./dspace index-discovery -f
Here is an example facet that I have configured in our instance. Try setting the facetLimit, sortOrder, and splitter. Re-index and see if that resolves the issue.
<bean id="searchFilterGeographic"
class="org.dspace.discovery.configuration.HierarchicalSidebarFacetConfiguration">
<property name="indexFieldName" value="geographic-region"/>
<property name="metadataFields">
<list>
<value>dc.coverage.spatial</value>
</list>
</property>
<property name="facetLimit" value="5"/>
<property name="sortOrder" value="COUNT"/>
<property name="splitter" value="::"/>
</bean>

Access activemq Poolable Connection factory as OSGI service

I am using fuse 6.0 and activemq 5.8. Instead of defining activemq poolable connection factory in each bundle, it makes sense to define in a common bundle and expose it as osgi service. I created blue print file in FUSE_HOME/etc and opened an osgi service like this.
<osgix:cm-properties id="prop" persistent-id="xxx.xxx.xxx.properties" />
<bean id="jmsConnectionFactory" class="org.apache.activemq.ActiveMQConnectionFactory">
<property name="brokerURL" value="${xxx.url}" />
<property name="userName" value="${xxx.username}" />
<property name="password" value="${xxx.password}" />
</bean>
<bean id="pooledConnectionFactory" class="org.apache.activemq.pool.PooledConnectionFactory" init-method="start" destroy-method="stop">
<property name="maxConnections" value="${maxconnections}" />
<property name="connectionFactory" ref="jmsConnectionFactory" />
</bean>
<service ref="pooledConnectionFactory" interface="javax.jms.ConnectionFactory">
<service-properties>
<entry key="name" value="localhost"/>
</service-properties>
</service>
and when i try to access this service in both blueprint files and spring text files like this
<reference id="pooledConnectionFactory" interface="javax.jms.ConnectionFactory"/>
bean id="jmsConfig" class="org.apache.camel.component.jms.JmsConfiguration">
<property name="connectionFactory" ref="pooledConnectionFactory"/>
<property name="concurrentConsumers" value="${xxx.concurrentConsumers}"/>
</bean>
<bean id="activemq" class="org.apache.activemq.camel.component.ActiveMQComponent">
<property name="configuration" ref="jmsConfig"/>
</bean>
but I am getting following expection during bundles startup.
Failed to add Connection ID:PLNL6237-55293-1401929434025-11:1201, reason: java.lang.SecurityException: User name [null] or password is invalid.
I even defined compendium definition in my bundles.
How can i solve this problem? any help is appreciated.
I found this online https://issues.apache.org/jira/i#browse/SM-2183
Do i need to upgrade?
It looks to me like you're using the property placeholders incorrectly. First of all, you should know what osgix:cm-properties only exposes the properties at the persistent id that you specify. You can treat it like a java.util.Properties object, and even inject it into a bean as one. This does however mean that it makes no attempt to resolve the properties.
To resolve properties, use spring's property placeholder configurer.
<bean class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
<property name="properties" ref="prop"/>
</bean>
P.S. The persistent id of cm-properties is the name of the file, not including the file type. You don't need the .properties at the end.

How do i exclude everything but text/html from a heritrix crawl?

On: Heritrix Usecases there is an Use Case for "Only Store Successful HTML Pages"
My Problem: i dont know how to implement it in my cxml File. Especially:
Adding the ContentTypeRegExpFilter to the ARCWriterProcessor => set its regexp setting to text/html.*. ...
There is no ContentTypeRegExpFilter in the sample cxml Files.
Kris's answer is only half the truth (at least with Heritrix 3.1.x that I'm using). A DecideRule return ACCEPT, REJECT or NONE. If a rule returns NONE, it means that this rule has "no opinion" about that (like ACCESS_ABSTAIN in Spring Security). Now ContentTypeMatchesRegexDecideRule (as all other MatchesRegexDecideRule) can be configured to return a decision if a regex matches (configured by the two properties "decision" and "regex"). The setting means that this rule returns an ACCEPT decision if the regex matches, but returns NONE if it does not match. And as we have seen - NONE is not an opinion so that shouldProcessRule will evaluate to ACCEPT because no decisions have been made.
So to only archive responses with text/html* Content-Type, configure a DecideRuleSequence where everything is REJECTed by default and only selected entries will be ACCEPTed.
This looks like this:
<bean id="warcWriter" class="org.archive.modules.writer.WARCWriterProcessor">
<property name="shouldProcessRule">
<bean class="org.archive.modules.deciderules.DecideRuleSequence">
<property name="rules">
<list>
<!-- Begin by REJECTing all... -->
<bean class="org.archive.modules.deciderules.RejectDecideRule" />
<bean class="org.archive.modules.deciderules.ContentTypeMatchesRegexDecideRule">
<property name="decision" value="ACCEPT" />
<property name="regex" value="^text/html.*" />
</bean>
</list>
</property>
</bean>
</property>
<!-- other properties... -->
</bean>
To avoid that images, movies etc. are downloaded at all, configure the "scope" bean with a MatchesListRegexDecideRule that REJECTs urls with well known file extensions like:
<!-- ...and REJECT those from a configurable (initially empty) set of URI regexes... -->
<bean class="org.archive.modules.deciderules.MatchesListRegexDecideRule">
<property name="decision" value="REJECT"/>
<property name="listLogicalOr" value="true" />
<property name="regexList">
<list>
<value>.*(?i)(\.(avi|wmv|mpe?g|mp3))$</value>
<value>.*(?i)(\.(rar|zip|tar|gz))$</value>
<value>.*(?i)(\.(pdf|doc|xls|odt))$</value>
<value>.*(?i)(\.(xml))$</value>
<value>.*(?i)(\.(txt|conf|pdf))$</value>
<value>.*(?i)(\.(swf))$</value>
<value>.*(?i)(\.(js|css))$</value>
<value>.*(?i)(\.(bmp|gif|jpe?g|png|svg|tiff?))$</value>
</list>
</property>
</bean>
The use cases you cite are somewhat out of date and refer to Heritrix 1.x (filters have been replaced with decide rules, very different configuration framework). Still the basic concept is the same.
The cxml file is basically a Spring configuration file. You need to configure the property shouldProcessRule on the ARCWriter bean to be the ContentTypeMatchesRegexDecideRule
A possible ARCWriter configuration:
<bean id="warcWriter" class="org.archive.modules.writer.ARCWriterProcessor">
<property name="shouldProcessRule">
<bean class="org.archive.modules.deciderules.ContentTypeMatchesRegexDecideRule">
<property name="decision" value="ACCEPT" />
<property name="regex" value="^text/html.*">
</bean>
</property>
<!-- Other properties that need to be set ... -->
</bean>
This will cause the Processor to only process those items that match the DecideRule, which in turn only passes those whose content type (mime type) matches the provided regular expression.
Be careful about the 'decision' setting. Are you ruling things in our out? (My example rules things in, anything not matching is ruled out).
As shouldProcessRule is inherited from Processor, this can be applied to any processor.
More information about configuring Heritrix 3 can be found on the Heritrix 3 Wiki (the user guide on crawler.archive.org is about Heritrix 1)