SOLR IndexerOperationListener.afterPrepareContext(IndexerOperationListener.java:97) - indexing

ImageError
I get this error when I try to fullindex in hybris (SOLR)
I have 5 identical indexes (only the range of products to be indexed changes), but for 2 of these the full indexes everything works correctly while for 3 I have this error.
The configurations of the 5 indices are the same, only the range of products to be indexed changes (I have 5 product catalogs).
The error occurs when trying to make a full of solr in hybris (example of a log for the cronjob):
22.08.12 04:49:53:119 INFO Started indexer cronjob.
22.08.12 04:49:53:389 WARN Error during indexer call: fmfIndex
I have a standalone configuration with two nodes one as master and one as slave and this configuration:
BackOfficeConfig
I would like to know if it is a SOLR configuration problem on hybris or on the machines or whatever.

Related

Element Range Index & availability for search

MarkLogic 9.0.8.2
We have around 20M records in our database in XML format.
To work with facets, we have created element-rage-index on the given element.
It is working fine, so no issue there.
Real problem is that, we now want to deploy same code on different environments like System Test(ST), UAT, Production.
Before deploying code, we have to make sure that given index exist. So we execute it in 1/2 days in advance.
We noticed that until full indexing is completed, we can't deploy our code else it will start showing up errors like this.
<error:code>XDMP-ELEMRIDXNOTFOUND</error:code>
<error:name/>
<error:xquery-version>1.0-ml</error:xquery-version>
<error:message>No element range index</error:message>
<error:format-string>XDMP-ELEMRIDXNOTFOUND: cts:element-reference(fn:QName("","tc"), ("type=string", "collation=http://marklogic.com/collation/")) -- No string element range index for tc collation=http://marklogic.com/collation/ </error:format-string>
And once index is finished, same code will run as expected.
Specially in ST/UAT, we are fine if we get partial data with unfinished indexing.
Is there any way we can achieve this? else we are loosing too much time just to wait for index to finish.
This happens every time when we come up with new feature which has dependency on new index
You can only use a range index if it exists and is available. It is not available until all matching records have been indexed.
You should create your indexes earlier and allow enough time to finish reindexing before deploying code that uses them. Maybe make your code deployment depend upon the reindexing status and not allow for it to be deployed until it has completed.
If the new versions of your applications can function without the indexes (value query instead of range-query), or you are fine with queries returning inaccurate results, then you could enable/disable the section of code utilizing them with feature flags, or wrap with try/catch, but you really should just create the indexes earlier in your deployment cycles.
Otherwise, if you are performing tests without a complete and functioning environment, what are you really testing?

Apache Pulsar topic replication with increase in cluster size

I want to understand how the namespace/topic replication works in Apache Pulsar and what affect does the change in cluster size have on the replication factor of the existing and new namespaces/topics.
Consider the following scenario:
I am starting with a single node with the following broker configuration:
# Number of bookies to use when creating a ledger
managedLedgerDefaultEnsembleSize=1
# Number of copies to store for each message
managedLedgerDefaultWriteQuorum=1
# Number of guaranteed copies (acks to wait before write is complete)
managedLedgerDefaultAckQuorum=1
After a few months I decide to increase the cluster size to two with the following configuration for the new broker:
# Number of bookies to use when creating a ledger
managedLedgerDefaultEnsembleSize=2
# Number of copies to store for each message
managedLedgerDefaultWriteQuorum=2
# Number of guaranteed copies (acks to wait before write is complete)
managedLedgerDefaultAckQuorum=2
In the above scenario what will be the behaviour of the cluster:
Does this change the replication factor(RF) of the existing topics?
Do newly created topics have the old RF or the new specified RF?
How does the namespace/topic(Managed Ledger) -> Broker ownership work?
Please note that the two broker nodes have different configurations at this point.
TIA
What you are changing is the default replication settings (ensemble, write, ack). You shouldn't be using different defaults on different brokers, because then you'll get inconsistent behavior depending on which broker the client connects to.
The replication settings are controlled at namespace level. If you don't explicitly set them, you get the default settings. However, you can change the settings on individual namespaces using the CLI or the REST interface. If you start with settings of (1 ensemble, 1 write, 1 ack) on the namespace and then change to (2 ensemble, 2 write, 2 ack), then the following happens:
All new topics in the namespace use the new settings, storing 2 copies of each message
All new messages published to existing topics in the namespace use the new settings, storing 2 copies. Messages that are already stored in existing topics are not changed. They still have only 1 copy.
An important point to note is that the number of brokers doesn't affect the message replication. In Pulsar, the broker just handles the serving (producing/consuming) of the message. Brokers are stateless and can be scaled horizontally. The messages are stored on Bookkeeper nodes (bookies). The replication settings (ensemble, write, ack) refer to Bookkeeper nodes, not brokers. Here is an diagram from the Pulsar website that illustrates this:
So, to move from a setting of (1 ensemble, 1 write, 1 ack) to (2 ensemble, 2 write, 2 ack), you need to add a Bookkeeper node to your cluster (assuming you start with just 1), not another broker.

Using index in DSE graph

I'm trying to get the list of persons in a datastax graph that have the same address with other persons and the number of persons is between 3 and 5.
This is the query:
g.V().hasLabel('person').match(__.as('p').out('has_address').as('a').dedup().count().as('nr'),__.as('p').out('has_address').as('a')).select('nr').is(inside(3,5)).select('p','a','nr').range(0,20)
At first run I've noticed this error messages:
Could not find an index to answer query clause and graph.allow_scan is
disabled: ((label = person))
I've enabled graph.allow_scan=true and now it's working
I'm wondering how can I create an index to be able to run this query without enabling allow_scan=true ?
Thanks
You can create an index by adding it to the schema using a command like this:
schema.vertexLabel('person').index('address').materialized().by('has_address').add()
Full documentation on adding indexes is available here: https://docs.datastax.com/en/latest-dse/datastax_enterprise/graph/using/createIndexes.html
You should not enable graph.allow_scan=true as under the covers it is turning on ALLOW FILTERING on the CQL queries. This will cause a lot of cluster scans and will inevitably time out with any real amount of data in the system. You should never enable this in any sort of production environment.
I am not sure that indexing is the solution for your problem.
The best way to do this would be to reify addresses as nodes and look for nodes with an indegree between 3 and 5.
You can use index on textual fields of your address nodes.

DatastaxEnteprise: node vs instance, correct AMI image, why do I need storage

Currently, we are evaluating datastax enteprise as our provider of Cassandra and Spark.We consider deploying Datastax cluster on AWS.
I have following questions:
1) In step 1 of Datastax on EC2 installation manual, I need to choose correct AMI Image: Currently there are 7 of them. Which is the correct one:
(DataStax Auto-Clustering AMI 2.5.1-pv, DataStax Auto-Clustering AMI 2.6.3-1204-pv, DataStax Auto-Clustering AMI 2.6.3-1404-pv....)
2) The moment we launch the cluster, do we pay only for aws instances or also Datastax Enterprise licensing fee? I know there is a 30 days enterprise free trial, but nowhere in the installation process I saw a step where we can ask for the free trial? Is there some online calculator that we can use to calculate the cost of a cluster on a monthy basis (based on the instance types we create)
3) In the step 3 of the installation process Configure Instance Details, I am confused with terms instance and nodes. What is the difference between them? What happens if I choose:
a) 1 instance, --totalnodes 3 (in the user data)
b) 3 instance, --tatalnodes 3
c) 1 instance, --totalnodes 0 --analyticsnodes 3
d) 3 instance, --totalnodes 0 --analyticsnodes 3
4) We are interested in the use case where each of our 3 cassandra nodes has Spark. Is the proper user data configuration:
--totalnodes 0 --analyticsnodes 3
Are then we going to have 0 nodes with only cassandra, and 3 nodes that have Cassandra and Spark? What is the Number of instances we should specify then?
5) In step 4 of installation process Add Storage, we are asked to add storage to the instance. But why do we need this storage? When choosing instance type, for example m3.large, I already know that my instance has 32GB of SSD storage, what is this then?
Thank you for your answers. If there is some email list to which I can send these questions, I would appreciate it.
Use whichever AMI has the highest version number and the virtualization type you prefer (-pv or -hvm): http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/virtualization_types.html
You only pay for EC2 usage. DSE is free for testing and development. You do not need to request a trial license. If you want a production license or if you want to become a startup member, contact DataStax.
The AMI will install one "DSE node" per "EC2 instance". So if you want a six node cluster you need to specify 6 instances. To use your examples:
a) 1 instance, --totalnodes 3 (in the user data)
This won't work
b) 3 instance, --tatalnodes 3
This will give you a three node Cassandra cluster (running on three instances). You have not specified search or analytics nodes so by default you will just get Cassandra nodes.
c) 1 instance, --totalnodes 0 --analyticsnodes 3
Won't work. Total nodes should equal number of instances and number of analytics nodes can't be greater than total nodes.
d) 3 instance, --totalnodes 0 --analyticsnodes 3
Won't work. Number of analytics nodes can't be greater than number of total nodes.
If you want a three-node cluster and you want all of them running both Cassandra and Spark use this:
3 instances, --totalnodes 3 --analyticsnodes 3
Adding storage is optional. And only possible with certain instance types. You should notice with m3.large that there is a default config and you can't actually make any changes to it.
Hope this helps!

Changing index in Neo4j affects the search in cypher

I am using Neo4j 2.0 .
I have a label PROD. All nodes having label PROD have a property name. I have created an index on name like this :
CREATE INDEX ON :PROD(name)
After creating an index if I change value of the property name from "old" to "new", the following query works fine in a small dataset used for testing but not in our production data with 700 nodes having label PROD (where totally there are around a million nodes with other labels).
MATCH (n:PROD) WHERE n.name="new" RETURN n;
I have also created legacy index on the same field and after dropping and re-indexing the node on modification, it works perfectly fine on both test and production datasets.
Is there a way to ensure the indices are updated? What am I doing wrong? Why does the above query fail for large dataset?
You can use the :schema command in Neo4j browser or schema in neo4j-shell. The output should indicate which indexes have been created and which of them are already online.
Additionally there's a programmatic way to wait until index population has finished.
Consider upgrading to 2.1.3, indexing has improved since 2.0.