Liferay 6.2 Lucene replication in cluster - lucene

I'd welcome any help regarding simple issue: I have clustered environment and I enabled Lucene replication in properties (lucene.replicate.write=true). Now, all the tutorials are instructing me to reindex Lucene.
Should I run it on one node? On both? Simultaneously or sequentially?
This question has been asked in Liferay Forum as well: https://www.liferay.com/community/forums/-/message_boards/view_message/69175435.
Thank you!

Basically what I did at first was following:
cluster.link.enabled=true
lucene.replicate.write=true
and the result was NOT WORKING replication.
What I tried next was to overcome this issue and continue with clustering the rest of the portal which at the end helped lucene as well. My progress was to:
deploy cluster activation keys
deploy ehcache-cluster-web.war
portal-ext.properties:
cluster.link.enabled=true
cluster.link.autodetect.address=<COMMONLY_ACCESSIBLE_IP_AND_PORT>
lucene.commit.batch.size=1
lucene.commit.time.interval=5000
lucene.replicate.write=true
ehcache.cluster.link.replication.enabled=true
cluster.link.channel.properties.control=<PATH_TO_XML>
cluster.link.channel.properties.transport.0=<PATH_TO_XML>
portal.instance.protocol=http
portal.instance.http.port=8080
setenv.sh
-Djava.net.preferIPv4Stack=true
-Djgroups.bind_addr=<IP_OF_THE_NODE>
edit clusterlink_control and clusterlink_transport files by Liferay tutorials
when servers shutted down delete contents of data/lucene and in Control Panel run reindaxation on one node
At the end, Lucene replication IS WORKING. What I think could be significant is following. At first, portal.properties explanation on keys lucene.commit.* is kind of hard to comprehend. By trial and error I found out that these two keys are in AND relation. Also, I found out about portal.instance.* keys which are used for multiple purposes in clustering and can matter if you have loadbalancers and/or Apaches between the nodes and autodetect fails.

There are multiple ways to configure search clustering in Liferay. If you use the lucene.replicate.write=true way, you're looking at several reindexing runs: On every restart of a server you must reindex that server's documents, as it might have missed indexing requests when it was down.
So, short answer: Don't worry, reindex both. Sooner or later you'll do it anyways, no matter if you need only one now.

Related

nixos etcd.pem (kubernetes)

While trying to install Kubernetes on nixos, using the following stanza:
services.kubernetes.masterAddress = "XXXXXX";
users.users.XXXXXX.extraGroups = [ "kubernetes" ];
services.kubernetes = {
roles = ["master" "node"];
};
I hit the following issue:
open /var/lib/kubernetes/secrets/etcd.pem: no such file or directory
I recognize this as a TLS/SSL certificate, but how should I go about generating that file?
The article you used is really old. It was published 2017-07-21 so almost 2,5 years ago. You can be pretty sure it's outdated in one way or another however major NixOS approach to setting up kubernetes cluster from end user perspective may have not changed a lot during this time.
So, after familiarizing with it a bit more... I see that this is actually yet another approach to installing kubernetes cluster and it has nothing to do with "the hard way" I mentioned in my previous comment. On the contrary, it's the easiest kubernetes cluster setup I've ever seen. Actually you don't have to do anything but add a single entry in your configuration.nix and then run nixos-rebuild switch and you can expect everything to be up and running. But there is really a lot, not just a few things that NixOS takes care about "under the hood". Generating proper certificates is just one of many steps involved in kubernetes cluster setup. Keep in mind that Kubernetes installation from scratch is pretty complex task. Take a brief look at this article and you'll see what I mean. This is really amazing thing for educational purposes as there is probably no better way to understand something in-deep, than to build it from scratch, in the possibly most manual way.
On the other hand, if you just need to set up relatively quickly a working kubernetes cluster, Kubernetes the Hard Way won't be your choice. Fortunatelly there are a few solutions that give you possibility to set up your kubernetes cluster relatively quickly and simply.
One of them is Minikube.
The other one which gives you possibility to set-up multi-node kubernetes cluster is kubeadm.
Going back to NixOS, I'm really impressed by how simple it is to set up your kubernetes cluster on this system, provided everything works as expected. But what if it doesn't ( and this is mainly what your question was about ) ? You may try to debug it on your own and try to look for a workaround of your issue or simply create an issue on NixOS project github page like this one. As you can see someone already reported exactly the same problem as yours. They say that on the 18.09 release it works properly so probably you're using newer version like 19.03. You can further read that there were some major changes like moving to mandatory pki in 19.03.
Take a closer look at this issue if you're particularly interested in running kubernetes on NixOS as there are a few advices and workarounds described there:
https://github.com/NixOS/nixpkgs/issues/59364#issuecomment-485122860
https://github.com/NixOS/nixpkgs/issues/59364#issuecomment-485249797
First of all make sure that your masterAddress is set properly i.e. as hostname, not ip address. As you put there only "XXXXXX" I can't guess what is currently set there. It's quite likely that when you set it e.g. to localhost, appropriate certificate would be generated properly:
services.kubernetes = {
roles = ["master"];
masterAddress = "localhost";
};
You may also want to familiarize with this info in NixOS docs related with Kubernetes.
Let me know if it helped.

What's elasticsearch and is it safe to delete logstash?

I have an internal Apache server for testing purpose, not client facing.
I wanted to upgrade the server to apache 2.4, but there is no space left, so I was trying to delete some files on the server.
After checking file size, I found a folder /var/lib/elasticsearch takes 80g space. For example, /var/lib/elasticsearch/elasticsearch/nodes/0/indices/logstash-2015.12.08 takes 60g already. I'm not sure what's elasticsearch. Is it safe if i delete this logstash? Thanks!
Elasticsearch is a search engine, like a NoSql database, and it stores the data in indeces. What you are seeing is the data of one index.
Probobly someone was using the index aroung 2015 when the index was timestamped.
I would just delete it.
I'm afraid that only you can answer that question. One use for logstash+elastic search are to help make sense out of system logs. That combination isn't normally setup by default, so I presume someone set it up at some time for some reason, and it has obviously done some logging. Only you can know if it is still being used, or if it is safe to delete.
As other answers pointed out Elastic search is a distributed search engine. And I believe an earlier user was pushing application or system logs using Logstash to this Elastic search instance. If you can find the source application, check if the log files are already there, if yes, then you can go ahead and delete your index. I highly doubt anyone still needs the logs back from 2015, but it is really your call to see what your application's archiving requirements are and then take necessary action.

How to know when dataimporthandler for solr finished indexing?

Is it possibile to know where solr finished to index my data?
I work with solrcloud 4.9.0 and zookeeper for conf file manager.
I have the data.import file, but in it there is only where the indexing is STARTED not when it ended.
You can get the dataimporthandler status using:
<MY_SERVER>/solr/dataimport?command=status
Reading the status you can understand if the import is still running. A similar procedure (with a different url) is advised in "Solr in Action" book in order to check if a backup is still running.
Another option would involve the use of listeners as advised here.
I also use the /dataimport?command=status way to check if the job is done or not, and while it works, sometimes I get the impression it is a bit flaky.
There are listeners you can use: see here I would really like to use those, but of course you need to write java code and handle your jar in solr etc. So it is a bit of a PITA

What is the significance of data-config.xml file in Solr?

and when shall I use it? How is it configured can anyone please tell me in detail?
The data-config.xml file is an example configuration file for how to use the DataImportHandler in Solr. It's one way of getting data into Solr, allowing one of the servers to connect through JDBC (or through a few other plugins) to a database server or a set of files and import them into Solr.
DIH has a few issues (for example the non-distributed way it works), so it's usually suggested to write the indexing code yourself (and POST it to Solr from a suitable client, such as SolrJ, Solarium, SolrClient, MySolr, etc.)
It has been mentioned that the DIH functionality really should be moved into a separate application, but that hasn't happened yet as far as I know.

Using SQL for cleaning up JIRA database

Has anyone had luck with removing large amount of issues from a jira database instead of using the frontend? Deleting 60000 issues with the bulktools is not really feasible.
Last time I tried it, the jira went nuts because of its own way of doing indexes.
How about doing a backup to xml, editing the xml, and reimporting?
We got gutsy and did a truncate on the jiraissues table and then use the rebuild index feature on the frontend. It looks like it's working!
This is old, but I see that this question was just edited recently, so to chime in:
Writing directly to the JIRA database is problematic. The reindex feature suggested in the Oct 14 08 answer just rebuilds the Lucene index, so it is unlikely to clean up everything that needs to be cleaned up from the database on a modern JIRA instance. Off the top of my head, this will probably leave data lying around in the following tables, among others:
custom field data (customfieldvalue table)
issue links (issuelink table)
versions and components (nodeassociation table, which contains other stuff too, so be careful!)
remote issue links or wiki mentions (remotelink table)
If one has already done such a manual delete on production, it's always a good idea to run the database integrity checker (YOURJIRAURL/secure/admin/IntegrityChecker!default.jspa) to make sure that nothing got seriously broken.
Fast forwarding to 2014, the best solution is to write a quick shell script that uses the REST API to delete all of the required issues. (The JIRA CLI plugin is usually a good option for automating certain types of tasks too, but as far as I can tell, it does not currently support the deletion of issues, so the REST API is your best bet.)