DSE: Synonym Search (Thesaurus) - datastax

I am currently trying to get synonym search to work on multiple nodes via Datastax Studio if possible. The installation was done through OpsCenter with 3 nodes using DSE Search + Graph. I have seen various posts regarding this and the approaches consists of directly changing schema.xml. However, as I am using multiple nodes I am unsure if that is the way to approach this.
I have also tried to look information from datastax docs but couldn't find what i needed, hence, any advise or directions on this would be greatly appreciated. In an Ideal world it would be great if I am able to do synonym search through the graph(gremlin) interface.

For Graph the only option is to create necessary indexes, then get the existing schema.xml via dsetool get_core_schema, do modifications, and load it again with dsetool reload_core - the reloading of core should be done in every data center, but not on every machine of the data center...
Don't forget, that before reloading the core with modified schema, you need to upload synonym files with dsetool write_resource.
See the DSE Search documentation for full list of the options for dsetool command.

Related

Importing RedisTimeSeries data into Grafana

I've got a process storing RedisTimeSeries data in a Redis instance on Docker. I can access the data just fine with the RedisInsight CLI:
I can also add Redis as a data source to Grafana:
I've imported the dashboards:
But when I actually try to import the data into a Grafana dashboard, the query just sits there:
TS.RANGE with a value of - +, or two timestamps, also produces nothing: (I do get results when entering it into the CLI, but not as a CLI query in Grafana.
What could I be missing?
The command you should be using in the Grafana dashboard for retrieving and visualising the data in time series stored in Redis with RedisTimeSeries is TS.RANGE for a specific key, or TS.MRANGE in combination with a filter that selects a set of time series matching this filter. List of commands with RedisTimeSeries: https://oss.redislabs.com/redistimeseries/commands/ (you're using TS.INFO which does only retrieve metadata of time series key, not the actual samples within)
So I looked into this a bit more. Moderators deleted my last answer because it didn't 'answer' the question.
There is a github issue for this. One of the developers also responded. It is broken and has been for awhile. Grafana doesn't seem to want to maintain this datasource at the moment. IMHO they should remove the redis timeseries support from their plugin library if it isn't fully baked.
[redis datasource issue for TS.RANGE]
[1]: https://github.com/RedisGrafana/grafana-redis-datasource/issues/254
Are you trying to display a graph (eg, number of people vs time)? If so, perhaps that TS.INFO is not the right command and you should use something like TS.MRANGE.
Take a look at
https://redislabs.com/blog/how-to-use-the-new-redis-data-source-for-grafana-plug-in/
for some more examples.

Elasticsearch access control based on field value

I am currently investigating the ELK (Elasticsearch, Logstash, Kibana) stack for centralized log file analysis.
The plan is to store logs of multiple applications in the same Elasticsearch cluster using logstash and day-based indexes.
All documents contain a field called application, e.g."application": "superapp".
Now we are looking for a way to implement access control like this:
A) Superuser: is able to see log entries of all applications.
B) Developer: can only see log entries of the applications he is allowed to. For example the dev team for application "superapp" should only be able to see the entries for this application.
To wrap it up: we need access control based on the value in the field application.
While reading the documentation for Elastisearch and Shield I could not find an obvious way to do it.
Any ideas how we could realize this in a way that would also work with Kibana 3 and 4?
My first idea was to use aliases which are being automatically assigned to documents using index templates. I am wondering if this is the right direction.
I asked this question here on the elasticsearch Google Group and got this reply:
"You can separate out the different types of logs into their own indices which would make things much easier, you could also setup an alias with a filter and then provide access to that alias to certain users.
Currently KB isn't multi-tenanted but it is a feature that is going to be added, you'd have to setup multiple instances with each going to their own alias."
To sum it up: multi-tenancy needs to addressed at the frontend (Kibana) and the backend (Elasticsearch).
Frontend: Use Proxies for Kibana
https://github.com/salyh/elastic-defender
https://github.com/fangli/kibana-authentication-proxy
Backend: Several approaches using filtered alias and alias templates
Limiting Indexes and Operations
Faking Index per User with Aliases -
http://engineering.aweber.com/using-elasticsearchs-aliases/
http://opennomad.com/content/controlling-access-elasticsearch-filtered-aliases-nginx-and-tokens

Merge two Endeca Servers (Endeca 3.1) into one. Including their current data

Let me explain in more detail:
1st: I'm running endeca 3.1, so Endeca Server here refers to 3.0's Data Domain.
I'm required to use an Endeca Server currently present on Endeca (Downloaded a Demo VM). All the info on it, including, groups, attributes and data, must be merged into out Endeca Server. (It can also be the other way around, i could merge my Endeca Server into this one.)
So far, i've tried to do the following:
1) Clone the Endeca Server
2) use the putCollection sconfig operation to create a collection on it with the same name i have on mine.
3) Load configurations using the LoadCollection & LoadAttributes graphs from OEID POC Template 3.1. I point to the new collection on the Configuration.xls file.
This is where i encounter an issue. The LoadAttributes graph gets a T/O message from the server's WS. Then the config WSDL becomes inaccesible for a while. I can't go beyond this point.
I've been able to load data into the collection, but i need to load the attributes first.
THanks in advance for your replies.
Regards
There are a few techniques.
Have you tried exporting the data domain and then importing it?
You can use the endeca-cmd tools to export to a file, and then import from that file. This would enable you to add 2 datastores into one server.
If you want to combine 2 datastores then that is a different question.
The simplest approach in 3.1 if the data collections are small. Extract then as CSV (via a data-table), convert to XLS and add them via self provisioning into separate collections within a single data store. If you are running in the VM this is potentially the easiest approach.
This can also be done using Integrator.
You don't need to load the attributes unless you are using multi-value types. You can call against the conversation web-service to extract data and then load it using 'bulk-load' I would not worry too much about creating the attributes unless this becomes essential due to their type or complexity. If you cannot call against the conversation web-service, then again extract as csv and load using Integrator.

Can't migrate custom Plone file types to Blobs

We have custom content types that were created as extensions of the ATTypes, two of them extend the ATFile type and one extends the ATImage type. We recently upgraded from Plone 4.2 to Plone 4.3.2. Just discovered we are not using Blob storage at all. No wonder our Data.fs is HUGE. So, I have been trying to migrate these custom types.
I have followed all of the steps explained in this example and the product's notes from pypi, these Plone instructions, and used the example from the pypi page for archetypes.schemaextender (Sorry, since I'm still a noob my reputation won't let me post more than 2 links).
In the end, I created an extender script that just extends the ATFile type changing the FileField to BlobField. It seems to be working for new items. I can add a new CustomFileType and it appears to be uploading the file to blob, and my new upload field is showing (I changed the description as a quick way to verify which one it was using).
However, I am having a problem migrating all existing content items to move the binary files over to blob. I tried the generic migrate() script, then I created my own migrate and walker as suggested in the above resources. It doesn't seem like it is doing anything though. When printing results for each item it tries merging, I do see this returned for each item:
DEBUG ATCT.migration Migrating /site/path/to/custom/file/filename.ext (CustomFile -> Blob)
When I navigate to the custom file type in the site, where it usually shows the link to the file, it is just empty. Then going to edit, it treats it as if there is no file there. As a check, I disabled the extender, restarted, and reloaded the custom file. The file was there now. So it looks like the script I am running just isn't moving that file over to where it should be now.
I feel like I am missing something simple, and it is right there, but I can't seem to find it. All of this is learn as I go and a bit over my head, so hopefully someone can easily set me straight.
If I need to provide any additional information leave a comment and I will try to provide what you need.
UPDATE
I used the Red Turtle objects as examples to migrate my custom types as suggested by keul. I still was not able to get the file to migrate to blob within the type itself. So, I tried a different approach. I created a new custom type "CustomBlob", that is a mimic setup of my CustomFile type, and only extended this new blob type to be blob aware. Then I migrated the CustomFiles to CustomBlob, did a complete clear and rebuild, and packed the zeo. The migration seemed to work for the most part, the blobstorage grew by an expected amount, the new types worked. However, the Data.fs didn't go down in size. I would have thought that the binary files that were stored in Data.fs would be removed during the migration. Am I understanding this incorrectly? How can I remove these files so the Data.fs size goes down appropriately?
Not sure if this is the best solution, but here is how I was able to get this to work.
I created temporary content types parallel of each type (for CustomImage I made CustomImageBlob, and so on). I made the new types blob-aware only, migrated all types to their parallel. Then I enabled the extender for the original types to make them blob-aware, and migrated back. It is a little redundant and time consuming, but I just could not get the files to migrate to blob when migrating to itself.
Providing this as the best answer so far in case it helps someone else, or might encourage someone to find a better solution. Thanks for the tip keul, it definitely helped me get to this solution.

Query RavenDB without using the studio interface

I am trying to view my sagas in the RavenDB management studio, and loading even the initial page, all that I see is this "Querying documents..." box with a continuous moving progress bar. I can not seem to get past it, going from page to page it does not go away. Is there a way to pull all of the saga data into a list so I can look at it? It appears the issue is that the saga documents are continuously being added.
I've looked into the HTTP API and the Linq adapters, but I guess I am looking for something that already exists that can easily peer into the server much like the silverlight studio, except not such a pain. I more or less just want to pull a snapshot of all the documents into some kind of readable list.
I find LINQPad 4 convenient, the RavenDB driver for LINQPad can be found here:
https://github.com/ronnieoverby/RavenDB-Linqpad-Driver
For the command line - cURL using dynamic indexes as explained here:
http://ravendb.net/docs/http-api/indexes/dynamic-indexes
In the browser, go to http://localhost:8080/docs
You might need to install JsonView, but that should give you what you want.
If anyone wants to know how to browse the data through REST call,
"localhost:8080/databases/{database-name}/docs/{dataset-name}/id"
example:
"localhost:8080/databases/testDB/docs/Sites/1"
will give the json data for the "Sites" document
"localhost:8080/databases/testDB/docs/"
will give the json data for all the documents in
testDB.