Redisgraph create index command timed out - redisgraph

Command is timing out when creating an index.
When I try to create an index on facilityNumber
GRAPH.QUERY GRAPH_NAME "CREATE INDEX ON :node(facilityNumber)"
I'm getting a timed out exception
CLI ERROR: Command timed out. Blocking commands are not supported
More Context:
My graph is constructed using redislab's bulk insert python script.
Graph consisting of 1214 nodes and 152846 relations.
node does contains facilityNumber when queried against.
With redisgraph running in docker, using image redislabs/redismod

On which type of machine are you running docker?
also what happens when you switch to redisgraph docker image instead of redislabs/redismod ?

Related

Error When Creating Search Index in Cassandra Enterprise (DSE)

I'm trying to create a search index on my table in DSE 6.8. This is my table in the test keyspace:
CREATE TABLE users (
username text,
first_name text,
last_name text,
password text,
email text,
last_access timeuuid,
PRIMARY KEY(username));
I tried this query:
CREATE SEARCH INDEX ON test.users;
and this is the response:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Search statements are not supported on this node"
I think there must be something that I should change in some file for DSE to support search statements. I've already set the SOLR_ENABLED in /etc/default/dse to 1. I'm totally new to this and I don't know if there's something wrong with my table or anything else.
can anyone suggest what might be causing this error? Thanks in advance.
As the error message suggests, you can only create a Search index on DSE nodes running in Search mode.
Check the node's workload by running the command below. It will tell you if the node is running in pure Cassandra mode or Search mode.
$ dsetool status
If you have installed DSE using the binary tarball, it doesn't use /etc/default/dse. Instead start DSE as a standalone process with the -s flag to start it in Search mode:
$ dse cassandra -s
Cheers!

Can't access external Hive metastore with Pyspark

I am trying to run a simple code to simply show databases that I created previously on my hive2-server. (note in this example there are both, examples in python and scala both with the same results).
If I log in into a hive shell and list my databases I see a total of 3 databases.
When I start Spark shell(2.3) on pyspark I do the usual and add the following property to my SparkSession:
sqlContext.setConf("hive.metastore.uris","thrift://*****:9083")
And re-start a SparkContext within my session.
If I run the following line to see all the configs:
pyspark.conf.SparkConf().getAll()
spark.sparkContext._conf.getAll()
I can indeed see the parameter has been added, I start a new HiveContext:
hiveContext = pyspark.sql.HiveContext(sc)
But If I list my databases:
hiveContext.sql("SHOW DATABASES").show()
It will not show the same results from the hive shell.
I'm a bit lost, for some reason it looks like it is ignoring the config parameter as I am sure the one I'm using it's my metastore as the address I get from running:
hive -e "SET" | grep metastore.uris
Is the same address also if I run:
ses2 = spark.builder.master("local").appName("Hive_Test").config('hive.metastore.uris','thrift://******:9083').getOrCreate()
ses2.sql("SET").show()
Could it be a permission issue? Like some tables are not set to be seen outside the hive shell/user.
Thanks
Managed to solve the issue, because a communication issue the Hive was not hosted in that machine, corrected the code and everything fine.

Pig script on aws emr with tez occasionally fails with OutOfMemoryException

I have a pig script running on an emr cluster (emr-5.4.0) using a custom UDF. The UDF is used to lookup some dimensional data for which it imports a (somewhat) large amout of text data.
In the pig script, the UDF is used as follows:
DEFINE LookupInteger com.ourcompany.LookupInteger(<some parameters>);
The UDF stores some data in Map<Integer, Integer>
On some input data the aggregation fails with an exception as follows
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.String.split(String.java:2377)
at java.lang.String.split(String.java:2422)
[...]
at com.ourcompany.LocalFileUtil.toMap(LocalFileUtil.java:71)
at com.ourcompany.LookupInteger.exec(LookupInteger.java:46)
at com.ourcompany.LookupInteger.exec(LookupInteger.java:19)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:330)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextInteger(POUserFunc.java:379)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:347)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POBinCond.genericGetNext(POBinCond.java:76)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POBinCond.getNextInteger(POBinCond.java:118)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:347)
This does not occur when the pig aggregation is run with mapreduce, so a workaround for us is to replace pig -t tez with pig -t mapreduce.
As i'm new to amazon emr, and pig with tez, i'd appreciate some hints on how to analyse or debug the issue.
EDIT:
It looks like a strange runtime behaviour while running the pig script on tez stack.
Please note that the pig script is using
replicated joins (the smaller relations to be joined need to fit into memory) and
the already mentioned UDF, which is initialising a Map<Integer, Interger> producing the aforementioned OutOfMemoryError.
We found another workaround using tez backend. Using increased values for mapreduce.map.memory.mb and mapreduce.map.java.opts (0.8 times of mapreduce.map.memory.mb). Those values are bound to the ec2 instance types and are usually fixed values (see aws emr task config).
By (temporarily) doubling the values, we were able to make the pig script succeed.
The following values were set for a m3.xlarge core instance, which has default values:
mapreduce.map.java.opts := -Xmx1152m
mapreduce.map.memory.mb := 1440
Pig startup command
pig -Dmapreduce.map.java.opts=-Xmx2304m \
-Dmapreduce.map.memory.mb=2880 -stop_on_failure -x tez ... script.pig
EDIT
One colleague came up with the following idea:
Another workaround for the OutOfMemory: GC overhead limit exceeded could be to add explicit STORE and LOAD statements for the problematic relations, that would make tez flush the data to storage. This could also help in debugging the issue, as the (temporary, intermediate) data can be observed with other pig scripts.

Redshift drop/create/select query failing in Data Pipeline

I'm trying to run a daily migration script in Redshift using Data Pipeline.
The script works as expected when I run it directly using SQL Workbench/J, but fails when triggered through Data Pipeline.
I have reproduced the problem with this simple code:
drop table if exists image_stg;
create table image_stg (like image_full);
select * from image_stg;
When I run it in Data Pipeline, I get this error:
[Amazon](500310) Invalid operation: relation "image_stg" does not exist;
I also got this error once, for the exact same code, without changing anything:
[Amazon](500310) Invalid operation: Relation with OID 108425 does not exist.;
Here's a screenshot of the two error messages:
I've found this thread on the AWS forums, but it didn't help: Pipeline started failing on simple Redshift SqlActivity and temp table
What is causing this error? Is there a workaround?
I've contacted Amazon, and it looks like a problem in Data Pipeline.
They did suggest a workaround that seems to work in my case: Change the JDBC connection string from jdbc:redshift://… to jdbc:postgresql://… .
I had the same problem when creating a temporary table in Redshift via Pipeline but the workaround of changing the connection string from jdbc:redshift://… to jdbc:postgresql://… didn't work for me though. My last resort is to create the table as physical table and drop it after use - through Pipeline.

sonar 5.1.1 analysis results for different branches giving timeouts with mysql db

We are using Sonarqube 5.1.1 with a MySQL database. We are facing timeout issues with the database. We ran the MySQL tuning primer script and made some changes to the InnoDB timeout (increased it in /etc/my.cnf), but it made no difference. One of the suggestions from mysl tuner output is :
"of 7943 temp tables, 40% were created on disk"
Note: BLOB and TEXT columns are not allowed in memory tables.
Are there any suggestions for dealing with Sonar analysis results for a bunch of different branches?
Perhaps using Postgres instead of MySQL?
We get errors as shown below:
Failed to process analysis report 8 of project "X"
org.apache.ibatis.exceptions.PersistenceException:
Error committing transaction. Cause: org.apache.ibatis.executor.BatchExecutorException:
org.sonar.core.issue.db.IssueMapper.insert (batch index #1) failed.
Cause: java.sql.BatchUpdateException: Lock wait timeout exceeded; try
restarting transaction
Cause: org.apache.ibatis.executor.BatchExecutorException: org.sonar.core.issue.db.IssueMapper.insert (batch index #1) failed.
Cause: java.sql.BatchUpdateException: Lock wait timeout exceeded; try
restarting transaction