I've got SparkController 2.0.0 running on a HDP 2.4.3 with Spark 1.6.2
In the configuration I have these parameters configured:
sap.hana.es.enable.cache=true
sap.hana.es.cache.max.capacity=500
sap.hana.hadoop.datastore=Hive
I've got HANA 1.00.122 connected to that Spark Controller, set enable_remote_cache parameter to true in indexserver.ini, and imported one of exposed Hive tables as a virtual table in HANA.
Then I ran select-statements against that virtual table, but every time I see that no cache is created (nothing in the Storage tab of Spark UI), nor it is hit (query runtime doesn't drop, and I see the job going through the same stages every time).
Using the hint "with hint (USE_REMOTE_CACHE)" doesn't help either.
Are there any other settings I forgot to make?
In order to enable remote caching for federated queries to Hive from HANA you must also set the HANA parameter enable_remote_cache = true
For more info see the bottom of this page:
https://help.sap.com/viewer/6437091bdb1145d9be06aeec79f06363/2.0.1.0/en-US/1fcb5331b54e4aae82c0340a8a9231b4.html
Accordingly to SAP, the HANA version for caching to work should be 2.0+.
Related
Can I get some advice on whether it is possible to proceed like the steps below?
SQL Server data is loaded in Ignite Cluster
The data in SQL Server has been changed.
-> Is there any other way to reflect this changed data without reloading the data from SQL Server?
When used as a cache in front of the database, when changes are made directly to the DB without going through the Ignite Cluster, can the already loaded cache data be directly reflected in the Ignite cache?
Is it possible to set only the value to change without loading the data again?
If possible, which part should I set? Please.
I suppose the real question is - how to propagate changes applied to SQL Server first to the Apache Ignite cluster. And the short answer is - you need to do it by yourself, i.e. you need to implement some synchronization logic between the two databases. This should not be a complex task if most of the data updates come through Ignite and SQL Server-first updates are rare.
As for the general approach, you can check for the Change Data Capture (CDC) pattern implementations. There are multiple articles on how you can achieve it using external tools, for sample, CDC Between MySQL and GridGain With Debezium or this video.
It's worth mentioning that Apache Ignite is currently working on its own native implementation of CDC.
Take a look at Ignite's external storage integration, and the read/write through features. See: https://ignite.apache.org/docs/latest/persistence/external-storage
and https://ignite.apache.org/docs/latest/persistence/custom-cache-store
examples here: https://github.com/apache/ignite/tree/master/examples/src/main/java/org/apache/ignite/examples/datagrid/store
I have a hive table lets say it as table A. My requirement is to capture all the DML and DDL operations on table A in table B. Is there any way to capture the same?
Thanks in advance..
I have not come across any such tool however Cloudera Navigator helps to manage it. Refer the detailed documentation.
Cloudera Navigator
Cloudera Navigator auditing supports tracking access to:
HDFS entities accessed by HDFS, Hive, HBase, Impala, and Solr
services
HBase and Impala
Hive metadata
Sentry
Solr
Cloudera Navigator Metadata Server
Alternatively, if you are not using cloudera distribution, you can still access hive-metastore log file under /var/log/hive/hadoop-cmf-hive-HIVEMETASTORE.log.out and check the changes applied to the different table.
I haven't used Apache atlas yet, but from the documentation, it looks like they have Audit store and hive bridge. That works for operational events as well.
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/atlas-overview/content/apache_atlas_features.html
I know the question is a little bit strange. I love Hadoop & HDFS, but recently work on SparkSQL with Hive Metastore.
I want to use SparkSQL as a vertical SQL engine to run OLAP query across different datasources like RDB, Mongo, Elastic ... without ETL process. Then I register different schema as external tables in Metastore with corresponding Hive storage Handler.
Moreover, HDFS is not used as a datasource in my work. Then, given Map/R is already replaced by Spark engine. That sound to me that Hadoop/HDFS is useless but to base the installation of Hive. I don't want to buy them all.
I wonder If I only start Hive metastore service without Hadoop/HDFS to support SparkSQL, what kind of issue will happen. Would I put myself into the jungle?
What you need is "Hive Local Mode" (search for "Hive, Map-Reduce and Local-Mode" in the page).
Also this may help.
This configuration is only suggested if you are experimenting locally. But in this case you only need the metastore.
Also from here;
Spark SQL uses Hive Metastore, even if when we don't configure it to . When not configured it uses a default Derby DB as metastore.
So this seems to be quite legal;
Arrange your metastore in Hive
Start Hive in local mode
And make Spark use Hive metastore
Use Spark as an SQL engine for all datasources supported by Hive.
I am trying to connect tableau desktop to Hadoop Hive connection. I want to change the default hive engine to TEZ . How can i change this parameter through tableau desktop?
One thing which I have figured it out is, you can use :
initial SQL
option to set default engine to TEZ. However problem with this is, it only gets fired when firing first tableau query, for subsequent queries again engine related changes are reverted back.I am in research on this with Tableau, and will post once I hear anything concrete.
I am working on a remote database which has several master tables. the meta-data & the actual data in these tables changes rarely.
When querying DB involving these tables and using certain functions (Ex: ctrl+space to auto-complete a table/column name), it takes too long to query the remote DB to fetch this data since its not cached locally.
Is there any extension/plug-in/configuration in SQLDeveloper to do this.
(Oracle SQLDeveloper Version 1.5.1 Build MAIN-5440)
Try to update to version 2.1
Use a tool like SQuirreL to build your queries and then copy them into SQLDeveloper. SQuirreL caches the metadata.
You could create a view on the local DB, that would keep the metadata local.