Impala connection management best practice - impala

I want to use Cloudera Impala within Clodera Hadoop 2.6.0-cdh5.10.0 in order to execute some queries in java with best performance under high-load conditions.
I have already read official documentation https://www.cloudera.com/documentation/enterprise/5-10-x/topics/impala_jdbc.html but there are a few moments I didn't understand clearly.
I have use Hive dependency in order to connect to Impala
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.1.0-cdh5.10.0</version>
<classifier>standalone</classifier>
</dependency>
Now please tell me what is a best way to create and dispose Impala connections:
Should I use DriverManager.getConnection inside try-with-resource block? Will it cause new physical connection creating for each query? Another words, is that a heavy operation to create connection this way?
Is there connection pool on server side? Should I use DataSource.getConnection (just like in JavaEE) or something else in order to use this connection pool?
Or should I use some 3rd-part libraries like org.apache.commons:commons-dbcp2 to create connection pool on client side?
Thanks for your time

Related

Select data from DB using RabbitMQ

I need advice for design pattern how to use RabbitMQ to select data from Database.
RabbitMQ looks very good solution for inserting and updating data into Database but what about selecting data from DB?
In my case I have REST API module and Database module connected to Maria DB which communicate via queues.
REST API module -> Database module -> Maria DB
But I need to select configuration from database via database module. I can use RPC as a solution but probably there is better way?
Can you advice?
In general, some sort of RPC is the way to go.
However: The point of a queue (asynchronous tasks) is the opposite of a database select (return my data now). If the direct database select requests are performing adequately, use them, avoid the extra complexity. Or some caching system for your config. This might not work for your system architecture and load needs, but is simpler.

SQL connection pooling in Azure Functions

In traditional webservers you would have a SQL connection pool and persistent connection to the database.
But I am thinking of creating my entire application as Azure Functions.
Will the functions create a new connection the SQL server everytime its called upon?
Azure Functions doesn't currently have SQL as an option for an input or output binding, so you'd need to use the SqlClient classes directly to make your connections and issue your queries.
As long as you follow best practices of disposing your SQL connections (see this for example: C# SQLConnection pooling), you should get pooling by default.
Here's a full example of inserting records into SQL from a function: https://www.codeproject.com/articles/1110663/azure-functions-tutorial-sql-database
Although this is already answered, I believe this answer can provide more information.
If you are not using connection pool then probably you are creating connection every time function is invoked. Creating connection has associated cost, for warmed up instances it is recommended to use connection pool. max number of connection should also be chosen cautiously since there can be couple of parallel functions app running (as per plan).
This is example of connection pool.

Connection -MongoDB & SQL Server

I need to create a connection between ##MongoDB## and ##SQL Server## where I want to replicate a subset of my Database from SQL Server into MongoDB. Can anyone suggest for feasibility of the same and how ?
Right now I am using symmetricDS for the replication but unable to...
Please suggest if symmetricDS is able to serve for this purpose.
Here is how you target MongoDB:
http://www.symmetricds.org/doc/3.8/html/user-guide.html#_mongodb
If you need more flexibility than straight table to table mapping, then you would write your own data loader using the MongoDatabaseWriter as a pattern.
https://github.com/JumpMind/symmetric-ds/tree/0c5cc1c24b42a64405f4b79c3cb6b594a35467f2/symmetric-client/src/main/java/org/jumpmind/symmetric/io
Got an easy way around for the Data Exchange from SQL to MongoDB using:
SQLtoMongo C# Tool
KNIME Analytics Platform (way easy to implement - Opensource)
But still looking for something with triggers to easily replicate the things.

Execute sql statement before normal execution with aop

I'm trying to create a Multitenant application with spring.
I'm trying to have a different schema for each Tenant on a PostgreSql database.
I first created a TenantAwareDataSource extending org.springframework.jdbc.datasource.AbstractDataSource where basically I manage a Map of org.apache.commons.dbcp.BasicDataSource, configuring setConnectionInitSqls() for each tenant. (The datasource the project had before was org.apache.commons.dbcp.BasicDataSource)
But then discussing it with a friend, we came up with the idea of changing the schema for every statment executed with an aspect (aop), simply adding a set search_path to statement just before normal execution.
This could greatly simplify the problems related to having too many connections to the database (a connection pool for every tenant at any given time).
Has anybody executed additional statements using AOP?
Any pitfalls to overcome?
I'm thinking on put back org.apache.commons.dbcp.BasicDataSource and intercept java.sql.Statements.exe*(..)
I'm not very experienced with Spring persistence. Or SQL statement execution interception for that matter (haha).
Is it ok?
I found this article but I don't think I need to obtain a reference for each connection.
Am I right?
Also found this one. The author is using org.springframework.jdbc.core.JdbcOperations. Not sure it's the case in my Spring Roo generated project.
Thank you all.

EF and TransactionScope for both SQL Server and Oracle without escalating/spanning to DTC?

Can anyone update me on this topic?
I want to support both SQL Server and Oracle in my application.
Is it possible to have the following code (in BL) working for both SQL Server and Oracle without escalating/spanning to distributed transactions (DTC) ?
// dbcontext is created before, same dbcontext will be used by both repositories
using (var ts = new TransactionScope())
{
// create order - make use of dbcontext, possibly to call SaveChanges here
orderRepository.CreateOrder(order);
// update inventory - make use of same dbcontext, possibly to call SaveChanges here
inventoryRepository.UpdateInventory(inventory);
ts.Complete();
}
As of today, end of August 2013, I understand that it works for SQL Server 2008+ ... but what about Oracle? I found this thread... it looks like for Oracle is promoting to distributed transactions but is still not clear to me.
Does anyone have experience with writing apps to support both SQL Server and Oracle with Entity Framework to enlighten me?
Thanks!
Update: Finally I noticed EF6 comes with Improved Transaction Support. This, in addition to Remus' recommendations could be the solution for me.
First: never use var ts = new TransactionScope(). Is the one liner that kills your app. Always use the explicit constructor that let you specify the isolation level. See using new TransactionScope() Considered Harmful.
Now about your question: the logic not to promote two connections in the same scope into DTC relies heavily on the driver/providers cooperating to inform the System.Transactions that the two distinct connections are capable of managing the distributed transaction just fine on their own because the resource managers involved is the same. SqlClient post SQL Server 2008 is a driver that is capable of doing this logic. The Oracle driver you use is not (and I'm not aware of any version that is, btw).
Ultimately is really really really basic: if you do not want a DTC, do not create one! Make sure you use exactly one connection in the scope. It is clearly arguable that you do not need two connections. In other words, get rid of the two separate repositories in your data model. Use only one repository for Orders, Inventory and what else what not. You are shooting yourself in the foot with them and you're asking for pixie dust solutions.
Update: Oracle driver 12c r1:
"Transaction and connection association: ODP.NET connections, by default, detach from transactions only when connection objects are closed or transaction objects are disposed"
Nope, DTC is needed for distributed transactions - and something spanning 2 different database technologies like this is a distributed transaction. Sorry!