Join on Hive on different keyspace tables on DSE

Join on Hive on different keyspace tables on DSE - hive

Is this possible to run JOIN query on DSE Hive with different keypsaces tables in Cassandra ?
I'm trying to execute below query with no success
hive> select * from mykeyspace1.table1 a JOIN keyspace_185.table_508 b on a.companyid=b.companyid limit 10;
there are two KEYSPACES mykeyspace1 and keyspace_508.
In my case map reduce run with no error but not showing any result.
Thanks in advance !

It works for me for a simple test. select a.name, b.state from test7.test1 a join test8.test1 b on a.name = b.name; Maybe something wrong with the data or join condition.

Related

How to compute loop join in Dataiku

With Dataiku, I am trying to compute multiple joins across the same table in Bigquery. For example, my query would be (in a simple language) :
For i = 1 to 24 :
CREATE TABLE table0 as
SELECT
A.*,
B.column as column_i
FROM
table0 AS A
LEFT JOIN table_i AS B
ON A.id=B.id
How can I do this in a simple way ? I tried with a SQL script or notebook but it seems that Dataiku doesnt support the statement DECLARE for my variable i.

When using parameter for join clause in snowflake, getting wrong result. why?

I'm using dbeaver with connection to snowflake database.
I want to select data with join clause.
but I need to do it with parameters.
my code is:
select count(*) from my_table as a ${join}
var join = 'LEFT JOIN table_b AS b ON a.ID = b.ID AND b.NAME = a.NAME'
when I run the select statement (in dbeaver), I get pop up asking me to fill ${join} value,
I put the value in the textbox and the command runs. I get WRONG result! (1,254,242)
But when run the following command:
select count(*) from my_table as a LEFT JOIN table_b AS b ON a.ID = b.ID AND b.NAME = a.NAME
I get correct result (900,254)
anybody can help please? thank you.

General approach to troubleshoot such scenarios:
Check if you are connecting to the same DB/schema from client tool/WebUI
SELECT current_database(), current_schema();
Check if you are using the same user(there may be access issues or Row Level Security applied that could affect number of rows)
SELECT current_user(), current_role();
Check the exact query text sent by client tool in Snowflake History tab and compare against the one run manually.

Hive Index Performance

I have two tables Table A and Table B which are 100GB and 35GB in size respectively. Also both the tables are compact indexed on the same column which is prodID.
I am facing an issue here where I am getting the same response time with or without index for a below query. It takes 30 minute to process the query.
select a.* from TableA a inner join TableB b on a.prodID=b.prodID.
I have 19 nodes cluster setup. Can you please advise me if I am missing any configuration here or doing something wrong.
Regards,
Prabu

I think you should try putting large table i.e Table A at the last, or stream table A to improve performance. You can try following query to stream table.
select /*+STREAMTABLE(a)*/ a.* from TableA a inner join TableB b on a.prodID=b.prodID;
Please refer Tips using joins in hive for more information.

SQL join conditions order performance

Does JOIN conditions order affect pefrormance? I have two tables A and B. I'm trying to join them like that:
SELECT * FROM A
INNER JOIN B on B.ID_A = A.ID
In this case firebird use NATURAL plan instead using foreign key.
SELECT * FROM A
INNER JOIN B on A.ID = B.ID_A
works good.
Is that normal?

I guess you use Firebird with version older than 2.5.4 (probably 2.5.3). And it had bug http://tracker.firebirdsql.org/browse/CORE-4530 fixed in 2.5.4. Please upgrade to Firebird 2.5.5 and check whether your problem disappear.

i'm not familiar with firebird but usually it doesn't matter for inner join.
And in the new version of Oracle or SQLServer or MySQL or Postgresql it won't impact the performance. we have explain plan to check the Database choose the right way to Join you can try it.

This query is OK, but make sure B.ID_A is index, (for Fast performance)
Don't use * in sql query, use only required columns...
Your question is about performance, so i am thinking, you have very large data..
so Must use LIMIT with your query...
Example
SELECT A.ID, A.column1, B.column2, FROM A
INNER JOIN B on B.ID_A = A.ID LIMIT 0,100

SQL JOIN that uses OR in the ON statement

I’m running a SQL query on Google BigQuery and want to do this kind of SQL command:
SELECT ... FROM A JOIN B
ON A.col1=B.col1 AND (A.col2=B.col2 OR A.col3=B.col3)
This fails though with the error:
Error: ON clause must be AND of = comparisons of one field name from each table, with all field names prefixed with table name.
Is there a way to rewrite the SQL to get this kind of functionality?

Turns out this works, which is equivalent to a UNION ALL statement in Google BigQuery. Not sure how to do it if you just want a UNION, since DISTINCT is actually not supported in BigQuery. Luckily it's enough for me as is.
SELECT ... FROM
(SELECT ... FROM A JOIN B ON A.col1=B.col1 AND A.col2=B.col2),
(SELECT ... FROM A JOIN B ON A.col1=B.col1 AND A.col3=B.col3)

This should work:
SELECT ... FROM A CROSS JOIN B
WHERE A.col1=B.col1 AND (A.col2=B.col2 OR A.col3=B.col3)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Join on Hive on different keyspace tables on DSE - hive

It works for me for a simple test. select a.name, b.state from test7.test1 a join test8.test1 b on a.name = b.name; Maybe something wrong with the data or join condition.

Related

How to compute loop join in Dataiku

When using parameter for join clause in snowflake, getting wrong result. why?

Hive Index Performance

SQL join conditions order performance

SQL JOIN that uses OR in the ON statement

Categories

Resources