Is there sql WITH clause equivalent in hive? - sql

Failed to find the answer in the specs.
So, I wonder: Can I do something like that in hive?
insert into table my_table
with a as
(
select *
from ...
where ...
),
b as
(
select *
from ...
where ...
)
select
a.a,
a.b,
a.c,
b.a,
b.b,
b.c
from a join b on (a.a=b.a);

With is available in Hive as of version 0.13.0. Usage documented here.

Hadoop Hive WITH Clause Syntax and Examples
With the Help of Hive WITH clause you can reuse piece of query result in same query construct. You can also improve the Hadoop Hive query using WITH clause. You can simplify the query by moving complex, complicated repetitive code to the WITH clause and refer the logical table created in your SELECT statements.
Hive WITH clause example with the SELECT statement
WITH t1 as (SELECT 1),
t2 as (SELECT 2),
t3 as (SELECT 3)
SELECT * from t1
UNION ALL
SELECT * from t2
UNION ALL
SELECT * from t3;
Hive WITH Clause in INSERT Statements
You can use the WITH clause while inserting data to table. For example:
WITH t11 as (SELECT 10),
t12 as (SELECT 20),
t13 as (SELECT 3)
INSERT INTO t1
SELECT * from t11
UNION ALL
SELECT * from t12
UNION ALL
SELECT * from t13;

I guess you could always use subqueries:
insert into table my_table
select
a.a,
a.b,
a.c,
b.a,
b.b,
b.c
from
(
select *
from ...
where ...
) a
join
(
select *
from ...
where ...
) b
on a.a = b.a;

Related

Can i run more then one main select statments on the 2 with tables?

Hay All,
is it possible to run more than 1 select statement after using with?
first select statement works fine, as soon as i add another select statement i got a error.
with
a as (select a,b,c from Table1 with(readuncommitted)),
b as (select d,e,f from Table2 with(readuncommitted))
select * from a
select * from b
expected output:
Table 1
a
Table 2
b
Well the way CTEs will behave is that they will only be in scope for the first query, but not the second. You could perhaps do a union query here:
SELECT a, b, c, 'Table1' AS src FROM a
UNION ALL
SELECT d, e, f, 'Table2' FROM b;
Or, you could move the b CTE to before the second query:
WITH a AS (
SELECT a, b, c
FROM Table1
WITH(readuncommitted)
)
SELECT * FROM a;
WITH b AS (
SELECT d, e, f
FROM Table2
WITH(readuncommitted)
)
SELECT * FROM b;
hay DasD,
You can not use multiple select for cte, but you can use more than one CTE like this.
with
a as (select a,b,c from Table1 with(readuncommitted)),
b as (select d,e,f from Table2 with(readuncommitted))
select * from a,b
You have to explain to the database, what you wantfrom bith tables.
as both have the same structure you can use UNION to join them vertically
with
a as (select a,b,c from Table1 with(readuncommitted)),
b as (select d,e,f from Table2 with(readuncommitted))
select * from a
UNION
select * from b
From the docs:
"A CTE must be followed by a single SELECT, INSERT, UPDATE, or DELETE statement that references some or all the CTE columns."
Source

Big Query INSERT INTO with WITH Clause giving error

I am trying to insert data into Big Query Table.
My query is complex and involves with clause, it is throwing error for all combinations I can try. I have written similar query in Hive and that works like charm.
Any suggestion on how can I achieve this is higly appreciated:
bq query --use_legacy_sql=false \
'with mapping_table as (SELECT t1.a, t2.b, t2.c from table1 as t1 inner join table2 on t2 group by )
INSERT OVERWRITE TABLE my-bq-dev.myschema.mytable PARTITION(CREATE_DT)
SELECT A, B, C ...... from TABLEX LEFT OUTER JOIN TABLEY ON'
Note the error is not related to syntax as My above query without INSERT OVERWRITE is working fine.
INSERT OVERWRITE TABLE ... is not BigQuery SQL.
Could you take a look at below example to see how insert into works with WITH clause?
create temp table t as select 1 x;
insert into t
with data as (select 2 x)
select * from data;
select * from t;
it should be
INSERT OVERWRITE TABLE my-bq-dev.myschema.mytable PARTITION(CREATE_DT)
with mapping_table as (SELECT t1.a, t2.b, t2.c from table1 as t1 inner join table2 on t2 group by )
SELECT A, B, C ...... from TABLEX LEFT OUTER JOIN TABLEY ON

Impala - CREATE TABLE after a WITH clause

I have a query with several WITH clauses, then a CREATE TABLE :
WITH TABLE_1 AS (
SELECT * FROM SOMEWHERE_1
), TABLE_2 AS (
SELECT * FROM SOMEWHERE_2
(
CREATE TABLE TABLE_3 AS
(
SELECT TABLE_1.*, TABLE_2.*
FROM TABLE_1
INNER JOIN TABLE_2 ON TABLE_2.key = TABLE_1.key
)
)
However I have the following error :
Encountered: CREATE Expected: SELECT, VALUES, WITH CAUSED BY: Exception: Syntax error
So I tried to put the CREATE statement first :
CREATE TABLE_3 AS
(
WITH TABLE_1 AS (
SELECT * FROM SOMEWHERE_1
), TABLE_2 AS (
SELECT * FROM SOMEWHERE_2
(
SELECT TABLE_1.*, TABLE_2.*
FROM TABLE_1
INNER JOIN TABLE_2 ON TABLE_2.key = TABLE_1.key
)
)
But now I have the following error :
AnalysisException: Could not resolve table reference: 'TABLE_1'
Note that :
The above query WORKS without the "CREATE" statement
My present situation is more complex than this simpe example, and I would like to keep the WITH statements, for clarity.
Hmmm. I think this will work:
CREATE TABLE TABLE_3 AS
WITH TABLE_1 AS (
SELECT * FROM SOMEWHERE_1
),
TABLE_2 AS (
SELECT * FROM SOMEWHERE_2
)
SELECT TABLE_1.*, TABLE_2.*
FROM TABLE_1 INNER JOIN
TABLE_2
ON TABLE_2.key = TABLE_1.key;
Of course, you will have other problems, such as the key column being duplicated in the results -- and that should generate another error. In practice, you should select exactly the columns that you want.
Alternatively, you can also do...
WITH TABLE_1 AS (
SELECT * FROM SOMEWHERE_1
),
TABLE_2 AS (
SELECT * FROM SOMEWHERE_2
)
SELECT TABLE_1.*, TABLE_2.* INTO TABLE_3
FROM TABLE_1 INNER JOIN
TABLE_2
ON TABLE_2.key = TABLE_1.key
It is much advised to always have a DDL handy and run INSERT INTO TABLE SELECT * FROM CTE

sqlite combine select all from multiple columns

How to combine many select queries into 1 statement in SQLite?
SELECT * FROM Table1 WHERE condition1
SELECT * FROM Table2 WHERE condition2
SELECT * FROM Table3 WHERE condition3
...
Use UNION to combine the results of the three queries.
UNION will remove duplicate leaving on unique rows on the final result. If you want to keep the duplicate rows, use UNION ALL.
SELECT * FROM Table1 WHERE condition1
UNION
SELECT * FROM Table2 WHERE condition2
UNION
SELECT * FROM Table3 WHERE condition3
caveat: the number of columns (as well as the data type) must match on each select statement.
UPDATE 1
based on your comment below, You are looking for JOIN,
SELECT a.*, b.*, c.*
FROM table1 a
INNER JOIN table2 b
ON a.ColName = b.ColName
INNER JOIN table3 c
ON a.ColName = c.ColName
-- WHERE .. add conditions here ..
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
Your can also query also like
SELECT * FROM users where bday > '1970-05-20'
UNION ALL SELECT * FROM users where bday < '1970-01-20'

SQL: how to find unused primary key

I've got a table with > 1'000'000 entries; this table is referenced from about 130 other tables. My problem is that a lot of those 1-mio-entries is old and unused.
What's the fastet way to find the entries not referenced by any of the other tables? I don't like to do a
select * from (
select * from table-a TA
minus
select * from table-a TA where TA.id in (
select "ID" from (
(select distinct FK-ID "ID" from table-b)
union all
(select distinct FK-ID "ID" from table-c)
...
Is there an easier, more general way?
Thank you all!
You could do this:
select * from table_a a
where not exists (select * from table_b where fk_id = a.id)
and not exists (select * from table_c where fk_id = a.id)
and not exists (select * from table_d where fk_id = a.id)
...
try :
select a.*
from table_a a
left join table_b b on a.id=b.fk_id
left join table_c c on a.id=c.fk_id
left join table_d d on a.id=d.fk_id
left join table_e e on a.id=e.fk_id
......
where b.fk_id is null
and c.fk_id is null
and d.fk_id is null
and e.fk_id is null
.....
you might also try:
select a.*
from table_a a
left join
(select b.fk_id from table_b b union
select c.fk_id from table_c c union
...) table_union on a.id=table_union.fk_id
where table_union.fk_id is null
This is more SQL oriented and it will not take forever like the above solution.
Not sure about efficiency but:
select * from table_a
where id not in (
select id from table_b
union
select id from table_c )
If your concern is allowing the database to continue normal operations while you do the house keeping you could split it into multiple stages:
insert into tblIds
select id from table_a
union
select id from table_b
as may times as you need and then:
delete * from table_a where id not in ( select id from tableIds )
Of course sometimes doing a lot of processing takes a lot of time.
I like #Patrick's answer above, but I would like to add to that.
Rather than building the 130-step query by hand, you could build these INSERT statements by scanning sysObjects, finding key relations and generating your INSERT statements.
That would not only save you time, but should also help you to know for sure whether you've covered all the tables - maybe there are 131, or only 129.
I'm inclined to Marcelo Cantos' answer (and have upvoted it), but here is an alternative in an attempt to circumvent the problem of not having indexes on the foreign keys...
WITH
ids_a AS
(
SELECT id FROM myTable
)
,
ids_b AS
(
SELECT id FROM ids_a WHERE NOT EXISTS (SELECT * FROM table_a WHERE fk_id = ids_a.id)
)
,
ids_c AS
(
SELECT id FROM ids_b WHERE NOT EXISTS (SELECT * FROM table_b WHERE fk_id = ids_b.id)
)
,
...
,
ids_z AS
(
SELECT id FROM ids_y WHERE NOT EXISTS (SELECT * FROM table_y WHERE fk_id = ids_y.id)
)
SELECT * FROM ids_z
All I'm trying to do is to suggest an order to Oracle to minimise its efforts. Unfortunately Oracle will compile this to comething very similar to Marcelo Cantos' answer and it may not performa any differently.