Impala - CREATE TABLE after a WITH clause - sql

I have a query with several WITH clauses, then a CREATE TABLE :
WITH TABLE_1 AS (
SELECT * FROM SOMEWHERE_1
), TABLE_2 AS (
SELECT * FROM SOMEWHERE_2
(
CREATE TABLE TABLE_3 AS
(
SELECT TABLE_1.*, TABLE_2.*
FROM TABLE_1
INNER JOIN TABLE_2 ON TABLE_2.key = TABLE_1.key
)
)
However I have the following error :
Encountered: CREATE Expected: SELECT, VALUES, WITH CAUSED BY: Exception: Syntax error
So I tried to put the CREATE statement first :
CREATE TABLE_3 AS
(
WITH TABLE_1 AS (
SELECT * FROM SOMEWHERE_1
), TABLE_2 AS (
SELECT * FROM SOMEWHERE_2
(
SELECT TABLE_1.*, TABLE_2.*
FROM TABLE_1
INNER JOIN TABLE_2 ON TABLE_2.key = TABLE_1.key
)
)
But now I have the following error :
AnalysisException: Could not resolve table reference: 'TABLE_1'
Note that :
The above query WORKS without the "CREATE" statement
My present situation is more complex than this simpe example, and I would like to keep the WITH statements, for clarity.

Hmmm. I think this will work:
CREATE TABLE TABLE_3 AS
WITH TABLE_1 AS (
SELECT * FROM SOMEWHERE_1
),
TABLE_2 AS (
SELECT * FROM SOMEWHERE_2
)
SELECT TABLE_1.*, TABLE_2.*
FROM TABLE_1 INNER JOIN
TABLE_2
ON TABLE_2.key = TABLE_1.key;
Of course, you will have other problems, such as the key column being duplicated in the results -- and that should generate another error. In practice, you should select exactly the columns that you want.

Alternatively, you can also do...
WITH TABLE_1 AS (
SELECT * FROM SOMEWHERE_1
),
TABLE_2 AS (
SELECT * FROM SOMEWHERE_2
)
SELECT TABLE_1.*, TABLE_2.* INTO TABLE_3
FROM TABLE_1 INNER JOIN
TABLE_2
ON TABLE_2.key = TABLE_1.key
It is much advised to always have a DDL handy and run INSERT INTO TABLE SELECT * FROM CTE

Related

How to join select with itself in postgresql?

How to join select with itself in postgresql?
SELECT *
FROM (
SELECT src, dst FROM records
) as t1
JOIN t1 t2
USING(src)
UPDATE:
my table doesn't exists already and I create a table with "SELECT" and I want join this selected table with itself.
Use a Common Table Expressiom:
with t1 as
(
SELECT src, dst FROM records
)
SELECT *
FROM t1 JOIN t1 t2
USING(src)
You should fix the question. But one obvious problem is that you cannot re-use a table alias to define another table in the same from clause where it is defined. Hence, I think you want:
SELECT r1.src, r1.dst, r2.src, r2.dst
FROM records r1 JOIN
records r2
USING (src);

Is there sql WITH clause equivalent in hive?

Failed to find the answer in the specs.
So, I wonder: Can I do something like that in hive?
insert into table my_table
with a as
(
select *
from ...
where ...
),
b as
(
select *
from ...
where ...
)
select
a.a,
a.b,
a.c,
b.a,
b.b,
b.c
from a join b on (a.a=b.a);
With is available in Hive as of version 0.13.0. Usage documented here.
Hadoop Hive WITH Clause Syntax and Examples
With the Help of Hive WITH clause you can reuse piece of query result in same query construct. You can also improve the Hadoop Hive query using WITH clause. You can simplify the query by moving complex, complicated repetitive code to the WITH clause and refer the logical table created in your SELECT statements.
Hive WITH clause example with the SELECT statement
WITH t1 as (SELECT 1),
t2 as (SELECT 2),
t3 as (SELECT 3)
SELECT * from t1
UNION ALL
SELECT * from t2
UNION ALL
SELECT * from t3;
Hive WITH Clause in INSERT Statements
You can use the WITH clause while inserting data to table. For example:
WITH t11 as (SELECT 10),
t12 as (SELECT 20),
t13 as (SELECT 3)
INSERT INTO t1
SELECT * from t11
UNION ALL
SELECT * from t12
UNION ALL
SELECT * from t13;
I guess you could always use subqueries:
insert into table my_table
select
a.a,
a.b,
a.c,
b.a,
b.b,
b.c
from
(
select *
from ...
where ...
) a
join
(
select *
from ...
where ...
) b
on a.a = b.a;

SQL: how to find unused primary key

I've got a table with > 1'000'000 entries; this table is referenced from about 130 other tables. My problem is that a lot of those 1-mio-entries is old and unused.
What's the fastet way to find the entries not referenced by any of the other tables? I don't like to do a
select * from (
select * from table-a TA
minus
select * from table-a TA where TA.id in (
select "ID" from (
(select distinct FK-ID "ID" from table-b)
union all
(select distinct FK-ID "ID" from table-c)
...
Is there an easier, more general way?
Thank you all!
You could do this:
select * from table_a a
where not exists (select * from table_b where fk_id = a.id)
and not exists (select * from table_c where fk_id = a.id)
and not exists (select * from table_d where fk_id = a.id)
...
try :
select a.*
from table_a a
left join table_b b on a.id=b.fk_id
left join table_c c on a.id=c.fk_id
left join table_d d on a.id=d.fk_id
left join table_e e on a.id=e.fk_id
......
where b.fk_id is null
and c.fk_id is null
and d.fk_id is null
and e.fk_id is null
.....
you might also try:
select a.*
from table_a a
left join
(select b.fk_id from table_b b union
select c.fk_id from table_c c union
...) table_union on a.id=table_union.fk_id
where table_union.fk_id is null
This is more SQL oriented and it will not take forever like the above solution.
Not sure about efficiency but:
select * from table_a
where id not in (
select id from table_b
union
select id from table_c )
If your concern is allowing the database to continue normal operations while you do the house keeping you could split it into multiple stages:
insert into tblIds
select id from table_a
union
select id from table_b
as may times as you need and then:
delete * from table_a where id not in ( select id from tableIds )
Of course sometimes doing a lot of processing takes a lot of time.
I like #Patrick's answer above, but I would like to add to that.
Rather than building the 130-step query by hand, you could build these INSERT statements by scanning sysObjects, finding key relations and generating your INSERT statements.
That would not only save you time, but should also help you to know for sure whether you've covered all the tables - maybe there are 131, or only 129.
I'm inclined to Marcelo Cantos' answer (and have upvoted it), but here is an alternative in an attempt to circumvent the problem of not having indexes on the foreign keys...
WITH
ids_a AS
(
SELECT id FROM myTable
)
,
ids_b AS
(
SELECT id FROM ids_a WHERE NOT EXISTS (SELECT * FROM table_a WHERE fk_id = ids_a.id)
)
,
ids_c AS
(
SELECT id FROM ids_b WHERE NOT EXISTS (SELECT * FROM table_b WHERE fk_id = ids_b.id)
)
,
...
,
ids_z AS
(
SELECT id FROM ids_y WHERE NOT EXISTS (SELECT * FROM table_y WHERE fk_id = ids_y.id)
)
SELECT * FROM ids_z
All I'm trying to do is to suggest an order to Oracle to minimise its efforts. Unfortunately Oracle will compile this to comething very similar to Marcelo Cantos' answer and it may not performa any differently.

How to do a Select in a Select

I have a table containing a unique ID field. Another field (REF) contains a reference to another dataset's ID field.
Now I have to select all datasets where REF points to a dataset that doesn't exist.
SELECT * FROM table WHERE ("no dataset with ID=REF exists")
How can I do this?
3 ways
SELECT * FROM YourTable y WHERE NOT EXISTS
(SELECT * FROM OtherTable o WHERE y.Ref = o.Ref)
SELECT * FROM YourTable WHERE Ref NOT IN
(SELECT Ref FROM OtherTable WHERE Ref IS NOT NULL)
SELECT y.* FROM YourTable y
LEFT OUTER JOIN OtherTable o ON y.Ref = o.Ref
WHERE o.Ref IS NULL
See also Five ways to return all rows from one table which are not in another table
Try this:
SELECT * FROM TABLE WHERE NOT EXISTS
(SELECT * FROM OtherTable WHERE TABLE.Ref = OtherTable.ID)
I think this should work
SELECT * FROM table WHERE id NOT IN (SELECT ref_id FROM ref_table)
or with JOIN
SELECT table.*
FROM table LEFT JOIN ref_table ON table.id = ref_table.ref_id
WHERE ref_table.ref_id IS NULL
SELECT
table1.*
FROM
table1
LEFT JOIN table2 ON table1.id = table2.ref
WHERE
table2.ref IS NULL
You can do a subquery like:
select * from table where somefield not in (select otherfield from sometable where ID=REF)
SELECT *
FROM table
WHERE ((SELECT COUNT(*) FROM table2 WHERE table2.id = table.ref) = 0)
Something like that :
SELECT * FROM table WHERE ID NOT IN(SELECT REF FROM Table2 )
Yes you can use
select * from x where not exist ( select * from y )

Self-join of a subquery

I was wondering, is it possible to join the result of a query with itself, using PostgreSQL?
You can do so with WITH:
WITH subquery AS(
SELECT * FROM TheTable
)
SELECT *
FROM subquery q1
JOIN subquery q2 on ...
Or by creating a VIEW that contains the query, and joining on that:
SELECT *
FROM TheView v1
JOIN TheView v2 on ...
Or the brute force approach: type the subquery twice:
SELECT *
FROM (
SELECT * FROM TheTable
) sub1
LEFT JOIN (
SELECT * FROM TheTable
) sub2 ON ...
Do you mean, the result of a query on a table, to that same table. If so, then Yes, it's possible... e.g.
--Bit of a contrived example but...
SELECT *
FROM Table
INNER JOIN
(
SELECT
UserID, Max(Login) as LastLogin
FROM
Table
WHERE
UserGroup = 'SomeGroup'
GROUP BY
UserID
) foo
ON Table.UserID = Foo.UserID AND Table.Login = Foo.LastLogin
Yes, just alias the queries:
SELECT *
FROM (
SELECT *
FROM table
) t1
JOIN (
SELECT *
FROM table
) t2
ON t1.column < t2.other_column