I am trying to insert data into Big Query Table.
My query is complex and involves with clause, it is throwing error for all combinations I can try. I have written similar query in Hive and that works like charm.
Any suggestion on how can I achieve this is higly appreciated:
bq query --use_legacy_sql=false \
'with mapping_table as (SELECT t1.a, t2.b, t2.c from table1 as t1 inner join table2 on t2 group by )
INSERT OVERWRITE TABLE my-bq-dev.myschema.mytable PARTITION(CREATE_DT)
SELECT A, B, C ...... from TABLEX LEFT OUTER JOIN TABLEY ON'
Note the error is not related to syntax as My above query without INSERT OVERWRITE is working fine.
INSERT OVERWRITE TABLE ... is not BigQuery SQL.
Could you take a look at below example to see how insert into works with WITH clause?
create temp table t as select 1 x;
insert into t
with data as (select 2 x)
select * from data;
select * from t;
it should be
INSERT OVERWRITE TABLE my-bq-dev.myschema.mytable PARTITION(CREATE_DT)
with mapping_table as (SELECT t1.a, t2.b, t2.c from table1 as t1 inner join table2 on t2 group by )
SELECT A, B, C ...... from TABLEX LEFT OUTER JOIN TABLEY ON
Related
I have a table with 100 strings that I would like to add to a where column in (value, value, etc) Something like select cookies from table where field in (select * from table)
I don't think Hive supports subqueries in in clauses, but you can accomplish the same with an inner join:
select table1.cookies
from table1 join table2 on table1.field = table2.field
Hive does have a support of SUB-QUERIES from version-0.13.
So you can use this version.
Or you can try this query:
select * from table1 t1 JOIN (select 100_string_column as col2 from table2 where (whatever your condition is)) t2 ON t1.<matching_column> = t2.col2
Hope this helps...!!!
Failed to find the answer in the specs.
So, I wonder: Can I do something like that in hive?
insert into table my_table
with a as
(
select *
from ...
where ...
),
b as
(
select *
from ...
where ...
)
select
a.a,
a.b,
a.c,
b.a,
b.b,
b.c
from a join b on (a.a=b.a);
With is available in Hive as of version 0.13.0. Usage documented here.
Hadoop Hive WITH Clause Syntax and Examples
With the Help of Hive WITH clause you can reuse piece of query result in same query construct. You can also improve the Hadoop Hive query using WITH clause. You can simplify the query by moving complex, complicated repetitive code to the WITH clause and refer the logical table created in your SELECT statements.
Hive WITH clause example with the SELECT statement
WITH t1 as (SELECT 1),
t2 as (SELECT 2),
t3 as (SELECT 3)
SELECT * from t1
UNION ALL
SELECT * from t2
UNION ALL
SELECT * from t3;
Hive WITH Clause in INSERT Statements
You can use the WITH clause while inserting data to table. For example:
WITH t11 as (SELECT 10),
t12 as (SELECT 20),
t13 as (SELECT 3)
INSERT INTO t1
SELECT * from t11
UNION ALL
SELECT * from t12
UNION ALL
SELECT * from t13;
I guess you could always use subqueries:
insert into table my_table
select
a.a,
a.b,
a.c,
b.a,
b.b,
b.c
from
(
select *
from ...
where ...
) a
join
(
select *
from ...
where ...
) b
on a.a = b.a;
I want to insert data into a table from a select. This works fine so far...
INSERT INTO table_2
SELECT t.id, 1
FROM table_1 t
WHERE t.title LIKE '%search%';
But when I run this a second time, the statement raises an exception, because some of the rows already exist.
What can I do to get around this?
Thanks for your help,
Urkman
You can insert rows where they don't already exist, by adding that as a clause.
insert into table_2
select t.id, 1
from table_1 t
where t.title like '%search%'
and not exists (select t2.id from table_2 t2 where t2.id = t.id);
I have two tables with binding primary key in database and I desire to find a disjoint set between them. For example,
Table1 has columns (ID, Name) and sample data: (1 ,John), (2, Peter), (3, Mary)
Table2 has columns (ID, Address) and sample data: (1, address2), (2, address2)
So how do I create a SQL query so I can fetch the row with ID from table1 that is not in table2. In this case, (3, Mary) should be returned?
PS: The ID is the primary key for those two tables.
Try this
SELECT ID, Name
FROM Table1
WHERE ID NOT IN (SELECT ID FROM Table2)
Use LEFT JOIN
SELECT a.*
FROM table1 a
LEFT JOIN table2 b
on a.ID = b.ID
WHERE b.id IS NULL
There are basically 3 approaches to that: not exists, not in and left join / is null.
LEFT JOIN with IS NULL
SELECT l.*
FROM t_left l
LEFT JOIN
t_right r
ON r.value = l.value
WHERE r.value IS NULL
NOT IN
SELECT l.*
FROM t_left l
WHERE l.value NOT IN
(
SELECT value
FROM t_right r
)
NOT EXISTS
SELECT l.*
FROM t_left l
WHERE NOT EXISTS
(
SELECT NULL
FROM t_right r
WHERE r.value = l.value
)
Which one is better? The answer to this question might be better to be broken down to major specific RDBMS vendors. Generally speaking, one should avoid using select ... where ... in (select...) when the magnitude of number of records in the sub-query is unknown. Some vendors might limit the size. Oracle, for example, has a limit of 1,000. Best thing to do is to try all three and show the execution plan.
Specifically form PostgreSQL, execution plan of NOT EXISTS and LEFT JOIN / IS NULL are the same. I personally prefer the NOT EXISTS option because it shows better the intent. After all the semantic is that you want to find records in A that its pk do not exist in B.
Old but still gold, specific to PostgreSQL though: https://explainextended.com/2009/09/16/not-in-vs-not-exists-vs-left-join-is-null-postgresql/
Fast Alternative
I ran some tests (on postgres 9.5) using two tables with ~2M rows each. This query below performed at least 5* better than the other queries proposed:
-- Count
SELECT count(*) FROM (
(SELECT id FROM table1) EXCEPT (SELECT id FROM table2)
) t1_not_in_t2;
-- Get full row
SELECT table1.* FROM (
(SELECT id FROM table1) EXCEPT (SELECT id FROM table2)
) t1_not_in_t2 JOIN table1 ON t1_not_in_t2.id=table1.id;
Keeping in mind the points made in #John Woo's comment/link above, this is how I typically would handle it:
SELECT t1.ID, t1.Name
FROM Table1 t1
WHERE NOT EXISTS (
SELECT TOP 1 NULL
FROM Table2 t2
WHERE t1.ID = t2.ID
)
SELECT COUNT(ID) FROM tblA a
WHERE a.ID NOT IN (SELECT b.ID FROM tblB b) --For count
SELECT ID FROM tblA a
WHERE a.ID NOT IN (SELECT b.ID FROM tblB b) --For results
I want to write the following (pseudo) SQL statement in MS Access:
Select C
from MyTable
where (A, B) IN (select distinct A,B from MyTable);
I tried that but got the complaint "You have written a subquery that can return more than one field without using the EXISTS reserved word in the main query's FROM clause."
I appreciate any feedback.
You can use an inner join as a filter:
select c
from MyTable t1
inner join
(
select distinct
a
, b
from OtherTable
) t2
on t1.a = t2.a
and t1.b = t2.b
(I'm assuming you have two tables because the query doesn't make much sense for one table. Obviously, all combinations of A and B that are in Table1 will "also" be in Table1.)