when loading data from multiple tables into a single table in hive? - hive

I have a hive empty table of a particular structure.
I have 10 other tables of the same structure and datatype and schema but different table names.
I loaded data of one table into the empty table using "insert into" and say i have 10 mil records.
Now I am loading the second table into this table using "insert into".
When I do a count(*), it is not showing me the entire count of records.
It is displaying only the record count of the last loaded table.
Why is that? I want all the records to be loaded.
Any help would be great!

do this,
insert into table table_name
Select * from
(
SELECT b.var1 FROM tmp_table1 b
UNION ALL
SELECT c.var1 FROM tmp_table2 c
UNION ALL
SELECT d.var1 FROM tmp_table3 d
UNION ALL
SELECT e.var1 FROM tmp_table4 e
UNION ALL
SELECT f.var1 FROM tmp_table5 f
UNION ALL
SELECT g.var1 FROM tmp_table6 g
UNION ALL
SELECT h.var1 FROM tmp_table7 h
) CombinedTable

You are having same schema for all the table so it's better to copy the files to new empty table. this is better a solution if you don't have any partitions in hive table.

Related

BigQuery using Dynamic sql as the input source of a query

How can I use a dynamic query as an input source for a larger query?
There is a query I'm getting the union of values in different datasets/tables scattered around and the list is growing so I'm thinking of using the dynamic query than to write queries for each tables like this
SET QUERY = "";
SET tables = ["table1", "table2"...];
SET tables_size = ARRAY_LENGTH(tables);
WHILE i < tables_size DO
IF (i = tables_size -1) THEN
BEGIN
SET query = CONCAT(query, " SELECT id, name FROM ", tables[OFFSET(i)]);
BREAK;
END;
ELSE
SET query = CONCAT(query, " SELECT id, name FROM ", tables[OFFSET(i)], ' UNION ALL ');
END IF;
SET i = i + 1;
END WHILE;
EXECUTE IMMEDIATE query;
My goal is to use the output of the executed query as a FROM clause for a larger query.
It will be something like
Select A, B, C, D ... From *EXECUTE IMMEDIATE query* LEFT JOIN ... ON..
Is there a way to inject an output of a dynamic query as a table for another query?
I don't see TABLE as a variable type for bigquery so that was not my option.
I'm getting a bit tired of copy pasting table names to the exact query every time a new table is introduced to this logic.
SELECT id, name FROM table1 UNION ALL
SELECT id, name FROM table1 UNION ALL
SELECT id, name FROM table3...
If there is a simple way to do this? or maybe a reason to not use dynamic queries for performance reasons?
Hope one of these are helpful:
1. Wildcard tables
If tables you want to union have a common prefix, you can consider to use a wildcard table like below. I think this is more concise form rather than union-all:
-- Sample Tables
CREATE TABLE IF NOT EXISTS testset.table1 AS SELECT 1 AS id, 'aaa' AS name;
CREATE TABLE IF NOT EXISTS testset.table2 AS SELECT 2 AS id, 'bbb' AS name;
CREATE TABLE IF NOT EXISTS testset.table3 AS SELECT 3 AS id, 'ccc' AS name;
--- Wildcard tables
SELECT * FROM `testset.table*` WHERE _TABLE_SUFFIX IN ('1', '2', '3');
2. Dynamic SQL & Temp Table
You can't inject a dynamic SQL directly into another query but you can use a temp table to emulate it.
2.1 Dynamic SQL
More concise dynamic query to union all tables:
DECLARE tables DEFAULT ["testset.table1", "testset.table2", "testset.table3"];
SELECT ARRAY_TO_STRING(ARRAY_AGG(FORMAT('SELECT id, name FROM %s', t)), ' UNION ALL\n')
FROM UNNEST(tables) t;
2.2 Using a temp table
I thinks you can modify your larger query to use a dynamically generated temp table.
DECLARE tables DEFAULT ["testset.table1", "testset.table2", "testset.table3"];
CREATE TABLE IF NOT EXISTS testset.table1 AS SELECT 1 AS id, 'aaa' AS name;
CREATE TABLE IF NOT EXISTS testset.table2 AS SELECT 2 AS id, 'bbb' AS name;
CREATE TABLE IF NOT EXISTS testset.table3 AS SELECT 3 AS id, 'ccc' AS name;
EXECUTE IMMEDIATE (
SELECT "CREATE TEMP TABLE IF NOT EXISTS union_tables AS \n"
|| ARRAY_TO_STRING(ARRAY_AGG(FORMAT('SELECT id, name FROM %s', t)), ' UNION ALL\n') FROM UNNEST(tables) t
);
-- your larger query using a temp table
SELECT * FROM union_tables;
output:

SELECT VALUES in Teradata

I know that it's possible in other SQL flavors (T-SQL) to "select" provided data without a table. Like:
SELECT *
FROM (VALUES (1,2), (3,4)) tbl
How can I do this using Teradata?
Teradata has strange syntax for this:
select t.*
from (select * from (select 1 as a, 2 as b) x
union all
select * from (select 3 as a, 4 as b) x
) t;
I don't have access to a TD system to test, but you might be able to remove one of the nested SELECTs from the answer above:
select x.*
from (
select 1 as a, 2 as b
union all
select 3 as a, 4 as b
) x
If you need to generate some random rows, you can always do a SELECT from a system table, like sys_calendar.calendar:
SELECT 1, 2
FROM sys_calendar.calendar
SAMPLE 10;
Updated example:
SELECT TOP 1000 -- Limit to 1000 rows (you can use SAMPLE too)
ROW_NUMBER() OVER() MyNum, -- Sequential numbering
MyNum MOD 7, -- Modulo operator
RANDOM(1,1000), -- Random number between 1,1000
HASHROW(MyNum) -- Rowhash value of given column(s)
FROM sys_calendar.calendar; -- Use as table to source rows
A couple notes:
make sure you pick a system table that will always be present and have rows
if you need more rows than are available in the source table, do a UNION to get more rows
you can always easily create a one-column table and populate it to whatever number of rows you want by INSERT/SELECT into it:
CREATE DummyTable (c1 INT); -- Create table
INSERT INTO DummyTable(1); -- Seed table
INSERT INTO DummyTable SELECT * FROM DummyTable; -- Run this to duplicate rows as many times are you want
Then use this table to create whatever resultset you want, similar to the query above with sys_calendar.calendar.
I don't have a TD system to test so you might get syntax errors...but that should give you a basic idea.
I am a bit late to this thread, but recently got the same error.
I solved this by simply using
select distinct 1 as a, 2 as b from DBC.tables
union all
select distinct 3 as a, 4 as b from DBC.tables
Here, DBC.tables is a DB backend table with a few rows only. So, the query runs fast as well

SELECT * FROM (SELECT)

There is a table "MAIN_TABLE" with columns "Table_Unique_Code" and "Table_name".
Also there are several tables with required data.
The task is to create SQL-query with parameter ("Table_Unique_Code"), which will select all data from the table determined by the "Table_Unique_code".
Something like
SELECT * FROM (*determine the name of the table by Table_Unique_code here*);
I tried
SELECT * FROM (SELECT table_name FROM MAIN_TABLE WHERE Table_Unique_Code=?)
but it doesn't work.
I work with OracleDB.

(Teradata Version)- get all records plus all corresponding records in another table

Can following Query be optimised for Teradata?
We need all records from small table A, plus all corresponding records from large table B, that match on a nonunique key
Or, in other words: everything except all from B that has no match in A.
Maybe something with a JOIN? Or a Subselect that is a non-correlated Query, does that also apply to Teradata?
SELECT a.nonunique
, a.colX
FROM small_tab a
UNION ALL
SELECT b.nonunique
, b.colY
FROM large_tab b
WHERE EXISTS (
SELECT 1
FROM small_tab a
WHERE a.nonuniqe = b.nonunique
);
Thanks for the help!
=========UPDATE====
based on quanos answer in this MySQL question, would following statement with a noncorrelated subquery be faster also in Teradata?
SELECT a.nonunique
, a.colX
FROM small_tab a
UNION ALL
SELECT b.nonunique
, b.colY
FROM large_tab b
WHERE b.nonunique IN
(
SELECT DISTINCT nonunique
FROM small_tab
GROUP BY nonunique
)
I cannot test in Teradata currently, only have an Oracle instance at home..
I'm not sure whether it is a typo, but you have a redundant select query after WHERE clause. Also, you will have to use the same column name in SELECT query that is being used in WHERE Claue.
Below query works fine in Teradata.
SELECT a.nonunique, a.colX
FROM small_tab a
UNION ALL
SELECT b.nonunique, b.colY
FROM large_tab b
WHERE b.id IN(
SELECT **id**
FROM small_tab)
Hope it helps. if any query on above query, please let me know.

Join on all fields without listing them?

Goal is to retrieve all exact records (each field is the same) from table_a that exist in table_b; however, there are many fields (lets say 100), which I don't want to type/list out.
Is there a way to compare tables based on records? Or have it auto-recognize and join-on fields when not specified?
SELECT * FROM table_a
WHERE EXISTS (
select * from table_b
-- where table_a.field1 = table_b.field1
-- and ...
-- and table_a.field100 = table_b.field100
);
try:
select * from A
intersect
select * from B
see: http://www.postgresql.org/docs/9.1/static/queries-union.html
modified as suggested by user2989408