Dynamic hive view creation

Dynamic hive view creation - hive

I have 10 hive tables, few columns might be similar in few tables, I need to union them into one table. I was writing Union all statements.
Now, these columns in these 10 tables might change . how to create the Union all query dynamically based on the tables.
CREATE VIEW comb_view AS
SELECT col11, col12, col13,.....
From
(SELECT col11,col12,NULL col13 from tab1
UNION ALL
SELECT col11,NULL col12, col13 from tab2
.
.
.
.
) c
ensuring the position and data type.

Related

Union all with loop in oracle select statement

I'm working in an oracle DB that has 20 tables with the same structure but divided by years. So, it started in ft_expenses_2002 and goes until ft_expenses_2021 (the year when I'm writing this). I need to put all these tables' columns together before doing some maths and my first approach was to use UNIAN ALL statements. It worked but I'm wondering if it's possible to do something more elegant, like using a FOR LOOP. It would not only make the query far more elegant but would avoid future maintenances because every year a new table with the "_new_year" suffix will be created.

Create a view for all the tables:
CREATE VIEW ft_expenses AS
SELECT * FROM ft_expenses_2002
UNION ALL SELECT * FROM ft_expenses_2003
UNION ALL SELECT * FROM ft_expenses_2004
UNION ALL SELECT * FROM ft_expenses_2005
UNION ALL SELECT * FROM ft_expenses_2006
UNION ALL SELECT * FROM ft_expenses_2007
UNION ALL SELECT * FROM ft_expenses_2008
UNION ALL SELECT * FROM ft_expenses_2009
UNION ALL SELECT * FROM ft_expenses_2010
UNION ALL SELECT * FROM ft_expenses_2011
-- ...
UNION ALL SELECT * FROM ft_expenses_2021
Then just do your query using the view.
Next year when you add a 2022 table then recreate the view with the extra table added to the view.
Alternatively, create a table from the originals so that everything is in one table that you can query directly:
CREATE TABLE ft_expenses (year, col1, col2, col3) AS
SELECT 2002, col1, col2, col3 FROM ft_expenses_2002
UNION ALL SELECT 2003, col1, col2, col3 FROM ft_expenses_2003
UNION ALL SELECT 2004, col1, col2, col3 FROM ft_expenses_2004
UNION ALL SELECT 2005, col1, col2, col3 FROM ft_expenses_2005
UNION ALL SELECT 2006, col1, col2, col3 FROM ft_expenses_2006
UNION ALL SELECT 2007, col1, col2, col3 FROM ft_expenses_2007
UNION ALL SELECT 2008, col1, col2, col3 FROM ft_expenses_2008
-- ...
UNION ALL SELECT 2021, col1, col2, col3 FROM ft_expenses_2021
Then drop the individual tables (make sure you backed everything up first) and create views if you still need to access then by the original names:
CREATE VIEW ft_expenses_2002 (col1, col2, col3) AS
SELECT col1, col2, col3 FROM ft_expenses WHERE year = 2002;
CREATE VIEW ft_expenses_2003 (col1, col2, col3) AS
SELECT col1, col2, col3 FROM ft_expenses WHERE year = 2003;
-- ...
CREATE VIEW ft_expenses_2021 (col1, col2, col3) AS
SELECT col1, col2, col3 FROM ft_expenses WHERE year = 2021;

I've found here a really good and short way to solve my problem, it was:
SELECT
GROUP_CONCAT(
CONCAT(
'SELECT * FROM `',
TABLE_NAME,
'`') SEPARATOR ' UNION ALL ')
FROM
`INFORMATION_SCHEMA`.`TABLES`
WHERE
`TABLE_NAME` LIKE 'ft_expenses_%'
INTO #sql;
PREPARE stmt FROM #sql;
EXECUTE stmt;

Depends on your definition of "elegant".
It's certainly possible to use dynamic SQL to look for every table whose name fits a particular pattern. My guess is that the simplest thing to do would be to create a view that does the union all, write your processing code against that view, and then have a bit of dynamic SQL that can rebuild the view based on what tables exist. You could then run that procedure when new tables are created (if you can hook in to that process) or based on a DDL trigger or just schedule it to run in the early morning hours every year/ month/ day depending on how likely it is that a new table is going to suddenly show up.
create or replace procedure build_view
as
l_sql_stmt varchar2(32000);
type typ_table_names is table of varchar2(256);
l_tables typ_table_names;
begin
l_sql_stmt := 'create or replace view my_view as ';
select table_name
bulk collect into l_tables
from user_tables
where table_name like 'ft_expenses%';
for i in 1 .. l_tables.count
loop
l_sql_stmt := l_sql_stmt ||
' select * from ' || l_tables(i);
if( i != 1_tables.count )
then
l_sql_stmt := l_sql_stmt ||
' union all ';
end if;
end loop;
dbms_output.put_line( l_sql_stmt );
execute immediate l_sql_stmt;
end;

You physical setup of the tables is state of the art of Oracle 6 (late 1980's).
You have two possibilities to upgrade. Either the DIY union all view as proposed in other answers, or simple follow the development the Oracle implemented since the above mentioned version.
Note, I'd not recommend to follow the advise to put all data in one table - why? You'd see no difference in the queries covering all data, but you'll spot a significant decrease in performance in queries on one or few years.
What was done in Oracle on this topic since the release 6?
In Oracle 7 (1990's) a partitioned views was introduced - which is similar idea to the proposal of the UNION ALL view.
Starting with Oracle 8 partitioning concept was introduced any improved in each following release.
So if you want to leverage the current Oracle features the partitioning should be applied:
managing the data in acceptable size
providing a flexibility in the access
Here an example how could you migrate in Oracle 19c
I assume that you table contains a column trans_dt containing a DATE with a year same as the tables year.
Start with the oldest table and change it to a partitioned table
alter table ft_expenses_2002 modify
partition by range(trans_dt) interval(NUMTOYMINTERVAL(1,'YEAR'))
( partition p_init values less than (DATE'2002-01-01')
) online
Rename the table eliminating the year
rename ft_expenses_2002 to ft_expenses;
Now the table in partitioned and contains two partitions, the initial one and the partition for the 2002 year.
select PARTITION_NAME from user_tab_partitions where table_name = 'FT_EXPENSES' order by PARTITION_POSITION;
PARTITION_NAME
----------------
P_INIT
SYS_P2340
For each following years perform the below steps
Add a new partition
alter table ft_expenses
exchange partition for( DATE'2003-01-01' ) with table ft_expenses_2003
Note that you use the for syntax to address the partition, so there is no need to know the partition name.
Additionally the recent version can create the partition in the exchange statement, so there is no need to lock table anymore.
Final Notes
You may include indexes in the reorganization.
As always backup all tables before start.
Test carefully to check possible limitations.

UNION two SELECT queries but result set is smaller than one of them

In a SQL Server statement there is
SELECT id, book, acnt, prod, category from Table1 <where clause...>
UNION
SELECT id, book, acnt, prod, category from Table2 <where clause...>
The first query returned 131,972 lines of data; the 2nd one, 147,692 lines. I didn't notice there is any commonly shared line of data from these two tables, so I expect the result set after UNION should be the same as the sum of 131972 + 147692 = 279,384.
However the result set after UNION is 133,857. Even though they might have overlapped lines that I accidently missed, the result should be at least the same as the larger result set of those two. I can't figure how the number 133,857 came from.
Is my understanding about SQL UNION correct? I use SQL server in this case.

To expand comment given under the question, which I think states what you already know:
UNION takes care of duplicates also within one table as well.
Just take a look at a example:
SETUP:
create table tbl1 (col1 int, col2 int);
insert into tbl1 values
(1,2),
(3,4);
create table tbl2 (col1 int, col2 int);
insert into tbl1 values
(1,2),
(1,2),
(1,2),
(3,4);
Query
select * from tbl1
union
select * from tbl2;
will produce output
col1 | col2
-----|------
1 | 2
3 | 4
DB fiddle

Avoid duplicates on import of updated excel-sheets. Unique-Index can only hold 10 fields max

I am facing the following situation:
I import an Excel-Sheet, then some columns are modified (e.g. "comments")
After a while, I would receive an updated Excel-Sheet containing the records from the old Excel-sheet as well as new ones.
I do not want to import the records that already exist in the database.
Step-by-Step:
Initial Excel-sheet
col1 col2 comments
A A
A B
After import, some fields will get manipulated
col1 col2 comments
A A looks good
A B fine with me
Then I receive an excel sheet with updates
col1 col2 comments
A A
A B
A C
After this update-step, the database should look like
col1 col2 comments
A A looks good
A B fine with me
A C
I was planning to simply create a unique index on all fields that won't get manipulated, so only the new records will get imported. (sth like
ALTER TABLE tbl ADD CONSRAINT unique_key UNIQUE (col1,col2)
My problem now is that Access somehow only allows composite indices of max. 10 fields. My tables all have around 11-20 cols...
I could maybe import the updated xls to a temp. table, and do s.th like
INSERT INTO tbl_old SELECT col1,col2, "" FROM tbl_new WHERE (col1,col2) NOT IN (SELECT col1,col2 FROM tbl_old UNION SELECT col1,col2 FROM tbl_new)
But I'm wondering if there isn't a more straigt-forward way...
Any ideas how I can solve that?

Try the EXISTS condition:
INSERT INTO tbl_old (col1, col2, comments)
SELECT col1, col2, Null
FROM tbl_new
WHERE NOT EXISTS (SELECT col1, col2 FROM tbl_old WHERE tbl_old.col1 = tbl_new.col1 AND tbl_old.col2 = tbl_new.col2);

Considering you will use SQL approach:
INSERT INTO table_old (col1, col2)
SELECT col1, col2 FROM table_new
EXCEPT
SELECT col1, col2 FROM table_old
:)
It will insert null in comments column though. Use this:
INSERT INTO table_old
SELECT * FROM table_new
EXCEPT
SELECT * FROM table_old
to avoid null values. Also both tables have to have the same amount of columns. For Oracle go with minus instead of except. Equivalent SQL query would be made with LEFT OUTER JOIN.
INSERT INTO table_old (col1 , col2)
SELECT N.col1, N.col2
FROM table_new N
LEFT OUTER JOIN table_old O ON O.col2 = N.col2
WHERE O.col2 IS NULL
Which will also provide null values to comments column, as we are inserting only col1 and col2. All inserts tested on provided table examples.
I would just put PK ID column in those tables.

SQL - Select null as FIELDNAME?

I have come across some code that says
select null as UnitCost,
null as MarkUp
What exactly is this doing? Is it getting the field names unitcost and markup?
Why would you use "select null as..."?
New to this sorry.
Thanks.

That code is aliasing the value null and calling it UnitCost/MarkUp. It is not selecting that column from any available table.
You would usually see this when a statement requires matching column sets, e.g. union all.
select id, col1, col2, null as UnitCost, null as MarkUp
from table_1
union all
select id, null as col1, null as col2, UnitCode, MarkUp
from table_2

Just populate NULL to those selected columns
It is useful when you do Insert...Select, the Table you are trying to insert may have more columns than the selected table has, for convenience, you could use select null as col_name to let the numbers of columns you are trying to insert matches the columns that the target table has.

Display multiple queries with different row types as one result

In PostgreSQL 8.3 on Ubuntu, I do have 3 tables, say T1, T2, T3, of different schemas.
Each of them contains (a few) records related to the object of the ID I know.
Using 'psql', I frequently do the 3 operations:
SELECT field-set1 FROM T1 WHERE ID='abc';
SELECT field-set2 FROM T2 WHERE ID='abc';
SELECT field-set3 FROM T3 WHERE ID='abc';
and just watch the results; for me it is enough to see.
Is it possible to have a procedure/function/macro etc, with one parameter 'id',
just running the three SELECTS one after another,
displaying results on the screen ?
field-set1, field-set2 and field-set 3 are completely different.
There is no reasonable way to JOIN the tables T1, T2, T3; these are unrelated data.
I do not want JOIN.
I want to see the three resulting sets on the screen.
Any hint?

Quick and dirty method
If the row types (data types of all columns in sequence) don't match, UNION will fail.
However, in PostgreSQL you can cast a whole row to its text representation:
SELECT t1:text AS whole_row_in_text_representation FROM t1 WHERE id = 'abc'
UNION ALL
SELECT t2::text FROM t2 WHERE id = 'abc'
UNION ALL
SELECT t3::text FROM t3 WHERE id = 'abc';
Only one ; at the end, and the one is optional with a single statement.
A more refined alternative
But also needs a lot more code. Pick the table with the most columns first, cast every individual column to text and give it a generic name. Add NULL values for the other tables with fewer columns. You can even insert headers between the tables:
SELECT '-t1-'::text AS c1, '---'::text AS c2, '---'::text AS c1 -- table t1
UNION ALL
SELECT '-col1-'::text, '-col2-'::text, '-col3-'::text -- 3 columns
UNION ALL
SELECT col1::text, col2::text, col3::text FROM t1 WHERE id = 'abc'
UNION ALL
SELECT '-t2-'::text, '---'::text, '---'::text -- table t2
UNION ALL
SELECT '-col_a-'::text, '-col_b-'::text, NULL::text -- 2 columns, 1 NULL
UNION ALL
SELECT col_a::text, col_b::text, NULL::text FROM t2 WHERE id = 'abc'
...

put a union all in between and name all columns equal
SELECT field-set1 as fieldset FROM T1 WHERE ID='abc';
union all
SELECT field-set2 as fieldset FROM T2 WHERE ID='abc';
union all
SELECT field-set3 as fieldset FROM T3 WHERE ID='abc';
and execute it at once.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas