Is it possible to add data in Hive table column wise.
for example :
Let say I have a Hive table ( t1) like this.
table:
col1 col2
1 20
34 78
67 89
I have some data separately like
col3
10
20
30
col3, you can consider it like an array or a table. What I want to achieve is something like this for same table t1:
col1 col2 col3
1 20 10
34 78 20
67 89 30
Is this possible in Hive? I saw syntax for adding new column by using ALTER statement but not able to find how can I add data as well.
Related
Using Teradata :I have two tables with 10 records and 3 variables. All columns and values are same expect for three values in one variable.
My task is to make code changes for table2 where both records are matched, by not hard coding any value.
The second table was created by the first table , so there is no way to pick values by join etc .
Code :
Create multiset table table2 as (
Select * from table1 )
With data primary index(var1);
Eg:
Var1
Var2
Var3
1
Abc
20
2
Cde
30
3
kgk
87
4
kjj
98
5
gvy
67
6
jbn
78
7
hvb
56
8
ihg
62
9
jhn
22
10
hbn
34
Var1
Var2
Var3
1
Abc
20
2
Cde
30
3
kgk
87
4
kjj
98
5
gvy
67
6
jbn
78
7
hvb
56
8
ihg
77
9
jhn
56
10
hbn
23
Not sure what you want but you can find all the matching records using exists as follows:
select t.* from table2 t
where exists
(select 1 from table1 tt
where t.var1 = tt.var1 and t.var2 = tt.var2)
I am trying to create a table in SQL that lists the current column heading with the existing row so current listing would have heading
current layout
Code 1 2 3
ABC 50 80 90
DEF 40 20 70
but i want to show the value as
Target Layout
ABC 1 50
ABC 2 80
ABC 3 90
DEF 1 40
DEF 2 20
DEF 3 70
Not even sure if it is possible but would appreciate assistance
You can use union all:
select code, 1, col1 from t union all
select code, 2, col2 from t union all
select code, 3, col3 from t ;
Some databases support lateral joins, which are more efficient than union all.
I have one temporary external table and have put the data from HDFS into this table. Now I am inserting the same data into my partition main external table. The data gets inserted successfully, but when I am querying the main table using columns I'm getting different values for the columns.
I have loaded the data using csv file into my temporary that contains four fields.
col1=id
col2=visitDate
col3=comment
col4=age
Below are the queries and their results:
Temporary table:
create external table IF NOT EXISTS dummy1(id string,visitDate string,comment string, age string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
;
MAIN Table:
create external table IF NOT EXISTS dummy1(id string,comment string)
PARTITIONED BY (visitDate string, age string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS ORC
;
Result:
Temporary table:
select *from incr_dummy1;
1 11 a 20
2 12 b 3
1 13 c 34
4 14 d 23
5 15 e 45
6 16 f 65
7 17 g 78
8 18 h 9
9 19 i 12
10 20 j 34
select visitDate,age from incr_dummy1;
11 20
12 3
13 34
14 23
15 45
16 65
17 78
18 9
19 12
20 34
Main Table:
select *from dummy1;
1 11 a 20
2 12 b 3
1 13 c 34
4 14 d 23
5 15 e 45
6 16 f 65
7 17 g 78
8 18 h 9
9 19 i 12
10 20 j 34
select visitDate,age from dummy1;
a 20
b 3
c 34
d 23
e 45
f 65
g 78
h 9
i 12
j 34
so in above main external table,the value of "comment" column is coming when I'm querying for "visitDate" column.
Please let me know what mistake I'm doing here?
As i can see column orders are not same in temporary and final tables.
While inserting data from Temporary table to final table check you are having the correct order of columns in select statement(partition cols needs to be at the end of select cols).
hive> insert into dummy1 partition(visitDate,age) select id,comment,visitDate,age from incr_dummy1;
Just in case if you are still having issues then its better to check
As you are having external partitioned table (when we drop the table data will not be dropped on HDFS), check the hdfs directory is there any extra files that are not been deleted.
Then drop the table, delete the hdfs directory and create the table then run your job again.
Update:
Option1:
is it possible to match columns order in temporary table with final table, if yes then change the order of columns.
Option2:
use subquery with quoted identifier to exclude the original columns and get only the alias columns into our final select query.
hive> set hive.support.quoted.identifiers=none;
hive> insert into dummy1 partition(visitDate,age)
select `(visitDate|age)?+.+` from --exlude visitDate,age columns.
(select *,visitDate vis_dat,age age_n from incr_dummy1)t;
We have a table Property in our database containing a counter in every row stored in the NextVoucherNumber integer column. There are about 2000 rows there.
ID ... {other columns} ... NextVoucherNumber
-----------------------------------------------
1 112
2 34
3 29
4 9456
.... ....
2000 233
We have an issue with a concurrent access to the table.
To improve the performance we would like to extract those columns to a separate table PropertyVoucherNumbers with a 1:1 relation between the rows.
ID NextVoucherNumber
------------------------
1 112
2 34
3 29
4 9456
.... ....
2000 233
Alternatively, we could maintain sequences for every row.
Seq_VoucherNumber_1, Seq_VoucherNumber_2, ... Seq_VoucherNumber_2000.
Looks like the same triggers just a little dynamic SQL there.
Could you please describe what the problems we will face using the second solution?
Can you suggest any better solution?
Having a roadblock: I have rows with different combinations of variables that leads to a differing output (Value)-
Ex)
TableTest
TypeID PopularityID CriteriaID ExposureID Value
10 20 5 12 2
10 20 4 4 0.90
14 20 2 10 1.21
15 32 5 8 0.90
18 20 3 7 51
And I wanted to pull only the unique combinations of rows that give me the highest Value and the lowest Value, also a quick note there might be duplicates in the Value column in which case SQL can just pull out all the appropriate rows.
Easy peasy
select distinct *
from table
where value in (select max(value) from table)
or value in (select min(value) from table)