I have one temporary external table and have put the data from HDFS into this table. Now I am inserting the same data into my partition main external table. The data gets inserted successfully, but when I am querying the main table using columns I'm getting different values for the columns.
I have loaded the data using csv file into my temporary that contains four fields.
col1=id
col2=visitDate
col3=comment
col4=age
Below are the queries and their results:
Temporary table:
create external table IF NOT EXISTS dummy1(id string,visitDate string,comment string, age string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
;
MAIN Table:
create external table IF NOT EXISTS dummy1(id string,comment string)
PARTITIONED BY (visitDate string, age string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS ORC
;
Result:
Temporary table:
select *from incr_dummy1;
1 11 a 20
2 12 b 3
1 13 c 34
4 14 d 23
5 15 e 45
6 16 f 65
7 17 g 78
8 18 h 9
9 19 i 12
10 20 j 34
select visitDate,age from incr_dummy1;
11 20
12 3
13 34
14 23
15 45
16 65
17 78
18 9
19 12
20 34
Main Table:
select *from dummy1;
1 11 a 20
2 12 b 3
1 13 c 34
4 14 d 23
5 15 e 45
6 16 f 65
7 17 g 78
8 18 h 9
9 19 i 12
10 20 j 34
select visitDate,age from dummy1;
a 20
b 3
c 34
d 23
e 45
f 65
g 78
h 9
i 12
j 34
so in above main external table,the value of "comment" column is coming when I'm querying for "visitDate" column.
Please let me know what mistake I'm doing here?
As i can see column orders are not same in temporary and final tables.
While inserting data from Temporary table to final table check you are having the correct order of columns in select statement(partition cols needs to be at the end of select cols).
hive> insert into dummy1 partition(visitDate,age) select id,comment,visitDate,age from incr_dummy1;
Just in case if you are still having issues then its better to check
As you are having external partitioned table (when we drop the table data will not be dropped on HDFS), check the hdfs directory is there any extra files that are not been deleted.
Then drop the table, delete the hdfs directory and create the table then run your job again.
Update:
Option1:
is it possible to match columns order in temporary table with final table, if yes then change the order of columns.
Option2:
use subquery with quoted identifier to exclude the original columns and get only the alias columns into our final select query.
hive> set hive.support.quoted.identifiers=none;
hive> insert into dummy1 partition(visitDate,age)
select `(visitDate|age)?+.+` from --exlude visitDate,age columns.
(select *,visitDate vis_dat,age age_n from incr_dummy1)t;
Related
I Have a data table in postgresql which has these columns and some rows like this:
st
epochnum
satnum
l1
l2
c1
p1
p2
1
1
1
10
11
12
13
14
1
1
2
15
16
17
18
19
1
2
1
20
21
22
23
24
1
2
2
25
26
27
28
29
20
1
1
30
41
52
63
74
20
1
2
75
76
87
88
null
20
2
1
...
I want to get some pairs of rows that have the same value for epochnum and satnum but have different value in "st". By the way, I have a list that specifies which "st" pairs should be subtracted. Its just another table that looks like this:
st1
st2
1
20
The rows in the first table have to be subtracted in l1,l2,c1,p1 and p2 with same epochnum and satnum according to this table and then inserted into a new table like this:
epochnum
st1
st2
satnum
dl1
dl2
dc1
dp1
dp2
1
1
20
1
20
30
40
50
60
1
1
20
2
65
65
75
75
null
...
The actual data has more than 400000 rows that has same epochnums and satnums like this. I have tried java programming in net-beans and used loops to simply get queries for each row and make the new table.
But I think maybe it is not efficient and unnecessarily takes long time due to the lots of queries that has to be done in java.
I wonder if there is a way that this can be done using just a few queries, or creating extra tables and .... I haven't come up with the best solution yet.
Are you looking for joins like this?
select t1.st, t1.epochnum, t1.satnum,
(t2.l1 - t1.l1),
(t2.l2 - t1.l2),
(t2.p1 - t1.p1),
(t2.p2 - t1.p2)
from t t1 join
t t2
on t1.epochnum = t2.epochnum and
t1.satnum = t2.satnum join
pairs p
on t1.st = p.st1 and t2.st = p.st2
Using Teradata :I have two tables with 10 records and 3 variables. All columns and values are same expect for three values in one variable.
My task is to make code changes for table2 where both records are matched, by not hard coding any value.
The second table was created by the first table , so there is no way to pick values by join etc .
Code :
Create multiset table table2 as (
Select * from table1 )
With data primary index(var1);
Eg:
Var1
Var2
Var3
1
Abc
20
2
Cde
30
3
kgk
87
4
kjj
98
5
gvy
67
6
jbn
78
7
hvb
56
8
ihg
62
9
jhn
22
10
hbn
34
Var1
Var2
Var3
1
Abc
20
2
Cde
30
3
kgk
87
4
kjj
98
5
gvy
67
6
jbn
78
7
hvb
56
8
ihg
77
9
jhn
56
10
hbn
23
Not sure what you want but you can find all the matching records using exists as follows:
select t.* from table2 t
where exists
(select 1 from table1 tt
where t.var1 = tt.var1 and t.var2 = tt.var2)
I want to sort a string column which can include both numbers and alphabets.
SQL Script:
select distinct a.UoA, b.rating , b.tot from omt_source a left join
wlm_progress_Scored b
on a.UoA = b.UoA
where a.UoA in (select UoA from UserAccess_dev
where trim(App_User) = lower(:APP_USER))
order by
regexp_substr(UoA, '^\D*') ,
to_number(regexp_substr(UoA, '\d+'))--);
Output I'm currently getting:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
23
26B
26A
27
28
30
31
32
33
34B
34A
But, I want 26 and 34 to be in this order
26A
26B
34A
34B
Any suggestion will be much helpful
Thanks
If your first order by clause ensures that the primary sort order is based on the numerical component of the UoA field, then your second order clause could actually be just the UoA field itself. I.e.
order by
regexp_substr(UoA, '^\D*'), UoA;
I need to create a stored procedure in SQL Server that accepts the following two parameters:
A select statement returning 1 column.
A number of columns.
The stored procedure would then run the select statement and return the result of the select statement with the values of the single column split into the given amount of columns per row.
Here are some examples:
exec stored_proc ‘select id from table where id between 1 and 20’, 5
The result of the select would be:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
The result of the stored procedure call would be:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
Or the call could be:
exec stored_proc ‘select id from table where id between 1 and 20’, 10
Giving the result of:
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
Though I'm not sure you should be doing this in SQL, it can be done.
I think the way to do it would be do create a cursor and use it's iterations to build a dynamic SQL statement.
During each iteration, add each piece of data as a new column (field) and when you reach the number of columns add something like Union Select
Need your help with a SQL query in Oracle db. I have data that I want to partition into groups when event = "Start". E.g. Row 1-6 is a group, row 7-9 is a group. I want to ignore rows with event = "Ignore". Finally I want to calculate max(Value)-min(Value) for these groups. I dont have any way to group the data.
Can this be achieved? Is it possible to use partition by Event = start. Same data is below:
Row Event Value Required Result is max-min of value
1 Start 10
2 A 11
3 B 12
4 C 13
5 D 14
6 E 15 5
--------------------------------------------
7 Start 16
8 A 18
9 B 20 4
--------------------------------------------
10 Start 27
11 A 30
12 B 33
13 C 34 7
--------------------------------------------
14 Ignore 35
--------------------------------------------
15 Ignore 36
--------------------------------------------
16 Start 33
17 A 34
18 B 35
19 C 36
20 D 37
21 E 38 5
--------------------------------------------
Yes, you can do this in SQL.
The following query first finds the group that a row is in, by finding the largest start before the row id. This version uses a correlated subquery for this calculation.
It then does the grouping on the id and does the calculation.
select groupid, max(value) - min(value)
from (select t.*,
(select max(row) from t t2 where t2.row < t.row and t2.event = start
) as groupid
from t
) t
where event <> 'IGNORE'