Match data values for two tables in Teradata Sql - sql

Using Teradata :I have two tables with 10 records and 3 variables. All columns and values are same expect for three values in one variable.
My task is to make code changes for table2 where both records are matched, by not hard coding any value.
The second table was created by the first table , so there is no way to pick values by join etc .
Code :
Create multiset table table2 as (
Select * from table1 )
With data primary index(var1);
Eg:
Var1
Var2
Var3
1
Abc
20
2
Cde
30
3
kgk
87
4
kjj
98
5
gvy
67
6
jbn
78
7
hvb
56
8
ihg
62
9
jhn
22
10
hbn
34
Var1
Var2
Var3
1
Abc
20
2
Cde
30
3
kgk
87
4
kjj
98
5
gvy
67
6
jbn
78
7
hvb
56
8
ihg
77
9
jhn
56
10
hbn
23

Not sure what you want but you can find all the matching records using exists as follows:
select t.* from table2 t
where exists
(select 1 from table1 tt
where t.var1 = tt.var1 and t.var2 = tt.var2)

Related

How to: For each unique id, for each unique version, grab the best score and organize it into a table

Just wanted to preface this by saying while I do have a basic understanding, I am still fairly new to using Bigquery tables and sql statements in general.
I am trying to make a new view out of a query that grabs all of the best test scores for each version by each employee:
select emp_id,version,max(score) as score from `project.dataset.table` where type = 'assessment_test' group by version,emp_id order by emp_id
I'd like to take the results of that query, and make a new table comprised of employee id's with a column for each versions best score for that rows emp_id. I know that I can manually make a table for each version by including a "where version = a", "where version = b", etc.... and then joining all of the tables at the end but that doesn't seem like the most elegant solution plus there is about 20 different versions in total.
Is there a way to programmatically create a column for each unique version or at the very least use my initial query as maybe a subquery and just reference it, something like this:
with a as (
select id,version,max(score) as score
from `project.dataset.table`
where type = 'assessment_test' and version is not null and score is not null and id is not null
group by version,id
order by id),
version_a as (select score from a where version = 'version_a')
version_b as (select score from a where version = 'version_b')
version_c as (select score from a where version = 'version_c')
select
a.id as id,
version_a.score as version_a,
version_b.score as version_b,
version_c.score as version_c
from
a,
version_a,
version_b,
version_c
Example Picture: left table is example data, right table is expected output
Example Data:
id
version
score
1
a
88
1
b
93
1
c
92
2
a
89
2
b
99
2
c
78
3
a
95
3
b
83
3
c
89
4
a
90
4
b
90
4
c
86
5
a
82
5
b
78
5
c
98
1
a
79
1
b
97
1
c
77
2
a
100
2
b
96
2
c
85
3
a
83
3
b
87
3
c
96
4
a
84
4
b
80
4
c
77
5
a
95
5
b
77
Expected Output:
id
a score
b score
c score
1
88
97
92
2
100
99
85
3
95
87
96
4
90
90
86
5
95
78
98
Thanks in advance and feel free to ask any clarifying questions
Use below approach
select * from your_table
pivot (max(score) score for version in ('a', 'b', 'c'))
if applied to sample data in your question - output is
In case if versions is not known in advance - use below
execute immediate (select '''
select * from your_table
pivot (max(score) score for version in (''' || string_agg(distinct "'" || version || "'") || "))"
from your_table
)

SQL: Subtracting certain rows with restrictions from a data table into a new table

I Have a data table in postgresql which has these columns and some rows like this:
st
epochnum
satnum
l1
l2
c1
p1
p2
1
1
1
10
11
12
13
14
1
1
2
15
16
17
18
19
1
2
1
20
21
22
23
24
1
2
2
25
26
27
28
29
20
1
1
30
41
52
63
74
20
1
2
75
76
87
88
null
20
2
1
...
I want to get some pairs of rows that have the same value for epochnum and satnum but have different value in "st". By the way, I have a list that specifies which "st" pairs should be subtracted. Its just another table that looks like this:
st1
st2
1
20
The rows in the first table have to be subtracted in l1,l2,c1,p1 and p2 with same epochnum and satnum according to this table and then inserted into a new table like this:
epochnum
st1
st2
satnum
dl1
dl2
dc1
dp1
dp2
1
1
20
1
20
30
40
50
60
1
1
20
2
65
65
75
75
null
...
The actual data has more than 400000 rows that has same epochnums and satnums like this. I have tried java programming in net-beans and used loops to simply get queries for each row and make the new table.
But I think maybe it is not efficient and unnecessarily takes long time due to the lots of queries that has to be done in java.
I wonder if there is a way that this can be done using just a few queries, or creating extra tables and .... I haven't come up with the best solution yet.
Are you looking for joins like this?
select t1.st, t1.epochnum, t1.satnum,
(t2.l1 - t1.l1),
(t2.l2 - t1.l2),
(t2.p1 - t1.p1),
(t2.p2 - t1.p2)
from t t1 join
t t2
on t1.epochnum = t2.epochnum and
t1.satnum = t2.satnum join
pairs p
on t1.st = p.st1 and t2.st = p.st2

Why main table and temporary table giving different results?

I have one temporary external table and have put the data from HDFS into this table. Now I am inserting the same data into my partition main external table. The data gets inserted successfully, but when I am querying the main table using columns I'm getting different values for the columns.
I have loaded the data using csv file into my temporary that contains four fields.
col1=id
col2=visitDate
col3=comment
col4=age
Below are the queries and their results:
Temporary table:
create external table IF NOT EXISTS dummy1(id string,visitDate string,comment string, age string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
;
MAIN Table:
create external table IF NOT EXISTS dummy1(id string,comment string)
PARTITIONED BY (visitDate string, age string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS ORC
;
Result:
Temporary table:
select *from incr_dummy1;
1 11 a 20
2 12 b 3
1 13 c 34
4 14 d 23
5 15 e 45
6 16 f 65
7 17 g 78
8 18 h 9
9 19 i 12
10 20 j 34
select visitDate,age from incr_dummy1;
11 20
12 3
13 34
14 23
15 45
16 65
17 78
18 9
19 12
20 34
Main Table:
select *from dummy1;
1 11 a 20
2 12 b 3
1 13 c 34
4 14 d 23
5 15 e 45
6 16 f 65
7 17 g 78
8 18 h 9
9 19 i 12
10 20 j 34
select visitDate,age from dummy1;
a 20
b 3
c 34
d 23
e 45
f 65
g 78
h 9
i 12
j 34
so in above main external table,the value of "comment" column is coming when I'm querying for "visitDate" column.
Please let me know what mistake I'm doing here?
As i can see column orders are not same in temporary and final tables.
While inserting data from Temporary table to final table check you are having the correct order of columns in select statement(partition cols needs to be at the end of select cols).
hive> insert into dummy1 partition(visitDate,age) select id,comment,visitDate,age from incr_dummy1;
Just in case if you are still having issues then its better to check
As you are having external partitioned table (when we drop the table data will not be dropped on HDFS), check the hdfs directory is there any extra files that are not been deleted.
Then drop the table, delete the hdfs directory and create the table then run your job again.
Update:
Option1:
is it possible to match columns order in temporary table with final table, if yes then change the order of columns.
Option2:
use subquery with quoted identifier to exclude the original columns and get only the alias columns into our final select query.
hive> set hive.support.quoted.identifiers=none;
hive> insert into dummy1 partition(visitDate,age)
select `(visitDate|age)?+.+` from --exlude visitDate,age columns.
(select *,visitDate vis_dat,age age_n from incr_dummy1)t;

Repetation of column when using join between two table

As per using select query in postgres along 8 or 9 table using join found output as
1. A 2 34
2. A 2 56
3. B 3 34
4. B 3 56
whereas i required output in two form either
1. A 2 34
2. A 2 34
3. B 3 56
4. B 3 56
or
A 2 34
B 3 56
what can i do?
Using distinct?
select distinct * from table

Postgresql: Merge multiple geometries into single geometry using Join

Say I have two tables, geom_levels and taz_geoms where taz_geoms has the columns as follows:
taz_geoms
id(int) state(int) county(int) taz(int) geom(geometry(MultiPolygon,4326))
and geom_levels looks like this:
geom_levels
TAZ COUNTY STATE DISTRICT
1 1 29 1
2 1 29 1
3 1 29 1
4 2 29 2
5 2 29 2
6 2 29 2
7 2 29 3
8 3 29 3
9 3 29 3
10 3 29 4
11 3 29 4
12 3 29 4
13 4 29 5
14 4 29 5
15 4 29 5
16 4 29 6
17 4 29 6
How would I go about combining these taz geometries into county, state, and district geometries? I would like to have a county_geoms, state_geoms, and district_geoms table. I have see that you can use ST_UNION with a geom array, but how would I generate such an array for counties or districts?
I was thinking something like this for counties:
SELECT ST_UNION(SELECT geom from taz_geoms GROUP BY county);
and for districts:
SELECT ST_UNION(SELECT geom from taz_geoms t LEFT JOIN geom_levels gl ON gl.taz = t.taz GROUP BY district);
But those options do not see possible.
Ideas?
try with:
SELECT ST_UNION( ARRAY( 'YOUR SELECT geoms QUERY' ) );
in your case:
SELECT ST_UNION(ARRAY( (SELECT geom from taz_geoms t LEFT JOIN geom_levels gl ON gl.taz = t.taz GROUP BY district) ));
I had the same problem and got it to work with postgre usin the ARRAY() function ;)