I have two queries that supposed to bring equivalent results. However the second query gives only partial results (less than 10 % of the total).
First query gives more than 4 million rows
SELECT id, amount
FROM table1 t1 LEFT OUTER JOIN table2 t2 ON t1.id = t2.id;
Second give only 18 thousand records
CREATE VOLATILE TABLE vt AS
(
SELECT id, amount
FROM table1 t1 LEFT OUTER JOIN table2 t2 ON t1.id = t2.id;
)
WITH DATA
NO PRIMARY INDEX
ON COMMIT PRESERVE ROWS;
SELECT *
FROM vt ;
Why does the second query give less records ???
When you do a SHOW TABLE vt; you'll notice that it's created as a SET table, which doesn't store duplicate rows. There are only 18 thousand distinct (id,amount) combinations.
Either add DISTINCT to your first Select or use CREATE MULTISET VOLATILE TABLE.
Related
I have two tables.
Table 1
Id
UpdateId
Name
Table 2
Table1ID
UpdateID
Address
Each time user update, system will insert record to table1. But for table2, system only insert record when there is update in address.
Sample data
Table 1
1,1,name1
1,2,name1
1,3,name1update
1,4,name1update
1,5,name1
1,6,name2
Table 2
1,1,address
1,4,addressupdate
I want to get the result as following
1,1,name1,address
1,2,name1,address
1,3,name1update,address
1,4,name1update,addressupdate
1,5,name1,addressupdate
1,6,name2,addressupdate
How to make use of join condition to achieve as above?
You can use a correlated subquery. Here is standard syntax, but it can be easily adapted to any database:
select t1.*,
(select t2.addressid
from table2 t2
where t2.table1id = t1.id and
t2.updateid <= t1.updateid
order by t2.updateid desc
fetch first 1 row only
) as addressid
from table1 t1;
you can use left join when you want to take all columns from left table t1 even though it doesn't match with the other table with column updateid on t2 table.
select t1.id,t1.updateid,t1.name,t2.address from table1 t1
left join table2 t2
on t2.updateid= t1.updateid
you can read more about joins here
I have two tables:
One is base table and second is transaction table. I want to compare base table value with second table's sum of value with group by.
Table1(T1Id,Amount1,...)
Tabe2(T2Id,T1ID,Amount2)
I want those rows from table 1 WHere SUM of Table2's SUM( Amount2) is greater or equal table1's Amount1.
*T1ID is in relation with both tables
* The SQL query have many joins with other table for data retriving.
One approach uses a join:
SELECT t1.T1Id, t1.Amount1
FROM Table1 t1
INNER JOIN Table2 t2
ON t1.T1Id = t2.T1ID
GROUP BY
t1.T1Id, t1.Amount1
HAVING
SUM(t2.Amount2) >= t1.Amount1;
We can also try doing this via a correlated subquery:
SELECT t1.T1Id, t1.Amount1
FROM Table1 t1
WHERE t1.Amount1 <= (SELECT SUM(t2.Amount2) FROM Table2 t2
WHERE t1.T1Id = t2.T1ID);
I would use something similar to the query below:
SELECT
a.T1Id, a.Amount1, SUM(b.Amount2)
FROM Table1 a
INNER JOIN Table2 b on b.T1Id = a.T1Id
GROUP BY a.T1Id, a.Amount1
HAVING SUM(b.Amount2) >= a.Amount1;
Basically what the query above does is give you the ID, Amount from table 1 and the summed amount from table 2. The HAVING clause at the end of query filters out those records where the summed amount from the second table is smaller than the amount from the first one.
If you want to add further table joins to the query, you can do so by adding as many joins as you wish. I would recommend having a referenced ID for each table you are joining in the Table1 table.
I'm trying to count the data records from Hive table t1 that have profile_emails that appear in Hive table t2. Multiple records can have the same profile_email in t1, but t2.profile_email is unique. I would expect a result count of < 11,681,830 (since some t1.profile_emails are not in t2). Instead it massively blows up. How is this possible with an inner join?? (and how do I fix it?)
select count(*) from t1;
#11,681,830
select count(*) from t2;
#1,661,773
SELECT count (*) FROM t1
inner JOIN t2 ON t1.profile_email = t2.profile_email
#1,519,465,221
I am attempting a very basic difference function in postgresql. Table 1 and Table 2 have identical columns. Only difference is Table 1 has some surplus rows. I would like to select for surplus rows only:
SELECT *
FROM table1
WHERE NOT EXISTS (SELECT * from table2);
The query above returns nothing when I know there are surplus rows.
I think you are looking for except:
select t1.*
from table1 t1
except
select t2.*
from table2 t2;
Note that the two tables must have the same number of columns, and the columns must all be of the same type. You can review the documentation here.
If you wish to use NOT EXISTS you're missing the joining of your table's keys in the inner where clause. Try:
SELECT *
FROM table1 t1
WHERE NOT EXISTS (SELECT * from table2 t2 WHERE t2.id = t1.id);
SELECT *
FROM t1, t2 , t3
WHERE t1.row_id = t2.invoice_id(+)
and t2.voi_id = t3.row_id(+)
and type = 'Dec'
order by 1
I have 3 indexes, one for each column in the join, but it seems that the explain plan uses a full table scan on the tables without using the indexes:
Plan
1 Every row in the table t1 is read.
2 The rows were sorted to support the join at step 5.
3 Every row in the table t2 is read.
4 The rows were sorted to support the join at step 5.
5 Join the sorted results sets provided from steps 2, 4.
6 Rows were returned by the SELECT statement.
It is depend on rowcounts and size of tables. In your query all rows t1 will be fetched (because used left join and all rows from t1 with type='Dec' will be shown). That's why TABLE ACCESS FULL to table t1 is normal.
If rowcount in t1 is more than 20-30% rowcount in t2 (% depend on t2 size) also TABLE ACCESS FULL to t2 and their hash join is normal scenario.