Left join on dates all dates - sql

I have several tables with dates that I'm trying to join to to make a large table where the data is grouped by date.
I'm accomplishing this right now by LEFT JOIN'ing to subselect's generated from the tables that I need to join to ( a lot of them are the same table with different where queries and involve SUM and COUNT so I think I have to use subselects ). The problem that I'm having is that if one of the dates doesn't existing in the first table then it doesn't show up in the table even if there are rows in subsequent tables that it's joined to with that date. I'm joining based upon DATE(datetime_column).
So it's like
SELECT date, col 1
FROM a
LEFT JOIN (SELECT date, col2 FROM a1) a2 ON DATE(a.date)=DATE(a2.date)
LEFT JOIN (SELECT date, col3 FROM a3) a4 ON DATE(a3.date)=DATE(a4.date)
Make sense? Probably not..

There are basically two ways to do so:
You can use a FULL OUTER JOIN
Full outer join
Conceptually, a full outer join combines the effect of applying both
left and right outer joins. Where records in the FULL OUTER JOINed
tables do not match, the result set will have NULL values for every
column of the table that lacks a matching row. For those records that
do match, a single row will be produced in the result set (containing
fields populated from both tables).
...
Some database systems do not support the full outer join functionality
directly, but they can emulate it through the use of an inner join and
UNION ALL selects of the "single table rows" from left and right
tables respectively. The same example can appear as follows:
SELECT employee.LastName, employee.DepartmentID, department.DepartmentName, department.DepartmentID
FROM employee
INNER JOIN department ON employee.DepartmentID = department.DepartmentID
UNION ALL
SELECT employee.LastName, employee.DepartmentID, CAST(NULL AS VARCHAR(20)), CAST(NULL AS INTEGER)
FROM employee
WHERE NOT EXISTS (SELECT * FROM department WHERE employee.DepartmentID = department.DepartmentID)
UNION ALL
SELECT CAST(NULL AS VARCHAR(20)), CAST(NULL AS INTEGER),
department.DepartmentName, department.DepartmentID
FROM department
WHERE NOT EXISTS (SELECT * FROM employee WHERE employee.DepartmentID = department.DepartmentID)
Other-ways you can make a master view, witch contains all the distinct keys of all the tables, to LEFT JOIN with all the tables.
select *
from (
SELECT date
FROM a
union
SELECT date
FROM a1
union
SELECT date
FROM a3
)
LEFT JOIN a using (date)
LEFT JOIN a1 using (date)
LEFT JOIN a3 using (date)
Sometime I prefer the second way to the FULL OUTER JOIN because FULL OUTER JOIN is not supported on many RDBMS and because there many of those who support it that do not optimize it well, Oracle's current version for example just threats a full outer join as the equivalent query showed in the citation, witch is very lossy for performances.

Try using the OUTER JOIN to fetch all the records from main table and only matching records from the sub/child table.
SELECT a.Col1, b.Col1 FROM a LEFT OUTER JOIN b ON a.Col2=b.Col2
Refer Join (SQL) for details on Joins.

You have another option for this, which is not using a join at all. You can bring the results together using unions and aggregations:
SELECT date, max(col1) as col1, max(col2) as col2, max(col3) as col3
FROM ((select date, col1, NULL as col2, NULL as col3 from a1) union all
(SELECT date, NULL, col2, NULL FROM a2) union all
(SELECT date, NULL, NULL, col3 FROM a3)
) t
group by date
Often the solution is the second one given by Alessandro (the first version is very cumbersome). One caveat. His solution pulls the dates from the data. Sometimes you want to generate the master list, perhaps from a calendar table or perhaps by generating the list of dates (the specifics for that depend entirely on the database).

Related

INTERSECT two table of size 500ml rows in vertica

I am very new to vertica db and hence looking for different efficient ways for comparing two tables of average size 500ml-800ml rows in vertica. I have a process that gets the data from vertica view and dump in to SQL server for later merge to final table in sql server. for few large tables combine it is dumping about 3bl rows daily. Instead of dumping all data I want to take daily snapshot, and compare it with previous days snapshot on vertica side only and then push changed rows only in to SQL SEREVER.
lets say previous snapshot is stored in tableA, today's snapshot stored in tableB. PK on both table is column named OrderId.
Simplest way I can think of is
Select * from tableB
Where OrderId NOT IN (
SELECT * from tableA
INTERSECT
SELECT * from tbleB
)
So my questions are:
Is there any other/better option in vertica to get only changed rows between two tables? Or should I
even consider doing this compare on vertica side?
How much doing such comparison should take?
What should I consider to improve the performance of such query?
If your columns have no NULL values, then a massive LEFT JOIN would seem to do what you want:
select b.*
from tableB b left join
tableA a
on b.OrderId = a.OrderId and
b.col1 = a.col1 and
. . . -- for all the columns you care about
However, I think you want except:
select b.*
from tableB b
except
select a.*
from tableA a;
I imagine this would have reasonable performance.
Do you have a primary key in the two tables?
Then my technique, for a complete Change Data Capture, is:
SELECT
'I' AS to_do
, newrows.*
FROM tb_today newrows
LEFT
JOIN tb_yesterday oldrows USING(id)
WHERE oldrows.id IS NULL
UNION ALL
SELECT
'U' AS to_do
, newrows.*
FROM tb_today newrows
JOIN tb_yesterday oldrows
WHERE oldrows.fname <> newrows.fname
OR oldrows.lnamd <> newrows.lname
OR oldrows.bdate <> newrwos.bdate
OR oldrows.sal <> newrows.sal
[...]
OR oldrows.lastcol <> newrows.lastcol
UNION ALL
SELECT
'D' AS to_do
, oldrows.*
FROM tb_yesterday oldrows
LEFT
JOIN tb_today oldrows USING(id)
WHERE newrows.id IS NULL
;
Just leave out the last leg of the UNION SELECT if you don't want to cater for DELETEs ('D')
Good luck
you also do it nicely using joins:
SELECT b.*
FROM tableB AS b
LEFT JOIN tableA AS a ON a.id = b.id
WHERE a.id IS NULL
so above query return only diff from TableB to TableA i.e. data which is present in both table will be skipped...

Left Join with Distinct Clause

Below is my insert query.
INSERT INTO /*+ APPEND*/ TEMP_CUSTPARAM(CUSTNO, RATING)
SELECT DISTINCT Q.CUSTNO, NVL(((NVL(P.RATING,0) * '10.0')/100),0) AS RATING
FROM TB_ACCOUNTS Q LEFT JOIN TB_CUSTPARAM P
ON P.TEXT_PARAM IN (SELECT DISTINCT PRDCD FROM TB_ACCOUNTS)
AND P.TABLENAME='TB_ACCOUNTS' AND P.COLUMNNAME='PRDCD';
In the previous version of the query, P.TEXT_PARAM=Q.PRDCD but during insert to TEMP_CUSTPARAM due to violation of unique constraint on CUSTNO.
The insert query is taking ages to complete. Would like to know how to use distinct with LEFT JOIN statement.
Thanks.
SELECT T1.Col1, T2.Col2 FROM Table1 T1
Left JOIN
(SELECT Distinct Col1, Col2 FROM Table2
) T2 ON T2.Id = T1.Id
You are missing criteria to join TB_ACCOUNTS records with their related TB_ACCOUNTS/PRDCD TB_CUSTPARAM records and thus cross join them instead. I guess you want:
INSERT INTO /*+ APPEND*/ TEMP_CUSTPARAM(CUSTNO, RATING)
SELECT DISTINCT
Q.CUSTNO,
NVL(P.RATING, 0) * 0.1 AS RATING
FROM TB_ACCOUNTS Q
LEFT JOIN TB_CUSTPARAM P ON P.TEXT_PARAM = Q.PRDCD
AND P.TABLENAME = 'TB_ACCOUNTS'
AND P.COLUMNNAME = 'PRDCD';
If the query is taking ages to complete, check first the execution plan. You may find some hints here - If you see a cartesian join on two non-trivial tables, probably the query should be revisited.
Than ask yourself what is the expectation of the query.
Do you expect one record per CUSTNO? Or can a customer have more than one rating?
One reting per customer could have sense from the point of business. To get unique customer list with rating
1) first get a UNIQUE CUSTNO - note that this is in generel not done with a DISTINCT clause, but if tehre are more rows per customer with a filter predicate, e.g. selecting the most recent row.
2) than join to the rating table

How to compare two tables in Postgresql?

I have two identical tables:
A : id1, id2, qty, unit
B: id1, id2, qty, unit
The set of (id1,id2) is identifying each row and it can appear only once in each table.
I have 140 rows in table A and 141 rows in table B.
I would like to find all the keys (id1,id2) that are not appearing in both tables. There is 1 for sure but there can't be more (for example if each table has whole different data).
I wrote this query:
(TABLE a EXCEPT TABLE b)
UNION ALL
(TABLE b EXCEPT TABLE a) ;
But it's not working. It compares the whole table where I don't care if qty or unit are different, I only care about id1,id2.
use a full outer join:
select a.*,b.*
from a full outer join b
on a.id1=b.id1 and a.id2=b.id2
this show both tables side by side. with gaps where there is an unmatched row.
select a.*,b.*
from a full outer join b
on a.id1=b.id1 and a.id2=b.id2
where a.id1 is null or b.id1 is null;
that will only show unmatched rows.
or you can use not in
select * from a
where (id1,id2) not in
( select id1,id2 from b )
that will show rows from a not matched by b.
or the same result using a join
select a.*
from a left outer join b
on a.id1=b.id1 and a.id2=b.id2
where b.id1 is null
sometimes the join is faster than the "not in"
Here is an example of using EXCEPT to see what records are different. Reverse the select statements to see what is different. a except s / then s except a
SELECT
a.address_entrytype,
a.address_street,
a.address_city,
a.address_state,
a.address_postal_code,
a.company_id
FROM
prospects.address a
except
SELECT
s.address_entrytype,
s.address_street,
s.address_city,
s.address_state,
s.address_postal_code,
s.company_id
FROM
prospects.address_short s

Count rows after joining three tables in PostgreSQL

Suppose I have three tables in PostgreSQL:
table1 - id1, a_id, updated_by_id
table2 - id2, a_id, updated_by_id
Users - id, display_name
Suppose I am using the using the following query:
select count(t1.id1) from table1 t1
left join table2 t2 on (t1.a_id=t2.a_id)
full outer join users u1 t1.updated_by_id=u1.id)
full outer join users u2 t2.updated_by_id=u2.id)
where u1.id=100;
I get 50 as count.
Whereas with:
select count(t1.id1) from table1 t1
left join table2 t2 on (t1.a_id=t2.a_id)
full outer join users u1 t1.updated_by_id=u1.id)
full outer join users u2 t2.updated_by_id=u2.id)
where u2.id=100;
I get only 25 as count.
What is my mistake in the second query? What can I do to get the same count?
My requirement is that there is a single user table, referenced by multiple tables. I want to take the complete list of users and get the count of ids from different tables.
But the table on which I have joined alone returns the proper count but rest of them don't return the proper count. Can anybody suggest a way to modify my second query to get the proper count?
To simplify your logic, aggregate first, join later.
Guessing missing details, this query would give you the exact count, how many times each user was referenced in table1 and table2 respectively for all users:
SELECT *
FROM users u
LEFT JOIN (
SELECT updated_by_id AS id, count(*) AS t1_ct
FROM table1
GROUP BY 1
) t1 USING (id)
LEFT JOIN (
SELECT updated_by_id AS id, count(*) AS t2_ct
FROM table2
GROUP BY 1
) t2 USING (id);
In particular, avoid multiple 1-n relationships multiplying each other when joined together:
Two SQL LEFT JOINS produce incorrect result
To retrieve a single or few users only, LATERAL joins will be faster (Postgres 9.3+):
SELECT *
FROM users u
LEFT JOIN LATERAL (
SELECT count(*) AS t1_ct
FROM table1
WHERE updated_by_id = u.id
) ON true
LEFT JOIN LATERAL (
SELECT count(*) AS t2_ct
FROM table2
WHERE updated_by_id = u.id
) ON true
WHERE u.id = 100;
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
Explain perceived difference
The particular mismatch you report is due to the specifics of a FULL OUTER JOIN:
First, an inner join is performed. Then, for each row in T1 that does
not satisfy the join condition with any row in T2, a joined row is
added with null values in columns of T2. Also, for each row of T2 that
does not satisfy the join condition with any row in T1, a joined row
with null values in the columns of T1 is added.
So you get NULL values appended on the respective other side for missing matches. count() does not count NULL values. So you can get a different result depending on whether you filter on u1.id=100 or u2.id=100.
This is just to explain, you don't need a FULL JOIN here. Use the presented alternatives instead.

In SQL, a Join is actually an Intersection? And it is also a linkage or a "Sideway Union"?

I always thought of a Join in SQL as some kind of linkage between two tables.
For example,
select e.name, d.name from employees e, departments d
where employees.deptID = departments.deptID
In this case, it is linking two tables, to show each employee with a department name instead of a department ID. And kind of like a "linkage" or "Union" sideway".
But, after learning about inner join vs outer join, it shows that a Join (Inner join) is actually an intersection.
For example, when one table has the ID 1, 2, 7, 8, while another table has the ID 7 and 8 only, the way we get the intersection is:
select * from t1, t2 where t1.ID = t2.ID
to get the two records of "7 and 8". So it is actually an intersection.
So we have the "Intersection" of 2 tables. Compare this with the "Union" operation on 2 tables. Can a Join be thought of as an "Intersection"? But what about the "linking" or "sideway union" aspect of it?
You're on the right track; the rows returned by an INNER JOIN are those that satisfy the join conditions. But this is like an intersection only because you're using equality in your join condition, applied to columns from each table.
Also be aware that INTERSECTION is already an SQL operation and it has another meaning -- and it's not the same as JOIN.
An SQL JOIN can produce a new type of row, which has all the columns from both joined tables. For example: col4, col5, and col6 don't exist in table A, but they do exist in the result of a join with table B:
SELECT a.col1, a.col2, a.col3, b.col4, b.col5, b.col6
FROM A INNER JOIN B ON a.col2=b.col5;
An SQL INTERSECTION returns rows that are common to two separate tables, which must already have the same columns.
SELECT col1, col2, col3 FROM A
INTERSECT
SELECT col1, col2, col3 FROM B;
This happens to produce the same result as the following join:
SELECT a.col1, a.col2, a.col3
FROM A INNER JOIN B ON a.col1=b.col1 AND a.col2=b.col2 AND a.col3=b.col3;
Not every brand of database supports the INTERSECTION operator.
A join 'links' or erm... joins the rows from two tables. I think that's what you mean by 'sideways union' although I personally think that is a terrible way to phrase it. But there are different types of joins that do slightly different things:
An inner join is indeed an intersection.
A full outer join is a union.
This page on Jeff Atwood's blog describes other possibilities.
An Outer Join - is not related to - Union or Union All.
For example, a 'null' would not occur as a result of Union or Union All operation, but it results from an Outer Join.
INNER JOIN treats two NULLs as two different values. So, if you join based on a nullable column, and if both tables have NULL values in that column, then INNER JOIN will ignore those rows.
Therefore, to correctly retrieve all common rows between two tables, INTERSECT should be used. INTERSECT treats two NULLs as the same value.
Example(SQLite):
Create two tables with nullable columns:
CREATE TABLE Table1 (id INT, firstName TEXT);
CREATE TABLE Table2 (id INT, firstName TEXT);
Insert NULL values:
INSERT INTO Table1 VALUES (1, NULL);
INSERT INTO Table2 VALUES (1, NULL);
Retrieve common rows using INNER JOIN (This shows no output):
SELECT * FROM Table1 INNER JOIN Table2 ON
Table1.id=Table2.id AND Table1.firstName=Table2.firstName;
Retrieve common rows using INTERSECT (This correctly shows the common row):
SELECT * FROM Table1 INTERSECT SELECT * FROM Table2;
Conclusion:
Even though, many times both INTERSECT and INNER JOIN can be used to get the same results, they are not the same and should be picked depending on the situation.