GROUP BY in CROSS APPLY - sql

Let us have two tables
create table A (
fkb int,
groupby int
);
create table B (
id int,
search int
);
insert into A values (1, 1);
insert into B values (1, 1);
insert into B values (2, 1);
then the following query
select B.id, t.max_groupby - B.search diff
from B
cross apply (
select max(A.groupby) max_groupby
from A
where A.fkb = B.id
) t
return the expected result as follows
id diff
---------
1 0
2 NULL
However, when I add the group by A.fkb into the cross apply, the B row where the corresponding A.fkb does not exist, disappear.
select B.id, t.max_groupby - B.search diff
from B
cross apply (
select max(A.groupby) max_groupby
from A
where A.fkb = B.id
group by A.fkb
) t
I was testing on SQL Server as well as on PostgreSQL (with cross join lateral instead of cross apply). Why the group by makes the row disappear? It seems that the cross apply behaves as an outer join in the first case and as an inner join in the latter case. However, it is not clear to me why.

You can see this when you look at the result of the inner query separately:
select max(A.groupby) max_groupby
from A
where A.fkb = 2;
returns a single row with max_groupby = null:
max_groupby
-----------
(null)
However as there is no row with A.fkb = 2 grouping by it yields an empty result which you can see when you run:
select max(A.groupby) max_groupby
from A
where A.fkb = 2
group by A.fkb
and thus the cross join does not return return rows for fkb = 2
You need to use an outer join in order to include the row from B.
In Postgres you would have to write this as:
select B.id, t.max_groupby - B.search diff
from B
left join lateral (
select max(A.groupby) max_groupby
from A
where A.fkb = B.id
group by A.fkb
) t on true
I don't know what the equivalent to left join lateral would be in SQL Server. on true would need to be written as on 1=1.

It happens because:
GROUP BY returns nothing when A.fkb = 2
without GROUP BY returns NULL
So your query CROSS APPLY returns different results.
select B.id, t.max_groupby - B.search diff
from B
outer apply (
select max(A.groupby) max_groupby
from A
where A.fkb = B.id
group by A.fkb
) t
OUTPUT:
id diff
1 0
2 NULL

Related

Insert missing values in column at all levels of another column in SQL?

I have been working with some big data in SQL/BigQuery and found that it has some holes in it that need to be filled with values in order to complete the dataset. What I'm struggling with is how to insert the missing values properly.
Say that I have multiple levels of a variable (1, 2, 3... no upper bound) and for each of these levels, they should have an A, B, C value. Some of these records will have data, others will not.
Current dataset:
level value data
1 A 1a_data
1 B 1b_data
1 C 1c_data
2 A 2a_data
2 C 2c_data
3 B 3b_data
What I want the dataset to look like:
level value data
1 A 1a_data
1 B 1b_data
1 C 1c_data
2 A 2a_data
2 B NULL
2 C 2c_data
3 A NULL
3 B 3b_data
3 C NULL
What would be the best way to do this?
You need a CROSS join of the distinct levels with the distinct values and a LEFT join to the table:
SELECT l.level, v.value, t.data
FROM (SELECT DISTINCT level FROM tablename) l
CROSS JOIN (SELECT DISTINCT value FROM tablename) v
LEFT JOIN tablename t ON t.level = l.level AND t.value = v.value
ORDER BY l.level, v.value;
See the demo.
We can use an INSERT INTO ... SELECT with the help of a calendar table:
INSERT INTO yourTable (level, value, data)
SELECT t1.level, t2.value, NULL
FROM (SELECT DISTINCT level FROM yourTable) t1
CROSS JOIN (SELECT DISTINCT value FROM yourTable) t2
LEFT JOIN yourTable t3
ON t3.level = t1.level AND
t3.value = t2.value
WHERE t3.data IS NULL;

max function does not when having case when clause

i have two tables.
one is as below
table a
ID, count
1, 123
2, 123
3, 123
table b
ID, count
table b is empty
when using
SELECT CASE
WHEN isnotnull(max(b.count)) THEN max(a.count) + max(b.count)
ELSE max(a.count)
FROM a, b
the only result is always NULL
i am very confused. why?
You don't need to use a JOIN, a simple SUM of two sub-queries will give you your desired result. Since you only add MAX(b.count) when it is non-NULL, we can just add it all the time but COALESCE it to 0 when it is NULL.
SELECT COALESCE((SELECT MAX(count) FROM b), 0) + (SELECT MAX(count) FROM a)
Another way to make this work is to UNION the count values from each table:
SELECT COALESCE(MAX(bcount), 0) + MAX(acount)
FROM (SELECT count AS acount, NULL AS bcount FROM a
UNION
SELECT NULL AS acount, count AS bcount FROM b) u
Note that if you use a JOIN it must be a FULL JOIN. If you use a LEFT JOIN you risk not seeing all the values from table b. For example, consider the case where table b has one entry: ID=4, count=456. A LEFT JOIN on ID will not include this value in the result table (since table a only has ID values of 1,2 and 3) so you will get the wrong result:
CREATE TABLE a (ID INT, count INT);
INSERT INTO a VALUES (1, 123), (2, 123), (3, 123);
CREATE TABLE b (ID INT, count INT);
INSERT INTO b VALUES (4, 456);
SELECT COALESCE(MAX(b.count), 0) + MAX(a.count)
FROM a
LEFT JOIN b ON a.ID = b.ID
Output
123 (should be 579)
To use a FULL JOIN you would write
SELECT COALESCE(MAX(b.count), 0) + MAX(a.count)
FROM a
FULL JOIN b ON a.ID = b.ID
Since, tableb is empty, max(b.count) will return NULL. And any operation done with NULL, results in NULL.
So, max(a.count) + max(b.count) is NULL.(this is 123 + NULL which will be NULL always). Hence, your query is returning NULL.
Just use a coalesce to assign a default value whenever NULL comes.
use coalesce() function and explicit join, avoid coma separated table name type old join method
select coalesce(max(a.count)+max(b.count),max(a.count))
from a left join b on a.id=b.id
Use left join
SELECT coalesce(max(a.count) + max(b.count),max(a.count))
FROM a left join b a.id=b.id

select sql query to merge results

I have a table old_data and a table new_data. I want to write a select statement that gives me
Rows in old_data stay there
New rows in new_data get added to old_data
unique key is id so rows with id in new_data should update existing ones in old_data
I need to write a select statement that would give me old_data updated with new data and new data added to it.
Example:
Table a:
id count
1 2
2 19
3 4
Table b:
id count
2 22
5 7
I need a SELECT statement that gives me
id count
1 2
2 22
3 4
5 7
Based on your desired results:
SELECT
*
FROM
[TableB] AS B
UNION ALL
SELECT
*
FROM
[TableA] AS A
WHERE
A.id NOT IN (SELECT id FROM [TableB])
I think this would work pretty neatly with COALESCE:
SELECT a.id, COALESCE(b.count, a.count)
FROM a
FULL OUTER JOIN b
ON a.id = b.id
Note - if your RDBMS does not contain COALESCE, you can write out the function using CASE as follows:
SELECT a.id,
CASE WHEN b.count IS NULL THEN a.count
ELSE b.count END AS count
FROM ...
You can write a FULL OUTER JOIN as follows:
SELECT *
FROM a
LEFT JOIN b
ON a.id = b.id
UNION ALL
SELECT *
FROM b
LEFT a
ON b.id = a.id
You have to use UPSERT to update old data and add new data in Old_data table and select all rows from Old_data. Check following and let me know what you think about this query
UPDATE [old_data]
SET [count] = B.[count]
FROM [old_data] AS A
INNER JOIN [new_Data] AS B
ON A.[id] = B.[id]
INSERT INTO [old_data]
([id]
,[count])
SELECT A.[id]
,A.[count]
FROM [new_Data] AS A
LEFT JOIN [old_data] AS B
ON A.[id] = B.[id]
WHERE B.[id] IS NULL
SELECT *
FROM [old_data]

Select table as json array

I have three tables in PostgreSQL: A, B, C.
I want to get a row from table A with a specific id, plus all records from tables B and C with matching id as aggregated JSON.
For example:
Table A Table B Table C
---------------------------------------------------------------
id / colum1 / colum2 id/ colum 1 id / column1
1 someValue, somValue 1 someVal1 1 someVal1
1 someVal2 1 someVal2
The expected output for id = 1 would be:
a.column1 a.column2 ARRAY_JSON_B ARRAY_JSON_C
------------------------------------------------------------------------------
someValue someValue [{colum1:'someVal1'}, [{colum1:'someVal1'},
{colum1:'someVal2'}] {colum1:'someVal2'}]
This requires Postgres 9.3 or later.
Simple case
I suggest to use the simpler json_agg() that's meant for this purpose, in LATERAL joins:
SELECT *
FROM a
LEFT JOIN LATERAL (SELECT json_agg(b) AS array_json_b FROM b WHERE id = a.id) b ON true
LEFT JOIN LATERAL (SELECT json_agg(c) AS array_json_c FROM c WHERE id = a.id) c ON true
WHERE id = 1;
LEFT JOIN LATERAL ... ON true keeps rows in the result that have no match on the left side of the join. Details:
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
Subtle difference: This query returns NULL where no match is found in b or c, #stas' query with correlated subqueries returns an empty array instead. May or may not be important.
Actual answer
Your example in the question excludes the redundant id column in b and c from the result - which makes sense. To achieve this, you can't use #stas' simple correlated subquery. While it would still work for a single column instead of the whole row, it would lose the column name and produce a simple array. Also, it would not work for more than one column.
Use json_object_agg() for a single selected column (which also allows to chose the tag name freely):
SELECT *
FROM a
LEFT JOIN LATERAL (
SELECT json_object_agg('colum1', colum1) AS array_json_b
FROM b WHERE id = a.id
) b ON true
LEFT JOIN LATERAL (
SELECT json_object_agg('colum1', colum1) AS array_json_c
FROM c WHERE id = a.id
) c ON true
WHERE id = 1;
Or use a subselect for any selection (col1 and col2 in this example):
SELECT *
FROM a
LEFT JOIN LATERAL (
SELECT json_agg(x) AS array_json_b
FROM (SELECT col1, col2 FROM b WHERE id = a.id) x
) b ON true
LEFT JOIN LATERAL (
SELECT json_agg(x) AS array_json_c
FROM (SELECT col1, col2 FROM c WHERE id = a.id) x
) c ON true
WHERE id = 1;
Related:
Return multiple columns of the same row as JSON array of objects
How do I return a jsonb array and array of objects from my data?
select
a.*,
to_json(array(select b from b where b.id = a.id)) array_json_b,
to_json(array(select c from c where c.id = a.id)) array_json_c
from a
where
a.id = 1;
I hope your Postgresql version is 9.3 or higher. There is a clever function to_json which can convert anything to json. So we take an array of all related rows from b and convert it. Same with c.

LEFT JOIN - How to join tables and include extra row even if you have right match

I have two tables
Table A
-------
ID
ProductName
Table B
-------
ID
ProductID
Size
I want to join these two tables
SELECT * FROM
(SELECT * FROM A)
LEFT JOIN
(SELECT * FROM B)
ON A.ID = B.ProductID
This is easy, I will get all rows from A multiplied by rows matched in B, and NULL fields if there is no match.
But here comes the tricky question, how can I get all rows from A with NULL fields for table B, even if there is a match, so I get an extra line with NULL values plus all the matches?
SELECT A.*
, B3.ID
, B3.ProductID
, B3.Size
FROM A
LEFT JOIN
(
SELECT ProductID as MatchID
, ID
, ProductID
, Size
FROM B
UNION ALL
SELECT ID
, null
, null
, null
FROM A A2
) B3
ON A.ID = B3.MatchID
Live example at SQL Fiddle.
Instead of using UNION ALL in a subquery as suggested by others, you could also (and I would) use UNION ALL at the outer level, which keeps the query simpler:
SELECT A.ID, A.ProductName, B.ID, B.Size
FROM A
INNER JOIN B
ON B.ProductID = A.ID
UNION ALL
SELECT A.ID, A.ProductName, NULL, NULL
FROM A
Since every join is going to be successful, we can switch to a full/inner join:
SELECT
*
FROM
A
INNER JOIN
(SELECT ID,ProductID,Size FROM B
UNION ALL
SELECT NULL,ID,NULL FROM A) B
ON
A.ID = B.ProductID
Now would be a very good time to switch to naming columns explicitly, rather than using SELECT *
Or, if, as per #Andomar's comment, you need all of the B columns to be NULL:
SELECT
A.ID,A.ProductName,
B.ID,B.ProductID,B.Size
FROM
A
INNER JOIN
(SELECT ID,ProductID,Size,ProductID as MatchID FROM B
UNION ALL
SELECT NULL,NULL,NULL,ID FROM A) B
ON
A.ID = B.MatchID