SQL Inner join with sum and null value - sql

The table below is an extract of a larger set of data
In my scenario Column 2 is null when is the "parent" record (Column 1 = AB1 and Column 2 is NULL) and as you can see the following 2 "child" records under Column 2 have AB1 as identifier which matches the AB1 from Column 1, what I want to do is to sum the values on Column 3 when Column 2 has the same identifier (AB1), up to this point the sum = 29 (for this case I can do a SUM and group by AB1). My issue arises when I need to add the value of 10 in Column 3 when column 2 is NULL and Column 1 is AB1 (parent identifier). The common identifier is AB1 but for the parent record the identifier is in Column 1 instead of Column 2. I need a SQL that return a total sum of 39.
Edit:
Thanks for the prompt responses, my apologies I think my question was not clear enough. I am using MS SQL Server Management Studio
The goal for the query to sum the amounts on Column 3 by grouping by the records on Column 2 that have the same identifier (AB1) and then find that same identifier on Column 1 (AB1) and also add that value to the total sum.
The query below is doing the group by Column 2 correctly because for example if I have 10 records with the identifier AB1 it is returning one row with the sum of the amounts on Column 3, the issue is that I also need to add to that sum when the identifier AB1 is also in Column 1.
select t1.Column1 , round(sum (t1.Column3),2) as Total from table t1, table t2 where
and t1.Column2 = t2. Column1 group by t1. Column2
Basically this table stores transactions and the initial transaction “parent” is in Column 1 (AB1) and all other transactions “children” linked to the parent transaction have that identifier (AB1) but in Column 2. Column 1 is a unique identifier and does not repeat and then is the “parent” transaction it is NULL on Column 2 but that identifier (AB1) can be repeated multiple times in Column 2 depending all the “children” transactions that are linked to the “parent”.

Oracle
The WITH clause is here just to generate sample data and, as such, it is not the part of the answer.
I don't know what is the expected result, but the Totals could be calculated using Union All (without Inner Join)
WITH
tbl AS
(
Select 'AB1' "COL_1", Null "COL_2", 10 "COL_3" From Dual Union All
Select 'CD2' "COL_1", 'AB1' "COL_2", 15 "COL_3" From Dual Union All
Select 'EF3' "COL_1", 'AB1' "COL_2", 14 "COL_3" From Dual
)
SELECT
ID, Sum(TOTAL) "TOTAL"
FROM
(
SELECT COL_1 "ID", Sum(COL_3) "TOTAL" FROM tbl GROUP BY COL_1 UNION ALL
SELECT COL_2 "ID", Sum(COL_3) "TOTAL" FROM tbl GROUP BY COL_2
)
WHERE ID Is Not Null
GROUP BY ID
ORDER BY ID
--
-- R e s u l t
--
ID TOTAL
--- ----------
AB1 39
CD2 15
EF3 14
It is a Sum() Group By aggregation, but the same result gives Sum() analytic function with DISTINCT keyword.
SELECT DISTINCT
ID, Sum(TOTAL) OVER(PARTITION BY ID ORDER BY ID) "TOTAL"
FROM
(
SELECT COL_1 "ID", Sum(COL_3) "TOTAL" FROM tbl GROUP BY COL_1 UNION ALL
SELECT COL_2 "ID", Sum(COL_3) "TOTAL" FROM tbl GROUP BY COL_2
)
WHERE ID Is Not Null
--
-- R e s u l t
--
ID TOTAL
--- ----------
AB1 39
CD2 15
EF3 14
And if you need Inner Join then the answer is below. Note that there is only ID which actually has children. That is because of the Inner Join. Regards...
SELECT
t1.COL_1 "ID",
Max(t1.COL_3) + Sum(t2.COL_3) "TOTAL"
FROM
tbl t1
INNER JOIN
tbl t2 ON (t2.COL_2 = t1.COL_1)
GROUP BY t1.COL_1
ORDER BY t1.COL_1
--
-- R e s u l t
--
ID TOTAL
--- ----------
AB1 39

select sum(Column3)
from TheTable
where 'AB1' in (Column1, Column2);
will sum the value of Column3 for the parent (Column1 = 'AB1') and the children (Column2 = 'AB1').
If the parent-child hierarchy has more than two levels, and you want to sum Column3 for grandchildren, grand-grandchildren, and so on, you can use a hierarchical query (also known as a recursive query). The exact syntax depends on your database, this is for PostgreSQL:
with recursive Hier(Column1, Column2, Column3) as
(
select Column1, Column2, Column3
from TheTable
where Column1 = 'AB1'
union all
select t.Column1, t.Column2, t.Column3
from TheTable t
join Hier h on t.Column2 = h.Column1
)
select sum(Column3)
from Hier;

You can spilt the two sets of data then union them together. From there it will be a simple sum group by.
To do this we simply saying
take Column1 as the Parent if Column2 IS null
take Column2 as the Parent if Column2 IS not null
Select Column1 as Parent, Column3
from TheTable
where Column2 IS null
Union
Select Column2 as Parent, Column3
from TheTable
where Column2 IS not null
From there you can use this as a cte
WITH data AS
(
Select Column1 as Parent, Column3
from TheTable
where Column2 IS null
Union
Select Column2 as Parent, Column3
from TheTable
where Column2 IS not null)
Select Parent, Sum(Column3)
from Data
Group by Parent
Result will be
Parent SumColumn3
AB1 39

Related

Delete oldest entries with two duplicate columns from a table - SQL

SELECT column1, column2, count(*) as duplicate
FROM table
GROUP BY column1, column2 HAVING count(*)> 1 ;
ID column1 column2 timestamp
abc 123 1 2020-02-03 19:36:27
xyz 123 1 2020-02-02 15:36:27
column1 and column2 is a unique combination with duplicate entry.
The above queries gives the entries that have duplicates. We want to delete the oldest entries based on another column timestamp
One method is:
delete from t
where t.timestamp > (select min(t2.timestamp)
from t t2
where t2.column1 = t.column1 and t2.column2 = t.column2
);
DELETE
FROM table a
JOIN (
SELECT id, row_number() OVER (PARTITION BY column1, column2 ORDER BY timestamp DESC) AS rownum
FROM table ) b
ON a.id = b.id
WHERE rownum > 1
You can use row_number function to get an ordered ranking of the results. Partitioning by column1 and column2 will restart the row number at each change in those values. Ordering by your timestamp descending will start your count with the newest record, so deleting anything where a rownum > 1 would keep only the newest record. If you needed something like a top 3, you would simply change the rownum > from 1 to 3.

sql - getting sum of same column from multiple tables

I have a few tables in my DB. Let's call them table1, table2, table3.
All of them have a column named value.
I need to create a query that will return a single number, where this number is the sum of all the value columns from all the tables together...
I've tried the following way:
SELECT (SELECT SUM(value) FROM table1) + (SELECT SUM(value) FROM table2) + (SELECT SUM(value) FROM table3) as total_sum
But when at least one of the inner SUM is NULL, the entire total value (total_sum here) is NULL, so that's not very trustworthy.
When there is no value in a certain inner SUM query, I need it to return 0, so it doesn't affect the rest of the SUM.
To make it more clear, let's say I have the following 2 tables:
TABLE1:
ID | NAME | VALUE
1 Name1 1000
2 Name2 2000
3 Name3 3000
TABLE2:
ID | NAME | VALUE
1 Name1 1500
2 Name2 2500
3 Name3 3500
Eventually, the query I need will return a single value - 13500, which is the total sum of all the values under the VALUE column of all the tables here.
All the other columns have no meaning for the needed query, and I even don't care much for performance in this case.
You can achieve it using Coalesce as follows
SELECT
(SELECT coalesce(SUM(value),0) FROM table1) +
(SELECT coalesce(SUM(value),0) FROM table2) +
(SELECT coalesce(SUM(value),0) FROM table3) as total_sum
Another approach is to use union all to merge all values into single table
select distinct coalesce(sum(a.value), 0) as total_sum from
(select value from table1
union all
select value from table 2
union all
select value from table 3) a;
You can use the ISNULL function to take care of the NULLs.
SELECT ISNULL((
SELECT SUM(value) FROM table1
)
, 0
) + ISNULL((
SELECT SUM(value) FROM table2
)
, 0
) + ISNULL((
SELECT SUM(value) FROM table3
)
, 0
) AS total_sum;
You could simply sum all of them:
select sum(total) as Total
from (
select sum(value) as total from Table1
union all
select sum(value) as total from Table2
union all
select sum(value) as total from Table3
) t;

Recursive Lag Column Calculation in SQL

I am trying to write a procedure that inserts calculated table data into another table.
The problem I have is that I need each row's calculated column to be influenced by the result of the previous row's calculated column. I tried to lag the calculation itself but this does not work!
Such as:
(Max is a function I created that returns the highest of two values)
Id Product Model Column1 Column2
1 A 1 5 =MAX(Column1*2, Lag(Column2))
2 A 2 2 =MAX(Column1*2, Lag(Column2))
3 B 1 3 =MAX(Column1*2, Lag(Column2))
If I try the above in SQL:
SELECT
Column1,
MyMAX(Column1,LAG(Column2, 1, 0) OVER (PARTITION BY Product ORDER BY Model ASC) As Column2
FROM Source
...it says column2 is unknown.
Output I get if I LAG the Column2 calculation:
Select Column1, MyMAX(Column1,LAG(Column1*2, 1, 0) OVER (PARTITION BY Product ORDER BY Model ASC) As Column2
Id Column1 Column2
1 5 10
2 2 10
3 3 6
Why 6 on row 3? Because 3*2 > 2*2.
Output that I want:
Id Column1 Column2
1 5 10
2 2 10
3 3 10
Why 10 on row 3? Because previous result of 10 > 3*2
The problem is I can't lag the result of Column2 - I can only lag other columns or calculations of them!
Is there a technique of achieving this with LAG or must I use Recursive CTE? I read that LAG succeeds CTE so I assumed it would be possible. If not, what would this 'CTE' look like?
Edit: Or alternatively - what else could I do to resolve this calculation?
Edit
In hindsight, this problem is a running partitioned maximum over Column1 * 2. It can be done as simply as
SELECT Id, Column1, Model, Product,
MAX(Column1 * 2) OVER (Partition BY Model, Product Order BY ID ASC) AS Column2
FROM Table1;
Fiddle
Original Answer
Here's a way to do this with a recursive CTE, without LAG at all, by joining on incrementing row numbers. I haven't assumed that your Id is contiguous, hence have added an additional ROW_NUMBER(). You haven't mentioned any partitioning, so haven't applied same. The query simply starts at the first row, and then projects the greater of the current Column1 * 2, or the preceding Column2
WITH IncrementingRowNums AS
(
SELECT Id, Column1, Column1 * 2 AS Column2,
ROW_NUMBER() OVER (Order BY ID ASC) AS RowNum
FROM Table1
),
lagged AS
(
SELECT Id, Column1, Column2, RowNum
FROM IncrementingRowNums
WHERE RowNum = 1
UNION ALL
SELECT i.Id, i.Column1,
CASE WHEN (i.Column2 > l.Column2)
THEN i.Column2
ELSE l.Column2
END,
i.RowNum
FROM IncrementingRowNums i
INNER JOIN lagged l
ON i.RowNum = l.RowNum + 1
)
SELECT Id, Column1, Column2
FROM lagged;
SqlFiddle here
Edit, Re Partitions
Partitioning is much the same, by just dragging the Model + Product columns through, then partitioning by these in the row numbering (i.e. starting back at 1 each time the Product or Model resets), including these in the CTE JOIN condition and also in the final ordering.
WITH IncrementingRowNums AS
(
SELECT Id, Column1, Column1 * 2 AS Column2, Model, Product,
ROW_NUMBER() OVER (Partition BY Model, Product Order BY ID ASC) AS RowNum
FROM Table1
),
lagged AS
(
SELECT Id, Column1, Column2, Model, Product, RowNum
FROM IncrementingRowNums
WHERE RowNum = 1
UNION ALL
SELECT i.Id, i.Column1,
CASE WHEN (i.Column2 > l.Column2)
THEN i.Column2
ELSE l.Column2
END,
i.Model, i.Product,
i.RowNum
FROM IncrementingRowNums i
INNER JOIN lagged l
ON i.RowNum = l.RowNum + 1
AND i.Model = l.Model AND i.Product = l.Product
)
SELECT Id, Column1, Column2, Model, Product
FROM lagged
ORDER BY Model, Product, Id;
Updated Fiddle

GROUP BY without aggregate function

I am trying to understand GROUP BY (new to oracle dbms) without aggregate function.
How does it operate?
Here is what i have tried.
EMP table on which i will run my SQL.
SELECT ename , sal
FROM emp
GROUP BY ename , sal
SELECT ename , sal
FROM emp
GROUP BY ename;
Result
ORA-00979: not a GROUP BY expression
00979. 00000 - "not a GROUP BY expression"
*Cause:
*Action:
Error at Line: 397 Column: 16
SELECT ename , sal
FROM emp
GROUP BY sal;
Result
ORA-00979: not a GROUP BY expression
00979. 00000 - "not a GROUP BY expression"
*Cause:
*Action: Error at Line: 411 Column: 8
SELECT empno , ename , sal
FROM emp
GROUP BY sal , ename;
Result
ORA-00979: not a GROUP BY expression
00979. 00000 - "not a GROUP BY expression"
*Cause:
*Action: Error at Line: 425 Column: 8
SELECT empno , ename , sal
FROM emp
GROUP BY empno , ename , sal;
So, basically the number of columns have to be equal to the number of columns in the GROUP BY clause, but i still do not understand why or what is going on.
That's how GROUP BY works. It takes several rows and turns them into one row. Because of this, it has to know what to do with all the combined rows where there have different values for some columns (fields). This is why you have two options for every field you want to SELECT : Either include it in the GROUP BY clause, or use it in an aggregate function so the system knows how you want to combine the field.
For example, let's say you have this table:
Name | OrderNumber
------------------
John | 1
John | 2
If you say GROUP BY Name, how will it know which OrderNumber to show in the result? So you either include OrderNumber in group by, which will result in these two rows. Or, you use an aggregate function to show how to handle the OrderNumbers. For example, MAX(OrderNumber), which means the result is John | 2 or SUM(OrderNumber) which means the result is John | 3.
Given this data:
Col1 Col2 Col3
A X 1
A Y 2
A Y 3
B X 0
B Y 3
B Z 1
This query:
SELECT Col1, Col2, Col3 FROM data GROUP BY Col1, Col2, Col3
Would result in exactly the same table.
However, this query:
SELECT Col1, Col2 FROM data GROUP BY Col1, Col2
Would result in:
Col1 Col2
A X
A Y
B X
B Y
B Z
Now, a query:
SELECT Col1, Col2, Col3 FROM data GROUP BY Col1, Col2
Would create a problem: the line with A, Y is the result of grouping the two lines
A Y 2
A Y 3
So, which value should be in Col3, '2' or '3'?
Normally you would use a GROUP BY to calculate e.g. a sum:
SELECT Col1, Col2, SUM(Col3) FROM data GROUP BY Col1, Col2
So in the line, we had a problem with we now get (2+3) = 5.
Grouping by all your columns in your select is effectively the same as using DISTINCT, and it is preferable to use the DISTINCT keyword word readability in this case.
So instead of
SELECT Col1, Col2, Col3 FROM data GROUP BY Col1, Col2, Col3
use
SELECT DISTINCT Col1, Col2, Col3 FROM data
You're experiencing a strict requirement of the GROUP BY clause. Every column not in the group-by clause must have a function applied to reduce all records for the matching "group" to a single record (sum, max, min, etc).
If you list all queried (selected) columns in the GROUP BY clause, you are essentially requesting that duplicate records be excluded from the result set. That gives the same effect as SELECT DISTINCT which also eliminates duplicate rows from the result set.
The only real use case for GROUP BY without aggregation is when you GROUP BY more columns than are selected, in which case the selected columns might be repeated. Otherwise you might as well use a DISTINCT.
It's worth noting that other RDBMS's do not require that all non-aggregated columns be included in the GROUP BY. For example in PostgreSQL if the primary key columns of a table are included in the GROUP BY then other columns of that table need not be as they are guaranteed to be distinct for every distinct primary key column. I've wished in the past that Oracle did the same as it would have made for more compact SQL in many cases.
Let me give some examples.
Consider this data.
CREATE TABLE DATASET ( VAL1 CHAR ( 1 CHAR ),
VAL2 VARCHAR2 ( 10 CHAR ),
VAL3 NUMBER );
INSERT INTO
DATASET ( VAL1, VAL2, VAL3 )
VALUES
( 'b', 'b-details', 2 );
INSERT INTO
DATASET ( VAL1, VAL2, VAL3 )
VALUES
( 'a', 'a-details', 1 );
INSERT INTO
DATASET ( VAL1, VAL2, VAL3 )
VALUES
( 'c', 'c-details', 3 );
INSERT INTO
DATASET ( VAL1, VAL2, VAL3 )
VALUES
( 'a', 'dup', 4 );
INSERT INTO
DATASET ( VAL1, VAL2, VAL3 )
VALUES
( 'c', 'c-details', 5 );
COMMIT;
Whats there in table now
SELECT * FROM DATASET;
VAL1 VAL2 VAL3
---- ---------- ----------
b b-details 2
a a-details 1
c c-details 3
a dup 4
c c-details 5
5 rows selected.
--aggregate with group by
SELECT
VAL1,
COUNT ( * )
FROM
DATASET A
GROUP BY
VAL1;
VAL1 COUNT(*)
---- ----------
b 1
a 2
c 2
3 rows selected.
--aggregate with group by multiple columns but select partial column
SELECT
VAL1,
COUNT ( * )
FROM
DATASET A
GROUP BY
VAL1,
VAL2;
VAL1
----
b
c
a
a
4 rows selected.
--No aggregate with group by multiple columns
SELECT
VAL1,
VAL2
FROM
DATASET A
GROUP BY
VAL1,
VAL2;
VAL1
----
b b-details
c c-details
a dup
a a-details
4 rows selected.
--No aggregate with group by multiple columns
SELECT
VAL1
FROM
DATASET A
GROUP BY
VAL1,
VAL2;
VAL1
----
b
c
a
a
4 rows selected.
You have N columns in select (excluding aggregations), then you should have N or N+x columns
Use sub query e.g:
SELECT field1,field2,(SELECT distinct field3 FROM tbl2 WHERE criteria) AS field3
FROM tbl1 GROUP BY field1,field2
OR
SELECT DISTINCT field1,field2,(SELECT distinct field3 FROM tbl2 WHERE criteria) AS field3
FROM tbl1
If you have some column in SELECT clause , how will it select it if there is several rows ? so yes , every column in SELECT clause should be in GROUP BY clause also , you can use aggregate functions in SELECT ...
you can have column in GROUP BY clause which is not in SELECT clause , but not otherwise
As an addition
basically the number of columns have to be equal to the number of columns in the GROUP BY clause
is not a correct statement.
Any attribute which is not a part of GROUP BY clause can not be used for selection
Any attribute which is a part of GROUP BY clause can be used for selection but not mandatory.
For anyone trying to group data (from foreign tables as an example) like a json object with nested arrays of data you can achieve this in sql with array_agg (you can also use this in conjunction with json_build_object to create a json object with key-value pairs).
As a refference, I found helpful this video on yt: https://www.youtube.com/watch?v=A6N1h9mcJf4
-- Edit
If you want to have a nested array inside a nested array, you could do it by using array.
In the following example, 'variation_images' (subquery 2 - in relation to the variation table) are nested under the 'variation' query (subquery 1 - in relation to product table) which is nested under the product query (main query):
SELECT product.title, product.slug, product.description,
ARRAY(SELECT jsonb_build_object(
'var_id', variation.id, 'var_name', variation.name, 'images',
ARRAY(SELECT json_build_object('img_url', variation_images.images)
FROM variation_images WHERE variation_images.variation_id = variation.id)
)
FROM variation WHERE variation.product_id = product.id)
FROM product
I know you said you want to understand group by if you have data like this:
COL-A COL-B COL-C COL-D
1 Ac C1 D1
2 Bd C2 D2
3 Ba C1 D3
4 Ab C1 D4
5 C C2 D5
And you want to make the data appear like:
COL-A COL-B COL-C COL-D
4 Ab C1 D4
1 Ac C1 D1
3 Ba C1 D3
2 Bd C2 D2
5 C C2 D5
You use:
select * from table_name
order by col-c,colb
Because I think this is what you intend to do.

How to select distinct rows with a specified condition

Suppose there is a table
_ _
a 1
a 2
b 2
c 3
c 4
c 1
d 2
e 5
e 6
How can I select distinct minimum value of all the rows of each group?
So the expected result here is:
_ _
a 1
b 2
c 1
d 2
e 5
EDIT
My actual table contains more columns and I want to select them all. The rows differ only in the last column (the second one in the example). I'm new to SQL and possibly my question is ill-formed in it initial view.
The actual schema is:
| day | currency ('EUR', 'USD') | diff (integer) | id (foreign key) |
The are duplicate pairs (day, currency) that differ by (diff, id). I want to see a table with uniquer pairs (day, currency) with a minimum diff from the original table.
Thanks!
in your case it's as simple as this:
select column1, min(column2) as column2
from table
group by column1
for more than two columns I can suggest this:
select top 1 with ties
t.column1, t.column2, t.column3
from table as t
order by row_number() over (partition by t.column1 order by t.column2)
take a look at this post https://stackoverflow.com/a/13652861/1744834
You can use the ranking function ROW_NUMBER() to do this with a CTE. Especially, if there are more column other than these two column, it will give the distict values like so:
;WITH RankedCTE
AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY column1 ORDER BY Colmn2 ) rownum
FROM Table
)
SELECT column1, column2
FROM RankedCTE
WHERE rownum = 1;
This will give you:
COLUMN1 COLUMN2
a 1
b 2
c 1
d 2
e 5
SQL Fiddle Demo
SELECT ColOne, Min(ColTwo)
FROM Table
GROUP BY ColOne
ORDER BY ColOne
PS: not front of a,machine, but give above a try please.
select MIN(col2),col1
from dbo.Table_1
group by col1