Summing columns in two tables then joining tables - sql

I have two tables set up in the same way, but with different values. Here are samples from each:
Table1:
Date Code Count
1/1/2015 AA 4
1/3/2015 AA 2
1/1/2015 AB 3
Table2:
Date Code Count
1/1/2015 AA 1
1/2/2015 AA 0
1/4/2015 AB 2
I would like the result table to contain all unique date-code pairs, with any duplicates between the tables having the counts of the two summed.
Output_Table:
Date Code Count
1/1/2015 AA 5 /*Summed because found in Table1 and Table2*/
1/2/2015 AA 0
1/3/2015 AA 2
1/1/2015 AB 3
1/4/2015 AB 2
I have no primary key to connect the two tables, and the joins I have tried have either not kept all distinct date-code pairs or have created duplicates.
For reference, I am doing this inside a SAS proc sql statement.

I'm on the road at the moment so I haven't run this, but:
SELECT date ,
code ,
SUM([count])
FROM ( SELECT *
FROM table1
UNION ALL
SELECT *
FROM table2
) [tables]
GROUP BY date ,
code
ORDER BY date ,
code
Will do the trick. I'll have a crack at the join version and edit this post when I get in front of a proper computer
EDIT:
Full outer joins and COALESCE will also do it, although is marginally slower, so It may depend on what else you have going on there!
SELECT COALESCE(#table1.date, #table2.date) ,
COALESCE(#table1.code, #table2.code) ,
ISNULL(#table1.COUNT, 0) + ISNULL(#table2.COUNT, 0)
FROM #table1
FULL OUTER JOIN #table2 ON #table2.code = #table1.code
AND #table2.date = #table1.date
ORDER BY COALESCE(#table1.date, #table2.date) ,
COALESCE(#table1.code, #table2.code)

Related

Selecting all rows from t1 join t2 on id and select the lowest value from t2 or null if not any

I have two tables, one holds some categories and the other holds players' records like so:
Categories Times
id Name id UserId MapId CategoryId Time
1 cat1 1 1 1 1 1500
2 cat2 2 3 1 2 3000
3 cat3 3 13 1 3 2500
4 cat4 4 12 1 4 1500
5 cat5 5 11 1 4 1000
I want to select all the categories (id, name) and the lowest time on each category.
If there's no record on that category it should show NULL or 0.
This would be the expected result:
Result
id Name Time
1 cat1 1500
2 cat2 3000
3 cat3 2500
4 cat4 1000
5 cat5 0
I'm using the following query, but it only selects the categories that already have a record in Times.
For example, if I use the following query it'll not select 'cat5' because it doesn't have any record in table Times.
select t2.id, t2.Name, min(t1.Time) as Time
from Times t1
join Categories t2 on t2.id = t1.CategoryId
where t1.MapId = %MAPID%
group by t2.id
I recommend to begin your query with the table "categories" in this case since your focus is on the data from this table. So you could write a left join. Furthermore, I think it's a good idea to replace null values by zero, thus your query would as example find negative times as the lowest times and return 0 if the lowest time is a null value.
Overall, this could be your goal:
SELECT c.id, c.name, MIN(COALESCE(t.time,0)) AS time
FROM categories c LEFT JOIN times t ON c.id = t.categoryid
GROUP BY c.id, c.name;
Here is a working example according to your sample data: db<>fiddle
There are likely also other options to achieve your goal, you can just try out.
I think you might just need to do a right join (because you want all rows from the 2nd table listed -- Categories). See if you get the desired results by changing line 3 to be:
right join Categories t2 on t2.id = t1.CategoryId

How to retrieve historical data based on condition on one row?

I have a table historical_data
ID
Date
column_a
column_b
1
2011-10-01
a
a1
1
2011-11-01
w
w1
1
2011-09-01
a
a1
2
2011-01-12
q
q1
2
2011-02-01
d
d1
3
2011-11-01
s
s1
I need to retrieve the whole history of an id based on the date condition on any 1 row related to that ID.
date>='2011-11-01' should get me
ID
Date
column_a
column_b
1
2011-10-01
a
a1
1
2011-11-01
w
w1
1
2011-09-01
a
a1
3
2011-11-01
s
s1
I am aware you can get this by using a CTE or a subquery like
with selected_id as (
select id from historical_data where date>='2011-11-01'
)
select hd.* from historical_data hd
inner join selected_id si on hd.id = si.id
or
select * from historical_data
where id in (select id from historical_data where date>='2011-11-01')
In both these methods I have to query/scan the table ``historical_data``` twice.
I have indexes on both id and date so it's not a problem right now, but as the table grows this may cause issues.
The table above is a sample table, the table I have is about to touch 1TB in size with upwards of 600M rows.
Is there any way to achieve this by only querying the table once? (I am using Snowflake)
Using QUALIFY:
SELECT *
FROM historical_data
QUALIFY MAX(date) OVER(PARTITION BY id) >= '2011-11-01'::DATE;

Getting the most recent data from a dataset based on a date field

This seems easy, but I can't get it. Assume this dataset:
ID SID AddDate
1 123 1/1/2014
2 123 2/3/2015
3 123 1/4/2010
4 124
5 124
6 125 2/3/2012
7 126 2/2/2012
8 126 2/2/2011
9 126 2/2/2011
What I need is the most recent AddDate and the associated ID for each SID.
So, my dataset should return IDs 2, 5, 6 and 7
I tried doing a max(AddDate), but it won't give me the proper ID that's associated with it.
My SQL string:
SELECT First(Table1.ID) AS FirstOfID, Table1.SID, Max(Table1.AddDate) AS MaxOfAddDate
FROM Table1
GROUP BY Table1.SID;
You can use a subquery that returns the Maximum add date for each Sid, then you can join back this subquery to the dataset table:
SELECT
MAX(id)
FROM
ds INNER JOIN (
SELECT Sid, Max(AddDate) AS MaxAddDate
FROM ds
GROUP BY ds.Sid) mx
ON ds.Sid = mx.Sid AND (ds.AddDate=mx.MaxAddDate or MaxAddDate IS NULL)
GROUP BY
ds.Sid
the join still has to succeed if the MaxAddDate is NULL (there's no AddDate), and in case there are multiple ID that matches, it looks like you want the biggest one.
You can change your query to get the grouping first and then perform a JOIN like
SELECT First(Table1.ID) AS FirstOfID,
Table1.SID, xx.MaxOfAddDate
FROM Table1 JOIN (
SELECT ID, Max(AddDate) AS MaxOfAddDate
FROM Table1
GROUP BY SID) xx ON Table1.ID = xx.ID;
Try
select SID from table1
where addDate in (select max(addDate) from Table1)

merging two tables and adding additional column

I am using sql-server. I have two tables (simple snap shot below).
table hlds table bench
name country wgt name country wgt
abc us 30 abc us 40
mno uk 50 ppp fr 45
xyz us 20 xyz us 15
what I would like to do is calculate the differnces in the wgt columns and insert the results into another table, lets call it merge_tbl. The other thing I would like to do is in merge_tbl have a bit column where it is 1 if the company exists in the table hlds.
So I would like the result to look like below,
merge_tbl
name country wgt inHld
abc us -10 1
mno uk 50 1
xzy us 5 1
ppp fr -45 0
How do I go about doing this?
I think you need a FULL OUTER JOIN to get records from both tables. Then, you can use a INSERT INTO SELECT statement to do the insert:
INSERT INTO merge_tbl
SELECT COALESCE(h.name, b.name) AS name,
COALESCE(h.country, b.country) AS country,
COALESCE(h.wgt, 0) - COALESCE(b.wgt, 0) AS wgt,
CASE WHEN h.name IS NOT NULL THEN 1
ELSE 0
END AS inHld
FROM hlds AS h
FULL OUTER JOIN bench AS b ON h.name = b.name AND h.country = b.country
The ON clause of the JOIN operation depends on your actual requirements. I have made the assumption that records from hlds, bench tables match if both name and country fields are equal.
Demo here

Merge two tables with summing up values

I have two tables,
Table1:
Key, Value
----------
1 10
2 20
3 30
Table2:
Key, Value
----------
3 30
4 40
5 50
How can merge them and get the following using SQL?
Desired output:
Key, Value
----------
1 10
2 20
3 60
4 40
5 50
In Python terminology, I guess it is called summing up two dictionaries.
PS: I am using Oracle 12c.
Use Union all to combine two select queries and sum the value with group by key column
select key,sum(value) value
from
(
select Key, Value from table1
union all
select Key, Value from table2
) a
group by key
or use Full outer Join
SELECT COALESCE(a.Key, b.key),
COALESCE(a.Value, 0) + COALESCE(b.value, 0)
FROM table1 a
FULL OUTER JOIN Table2 b
ON a.Key = b.Key
For more efficient in terms of speed, as much as possible avoid using UNION/UNION ALL. Try my answer below:
SELECT
NVL(A.Key,B.Key)[Key],
(NVL(A.Value,0) + NVL(B.Value,0))[Value]
FROM Table1 A
FULL JOIN Table2 B ON A.Key=B.Key