SQL inner join with filtering - sql

I have 2 tables as follows:
Table1:
ID Date
1 2022-01-01
2 2022-02-01
3 2022-02-05
Table2
ID Date Amount
1 2021-08-01 15
1 2022-02-10 15
2 2022-02-15 20
2 2021-01-01 15
2 2022-02-20 20
1 2022-03-01 15
I want to select the rows in Table2 such that only rows past the Date in Table1 are selected in Table2 and calculate a sum of amounts of each subset and max(date) in Table2 for each subset grouped by ID.
So the result would look like
ID Date Amount
1 2022-03-01 30
2 2022-02-20 40
SQL newbie here...I tried an inner join, but wasnt able to pass the date filter along...
Tried query:
with table1 as (select * from table1)
,table2 as (select * from table2)
select * from table1 a
inner join table2 b on (a.id=b.id)
Thanks!

Much like Paul, I would use a JOIN but I would put the clauses on the ON, so if you join to more tables, it's cleaner for the SQL optimizer to see what is the intent on a per table/join basis. I would also use aliases on tables and use the alias, so there is no room for confusion where the value is coming from, which again as a habit makes life easier when composing more complex SQL or cut'n'pasting into bigger blocks of code.
so with some CTE's for the data:
WITH table1(id, date) AS (
SELECT * FROM VALUES
(1, '2022-01-01'),
(2 , '2022-02-01'),
(3 , '2022-02-05')
), table2(id, date, amount) AS (
SELECT * FROM VALUES
(1, '2021-08-01'::date, 15),
(1, '2022-02-10'::date, 15),
(2, '2022-02-15'::date, 20),
(2, '2021-01-01'::date, 15),
(2, '2022-02-20'::date, 20),
(1, '2022-03-01'::date, 15)
)
The following SQL:
SELECT a.id,
max(b.date) as max_date,
sum(b.amount) as sum_amount
FROM table1 AS a
JOIN table2 AS b
ON a.id = b.id AND a.date <= b.date
GROUP BY 1
ORDER BY 1;
ID
MAX_DATE
SUM_AMOUNT
1
2022-03-01
30
2
2022-02-20
40

Here is how I would do this with Snowflake:
--create the tables and load data
--table1
CREATE TABLE TABLE1 (ID NUMBER, DATE DATE);
INSERT INTO TABLE1 VALUES (1, '2022-01-01');
INSERT INTO TABLE1 VALUES (2 , '2022-02-01');
INSERT INTO TABLE1 VALUES (3 , '2022-02-05');
--table 2
CREATE TABLE TABLE2 (ID NUMBER, DATE DATE, AMOUNT NUMBER);
INSERT INTO TABLE2 VALUES(1, '2021-08-01', 15);
INSERT INTO TABLE2 VALUES(1, '2022-02-10', 15);
INSERT INTO TABLE2 VALUES(2, '2022-02-15', 20);
INSERT INTO TABLE2 VALUES(2, '2021-01-01', 15);
INSERT INTO TABLE2 VALUES(2, '2022-02-20', 20);
INSERT INTO TABLE2 VALUES(1, '2022-03-01', 15);
Now obtain the data using a select
SELECT TABLE1.ID, MAX(TABLE2.DATE), SUM(AMOUNT)
FROM TABLE1, TABLE2
WHERE TABLE1.ID = TABLE2.ID
AND TABLE1.DATE < TABLE2.DATE
GROUP BY TABLE1.ID
Results
ID
MAX(TABLE2.DATE)
SUM(AMOUNT)
1
2022-03-01
30
2
2022-02-20
40

Not personally familiar with Snowflake but a standard SQL query that should work would be:
select id, Max(date) Date, Sum(Amount) Amount
from Table2 t2
where exists (
select * from Table1 t1
where t1.Id = t2.Id and t1.Date < t2.Date
)
group by Id;
Note that because you are only requiring data from Table2, an exists is preferable over an inner join and in almost all cases will be more performant than a join, at worst the same.

Related

SQL Where In clause with multiple fields

I have a table as below.
id date value
1 2011-10-01 xx
1 2011-10-02 xx
...
1000000 2011-10-01 xx
Then I have 1000 ids each associates with a date. I would like to perform something as below:
SELECT id, date, value
FROM the table
WHERE (id, date) IN ((id1, <= date1), (id2, <= date2), (id1000, <= date1000))
What's the best way to achieve above query?
You didn't specify your DBMS, so this is standard SQL.
You could do something like this:
with list_of_dates (id, dt) as (
values
(1, date '2016-01-01'),
(2, date '2016-01-02'),
(3, date '2016-01-03')
)
select
from the_table t
join list_of_dates ld on t.id = ld.id and t.the_date <= ld.dt;
This assumes that you do not have duplicates in the list of dates.
Update - now that the DBMS has been disclosed.
For SQL Server you need to change that to:
with list_of_dates (id, dt) as (
values
select 1, cast('20160101' as datetime) union all
select 2, cast('20160102' as datetime) union all
select 3, cast('20160103' as datetime)
)
select
from the_table t
join list_of_dates ld on t.id = ld.id and t.the_date <= ld.dt;
since this is info known ahead of time build a temp table of this info and then join to it
create table #test(id int, myDate date)
insert into #test(id,myDate) values
(1, '10/1/2016'),
(2, '10/2/2016'),
(3, '10/3/2016')
select a.id, a.date, a.value
from table as a
inner join
#test as b on a.id=b.id and a.date<=b.myDate

Join the best record, if there is one, in Oracle

I have a fairly complex Oracle query, getting data from multiple tables. In one of the joins, I want the best record, if there is one. Therefore, a left outer join. There is a start date field, so for most records, getting the max start date will get me the best record. However, occasionally there are records that have the same start date. In that case, there is also a status field. However, the best status value is not a min or a max. '20' is best, '05' or '40' are ok, and '70' is worst. How can I set up the query to find the best option when multiple records are returned?
So, if I have the following data
Table1 Table2
ID otherData ID date status otherData
1 stuffa 1 jan-1-13 20 stuff93
2 stuff3
3 stuff398 3 jan-2-13 20 stuff92
3 jan-2-13 70 stuff38
3 dec-3-12 20 stuff843
I will be able to query and get the following:
1 stuffa jan-1-13 20 stuff93
2 stuff3
3 stuff398 jan-2-13 20 stuff92
Right now, my query is as follows, which gets a second record 3 with the 70 status:
select *
from table1 t1
left outer join
(select *
from table2 t2a
where t2a.date = (select max(t2b.date)
from table2 t2b
where t2b.id = t2a.id)
) t2
on (t2.id = t1.id)
Is there a way to set an ordered enumeration or something like that within a select statement? Something like
rank() over ( partition by status order by ('20','05','40','70') rank
add the status to the order by like this;
select *
from (select t1.id, t1.otherdata otherdatat1, t2.date, t2.status, t2.otherdata otherdatat2,
rank() over (partition by t1.id order by t2.date desc,
case t2.status
when '20' then 1
when '05' then 2
when '40' then 3
when '70' then 4
else 5
end) rnk
from table1 t1
left outer join table2 t2
on t1.id = t2.id)
where rnk = 1;
If the ordered enumeration has few elements you can use this
........ order by
CASE status WHEN '20' THEN 1
WHEN '05' THEN 2
WHEN '40' THEN 3
WHEN '70' THEN 4
END) rank
You could do something like:
select t1.id, t1.otherdata, t2.dt, t2.status, t2.otherdata
from table1 t1
left outer join (
select t2a.*,
row_number() over (partition by id order by dt desc,
case status
when '20' then 1
when '05' then 2
when '40' then 3
when '70' then 4
else 5 end) as rn
from table2 t2a
) t2 on t2.id = t1.id and t2.rn = 1
order by t1.id;
This assumes you want a single hit even if there are two with the same status; which of the two you get is indeterminate. If you wanted both you could use rank() instead. Either way you're assigning a rank to each record based on the date (descending, since you want the max) and your own order for the status values, and then only ever picking the highest ranked in the join condition.
With data set up as:
create table table1(id number, otherdata varchar2(10));
create table table2(id number, dt date, status varchar2(2), otherdata varchar2(10));
insert into table1 values(1, 'stuffa');
insert into table1 values(2, 'stuff3');
insert into table1 values(3, 'stuff398');
insert into table2 values(1, date '2013-01-01', '20', 'stuff93');
insert into table2 values(3, date '2013-01-02', '20', 'stuff92');
insert into table2 values(3, date '2013-01-02', '70', 'stuff38');
insert into table2 values(3, date '2012-12-03', '20', 'stuff843');
... this gives:
ID OTHERDATA DT STATUS OTHERDATA
---------- ---------- --------- ------ ----------
1 stuffa 01-JAN-13 20 stuff93
2 stuff3
3 stuff398 02-JAN-13 20 stuff92

How to get the closest dates in Oracle sql

For example, I have 2 time tables:
T1
id time
1 18:12:02
2 18:46:57
3 17:49:44
4 12:19:24
5 11:00:01
6 17:12:45
and T2
id time
1 18:13:02
2 17:46:57
I need to get time from T1 that are the closest to time from T2. There is no relationship between this tables.
It should be something like this:
select T1.calldatetime
from T1, T2
where T1.calldatetime between
T2.calldatetime-(
select MIN(ABS(T2.calldatetime-T1.calldatetime))
from T2, T1)
and
T2.calldatetime+(
select MIN(ABS(T2.calldatetime-T1.calldatetime))
from T2, T1)
But I can't get it. Any suggestions?
You only have to use a single Cartesian join to solve you problem unlike the other solutions, which use multiple. I assume time is stored as a VARCHAR2. If it is stored as a date then you can remove the TO_DATE functions. If it is stored as a date (I would highly recommend this), you will have to remove the date portions
I've made it slightly verbose so it's obvious what's going on.
select *
from ( select id, tm
, rank() over ( partition by t2id order by difference asc ) as rnk
from ( select t1.*, t2.id as t2id
, abs( to_date(t1.tm, 'hh24:mi:ss')
- to_date(t2.tm, 'hh24:mi:ss')) as difference
from t1
cross join t2
) a
)
where rnk = 1
Basically, this works out the absolute difference between every time in T1 and T2 then picks the smallest difference by T2 ID; returning the data from T1.
Here it is in SQL Fiddle format.
The less pretty (but shorter) format is:
select *
from ( select t1.*
, rank() over ( partition by t2.id
order by abs(to_date(t1.tm, 'hh24:mi:ss')
- to_date(t2.tm, 'hh24:mi:ss'))
) as rnk
from t1
cross join t2
) a
where rnk = 1
I believe this is the query you are looking for:
CREATE TABLE t1(id INTEGER, time DATE);
CREATE TABLE t2(id INTEGER, time DATE);
INSERT INTO t1 VALUES (1, TO_DATE ('18:12:02', 'HH24:MI:SS'));
INSERT INTO t1 VALUES (2, TO_DATE ('18:46:57', 'HH24:MI:SS'));
INSERT INTO t1 VALUES (3, TO_DATE ('17:49:44', 'HH24:MI:SS'));
INSERT INTO t1 VALUES (4, TO_DATE ('12:19:24', 'HH24:MI:SS'));
INSERT INTO t1 VALUES (5, TO_DATE ('11:00:01', 'HH24:MI:SS'));
INSERT INTO t1 VALUES (6, TO_DATE ('17:12:45', 'HH24:MI:SS'));
INSERT INTO t2 VALUES (1, TO_DATE ('18:13:02', 'HH24:MI:SS'));
INSERT INTO t2 VALUES (2, TO_DATE ('17:46:57', 'HH24:MI:SS'));
SELECT t1.*, t2.*
FROM t1, t2,
( SELECT t2.id, MIN (ABS (t2.time - t1.time)) diff
FROM t1, t2
GROUP BY t2.id) b
WHERE ABS (t2.time - t1.time) = b.diff;
Make sure that the time columns have the same date part, because the t2.time - t1.time part won't work otherwise.
EDIT: Thanks for the accept, but Ben's answer below is better. It uses Oracle analytic functions and will perform much better.
This one here selects that row(s) from T1, which has/have the smallest distance to any in T2:
select T1.id, T1.calldatetime from T1, T2
where ABS(T2.calldatetime-T1.calldatetime)
=( select MIN(ABS(T2.calldatetime-T1.calldatetime))from T1, T2);
(tested it with mysql, hope you dont get an ORA from that)
Edit: according to the last comment, it should be like that:
drop table t1;
drop table t2;
create table t1(id int, t time);
create table t2(id int, t time);
insert into t1 values (1, '18:12:02');
insert into t1 values (2, '18:46:57');
insert into t1 values (3, '17:49:44');
insert into t1 values (4, '12:19:24');
insert into t1 values (5, '11:00:01');
insert into t1 values (6, '17:12:45');
insert into t2 values (1, '18:13:02');
insert into t2 values (2, '17:46:57');
select ot2.id, ot2.t, ot1.id, ot1.t from t2 ot2, t1 ot1
where ABS(ot2.t-ot1.t)=
(select min(abs(t2.t-t1.t)) from t1, t2 where t2.id=ot2.id)
Produces:
id t id t
1 18:13:02 1 18:12:02
2 17:46:57 3 17:49:44
Another one way of using analytic functions.
May be strange :)
select id, time,
case
when to_date(time, 'hh24:mi:ss') - to_date(lag_time, 'hh24:mi:ss') < to_date(lead_time, 'hh24:mi:ss') - to_date(time, 'hh24:mi:ss')
then lag_time
else lead_time
end closest_time
from (
select id, tbl,
LAG(time, 1, null) OVER (ORDER BY time) lag_time,
time,
LEAD(time, 1, null) OVER (ORDER BY time) lead_time
from
(
select id, time, 1 tbl from t1
union all
select id, time, 2 tbl from t2
)
)
where tbl = 2
To SQLFiddle... and beyond!
Try this query its little lengthy, I will try to optimize it
select * from t1
where id in (
select id1 from
(select id1,id2,
rank() over (partition by id2 order by diff) rnk
from
(select distinct t1.id id1,t2.id id2,
round(min(abs(to_date(t1.time,'HH24:MI:SS') - to_date(t2.time,'HH24:MI:SS'))),2) diff
from
t1,t2
group by t1.id,t2.id) )
where rnk = 1);

get certain record while sum of previous records greater than a pecentage in SQL Server

Table: FooData
ID Count
1 54
2 42
3 33
4 25
5 16
6 9
8 5
9 3
10 2
I want to fetch the record that sum of Count column is greater than a certain percentage like 90.
For this example, the sum of all the records is 189, and 90% of it is 170.1, we can see the sum from ID 1 to 6 is 179, and it is greater than 170.1 so the record ID 6 should be returned.
btw, temporary table is not allowed because I need to do it in a function.
Another version of a triangular join.
declare #T table(ID int primary key, [Count] int)
insert into #T values (1, 54), (2, 42), (3, 33),(4, 25), (5, 16), (6, 9), (8, 5), (9, 3),(10, 2)
;with R(ID, [Count], Running) as
(
select T1.ID,
T1.[Count],
cast(T3.[Count] as float)
from #T as T1
cross apply (select sum(T2.[Count])
from #T as T2
where T1.ID >= T2.ID) as T3([Count])
),
T(Total) as
(
select sum([Count])
from #T
)
select top 1 R.ID, R.[Count], R.Running
from R
inner join T
on R.Running / T.Total > 0.9
order by R.ID
Try this:
SELECT TOP 1
t2.id,
SUM(t1.value) AS runningTotal
FROM FooData t1
INNER JOIN FooData t2
ON t1.id <= t2.id
GROUP BY t2.id
HAVING SUM(t1.value) * 100. /
(SELECT SUM(value) FROM #FooData) > 90
ORDER BY SUM(t1.value)
But also be aware of the potential performance issue with triangular joins in running totals.

How do I Populate a 2-Column table with Unrelated Data from 2 Different Sources?

I have 2 tables, each with an identity column. What I want to do is populate a new 2-column table with those identities so that it results in a pairing of the identities.
Now, I am perfectly able to populate one column of my new table with the identities from one of the tables, but can't get the identities from the other table into my new table. If this isn't the best 1st step to take though, please let me know.
Thank you
You may want to try something like the following:
INSERT INTO t3 (id, value_1, value_2)
SELECT t1.id, t1.value, t2.value
FROM t1
JOIN t2 ON (t2.id = t1.id);
Test case (MySQL):
CREATE TABLE t1 (id int, value int);
CREATE TABLE t2 (id int, value int);
CREATE TABLE t3 (id int, value_1 int, value_2 int);
INSERT INTO t1 VALUES (1, 100);
INSERT INTO t1 VALUES (2, 200);
INSERT INTO t1 VALUES (3, 300);
INSERT INTO t2 VALUES (1, 10);
INSERT INTO t2 VALUES (2, 20);
INSERT INTO t2 VALUES (3, 30);
Result:
SELECT * FROM t3;
+------+---------+---------+
| id | value_1 | value_2 |
+------+---------+---------+
| 1 | 100 | 10 |
| 2 | 200 | 20 |
| 3 | 300 | 30 |
+------+---------+---------+
3 rows in set (0.00 sec)
You can populate a table with the INSERT...SELECT syntax, and the SELECT can be the result of a join between two (or more) tables.
INSERT INTO NewTable (col1, col2)
SELECT a.col1, b.col2
FROM a JOIN b ON ...conditions...;
So if you can express the pairing as a SELECT, you can insert it into your table.
If the two tables are unrelated and there's no way to express the pairing, then you're asking how to make a non-relational data store, and there are no relational rules for that.
An option would be to create a counter for each of the columns that would operate as a unique identifier and then join on the counter.
For SQL Server this would work:
SELECT one.column1, two.column2
FROM (SELECT RANK() OVER (ORDER BY column1) AS id,
column1
FROM table1) one
LEFT JOIN (SELECT RANK() OVER (ORDER BY column2) AS id,
column2
FROM table2) two ON one.id = two.id