How to retrieve historical data based on condition on one row? - sql

I have a table historical_data
ID
Date
column_a
column_b
1
2011-10-01
a
a1
1
2011-11-01
w
w1
1
2011-09-01
a
a1
2
2011-01-12
q
q1
2
2011-02-01
d
d1
3
2011-11-01
s
s1
I need to retrieve the whole history of an id based on the date condition on any 1 row related to that ID.
date>='2011-11-01' should get me
ID
Date
column_a
column_b
1
2011-10-01
a
a1
1
2011-11-01
w
w1
1
2011-09-01
a
a1
3
2011-11-01
s
s1
I am aware you can get this by using a CTE or a subquery like
with selected_id as (
select id from historical_data where date>='2011-11-01'
)
select hd.* from historical_data hd
inner join selected_id si on hd.id = si.id
or
select * from historical_data
where id in (select id from historical_data where date>='2011-11-01')
In both these methods I have to query/scan the table ``historical_data``` twice.
I have indexes on both id and date so it's not a problem right now, but as the table grows this may cause issues.
The table above is a sample table, the table I have is about to touch 1TB in size with upwards of 600M rows.
Is there any way to achieve this by only querying the table once? (I am using Snowflake)

Using QUALIFY:
SELECT *
FROM historical_data
QUALIFY MAX(date) OVER(PARTITION BY id) >= '2011-11-01'::DATE;

Related

Select count of total records and also distinct records

I have a table such as this:
PalmId | UserId | CreatedDate
1 | 1 | 2018-03-08 14:18:27.077
1 | 2 | 2018-03-08 14:18:27.077
1 | 3 | 2018-03-08 14:18:27.077
1 | 1 | 2018-03-08 14:18:27.077
I wish to know how many dates were created for Palm 1 and I also wish to know how many users have created those dates for Palm 1. So the outcome for first is 4 and outcome for second is 3
I am wondering if I can do that in a single query as oppose to having to do a subquery and a join on itself as in example below.
SELECT MT.[PalmId], COUNT(*) AS TotalDates, T1.[TotalUsers]
FROM [MyTable] MT
LEFT OUTER JOIN (
SELECT MT2.[PalmId], COUNT(*) AS TotalUsers
FROM [MyTable] MT2
GROUP BY MT2.[UserId]
) T1 ON T1.[PalmId] = MT.[PalmId]
GROUP BY MT.[PalmId], T1.[TotalUsers]
According to first table you could do something like this:
select count(distinct uerid) as N_Users,
count(created_date) as created_date, -- if you use count(*) you consider also rows with 'NULL'
palmid
from your_table
group by palmid
If you want "4" and "3", then I think you want:
SELECT MT.PalmId, COUNT(*) AS NumRows, COUNT(DISTINCT mt.UserId) as NumUsers
FROM MyTable MT
GROUP BY MT.PalmId

find all rows after the recent update using oracle

I tried below query to bring all rows after last Action="UNLOCKED", but ORDER BY is not allowed in subquery it seems.
SELECT *
FROM TABLE
WHERE id >= (SELECT MAX(id)
FROM TABLE
WHERE ACTION='UNLOCKED' AND action_id=123
ORDER BY CREATE_DATE DESC);
Sample data
Id action_id Action ... CREATE_DATE
1 123 ADD 03/18/2018
2 123 Unlocked 03/19/2018
3 123 Updated1 03/19/2018
4 123 Updated2 03/19/2018
5 123 Unlocked 03/20/2018
6 123 Updated3 03/20/2018
7 123 Updated4 03/20/2018
Output should be rows with id 5,6,7. What should i use to get this output
you could use an inner join on subselect for max create_date
select * from TABLE
INNER JOIN (
select max(CREATE_DATE) max_date
from TABLE
where Action = 'Unlocked' ) T on t.max_date = TABLE.CREATE_DATE
You need not order the inner query because it will return only one value. You can do it as follows
SELECT * FROM TABLE WHERE id >= (select max(id) from TABLE where ACTION='UNLOCKED' and action_id=123);

How do I join two tables in SQL limiting the second table data depending on the first one?

I have two tables looking something like this:
IMP DATE CAT
A 03/03/2016 1
B 04/04/2016 1
C 09/09/2016 2
D 01/01/2017 1
E 02/02/2017 1
F 03/03/2017 2
G 04/04/2017 2
===================
EXP DATE CAT
H 01/01/2016 1
I 05/05/2016 1
J 07/07/2016 2
K 11/11/2016 2
L 01/01/2017 1
M 03/03/2017 1
N 04/04/2017 2
O 05/05/2017 2
I want to join the first table to the second one but limit the lines joined from the second table by the latest date on the first table (per category).
The result I'm looking for would be every row in both tables except Item "M" (because Cat 1 in Table 1 has a latest date of February) and Item "O" (because Cat 2 in Table 1 has a latest date of April).
I've tried conditionalds within a where clause in the 2nd table but haven't got far.
Is there a simple way to do this? Any help is appreciated. I'm using SQL Server 2008 by the way.
Your description of the problem is specifically about using join. That suggests a query like this:
select . . .
from (select t1.*, max(t1.date) over (partition by t1.cat) as maxdate
from table1 t1
) t1 join
table2 t2
on t1.cat = t2.cat and t2.date <= t1.maxdate;
Desire output and its format is still not clear.
are you looking for this ?
;With CTE as
(
select *, row_number()over(partition by cat order by DATEs desc) rn
from #table1
)
--select * from cte
--where rn=1
select * from cte t1
left join #table2 t2
on t1.CAT=t2.CAT and t2.DATEs<=t1.DATEs
where rn=1

Left join to a table where values do not exist (and are not NULLs)

EDIT (SOLVED): A cross join. One of those joins you never use until you need it. Thanks for the help
Left table: TS , single field with values [1,2,...,365].
Right table: PAYMENT with three fields (ID, TS, AMT)
For each ID, I want to see 365 records from a left join of TS on PAYMENT.
The problem is that "no value" is not the same as a NULL.
If PAYMENT.TS does not exist for a certain value (e.g. PAYMENT.TS=4), then there is no value to join on and the left join does not return a row #4.
I tried using NOT IN / NOT EXISTS as a condition, but this only treats the case where the right table has explicit NULLS and not the case where no value exists.
How can I proceed? Thanks!
(This is a DB2 system)
SELECT * FROM TS LEFT JOIN PAYMENT ON TS = PAYMENT.TS
TS TABLE:
| TS |
----------
1
2
...
365
PAYMENTS TABLE:
| ID | TS | PMT |
-----------------------------
1 1 70
1 2 20
1 5 10
2 3 200
EXPECTED RESULT:
| ID | TS | PMT |
-----------------------------
1 1 70
1 2 20
1 3
1 4
1 5 10
... ...
1 365
2 1
2 2
2 3 200
... ...
2 365
ACTUAL RESULT:
| ID | TS | PMT |
-----------------------------
1 1 70
1 2 20
1 5 10
2 3 200
You need to generate all the rows you want using a cross join and then use left join:
SELECT i.id, ts.ts. p.amt
FROM (SELECT DISTINCT ID FROM PAYMENT) i CROSS JOIN
TS LEFT JOIN
PAYMENT p
ON ts.TS = p.TS AND p.id = i.id;
This will return 365 rows for each id.
You have to join them matching the two common columns in each table. Preferably by the keys(foreign and primary).
Let's say TS table has this one column called 'NUMBERS' and its type is int. The table PAYMENT has the column ID, type of int also. Which means they may have common values. Thus, if you want to join two tables and get the common ones where the PAYMENT.ID exists in TS.NUMBERS then you should do:
SELECT * FROM TS LEFT JOIN PAYMENT ON TS.NUMBERS = PAYMENT.ID
I hope I've been clear.
Note: Also do not forget that if a column or more has the same name in both tables, you have to clarify from which table you want that column for instance if also PAYMENT table had the column named as NUMBERS, then:
SELECT PAYMENT.ID, TS.NUMBERS FROM TS LEFT JOIN PAYMENT ON TS.NUMBERS = PAYMENT.ID

Ranking of a tuple in another table

So I have 2 tables, team A and team B, with their score. I want the rank of the score of every member of team A within team B using SQL or vertica, as shown below
Team A Table
user score
-------------
asa 100
bre 200
cqw 50
duy 50
Team B Table
user score
------------
gfh 20
ewr 80
kil 70
cvb 90
Output:
Team A Table
user score rank in team B
------------------------------
asa 100 1
bre 200 1
cqw 50 4
duy 50 4
Try this - and this only works in Vertica.
INTERPOLATE PREVIOUS VALUE is an outer-join predicate specific to Vertica that joins two tables on non-equal columns, using the 'last known' value in the outer-joined table to make a match succeed.
WITH
-- input, don't use in query itself
table_a (the_user,score) AS (
SELECT 'asa',100
UNION ALL SELECT 'bre',200
UNION ALL SELECT 'cqw',50
UNION ALL SELECT 'duy',50
)
,
table_b(the_user,score) AS (
SELECT 'gfh',20
UNION ALL SELECT 'ewr',80
UNION ALL SELECT 'kil',70
UNION ALL SELECT 'cvb',90
)
-- end of input - start WITH clause here
,
ranked_b AS (
SELECT
RANK() OVER(ORDER BY score DESC) AS the_rank
, *
FROM table_b
)
SELECT
a.the_user AS a_user
, a.score AS a_score
, b.the_rank AS rank_in_team_b
FROM table_a a
LEFT JOIN ranked_b b
ON a.score INTERPOLATE PREVIOUS VALUE b.score
ORDER BY 1
;
a_user|a_score|rank_in_team_b
asa | 100| 1
bre | 200| 1
cqw | 50| 4
duy | 50| 4
Simple correlated query should do:
select
a.*,
(select count(*) + 1 from table_b b where b.score > a.score) rank_in_b
from table_a a;
All you need to do is count the number of people with more score than current user in the table b and add 1 to it to get the rank.