Implementing Hierarchy in SQL - sql

Suppose I have a table which has a "CDATE" representing the date when I retrieved the data, a "SECID" identifying the security I retrieved data for, a "SOURCE" designating where I got the data and the "VALUE" which I got from the source. My data might look as following:
CDATE | SECID | SOURCE | VALUE
--------------------------------
1/1/2012 1 1 23
1/1/2012 1 5 45
1/1/2012 1 3 33
1/4/2012 2 5 55
1/5/2012 1 5 54
1/5/2012 1 3 99
Suppose I have a HIERARCHY table like the following ("SOURCE" with greatest HIERARCHY number takes precedence):
SOURCE | NAME | HIERARCHY
---------------------------
1 ABC 10
3 DEF 5
5 GHI 2
Now let's suppose I want my results to be picked according to the hierarchy above. So applying the hierarch and selecting the source with the greatest HIERARCHY number I would like to end up with the following:
CDATE | SECID | SOURCE | VALUE
---------------------------------
1/1/2012 1 1 23
1/4/2012 2 5 55
1/5/2012 1 3 99

This joins on your hierarchy and selects the top-ranked source for each date and security.
SELECT CDATE, SECID, SOURCE, VALUE
FROM (
SELECT t.CDATE, t.SECID, t.SOURCE, t.VALUE,
ROW_NUMBER() OVER (PARTITION BY t.CDATE, t.SECID
ORDER BY h.HIERARCHY DESC) as nRow
FROM table1 t
INNER JOIN table2 h ON h.SOURCE = t.SOURCE
) A
WHERE nRow = 1

You can get the results you want with the below. It combines your data with your hierarchies and ranks them according to the highest hierarchy. This will only return one result arbitrarily though if you have a source repeated for the same date.
;with rankMyData as (
select
d.CDATE
, d.SECID
, d.SOURCE
, d.VALUE
, row_number() over(partition by d.CDate, d.SECID order by h.HIERARCHY desc) as ranking
from DATA d
inner join HIERARCHY h
on h.source = d.source
)
SELECT
CDATE
, SECID
, SOURCE
, VALUE
FROM rankMyData
where ranking = 1

Related

Get certain rows, plus rows before and after

Let's say I have the following data set:
ID
Identifier
Admission_Date
Release_Date
234
2
5/1/22
5/5/22
234
1
4/25/22
4/30/22
234
2
4/20/22
4/24/22
234
2
4/15/22
4/18/22
789
1
7/15/22
7/19/22
789
2
7/8/22
7/14/22
789
2
7/1/22
7/5/22
321
2
6/1/21
6/3/21
321
2
5/27/21
5/31/21
321
1
5/20/21
5/26/21
321
2
5/15/21
5/19/21
321
2
5/6/21
5/10/21
I want all rows with identifier=1. I also want rows that are either directly below or above rows with Identifier=1 - sorted by most recent to least recent.
There is always a row below rows with identifier=1. There may or may not be a row above. If there is no row with identifier=1 for an ID, then it will not be brought in with a prior step.
The resulting data set should be as follows:
ID
Identifier
Admission Date
Release Date
234
2
5/1/22
5/5/22
234
1
4/25/22
4/30/22
234
2
4/20/22
4/24/22
789
1
7/15/22
7/19/22
789
2
7/8/22
7/14/22
321
2
5/27/21
5/31/21
321
1
5/20/21
5/26/21
321
2
5/15/21
5/19/21
I am using DBeaver, which runs PostgreSQL.
I admittedly don't know Postgres well so the following could possibly be optimised, however using a combination of lag and lead to obtain the previous and next dates (assuming Admission_date is the one to order by) you could try
with d as (
select *,
case when identifier = 1 then Lag(admission_date) over(partition by id order by Admission_Date desc) end pd,
case when identifier = 1 then Lead(admission_date) over(partition by id order by Admission_Date desc) end nd
from t
)
select id, Identifier, Admission_Date, Release_Date
from d
where identifier = 1
or exists (
select * from d d2
where d2.id = d.id
and (d.Admission_Date = pd or d.admission_date = nd)
)
order by Id, Admission_Date desc;
One way:
SELECT (x.my_row).* -- decompose fields from row type
FROM (
SELECT identifier
, lag(t) OVER w AS t0 -- take whole row
, t AS t1
, lead(t) OVER w AS t2
FROM tbl t
WINDOW w AS (PARTITION BY id ORDER BY admission_date)
) sub
CROSS JOIN LATERAL (
VALUES (t0), (t1), (t2) -- pivot
) x(my_row)
WHERE sub.identifier = 1
AND (x.my_row).id IS NOT NULL; -- exclude rows with NULL ( = missing row)
db<>fiddle here
The query is designed to only make a single pass over the table.
Uses some advanced SQL / Postgres features.
About LATERAL:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
About the VALUES expression:
Postgres: convert single row to multiple rows (unpivot)
The manual about extracting fields from a composite type.
If there are many rows per id, other solutions will be (much) faster - with proper index support. You did not specify ...

How to select rows with conditional values of one column in SQL

Say I have this table:
id
timeline
1
BASELINE
1
MIDTIME
1
ENDTIME
2
BASELINE
2
MIDTIME
3
BASELINE
4
BASELINE
5
BASELINE
5
MIDTIME
5
ENDTIME
6
MIDTIME
6
ENDTIME
7
RISK
7
RISK
So this is what the data looks like except the data has more observations (few thousands)
How do I get the output so that it will look like this:
id
timeline
1
BASELINE
1
MIDTIME
2
BASELINE
2
MIDTIME
5
BASELINE
5
MIDTIME
How do I select the first two terms of each ID which has 2 specific timeline values (BASELINE and MIDTIME)? Notice id 6 has MIDTIME and ENDTIME,and id 7 has two RISK I don't want these two ids.
I used
SELECT *
FROM df
WHERE id IN (SELECT id FROM df GROUP BY id HAVING COUNT(*)=2)
and got IDs with two timeline values (output below) but don't know how to get rows with only BASELINE and MIDTIME.
id timeline
---|--------|
1 | BASELINE |
1 | MIDTIME |
2 | BASELINE |
2 | MIDTIME |
5 | BASELINE |
5 | MIDTIME |
6 | MIDTIME | ---- dont want this
6 | ENDTIME | ---- dont want this
7 | RISK | ---- dont want this
7 | RISK | ---- dont want this
Many Thanks.
You can try using exists -
DEMO
select * from t t1 where timeline in ('BASELINE','MIDTIME') and
exists
(select 1 from t t2 where t1.id=t2.id and timeline in ('BASELINE','MIDTIME')
group by t2.id having count(distinct timeline)=2)
OUTPUT:
id timeline
1 BASELINE
1 MIDTIME
2 BASELINE
2 MIDTIME
5 BASELINE
5 MIDTIME
I think this query should give you the result you want.
NOTE: As i understand, you don't want the ID where exists a "ENDTIME", and in your sample data, there is an "ENDTIME" for ID 1. I assumed this was an error so i made a query that excludes all id containing "ENDTIME".
WITH CTE AS
(
SELECT
id
FROM
df
WHERE
timeline IN ('ENDTIME', 'RISK')
)
SELECT
id,
timeline
FROM
df
WHERE
id NOT IN (SELECT id FROM CTE);
There's probably a number of ways to do this, here's one way that will pick up BASELINE and MIDTIME rows where only they exist, ensuring there are only 2 rows per returned ID. Without knowing the ordering of timeline, it's not possible to go further I don't think:
SELECT
id
, timeline
FROM (
SELECT
*
, SUM(CASE WHEN timeline = 'BASELINE' THEN 1 ELSE 0 END) OVER (PARTITION BY id) AS BaselineCount
, SUM(CASE WHEN timeline = 'MIDTIME' THEN 1 ELSE 0 END) OVER (PARTITION BY id) AS MidtimeCount
FROM df
WHERE df.timeline IN ('BASELINE', 'MIDTIME')
) subquery
WHERE subquery.BaselineCount > 0
AND subquery.MidtimeCount > 0
GROUP BY
id
, timeline
;

How to join two tables without performing Cartesian product in SQL

I have index_date information for IDs and I want to extract baseline ( information between index_date and Index_date minus 6 months). I want to do this without using Cartesian product.
Total Table
ID index_date detail
1 01Jan2012 xyz
1 01Dec2011 pqr
1 01Nov2010 pqr
2 26Feb2013 abc
3 02Mar2013 abc
3 02Feb2013 ert
3 02Jan2013 tyu
4 07May2015 rts
I have a table A extracted from Total which has the index_dates:
ID index_date index_detail
1 01Jan2012 xyz
2 26Feb2013 abc
3 02Mar2013 abc
4 07May2015 rts
I want to extract baseline periods data for IDs in A from from the Total table
Table want :
ID date index_date detail index_detail
1 01Jan2012 01Jan2012 xyz xyz
1 01Dec2011 01Jan2012 pqr xyz
2 26Feb2013 26Feb2013 abc abc
3 02Mar2013 02Mar2013 abc abc
3 02Feb2013 02Mar2013 ert abc
3 02Jan2013 02Mar2013 tyu abc
4 07May2015 07May2015 rts rts
code used :
create table want as
select a.* , b.date,b.detail
from table_a as a
right join
Total as b
on a.id = b.id where
a.index_date > b.date
AND b.date >= add_months( a.index_date, -6)
;
But this requires Cartesian Product. Is there a way to do this without requiring Cartesian product.
DBMS - Hive
Sorry, I don't know it.
I'll give the solution on pure SQL for MySQL 8+ - maybe you'll find the way to convert it to Hive syntax.
SELECT id,
index_date date,
FIRST_VALUE(index_date) OVER (PARTITION BY ID ORDER BY STR_TO_DATE(index_date, '%d%b%Y') DESC) index_date,
detail,
FIRST_VALUE(detail) OVER (PARTITION BY ID ORDER BY STR_TO_DATE(index_date, '%d%b%Y') DESC) index_detail
FROM test
ORDER BY 1 ASC, 2 DESC
fiddle
I would recommend three steps:
Convert the date to a number.
Find the minimum date in a six month period.
Get the first value in that group.
This looks like:
select t.*, t2.index_date, t2.detail
from (select t.*,
min(index_date) over (partition by id
order by months
range between 6 preceding and current row
) as sixmonth_date
from (select t.*,
year(index_date) * 12 + month(index_date) as months
from total t
) t
) t left join
total t2
on t2.id = t.id and t2.index_date = t.sixmonth_date;
This is marginally simpler if first_value() accepts range window frames -- but I'm not sure if it does. It is worth trying, though:
select t.*,
min(index_date) over (partition by id
order by months
range between 6 preceding and current row
) as sixmonth_date,
first_value(detail) over (partition by id
order by months
range between 6 preceding and current row
) as sixmonth_value
from (select t.*,
year(index_date) * 12 + month(index_date) as months
from total t
) t

Some Case statement issue

I have two tables that has data like
table1
Id id_nm
1 per
2 per
3 org
table2
Id Lst_id l_nm up_dt
1 22 abc 9/10/2015
1 21 abs 10/12/2016
2 21 xzc 10/12/2013
2 23 xyz 10/21/2013
2 23 xnh 01/12/2013
Need to pick the l_nm where lst_id is 22. If that is not present then we need to pick the l_nm with the most recent updated date.
Id lst_id lnm up_dt
1 22 abc 9/10/2015
2 23 xyz 10/21/2013
can any one please help me in implementing it.
Simple way is to use row_number with a window clause to generate a custom sort order:
select id, lst_id, l_nm as lnm, up_dt
from (
select id
,lst_id
,l_nm
,up_dt
,row_number()
over (partition by id
order by case when lst_id = 22 then 1 else 2 end
,up_dt desc) as rn
from table2
) where rn = 1;

Query to find Cumulative while subtracting other counts

Here is my table structure
Id INT
RecId INT
Dated DATETIME
Status INT
and here is my data.
Status table (contains different statuses)
Id Status
1 Created
2 Assigned
Log table (contains logs for the different statuses that a record went through (RecId))
Id RecId Dated Status
1 1 2013-12-09 14:16:31.930 1
2 7 2013-12-09 14:27:26.620 1
3 1 2013-12-09 14:27:26.620 2
3 8 2013-12-10 11:14:13.747 1
3 9 2013-12-10 11:14:13.747 1
3 8 2013-12-10 11:14:13.747 2
What I need to generate a report from this data in the following format.
Dated Created Assigned
2013-12-09 2 1
2013-12-10 3 1
Here the rows data is calculated date wise. The Created is calculated as (previous record (date) Created count - Previous date Assigned count) + Todays Created count.
For example if on date 2013-12-10 three entries were made to log table out of which two have the status Created while one has the status assigned. So in the desired view that I want to build for report, For date 2013-12-10, the view will return Created as 2 + 1 = 3 where 2 is newly inserted records in log table and 1 is the previous day remaining record count (Created - Assigned) 2 - 1.
I hope the scenario is clear. Please ask me if further information is required.
Please help me with the sql to construct the above view.
This matches the expected result for the provided sample, but may require more testing.
with CTE as (
select
*
, row_number() over(order by dt ASC) as rn
from (
select
cast(created.dated as date) as dt
, count(created.status) as Created
, count(Assigned.status) as Assigned
, count(created.status)
- count(Assigned.status) as Delta
from LogTable created
left join LogTable assigned
on created.RecId = assigned.RecId
and created.status = 1
and assigned.Status = 2
and created.Dated <= assigned.Dated
where created.status = 1
group by
cast(created.dated as date)
) x
)
select
dt.dt
, dt.created + coalesce(nxt.delta,0) as created
, dt.assigned
from CTE dt
left join CTE nxt on dt.rn = nxt.rn+1
;
Result:
| DT | CREATED | ASSIGNED |
|------------|---------|----------|
| 2013-12-09 | 2 | 1 |
| 2013-12-10 | 3 | 1 |
See this SQLFiddle demo