Getting first one depending on the current row - sql

There are three tables, such as equip_type, output_history, and time_history in Oracle DB. How to join the three tables as shown below?
(DBMS: Oracle)
EQUIP MODEL DATE1 QUANTITY DATE2 TIME EQUIP_TYPE
---- ---- ---------- ------ -------- ---- ----------
e1 m1 20180103 10 20180101 6 A
e1 m1 20180106 20 20180105 5 A
Notice that at the point of DATE1 '20180103' in output_history, DATE2 '20180101' in time_history is the most recent one.
At the point of DATE1 '20180106' in output_history, 'DATE2 20180105' in time_history is the most recent one.
--equip_type table and the date
CREATE TABLE equip_type (
EQUIP_TYPE VARCHAR(60),
EQUIP VARCHAR(60)
);
INSERT INTO equip_type VALUES ('A','e1');
-- output_history and data
CREATE TABLE output_history (
EQUIP VARCHAR(60),
MODEL VARCHAR(60),
Data1 VARCHAR(60),
QUANTITY NUMBER(10)
);
INSERT INTO output_history VALUES ('e1','m1','20180103',10);
INSERT INTO output_history VALUES ('e1','m1','20180106',20);
--time_history table and data
CREATE TABLE time_history (
EQUIP VARCHAR(60),
MODEL VARCHAR(60),
Data2 VARCHAR(60),
time NUMBER(10)
);
INSERT INTO time_history VALUES ('e1','m1','20180101',6);
INSERT INTO time_history VALUES ('e1','m1','20180105',5);

You can use a correlated subquery with a NOT EXIST condition to select the closest related record in time_history.
I tested below query on MySQL in this db fiddle. You did not tag the RDBMS you are using. I tested on MySQL, but this is standard SQL that will work on most RDBMS.
SELECT
o.equip,
o.model,
o.data1,
o.quantity,
t.data2,
t.time,
e.equip_type
FROM
output_history o
INNER JOIN equip_type e ON e.equip = o.equip
INNER JOIN time_history t ON t.equip = o.equip AND t.data2 <= o.data1
WHERE NOT EXISTS (
SELECT 1
FROM time_history
WHERE
equip = o.equip
AND data2 <= o.data1
AND data2 > t.data2
)
Sid notes : the query will be always lookup the most recent time_history record before the current output_history record (even if there is a closest record in the future, it will not be selected)
Disclaimer : don't store dates as strings, this is a recipe for disaster. Use the relevant datatype according to your RDBMS. In your use case, it works only because dates are formated in a way that they can be easily sorted.

Related

How to insert a column which sets unique id based on values in another column (SQL)?

I will create table where I will insert multiple values for different companies. Basically I have all values that are in the table below but I want to add a column IndicatorID which is linked to IndicatorName so that every indicator has a unique id. This will obviously not be a PrimaryKey.
I will insert the data with multiple selects:
CREATE TABLE abc
INSERT INTO abc
SELECT company_id, 'roe', roevalue, metricdate
FROM TABLE1
INSERT INTO abc
SELECT company_id, 'd/e', devalue, metricdate
FROM TABLE1
So, I don't know how to add the IndicatorID I mentioned above.
EDIT:
Here is how I populate my new table:
INSERT INTO table(IndicatorID, Indicator, Company, Value, Date)
SELECT [the ID that I need], 'NI_3y' as 'Indicator', t.Company, avg(t.ni) over (partition by t.Company order by t.reportdate rows between 2 preceding and current row) as 'ni_3y',
t.reportdate
FROM table t
LEFT JOIN IndicatorIDs i
ON i.Indicator = roe3 -- the part that is not working if I have separate indicatorID table
I am going to insert different indicators for the same companies. And I want indicatorID.
Your "indicator" is a proper entity in its own right. Create a table with all indicators:
create table indicators (
indicator_id int identity(1, 1) primary key,
indicator varchar(255)
);
Then, use the id only in this table. You can look up the value in the reference table.
Your inserts are then a little more complicated:
INSERT INTO indicators (indicator)
SELECT DISTINCT roevalue
FROM table1 t1
WHERE NOT EXISTS (SELECT 1 FROM indicators i2 WHERE i2.indicator = t1.roevalue);
Then:
INSERT INTO ABC (indicatorId, companyid, value, date)
SELECT i.indicatorId, t1.company, v.value, t1.metricdate
FROM table1 t1 CROSS APPLY
(VALUES ('roe', t1.roevalue), ('d/e', t1.devalue)
) v(indicator, value) JOIN
indicators i
ON i.indicator = v.indicator;
This process is called normalization and it is the typical way to store data in a database.
DDL and INSERT statement to create an indicators table with a unique constraint on indicator. Because the ind_id is intended to be a foreign key in the abc table it's created as a non-decomposable surrogate integer primary key using the IDENTITY property.
drop table if exists test_indicators;
go
create table test_indicators (
ind_id int identity(1, 1) primary key not null,
indicator varchar(20) unique not null);
go
insert into test_indicators(indicator) values
('NI'),
('ROE'),
('D/E');
The abc table depends on the ind_id column from indicators table as a foreign key reference. To populate the abc table company_id's are associated with ind_id's.
drop table if exists test_abc
go
create table test_abc(
a_id int identity(1, 1) primary key not null,
ind_id int not null references test_indicators(ind_id),
company_id int not null,
val varchar(20) null);
go
insert into test_abc(ind_id, company_id)
select ind_id, 102 from test_indicators where indicator='NI'
union all
select ind_id, 103 from test_indicators where indicator='ROE'
union all
select ind_id, 104 from test_indicators where indicator='D/E'
union all
select ind_id, 103 from test_indicators where indicator='NI'
union all
select ind_id, 105 from test_indicators where indicator='ROE'
union all
select ind_id, 102 from test_indicators where indicator='NI';
Query to get result
select i.ind_id, a.company_id, i.indicator, a.val
from test_abc a
join test_indicators i on a.ind_id=i.ind_id;
Output
ind_id company_id indicator val
1 102 NI NULL
2 103 ROE NULL
3 104 D/E NULL
1 103 NI NULL
2 105 ROE NULL
1 102 NI NULL
I was finally able to find the solution for my problem which seems to me very simple, although it took time and asking different people about it.
First I create my indicators table where I assign primary key for all indicators I have:
CREATE TABLE indicators (
indicator_id int identity(1, 1) primary key,
indicator varchar(255)
);
Then I populate easy without using any JOINs or CROSS APPLY. I don't know if this is optimal but it seems as the simplest choice:
INSERT INTO table(IndicatorID, Indicator, Company, Value, Date)
SELECT
(SELECT indicator_id from indicators i where i.indicator = 'NI_3y) as IndicatorID,
'NI_3y' as 'Indicator',
Company,
avg(ni) over (partition by Company order by reportdate rows between 2 preceding and current row) as ni_3y,
reportdate
FROM TABLE1

Joining tables with recent date for each row then weighted averaging

There are three tables, such as equip_type , output_history, and time_history in Oracle DB.
Is there a way to join the three tables as shown below at (1) and then to get weighted average as shown below at (2)?
--equip_type table and the date
CREATE TABLE equip_type (
EQUIP_TYPE VARCHAR(60),
EQUIP VARCHAR(60)
);
INSERT INTO equip_type VALUES ('A','e1');
-- output_history and data
CREATE TABLE output_history (
EQUIP VARCHAR(60),
MODEL VARCHAR(60),
Data1 VARCHAR(60),
QUANTITY NUMBER(10)
);
INSERT INTO output_history VALUES ('e1','m1','20180103',10);
INSERT INTO output_history VALUES ('e1','m1','20180106',20);
--time_history table and data
CREATE TABLE time_history (
EQUIP VARCHAR(60),
MODEL VARCHAR(60),
Data2 VARCHAR(60),
time NUMBER(10)
);
INSERT INTO time_history VALUES ('e1','m1','20180101',6);
INSERT INTO time_history VALUES ('e1','m1','20180105',5);
(1) How to get joined table as below?
EQUIP MODEL DATE1 QUANTITY DATE2 TIME TYPE
---- ---- ---------- ------ -------- ---- ----
e1 m1 20180103 10 20180101 6 A
e1 m1 20180106 20 20180105 5 A
For each row in OUTPUT_HISTORY, *the most recent row at the point of the DATE1*in TIME_HISTORY is joined.
(2) Then, With the joined table above, how to get weighted average of TIME?
(QUANTITY * TIME) / sum of QUANTITY group by TYPE, MODEL
for example,(10×6 + 20×5)÷(10+20) for equip type A and model m1
One method uses analytic functions to get the most recent record and then simple aggregation
select sum(quantity * time) / sum(quantity)
from output_history oh left join
(select th.*,
row_number() over (partition by equip, model order by date2 desc) as seqnum
from time_history th
) th
on oh.equip = th.equip and oh.model = th.model and th.seqnum = 1
group by equip, model;

Get the row values as column in SQL

I have below table,and need to get row values as an output.
This is a part of a view in Oracle Database.
I need to get the output using SQL as below.name,address,regionare taking from another table by referringID .
Looking for much simple way since full query have more than 15 columns and below also need to be added as columns.
Thanks.
"Looking for much simple way since full query have more than 15 columns"
Sorry, you can have a complex query or no query at all :)
The problem is the structure of the posted table mandates a complex query. That's because it uses a so-called "generic data model", which is actually a data anti-model. The time saved in not modelling the requirement and just smashing values into the table is time you will have to spend writing horrible queries to get those values out again.
I assume you need to drive off the other table you referred to, and the posted table contains attributes supplementary to the core record.
select ano.id
, ano.name
, ano.address
, ano.region
, t1.value as alt_id
, t2.value as birth_date
, t3.value as contact_no
from another_table ano
left outer join ( select id, value
from generic_table
where key = 'alt_id' ) t1
on ano.id = t1.id
left outer join ( select id, value
from generic_table
where key = 'birth_date' ) t2
on ano.id = t2.id
left outer join ( select id, value
from generic_table
where key = 'contact_no' ) t3
on ano.id = t3.id
Note the need to use outer joins: one of the problems with generic data models is the enforcement of integrity constraints. Weak data typing can also be an issue (say if you wanted to convert the birth_date string into an actual date).
PIVOT concept fits well for these types of problems :
SQL> create table person_info(id int, key varchar2(25), value varchar2(25));
SQL> create table person_info2(id int, name varchar2(25), address varchar2(125), region varchar2(25));
SQL> insert into person_info values(4150521,'contact_no',772289317);
SQL> insert into person_info values(4150522,'alt_id','98745612V');
SQL> insert into person_info values(4150522,'birth_date',date '1990-04-21');
SQL> insert into person_info values(4150522,'contact_no',777894561);
SQL> insert into person_info2 values(4150521,'ABC','AAAAAA','ASD');
SQL> insert into person_info2 values(4150522,'XYZ','BBBBB','WER');
SQL> select p1.id, name, address, region, alt_id, birth_date, contact_no
from person_info
pivot
(
max(value) for key in ('alt_id' as alt_id,'birth_date' as birth_date,'contact_no' as contact_no)
) p1 join person_info2 p2 on (p1.id = p2.id);
ID NAME ADDRESS REGION ALT_ID BIRTH_DATE CONTACT_NO
------- ------- ------- ------ --------- ---------- ----------
4150521 ABC AAAAAA ASD 12345678V 21-APR-89 772289317
4150522 XYZ BBBBB WER 98745612V 21-APR-90 777894561

Highlight multiple records in a date range

Working with SQL Server 2008.
fromdate todate ID name
--------------------------------
1-Aug-16 7-Aug-16 x jack
3-Aug-16 4-Aug-16 x jack
5-Aug-16 6-Aug-16 x tom
1-Aug-16 2-Aug-16 x john
3-Aug-16 4-Aug-16 x harry
5-Aug-16 6-Aug-16 x mac
Is there a way to script this so that I know if there are multiple names tagged to an ID in the same date range?
For example above, I want to flag that ID x has Name Jack and Tom tagged in the same date range.
ID multiple_flag
------------------------------------------------
x yes
y no
If there is a unique index in your table (in my example it is column i but you could also generate one by means of using ROW_NUMBER()) then you can do the following query based on an INNER JOIN to find overlapping date ranges:
CREATE TABLE #tmp (i int identity primary key,fromdate date,todate date,ID int,name varchar(32));
insert into #tmp (fromdate,todate,ID ,name) values
('1-Aug-16','7-Aug-16',3,'jack'),
('3-Aug-16','4-Aug-16',3,'tom'),
('5-Aug-16','6-Aug-16',3,'jack');
select a.*,b.name bname,b.i i2 from #tmp a
INNER join #tmp b on b.id=a.id AND b.i<>a.i
AND ( b.fromdate between a.fromdate and a.todate
OR b.todate between a.fromdate and a.todate)
(My id column is int). This will give you:
i fromdate todate ID name bname i2
- ---------- ---------- - ---- ----- --
1 2016-08-01 2016-08-07 3 jack tom 2
1 2016-08-01 2016-08-07 3 jack jack 3
Implement further filtering or grouping as required. I left a little demo here.
Please check the below sql, but it might not be the optimal one..
SELECT formdate,todate,id,tab1.name,
case when tab2.#Of >1 then 'yes' else 'no' end as multiple_flag
FROM tab1
inner join (SELECT Name, COUNT(*) as #Of
FROM tab1
GROUP BY Name) as tab2 on tab1.name=tab2.name
order by tab1.id ;
add your where condition, before the order by, if you need to add some date range on your sql.
change formdate to fromdate before run this sql, as I have used formdate in my machine.
The result looks like
One way to do it is using EXISTS CASE:
Please note this part of the query:
-- make sure the records date ranges overlap
AND t1.fromdate <= t2.todate
AND t2.fromdate <= t1.todate
for an explanation on testing for overlapping ranges, read the overlap wiki.
Create and populate sample data (Please save us this step in your future questions)
DECLARE #T as table
(
fromdate date,
todate date,
ID char(1),
name varchar(10)
)
INSERT INTO #T VALUES
('2016-08-01', '2016-08-07', 'x', 'jack'),
('2016-08-03', '2016-08-04', 'x', 'tom'),
('2016-08-05', '2016-08-06', 'x', 'jack'),
('2016-08-01', '2016-08-02', 'y', 'john'),
('2016-08-03', '2016-08-04', 'y', 'harry'),
('2016-08-05', '2016-08-06', 'y', 'mac')
The query:
SELECT DISTINCT id,
CASE WHEN EXISTS
(
SELECT 1
FROM #T t2
WHERE t1.Id = t2.Id
-- make sure it's not the same record
AND t1.fromdate <> t2.fromdate
AND t1.todate <> t2.todate
-- make sure the records date ranges overlap
AND t1.fromdate <= t2.todate
AND t2.fromdate <= t1.todate
)
THEN 'Yes'
ELSE 'No'
END As multiple_flag
FROM #T t1
Results:
id multiple_flag
---- -------------
x Yes
y No

Last Record of a Join Table (how to optimize)

I have the same "problem" as described in (Last record of Join table): I need to join a "Master Table" with a "History Table" whereas I only want to join the latest (by date) Record of the the history table. So whenever I query a record for the mastertable I also geht the "latest" data of the History Table.
Master Table
ID
FIRSTNAME
LASTNAME
...
History Table
ID
LASTACTION
DATE
This is possible by joining both tables and using a subselect to retrieve the latest history table record as described in the answer given in the link above.
My Quesions are:
How can I solve the problem, that there might be in theory two History Records with the same date?
Is this kind of joining with the subselect really the best solution in terms of performance (and in general)? What do you think (I am NO expert in all this stuff) if I integrate a further attribute in the History table that is named "ISLATESTRECORD" as a boolean Flag that I manage manually (and that has a unique constrained). This attribute will then explicitly mark the latest record and I do not need any subselects as I can directly use this attribute in the where clause of the join.
On the other hand, this makes inserting a new record of course a little bit more complicated: I first have to remove the "ISLATESTRECORD" flag from the latest record, I have to insert the new History Record with the "ISLATESTRECORD" set and commit the transaction.
What do you think is the recommended solution? I do not have any clue about the performance impact of the subselects: I might have millions of "Mastertable" Records" that I have to search for a specific record also using in the search attributes of the joined History table like: "Give me the Mastertable Record with FIRSTNAME XYZ and the LASTACTION (of the History Table) was "changed_name". So this subselect might be called millions of times.
Or is it better work with a subselect to find the latest record, as subselects are very efficient and its better to keep everything normalized?
Thank you very much
I solve your problem with a query on your existing tables, and on your tables with an auto-incrementing identity column added to the history table. By adding an auto-incrementing identity column on your history table, you can get around the unique problem of the dates, and make the query easier.
To solve the problem with your tables (with SQL Server example code):
DECLARE #MasterTable table (MasterID int,FirstName varchar(20),LastName varchar(20))
DECLARE #HistoryTable table (MasterID int,LastAction char(1),HistoryDate datetime)
INSERT INTO #MasterTable VALUES (1,'AAA','aaa')
INSERT INTO #MasterTable VALUES (2,'BBB','bbb')
INSERT INTO #MasterTable VALUES (3,'CCC','ccc')
INSERT INTO #HistoryTable VALUES (1,'I','1/1/2009')
INSERT INTO #HistoryTable VALUES (1,'U','2/2/2009')
INSERT INTO #HistoryTable VALUES (1,'U','3/3/2009') --<<dups
INSERT INTO #HistoryTable VALUES (1,'U','3/3/2009') --<<dups
INSERT INTO #HistoryTable VALUES (2,'I','5/5/2009')
INSERT INTO #HistoryTable VALUES (3,'I','7/7/2009')
INSERT INTO #HistoryTable VALUES (3,'U','8/8/2009')
SELECT
MasterID,FirstName,LastName,LastAction,HistoryDate
FROM (SELECT
m.MasterID,m.FirstName,m.LastName,h.LastAction,h.HistoryDate,ROW_NUMBER() OVER(PARTITION BY m.MasterID ORDER BY m.MasterID) AS RankValue
FROM #MasterTable m
INNER JOIN (SELECT
MasterID,MAX(HistoryDate) AS MaxDate
FROM #HistoryTable
GROUP BY MasterID
) dt ON m.MasterID=dt.MasterID
INNER JOIN #HistoryTable h ON dt.MasterID=h.MasterID AND dt.MaxDate=h.HistoryDate
) AllRows
WHERE RankValue=1
OUTPUT:
MasterID FirstName LastName LastAction HistoryDate
----------- --------- -------- ---------- -----------
1 AAA aaa U 2009-03-03
2 BBB bbb I 2009-05-05
3 CCC ccc U 2009-08-08
(3 row(s) affected)
To solve the problem with a better, HistoryTable (with SQL Server example code):
it is better because it has an auto-incrementing history id identity column
DECLARE #MasterTable table (MasterID int,FirstName varchar(20),LastName varchar(20))
DECLARE #HistoryTableNEW table (HistoryID int identity(1,1), MasterID int,LastAction char(1),HistoryDate datetime)
INSERT INTO #MasterTable VALUES (1,'AAA','aaa')
INSERT INTO #MasterTable VALUES (2,'BBB','bbb')
INSERT INTO #MasterTable VALUES (3,'CCC','ccc')
INSERT INTO #HistoryTableNEW VALUES (1,'I','1/1/2009')
INSERT INTO #HistoryTableNEW VALUES (1,'U','2/2/2009')
INSERT INTO #HistoryTableNEW VALUES (1,'U','3/3/2009') --<<dups
INSERT INTO #HistoryTableNEW VALUES (1,'U','3/3/2009') --<<dups
INSERT INTO #HistoryTableNEW VALUES (2,'I','5/5/2009')
INSERT INTO #HistoryTableNEW VALUES (3,'I','7/7/2009')
INSERT INTO #HistoryTableNEW VALUES (3,'U','8/8/2009')
SELECT
m.MasterID,m.FirstName,m.LastName,h.LastAction,h.HistoryDate,h.HistoryID
FROM #MasterTable m
INNER JOIN (SELECT
MasterID,MAX(HistoryID) AS MaxHistoryID
FROM #HistoryTableNEW
GROUP BY MasterID
) dt ON m.MasterID=dt.MasterID
INNER JOIN #HistoryTableNEW h ON dt.MasterID=h.MasterID AND dt.MaxHistoryID=h.HistoryID
OUTPUT:
MasterID FirstName LastName LastAction HistoryDate HistoryID
----------- --------- -------- ---------- ----------------------- ---------
1 AAA aaa U 2009-03-03 00:00:00.000 4
2 BBB bbb I 2009-05-05 00:00:00.000 5
3 CCC ccc U 2009-08-08 00:00:00.000 7
(3 row(s) affected)
If the history table has a Primary Key (and all tables should), you can modify the subselect to extract the record with either the larger (or the smaller) PK value of the multiples that match the date criteria...
Select M.*, H.*
From Master M
Join History H
On H.PK = (Select Max(PK) From History
Where FK = M.PK
And Date = (Select Max(Date) From History
Where FK = M.PK))
As to performance, that can be addressed by adding the appropriate indices to these tables (History.Date, History.FK) but in general, depending on the specific table data distribution patterns, sub queries can adversely affect performance.