How to flatten a table from row to columns - sql

I use MariaDB 10.2.21
I have not seen this exact case elsewhere, hence my request for assistance.
I have a History table containing one record per change on any of the fields in a JIRA issues:
+----------+---------------+----------+-----------------+---------------------+
| IssueKey | OriginalValue | NewValue | Field | ChangeDate |
+----------+---------------+----------+-----------------+---------------------+
| HRSK-184 | (NULL) | 2 | Risk Detection | 2019-10-24 10:57:27 |
| HRSK-184 | (NULL) | 2 | Risk Occurrence | 2019-10-24 10:57:27 |
| HRSK-184 | (NULL) | 2 | Risk Severity | 2019-10-24 10:57:27 |
| HRSK-184 | 2 | 4 | Risk Detection | 2019-10-25 11:54:07 |
| HRSK-184 | 2 | 6 | Risk Detection | 2019-10-25 11:54:07 |
| HRSK-184 | 2 | 3 | Risk Severity | 2019-10-24 11:54:07 |
| HRSK-184 | 6 | 5 | Risk Detection | 2019-10-26 09:11:01 |
+----------+---------------+----------+-----------------+---------------------+
Every record contains the old and new value and the fieldtype that has changed ('Field') and, of course, the corresponding timestamp of that change.
I want to query the point-in-time status providing me the combination of the most recent values of every of the fields 'Risk Severity, Risk Occurrence and Risk Detection'.
The result should be like this:
+----------+----------------+-------------------+------------------+----------------------+
| IssueKey | Risk Severity | Risk Occurrence | Risk Detection | ChangeDate |
+----------+----------------+-------------------+------------------+----------------------+
| HRSK-184 | 3 | 2 | 5 | 2019-10-26 09:11:01 |
+----------+----------------+-------------------+------------------+----------------------+
Any ideas? I'm stuck...
Thanks in advance for you effort!

You cold use a couple of inline queries
select
IssueKey,
(
select t1.NewValue
from mytable t1
where t1.IssueKey = t.IssueKey and t1.Field = 'Risk Severity'
order by ChangeDate desc limit 1
) `Risk Severity`,
(
select t1.NewValue
from mytable t1
where t1.IssueKey = t.IssueKey and t1.Field = 'Risk Occurrence'
order by ChangeDate desc limit 1
) `Risk Occurrence`,
(
select t1.NewValue
from mytable t1
where t1.IssueKey = t.IssueKey and t1.Field = 'Risk Detection'
order by ChangeDate desc limit 1
) `Risk Severity`,
max(ChangeDate) ChangeDate
from mytable t
group by IssueKey
With an index on (IssueKey, Field, ChangeDate, NewValue), this should an efficient option.
Demo on DB Fiddle:
IssueKey | Risk Severity | Risk Occurrence | Risk Severity | ChangeDate
:------- | ------------: | --------------: | ------------: | :------------------
HRSK-184 | 3 | 2 | 5 | 2019-10-26 09:11:01

MariaDB 10.2 has introduced some Window Functions for analytical queries.
One of them is RANK() OVER (PARTITION BY ...ORDER BY...) function.
Firstly, you can apply it, and then pivot through Conditional Aggregation :
SELECT IssueKey,
MAX(CASE WHEN Field = 'Risk Severity' THEN NewValue END ) AS RiskSeverity,
MAX(CASE WHEN Field = 'Risk Occurrence' THEN NewValue END ) AS RiskOccurrence,
MAX(CASE WHEN Field = 'Risk Detection' THEN NewValue END ) AS RiskDetection,
MAX(ChangeDate) AS ChangeDate
FROM
(
SELECT RANK() OVER (PARTITION BY IssueKey, Field ORDER BY ChangeDate Desc) rnk,
t.*
FROM mytable t
) t
WHERE rnk = 1
GROUP BY IssueKey;
IssueKey | RiskSeverity | RiskOccurrence | RiskDetection | ChangeDate
-------- + --------------+-----------------+----------------+--------------------
HRSK-184 | 3 | 2 | 5 | 2019-10-26 09:11:01
Demo

Related

SQL - Group two rows by columns that value and null on different columns

Question
Say I have a table with such rows:
id | country | place | last_action | second_to_last_action
----------------------------------------------------------
1 | US | 2 | reply |
1 | US | 2 | | comment
4 | DE | 5 | reply |
4 | | | | comment
What I want to do is to combine these by id, country and place so that the last_action and second_to_last_action would be on the same row
id | country | place | last_action | second_to_last_action
----------------------------------------------------------
1 | US | 2 | reply | comment
4 | DE | 5 | reply | comment
How would I approach this? I guess I would need an aggregate here but my mind is hitting completely blank on which one should I use.
It can be expected that there will always be a matching pair.
Background:
Note: this table has been derived from something like this:
id | country | place | action | time
----------------------------------------------------------
1 | US | 2 | reply | 16:15
1 | US | 2 | comment | 15:16
1 | US | 2 | view | 13:16
4 | DE | 5 | reply | 17:15
4 | DE | 5 | comment | 16:16
4 | DE | 5 | view | 14:12
Code used to partition was:
row_number() over (partition by id order by time desc) as event_no
And then I got the last and second_to_last action by getting event_no 1 & 2. So if there's more efficient way to get the last two actions in two distinct columns I would be happy to hear that.
You can fix your first data by using aggregation:
select id, country, place, max(last_action), max(second_to_last_action)
from derived
group by id, country, place;
You can do this from the original table using conditional aggregation:
select id, country, place,
max(case when seqnum = 1 then action end) as last_action,
max(case when seqnum = 2 then action end) as second_to_last_action
from (select t.*,
row_number() over (partition by id order by time desc) as seqnum
from t
) t
group by id, country, place;

Transposing SQL rows data to column

In SQL we need to transform a table in the following way:
Table1:
+-----+---------+-----------+
| ID | insured | DOD |
+-----+---------+-----------+
| 123 | Pam | 6/18/2013 |
| 123 | Nam | 2/12/2010 |
| 123 | Tam | 2/10/2013 |
| 456 | Jessi | 4/6/2003 |
| 457 | Ron | 4/10/2010 |
| 457 | Tom | 5/5/2008 |
+-----+---------+-----------+
Desired output table:
+-----+---------+-----------+-----------+-----------+
| ID | insured | DOD1 | DOD2 | DOD3 |
+-----+---------+-----------+-----------+-----------+
| 123 | Pam | 6/18/2013 | 2/12/2010 | 2/10/2013 |
| 456 | Jessi | 4/6/2003 | null | null |
| 457 | Ron | 4/10/2010 | 5/5/2008 | null |
+-----+---------+-----------+-----------+-----------+
I have seen somewhere that we can use pivot and unpivot, but I am not sure how can I use it here.
Your help is much appreciated.
Assuming that you really want -- or can accept -- the dates in descending order, then you can use conditional aggregation for this:
select id,
max(case when seqnum = 1 then insured end) as insured,
max(case when seqnum = 1 then dod end) as dod_1,
max(case when seqnum = 2 then dod end) as dod_2,
max(case when seqnum = 3 then dod end) as dod_3
from (select t.*,
row_number() over (partition by id order by dod desc) as seqnum
from t
) t
group by id;
If you want to preserve the original ordering, then your question does not have enough information. If you have a column with the ordering, that can be used for row_number().
I would run it like this, guessing that your id is a number and that you want only those records. from vertical to horizontal , pivot is the best option
select id , insured, DOD from yourtable
pivot(max(DOD) for ID in (123,456,457));

How to de-duplicate SQL table rows by multiple columns with hierarchy?

I have a table with multiple records for each patient.
My end goal is a table that is 1-to-1 between Patient_id and Value.
I would like to de-duplicate (in respect to patient_id) my rows based on "a hierarchical series of aggregate functions" (if someone has a better way to phrase this, I'd appreciate that as well.)
+----+------------+------------+------------+----------+-----------------+-------+
| ID | patient_id | Date | Date2 | Priority | Source | Value |
+----+------------+------------+------------+----------+-----------------+-------+
| 1 | 1 | 2017-09-09 | 2018-09-09 | 1 | 'verified' | 55 |
| 2 | 1 | 2017-09-09 | 2018-11-11 | 2 | 'verified' | 78 |
| 3 | 1 | 2017-11-11 | 2018-09-09 | 3 | 'verified' | 23 |
| 4 | 1 | 2017-11-11 | 2018-11-11 | 1 | 'self_reported' | 11 |
| 5 | 1 | 2017-09-09 | 2018-09-09 | 2 | 'self_reported' | 90 |
| 5 | 1 | 2017-09-09 | 2018-09-09 | 3 | 'self_reported' | 34 |
| 6 | 2 | 2017-11-11 | 2018-09-09 | 2 | 'self_reported' | 21 |
+----+------------+------------+------------+----------+-----------------+-------+
For each patient_id, I would like to get the row(s) that has/have the MAX(Date). In the case that there are still duplicated patient_id, I would like to get the row(s) with the MIN(Priority). In the case that there are still duplicated rows I would like to get the row(s) with the MIN(Date2).
The way I've approached this problem is using a series of queries like this to de-duplicate on the columns one at a time.
SELECT *
FROM #table t1
LEFT JOIN
(SELECT
patient_id,
MIN(priority) AS min_priority
FROM #table
GROUP BY patient_id) t2 ON t2.patient_id = t1.patient_id
WHERE t2.min_priority = t1.priority
Is there a way to do this that allows me to de-dup on multiple columns at once? Is there a more elegant way to do this?
I'm able to get my results, but my solution feels very inefficient, and I keep running into this. Thank you for any input.
You could use row_number(), if your RDBMS supports it:
select ID, patient_id, Date, Date2, Priority, Source, Value
from (
select
t.*,
row_number() over(partition by patient_id order by Date desc, Priority, Date2) rn
from mytable t
) where rn = 1
Another option is to filter with a correlated subquery that sorts the record according to your criteria, like so:
select t.*
from mytable t
where id = (
select id
from mytable t1
where t1.patient_id = t.patient_id
order by t1.Date desc, t1.Priority, t1.Date2
limit 1
)
The actual syntax for limit varies accross RDBMS.

How do i get the latest user udpated column value in a table based on timestamp entry on a different table in SQL Server?

I have a temp table #StatusInfo with the following data
+---------+--------------+-------+-------------------------+--+
| OrderNo | GroupLineNum | Type1 | UpdateDate | |
+---------+--------------+-------+-------------------------+--+
| Order85 | NULL | 1 | 2019-11-25 05:15:55.000 | |
+---------+--------------+-------+-------------------------+--+
| Order86 | NULL | 1 | 2019-11-25 05:15:55.000 | |
+---------+--------------+-------+-------------------------+--+
| Order86 | 2 | 2 | 2019-11-25 05:32:23.773 | |
+---------+--------------+-------+-------------------------+--+
| Order87 | NULL | 1 | 2019-11-25 05:15:55.000 | |
+---------+--------------+-------+-------------------------+--+
| Order87 | 1 | 2 | 2019-11-25 05:43:37.637 | | B
+---------+--------------+-------+-------------------------+--+
| Order87 | 2 | 2 | 2019-11-25 05:42:32.390 | | A
+---------+--------------+-------+-------------------------+--+
| Order88 | NULL | 1 | 2019-11-25 06:35:13.000 | |
+---------+--------------+-------+-------------------------+--+
| Order88 | 1 | 2 | 2019-11-25 06:39:16.170 | |
+---------+--------------+-------+-------------------------+--+
Any update the user does on an order will be pulled into this temp table. Type 1 column with value 2 denotes a 'Required Date' field change by the user. The timestamp when the user made the change is the last column.
I have another temp table #LineInfo with the following data. This table is created by joining other tables and a left join with the above table too. The 'LineNum' column from below table will match the 'GroupLineNum' column in the above table for Type1=2
+---------+-----------+---------+------------+-------------------------+-------+
| OrderNo | RowNumber | LineNum | TotalCost | ReqDate | Type1 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order85 | 1 | 1 | 309.110000 | 2019-10-30 23:59:00.000 | 1 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order85 | 2 | 2 | 265.560000 | 2019-10-30 23:59:00.000 | 1 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order86 | 1 | 1 | 309.110000 | 2019-10-30 23:59:00.000 | 1 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order86 | 2 | 2 | 265.560000 | 2019-12-28 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order87 | 1 | 1 | 309.110000 | 2020-01-31 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order87 | 2 | 2 | 265.560000 | 2020-01-01 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order88 | 1 | 1 | 309.110000 | 2019-11-29 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order88 | 2 | 2 | 265.560000 | 2019-12-31 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
I will be joining #lineInfo with other tables to generate a new table with only one record for an orderno. Its grouped by orderno.
What I need to do is ensure that the new selectquery will have a column 'ReqDate' which will be the latest ReqDate value for the order.
For example, Order87 has two lines in the order. User updated Line 2 first at '2019-11-25 05:42:32.390' as seen in the row marked 'A' followed by Line 1 marked B # '2019-11-25 05:43:37.637 ' from the first table.
The new query should have the data from LineInfo and only the 'ReqDate' value matching the 'LineNum' that has the maximum of 'UpdateDate' column for Type1=2 and group by orderno.
So in our example, the output should have the ReqDate value '2020-01-31 23:59:00.000'.
In short, an order should have the most recently updated required date. Order can have multiple line items where reqdate is udpated. If there is no entry in #StatusInfo table with Type2 for an order, then any one of the ReqDate value from the #LineInfo table will suffice. Maybe the first line
I wrote something like this but it doesnt pull orders without any entry in StatusInfo table. Those orders will have a default value even though user didnt udpate and i am not sure how to join the result of this with LineInfo table to set the latest value
Select SIT.Orderno, max_date,grouplinenum
from #StatusInfo SIT
inner join
(SELECT Orderno, MAX(ActDate) as max_date
FROM #StatusInfo SI
WHERE SI.Type1=2
GROUP BY SI.Orderno)a
on a.Orderno = SIT.Orderno and a.max_date = SIT.ActDate
This is what I did. I created the blow CTE to load orders with req date change in order of Updated date and assigned it row number. Record with row number 1 will be the most recently updated date
;WITH cteLatestReqDate AS ( --We need to pull the latest ReqDate value the user set. So we are are ordering the SIT table by ActDate and assigning a row number and respective line's required date here
SELECT SIT.OrderNo, SIT.UpdateDate, SIT.GroupLineNum, LLI.ReqDate,
ROW_NUMBER() OVER (PARTITION BY SIT.OrderNo ORDER BY ActDate DESC) AS RowNum
FROM #StatusInfo SIT INNER JOIN #LineLevelInfo LLI ON SIT.OrderNo = OI.OrderNo AND SIT.GroupLineNum = LLI.LineNum
WHERE SIT.Type1 = 2
)
and then I added the below condition to my select query. Below select query is partial
SELECT
CASE WHEN MAX(LRD.ReqDate) IS NULL THEN CAST(FORMAT(MAX(LLI.ReqDate), 'yyMMdd') AS NVARCHAR(10))
ELSE CAST(FORMAT(MAX(LRD.ReqDate), 'yyMMdd') AS NVARCHAR(10)) END AS LatestReqDate
FROM #LineLevelInfo LLI
LEFT JOIN(SELECT * FROM cteLatestReqDate WHERE RowNum = 1)LRD ON LRD.OrderNo = LLI.OrderNo And LRD.GroupLineNum = LLI.LineNum

Select latest values for group of related records

I have a table that accommodates data that is logically groupable by multiple properties (foreign key for example). Data is sequential over continuous time interval; i.e. it is a time series data. What I am trying to achieve is to select only latest values for each group of groups.
Here is example data:
+-----------------------------------------+
| code | value | date | relation_id |
+-----------------------------------------+
| A | 1 | 01.01.2016 | 1 |
| A | 2 | 02.01.2016 | 1 |
| A | 3 | 03.01.2016 | 1 |
| A | 4 | 01.01.2016 | 2 |
| A | 5 | 02.01.2016 | 2 |
| A | 6 | 03.01.2016 | 2 |
| B | 1 | 01.01.2016 | 1 |
| B | 2 | 02.01.2016 | 1 |
| B | 3 | 03.01.2016 | 1 |
| B | 4 | 01.01.2016 | 2 |
| B | 5 | 02.01.2016 | 2 |
| B | 6 | 03.01.2016 | 2 |
+-----------------------------------------+
And here is example of desired output:
+-----------------------------------------+
| code | value | date | relation_id |
+-----------------------------------------+
| A | 3 | 03.01.2016 | 1 |
| A | 6 | 03.01.2016 | 2 |
| B | 3 | 03.01.2016 | 1 |
| B | 6 | 03.01.2016 | 2 |
+-----------------------------------------+
To put this in perspective — for every related object I want to select each code with latest date.
Here is a select I came with. I've used ROW_NUMBER OVER (PARTITION BY...) approach:
SELECT indicators.code, indicators.dimension, indicators.unit, x.value, x.date, x.ticker, x.name
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY indicator_id ORDER BY date DESC) AS r,
t.indicator_id, t.value, t.date, t.company_id, companies.sic_id,
companies.ticker, companies.name
FROM fundamentals t
INNER JOIN companies on companies.id = t.company_id
WHERE companies.sic_id = 89
) x
INNER JOIN indicators on indicators.id = x.indicator_id
WHERE x.r <= (SELECT count(*) FROM companies where sic_id = 89)
It works but the problem is that it is painfully slow; when working with about 5% of production data which equals to roughly 3 million fundamentals records this select take about 10 seconds to finish. My guess is that happens due to subselect selecting huge amounts of records first.
Is there any way to speed this query up or am I digging in wrong direction trying to do it the way I do?
Postgres offers the convenient distinct on for this purpose:
select distinct on (relation_id, code) t.*
from t
order by relation_id, code, date desc;
So your query uses different column names than your sample data, so it's hard to tell, but it looks like you just want to group by everything except for date? Assuming you don't have multiple most recent dates, something like this should work. Basically don't use the window function, use a proper group by, and your engine should optimize the query better.
SELECT mytable.code,
mytable.value,
mytable.date,
mytable.relation_id
FROM mytable
JOIN (
SELECT code,
max(date) as date,
relation_id
FROM mytable
GROUP BY code, relation_id
) Q1
ON Q1.code = mytable.code
AND Q1.date = mytable.date
AND Q1.relation_id = mytable.relation_id
Other option:
SELECT DISTINCT Code,
Relation_ID,
FIRST_VALUE(Value) OVER (PARTITION BY Code, Relation_ID ORDER BY Date DESC) Value,
FIRST_VALUE(Date) OVER (PARTITION BY Code, Relation_ID ORDER BY Date DESC) Date
FROM mytable
This will return top value for what ever you partition by, and for whatever you order by.
I believe we can try something like this
SELECT CODE,Relation_ID,Date,MAX(value)value FROM mytable
GROUP BY CODE,Relation_ID,Date