propagate hierarchical values from a single column

propagate hierarchical values from a single column - sql

I have table with structure like this, which represents status changes.
+-----------+----------+----------------+-----------------+-------+
| record_id | group_id | attribute type | change date | value |
+-----------+----------+----------------+-----------------+-------+
| 1 | 1 | status | 4/16/2008 18:59 | s1 |
| 2 | 1 | details | 4/16/2008 18:59 | d5 |
| 3 | 1 | details | 8/7/2008 18:31 | d2 |
| 4 | 1 | details | 2/5/2009 22:15 | d1 |
| 5 | 1 | status | 4/3/2009 21:27 | s2 |
| 6 | 1 | details | 4/3/2009 21:27 | d7 |
| 7 | 2 | status | 4/3/2009 21:46 | s1 |
| 8 | 2 | details | 4/3/2009 21:46 | d1 |
+-----------+----------+----------------+-----------------+-------+
I'd like to query status changes and status details changes in two columns, grouped by time stamp (actually any status change makes change to details, so only details change timestamp could be used for easier grouping) and status propagated to related details, like this:
+-----------+-----------------+--------+---------+
| object id | change date | status | details |
+-----------+-----------------+--------+---------+
| 1 | 4/16/2008 18:59 | s1 | d5 |
| 1 | 8/7/2008 18:31 | s1 | d2 |
| 1 | 2/5/2009 22:15 | s1 | d1 |
| 1 | 4/3/2009 21:27 | s2 | d3 |
| 2 | 4/3/2009 21:46 | s1 | d1 |
+-----------+-----------------+--------+---------+
this is what I've started with, but it leaves me with NULLs
SELECT history.record_id,
history.group_id,
history.changedate,
status_chages.value AS status,
history.value AS details
FROM history
LEFT JOIN (SELECT
history.group_id,
history.changedate,
history.value
FROM history
WHERE history.attribute_type = 'status') status_chages
ON status_chages.group_id = history.group_id AND
status_chages.changedate = history.changedate
WHERE history.attribute = 'details'
First thing which came to my mind is to fill NULLs with previous row data.
But is there a better approach for querying the result listed above?

The query gives desired layout:
select
group_id,
change_date,
max(case attr_type when 'status' then value else null end) as status,
max(case attr_type when 'status' then null else value end) as detail
from history
group by 1, 2
order by 1, 2;
group_id | change_date | status | detail
----------+---------------------+--------+--------
1 | 2008-04-16 18:59:00 | s1 | d5
1 | 2008-08-07 18:31:00 | | d2
1 | 2009-02-05 22:15:00 | | d1
1 | 2009-04-03 21:27:00 | s2 | d7
2 | 2009-04-03 21:46:00 | s1 | d1
(5 rows)
You can fill nulls with previous values in such a way:
select
group_id,
change_date,
max(status) over (partition by group_id, part) status,
detail
from (
select *, count(status) over (partition by group_id order by change_date) part
from (
select
group_id,
change_date,
max(case attr_type when 'status' then value else null end) as status,
max(case attr_type when 'status' then null else value end) as detail
from history
group by 1, 2
) s
) s
order by 1, 2;
group_id | change_date | status | detail
----------+---------------------+--------+--------
1 | 2008-04-16 18:59:00 | s1 | d5
1 | 2008-08-07 18:31:00 | s1 | d2
1 | 2009-02-05 22:15:00 | s1 | d1
1 | 2009-04-03 21:27:00 | s2 | d7
2 | 2009-04-03 21:46:00 | s1 | d1
(5 rows)
Test it in rextester.

Related

Aggregation in Join and where

I have this Query for Invertory Balance and work well:
Select A.BATCH_ID ,
A.QTY_MOV - IsNull(B.QTY_USED,0) As BALANCE
From P_BATCH_PRODUC A
Left OUTER Join (Select MATERIAL_ID,
BATCH_MATERIAL_ID),
SUM(QTY_INS) QTY_USED
From CONSUMPTION
Group By MATERIAL_ID, BATCH_MATERIAL_ID) As B
On B.MATERIAL_ID= A.PRODUCT_ID
And A.BATCH_ID = B.BATCH_MATERIAL_ID"
Where A.QTY_MOV - IsNull(B.QTY_USED,0) > 0
AND A.PRODUCT_ID= 1
and A.BATCH_ID = 1
But now, it's possible to have more than one A.QTY_MOV for each A.BATCH_ID , so i need to Change A.QTY_MOV to Sum(A.QTY_MOV ). What do I need to change for that?
Sample:
Table A
+------------+------------+---------+
| Product_ID | Batch_ID | Qty_Mov |
+------------+------------+---------+
| 1 | 1 | 100 |
| 1 | 1 | 150 |
| 2 | 1 | 80 |
| 1 | 3 | 100 |
| 1 | 4 | 100 |
+------------+------------+---------+
Table B
+------------------+------------+------------+----------+--+
| BATCH_MATERIAL_ID| Product_ID | Batch_ID | Qty_USED | |
+------------------+------------+------------+----------+--+
| 1 | 1 | 1 | 80 | |
| 2 | 1 | 1 | 10 | |
| 3 | 1 | 2 | 150 | |
| 4 | 1 | 3 | 80 | |
+------------------+------------+------------+----------+--+
This is what I want
Batch_ID BALANCE
---------- ---------------
1 160

Based strictly on the question, it sounds like you want a window function:
Select A.BATCH_ID ,
SUM(A.QTY_MOV) OVER (PARTITION BY A.BATCH_ID) - IsNull(B.QTY_USED,0) As BALANCE
I don't know if this does anything useful. If it does not, you should ask a new question with sample data and an explanation of logic.

Create table based on max value

I am working on a PostgreSQL database with data from car tracking which looks similar to this.
+--------+-------+---------+------------+
| car_id | trip | speed | Segment |
+--------+-------+---------+------------+
| 1 | 1 | 82 | s1 |
| 1 | 1 | 81 | |
| 1 | 1 | 85 | s1 |
| 1 | 2 | 82 | s1 |
| 1 | 2 | 76 | s2 |
| 2 | 3 | 80 | s1 |
| 2 | 3 | 84 | s2 |
| 2 | 3 | 83 | s2 |
+--------+-------+---------+------------+
Where every car has a specific car_id, and the trip changes based on that car_id or a change in date-time bigger than 5 sec. For every data point the speed is registered, and what part of the road the track belongs to (segment).
I would like to end up with a table where the maximum speed is shown, for each trip for each segment. If possible the car_id should be shown as well. It should look like this:
+-------+----------+------+------+
| trip | car_id | s1 | s2 |
+-------+----------+------+------+
| 1 | 1 | 85 | |
| 2 | 1 | 82 | 76 |
| 3 | 2 | 80 | 84 |
+-------+----------+------+------+
I have tried to use a group by but I can't make it work. I will be grateful if anyone can help.

I think this is just conditional aggregation:
select trip_id, car_id,
max(speed) filter (where segment = 's1') as s1,
max(speed) filter (where segment = 's2') as s2
from t
group by trip_id, car_id

PostgreSQL multiple row as columns

I have a table like this:
| id | name | segment | date_created | question | answer |
|----|------|---------|--------------|----------|--------|
| 1 | John | 1 | 2018-01-01 | 10 | 28 |
| 1 | John | 1 | 2018-01-01 | 14 | 37 |
| 1 | John | 1 | 2018-01-01 | 9 | 83 |
| 2 | Jack | 3 | 2018-03-11 | 22 | 13 |
| 2 | Jack | 3 | 2018-03-11 | 23 | 16 |
And I want to show this information in a single row, transpose all the questions and answers as columns:
| id | name | segment | date_created | question_01 | answer_01 | question_02 | answer_02 | question_03 | answer_03 |
|----|------|---------|--------------|-------------|-----------|-------------|-----------|-------------|-----------|
| 1 | John | 1 | 2018-01-01 | 10 | 28 | 14 | 37 | 9 | 83 |
| 2 | Jack | 3 | 2018-03-11 | 22 | 13 | 23 | 16 | | |
The number os questions/answers for the same ID is known. Maximum of 15.
I'm already tried using crosstab, but it only accepts a single value as category and I can have 2 (question/answer). Any help how to solve this?

You can try to use row_number to make a number in subquery then, do Aggregate function condition in the main query.
SELECT ID,
Name,
segment,
date_created,
max(CASE WHEN rn = 1 THEN question END) question_01 ,
max(CASE WHEN rn = 1 THEN answer END) answer_01 ,
max(CASE WHEN rn = 2 THEN question END) question_02,
max(CASE WHEN rn = 2 THEN answer END) answer_02,
max(CASE WHEN rn = 3 THEN question END) question_03,
max(CASE WHEN rn = 3 THEN answer END) answer_03
FROM (
select *,Row_number() over(partition by ID,Name,segment,date_created order by (select 1)) rn
from T
) t1
GROUP BY ID,Name,segment,date_created
sqlfiddle
[Results]:
| id | name | segment | date_created | question_01 | answer_01 | question_02 | answer_02 | question_03 | answer_03 |
|----|------|---------|--------------|-------------|-----------|-------------|-----------|-------------|-----------|
| 1 | John | 1 | 2018-01-01 | 1 | 28 | 14 | 37 | 9 | 83 |
| 2 | Jack | 3 | 2018-03-11 | 22 | 13 | 23 | 16 | (null) | (null) |

SQL Server pivot with "ties"

Here is my source data:
+-------+-------+-------+------+
| Categ | Nm | Value | Rnk |
+-------+-------+-------+------+
| A | Tom | 37 | 1 |
| A | Joe | 36 | 2 |
| A | Eddie | 35 | 3 |
| B | Seth | 28 | 1 |
| B | Ed | 25 | 2 |
| B | Billy | 22 | 3 |
| C | Julie | 42 | 1 |
| C | Jenny | 41 | 2 |
| C | April | 40 | 3 |
| C | Mary | 40 | 3 |
| C | Laura | 40 | 3 |
+-------+-------+-------+------+
And here is the output I would like to produce:
+------+--------+--------+-------+
| Rnk | A | B | C |
+------+--------+--------+-------+
| 1 | Tom | Seth | Julie |
| 2 | Joe | Ed | Jenny |
| 3 | Eddie | Billy | April |
| 3 | (null) | (null) | Mary |
| 3 | (null) | (null) | Laura |
+------+--------+--------+-------+
I have used the following approach (which I understand through other posts may be superior to actually using PIVOT)...and this gets me to where I see Julie/Jenny/April, but not Mary/Laura (obviously, since it is pulling the MIN in the event of a 'tie').
SELECT Rnk
, min(CASE WHEN Categ = 'A' THEN Nm END) as A
, min(CASE WHEN Categ = 'B' THEN Nm END) as B
, min(CASE WHEN Categ = 'C' THEN Nm END) as C
FROM Tbl
GROUP BY Rnk
How to get to my desired output?

Well, if you want multiple rows for each rank, you can't aggregate by rank, or at least by rank alone. So, calculate the rank-within-the-rank or as the following query calls it, the sub_rnk:
SELECT Rnk,
min(CASE WHEN Categ = 'A' THEN Nm END) as A,
min(CASE WHEN Categ = 'B' THEN Nm END) as B,
min(CASE WHEN Categ = 'C' THEN Nm END) as C
FROM (select t.*, row_number() over (partition by categ, rnk order by newid()) as sub_rnk
from Tbl t
) t
GROUP BY rnk, sub_rnk
ORDER BY rnk;

How to do this GROUP BY with the wanted result?

Basically, I have a table with all the bus stops of a route with the time_from_start value, that helps to put them in a good order.
CREATE TABLE `api_routestop` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`route_id` int(11) NOT NULL,
`station_id` varchar(10) NOT NULL,
`time_from_start` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `api_routestop_4fe3422a` (`route_id`),
KEY `api_routestop_15e3331d` (`station_id`)
)
I want to return for each stop of a line the time to go to the next stop.
I tried with this QUERY :
SELECT r1.station_id, r2.station_id, r1.route_id, COUNT(*), (r2.time_from_start - r1.time_from_start) as time
FROM api_routestop r1
LEFT JOIN api_routestop r2 ON r1.route_id = r2.route_id AND r1.id <> r2.id
GROUP BY r1.station_id
HAVING time >= 0
ORDER BY r1.route_id, r1.time_from_start, r2.time_from_start
But the group by seams not to work and the result looks like :
+------------+------------+----------+----------+------+
| station_id | station_id | route_id | COUNT(*) | time |
+------------+------------+----------+----------+------+
| Rub01 | Sal01 | 1 | 16 | 1 |
| Lyc02 | Sch02 | 2 | 17 | 2 |
| Paq01 | PoB01 | 3 | 15 | 1 |
| LaT02 | Gco02 | 4 | 16 | 1 |
| Sup01 | Tur01 | 5 | 132 | 1 |
| Oeu02 | CtC02 | 6 | 20 | 2 |
| Ver02 | Elo02 | 7 | 38 | 1 |
| Can01 | Mbo01 | 8 | 70 | 1 |
| Ver01 | Elo01 | 9 | 77 | 1 |
| MCH01 | for02 | 10 | 77 | 1 |
+------------+------------+----------+----------+------+
If I do that :
SELECT r1.station_id, r2.station_id, r1.route_id, COUNT(*), (r2.time_from_start - r1.time_from_start) as time
FROM api_routestop r1
LEFT JOIN api_routestop r2 ON r1.route_id = r2.route_id AND r1.id <> r2.id
GROUP BY r1.station_id, r2.station_id, r1.route_id
HAVING time >= 0
ORDER BY r1.route_id, r1.time_from_start, r2.time_from_start
I am approching :
+------------+------------+----------+----------+------+
| station_id | station_id | route_id | COUNT(*) | time |
+------------+------------+----------+----------+------+
| Rub01 | Sal01 | 1 | 1 | 1 |
| Rub01 | ARM01 | 1 | 1 | 2 |
| Rub01 | MaV01 | 1 | 1 | 4 |
| Rub01 | COl01 | 1 | 1 | 5 |
| Rub01 | Str01 | 1 | 1 | 6 |
| Rub01 | Jau01 | 1 | 1 | 7 |
| Rub01 | Cdp01 | 1 | 1 | 9 |
| Rub01 | Rep01 | 1 | 1 | 11 |
| Rub01 | CoT01 | 1 | 1 | 12 |
| Rub01 | Ctr01 | 1 | 1 | 14 |
| Rub01 | FLy01 | 1 | 1 | 15 |
| Rub01 | Lib01 | 1 | 1 | 17 |
| Rub01 | Bru01 | 1 | 1 | 18 |
| Rub01 | Sch01 | 1 | 1 | 20 |
| Rub01 | Lyc01 | 1 | 1 | 22 |
| Rub01 | Res01 | 1 | 1 | 24 |
| Sal01 | ARM01 | 1 | 1 | 1 |
| Sal01 | MaV01 | 1 | 1 | 3 |
| Sal01 | COl01 | 1 | 1 | 4 |
| Sal01 | Str01 | 1 | 1 | 5 |
| Sal01 | Jau01 | 1 | 1 | 6 |
| Sal01 | Cdp01 | 1 | 1 | 8 |
| Sal01 | Rep01 | 1 | 1 | 10 |
| Sal01 | CoT01 | 1 | 1 | 11 |
| Sal01 | Ctr01 | 1 | 1 | 13 |
| Sal01 | FLy01 | 1 | 1 | 14 |
| Sal01 | Lib01 | 1 | 1 | 16 |
| Sal01 | Bru01 | 1 | 1 | 17 |
| Sal01 | Sch01 | 1 | 1 | 19 |
| Sal01 | Lyc01 | 1 | 1 | 21 |
...
3769 rows in set (0.07 sec)
But what do I have to do to have only the first result for the same r1.station_id and r1.route_id ?

You're getting a lot of results back because your getting every stop joined to every other stop on the same route.
So you'll need to identify the "Next" stop as the stop that has the same route ID but has a minimum time from start later than the current one
Update Added routeId to the next_stop sub query to deal with the case of stations used in multiple routes
SELECT
r1.station_id,
r2.station_id,
r1.route_id,
r2.time_from_start - r1.time_from_start as time
FROM
api_routestop r1
INNER JOIN (SELECT
r1.station_id , r2.route_id, min(r2.time_from_start) next_time_from_start
FROM
api_routestop r1
LEFT JOIN api_routestop r2 ON r1.route_id = r2.route_id AND r1.id <> r2.id
and r2.time_from_start > r1.time_from_start
GROUP BY r1.Station_id, r2.route_id) next_stop
ON r1.Station_id = next_stop.station_id
and r1.route_id = next_stop.route_id
LEFT JOIN api_routestop r2
ON r2.time_from_start = r2.next_time_from_start
and r1.route_id = r2.route_id
AND r2.time_from_start > r1.time_from_start

SELECT station_id, coalesce(
(SELECT time_from_start
FROM api_routestop t2
WHERE t2.time_from_start > t1.time_from_start
AND t2.time_from_start <= (SELECT time_from_start FROM api_routestop t5 WHERE t5.station_id = '4' AND t5.route_id=t1.route_id)
AND t2.route_id = t1.route_id
ORDER BY t2.time_from_start LIMIT 1), time_from_start) - time_from_start AS difference
FROM api_routestop t1
WHERE t1.route_id = 1
AND t1.time_from_start >= (SELECT time_from_start FROM api_routestop t4 WHERE t4.station_id = '2' AND t4.route_id=t1.route_id)
AND t1.time_from_start <= (SELECT time_from_start FROM api_routestop t5 WHERE t5.station_id = '4' AND t5.route_id=t1.route_id)
ORDER BY time_from_start

Are you open to changing the schema? If so simply adding a column containing a sequential integer for all stops on route will make this query a lot easier and more efficient.
Failing that this will do it.
SELECT
station_id,
route_id,
time_from_start,
time_to_next
FROM
(
SELECT
station_id,route_id,time_from_start,
IF( #prev <> route_id, null, #time_from_start-time_from_start ) AS time_to_next,
#time_from_start := time_from_start,
#prev := route_id
FROM api_routestop
JOIN (SELECT #time_from_start := NULL, #prev := 0) AS r
ORDER BY route_id, time_from_start DESC
) t
ORDER BY route_id,time_from_start

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

propagate hierarchical values from a single column - sql

Related

Aggregation in Join and where

Create table based on max value

PostgreSQL multiple row as columns

SQL Server pivot with "ties"

How to do this GROUP BY with the wanted result?

Categories

Resources