SQL select with preference on column values

SQL select with preference on column values - sql

I am new to SQL and I would like to ask about how to select entries based on preferences and grouping.
+----------+----------+------+
| ENTRY_ID | ROUTE_ID | TYPE |
+----------+----------+------+
| 1 | 15 | 0 |
| 1 | 26 | 1 |
| 1 | 39 | 1 |
| 2 | 22 | 1 |
| 2 | 15 | 1 |
| 3 | 30 | 1 |
| 3 | 35 | 0 |
| 3 | 40 | 1 |
+----------+----------+------+
With the table above, I would like to select 1 entry for each ENTRY_ID with the following preference for the returned ROUTE_ID:
IF TYPE = 0 is available
for any one of the entries with the same ENTRY_ID, return the minimum ROUTE_ID for all entries with TYPE = 0
IF for the same ENTRY_ID only TYPE = 1 is available, return the minimum ROUTE_ID
The expected outcome for the query will be the following:
+----------+----------+------+
| ENTRY_ID | ROUTE_ID | TYPE |
+----------+----------+------+
| 1 | 15 | 0 |
| 2 | 15 | 1 |
| 3 | 35 | 0 |
+----------+----------+------+
Thank you for your help!

You can group by both TYPE and ENTRY_ID, and then use the HAVING clause to filter out those where TYPE is not the minimal value for that record.
SELECT ENTRY_ID, MIN(ROUTE_ID), TYPE
FROM MyTable
GROUP BY ENTRY_ID, TYPE
HAVING TYPE = (SELECT MIN(s.TYPE) FROM MyTable s WHERE s.ENTRY_ID = MyTable.ENTRY_ID)
This relies on type only being able to be 0 or 1. If there are more possible values, it will only return the lowest type.

If you want complete rows, use a correlated subquery:
select t.*
from t
where t.route_id = (select top 1 t2.route_id
from t as t2
where t2.entry_id = t.entry_id
order by iif(t2.type = 0, 1, 2), -- put type 0 first
t2.route_id asc -- then the first route_id
);
This has the advantage that it can return more than just the three columns you show in the question.

Related

Get range (min - max) of values concatenated in a single row

given the following table
+-----------------------------+
| id | type | price | item_id |
|-----------------------------|
| 1 | 1 | 20 | 22 |
|-----------------------------|
| 2 | 1 | 22 | 22 |
|-----------------------------|
| 3 | 2 | 19 | 22 |
|-----------------------------|
| 4 | 2 | 11 | 22 |
|-----------------------------|
| 5 | 1 | 08 | 22 |
|-----------------------------|
| 6 | 2 | 25 | 22 |
+-----------------------------+
I am trying to select the data to create a view as follows in a single row
+-------------------------------------+
| type1_range | type2_range | item_id |
|-------------------------------------|
| 08 - 22 | 11 - 25 | 22 |
+-------------------------------------+
type1_range and type2_range are the minimum and maximum price for each types.
I can get the data in couple of rows using
SELECT type, MAX (price) , MIN (price)
FROM table
where item_id=22 GROUP BY type;
+----------------------------+
| type | max | min | item_id |
|----------------------------|
| 1 | 22 | 08 | 22 |
|----------------------------|
| 2 | 25 | 11 | 22 |
+----------------------------+
But I am trying to concat the rows like this:
+-------------------------------------+
| type1_range | type2_range | item_id |
|-------------------------------------|
| 08 - 22 | 11 - 25 | 22 |
+-------------------------------------+
What would be sql required for this?

Something like this:
SELECT
CONCAT(
MIN(CASE WHEN type = 1 THEN price END),
' - ',
MAX(CASE WHEN type = 1 THEN price END)
) as type1range,
CONCAT(
MIN(CASE WHEN type = 2 THEN price END),
' - ',
MAX(CASE WHEN type = 2 THEN price END)
) as type2range.
item_id
FROM table
WHERE item_id = 22
GROUP BY item_id
You've tagged two different database systems (please avoid doing this) but I believe they do both support CONCAT() for string concatenation
If you want to omit the item_id from the select list (you already know it's item 22) you can remove the GROUP BY. Alternatively if you remove the WHERE and leave the group by you'll get a row for each item_id
To get more of an idea as to how it works, remove the concat and the min/max - you'll see that the case when causes the price to show up only if the type is 1 (in the type 1 range column) otherwise it's null. It's the. Trivial for the min and max to work on just type 1 or just type 2 data for each column. It's actually a form of pivot query if you want to read up on them more

A straight forward approach would be having type1_range and type2_range as two sub-queries and join with the distinct id's like shown below,
SELECT t.item_id,type1_range,type2_range
FROM (Select distinct item_id from table) t
LEFT join
(SELECT item_id,type, concat(MIN(price),'-' ,MAX(price) ) as type1_range
FROM table
where type=1
GROUP BY item_id,type)type1 on type1.item_id=t.item_id
LEFT join
(SELECT item_id,type, concat(MIN(price),'-' ,MAX(price) ) as type2_range
FROM table
where type=2
GROUP BY item_id,type)type2 on type2.item_id=t.item_id

Take first, second, third ... last value and selecting rows (Window function with filter and lag)

I would like to perform a window function with a filter clause, for example:
LAG("date", 1) FILTER (WHERE type='A') OVER (PARTITION BY id ORDER BY id ASC) AS "A_lag_1"
However, Postgres doesn't support this operation but I cant determine how else do it. Details below
Challange
Input tab_A:
+----+------+------+
| id | type | date |
+----+------+------+
| 1 | A | 30 |
| 1 | A | 25 |
| 1 | A | 20 |
| 1 | B | 29 |
| 1 | B | 28 |
| 1 | B | 21 |
| 1 | C | 24 |
| 1 | C | 22 |
+----+------+------+
Desired output:
+----+------+------+---------+---------+---------+---------+---------+---------+
| id | type | date | A_lag_1 | A_lag_2 | B_lag_1 | B_lag_2 | C_lag_1 | C_lag_2 |
+----+------+------+---------+---------+---------+---------+---------+---------+
| 1 | A | 30 | 25 | 20 | 29 | 28 | 24 | 22 |
| 1 | A | 25 | 20 | | | | 24 | 22 |
| 1 | A | 20 | | | | | | |
| 1 | B | 29 | 25 | 20 | 28 | 21 | 24 | 22 |
| 1 | B | 28 | 25 | 20 | 21 | | 24 | 22 |
| 1 | B | 21 | 20 | | | | 24 | 22 |
| 1 | C | 24 | 20 | | 21 | | 22 | |
| 1 | C | 22 | 20 | | 21 | | | |
+----+------+------+---------+---------+---------+---------+---------+---------+
In words:
For each row select all rows which occurred before it (see date column)
Then for each type ('A', 'B', 'C') put the most recent date in A_lag_1 and the second the most recent (by date) value in A_lag_2 for type 'A', and B_lag_1, B_lag_2 for 'B' etc.
The above example is quite simplified, in my real use case there will be a lot more id values, more lag column iterations A_lag_X and types.
Posible solution
This challenge seems a perfect fit for a window function as I want to keep the same number of rows tab_A and append information which is related to the row but in the past.
So constructing it using a window function (sqlfiddle):
SELECT
id, type, "date",
LAG("date", 1) FILTER (WHERE type='A') OVER (PARTITION BY id ORDER BY id ASC, "date" DESC) AS "A_lag_1",
LAG("date", 2) FILTER (WHERE type='A') OVER (PARTITION BY id ORDER BY id ASC, "date" DESC) AS "A_lag_2",
LAG("date", 1) FILTER (WHERE type='B') OVER (PARTITION BY id ORDER BY id ASC, "date" DESC) AS "B_lag_1",
LAG("date", 2) FILTER (WHERE type='B') OVER (PARTITION BY id ORDER BY id ASC, "date" DESC) AS "B_lag_2",
LAG("date", 1) FILTER (WHERE type='C') OVER (PARTITION BY id ORDER BY id ASC, "date" DESC) AS "C_lag_1",
LAG("date", 2) FILTER (WHERE type='C') OVER (PARTITION BY id ORDER BY id ASC, "date" DESC) AS "C_lag_2"
FROM tab_A
However, I get the following error:
ERROR: FILTER is not implemented for non-aggregate window functions
Position: 30
Although this error is referenced in the documentation I cant determine another way of doing it.
Any help would be much appreciated.
Other SO questions:
1. This answer relies on using an aggregate function such as max. However, this won't work when trying to retrieve the 2nd last row, 3rd last row etc.

Another possible solution using a lateral join (fiddle):
SELECT
a.id,
a.type,
a."date",
c.nn_row,
c.type,
c."date" as "date_joined"
FROM tab_A AS a
LEFT JOIN LATERAL (
SELECT
type,
"date",
row_number() OVER (PARTITION BY id, type ORDER BY id ASC, "date" DESC) as nn_row
FROM tab_A AS b
WHERE a."date" > b."date"
) AS c on true
WHERE c.nn_row <= 5
This creates a long table like:
+----+------+------+--------+------+-------------+
| id | type | date | nn_row | type | date_joined |
+----+------+------+--------+------+-------------+
| 1 | A | 30 | 1 | A | 25 |
| 1 | A | 30 | 2 | A | 20 |
| 1 | A | 30 | 1 | B | 29 |
| 1 | A | 30 | 2 | B | 28 |
| 1 | A | 30 | 3 | B | 21 |
| 1 | A | 30 | 1 | C | 24 |
| 1 | A | 30 | 2 | C | 22 |
| 1 | A | 25 | 1 | A | 20 |
| 1 | A | 25 | 1 | B | 21 |
| 1 | A | 25 | 1 | C | 24 |
| 1 | A | 25 | 2 | C | 22 |
| 1 | B | 29 | 1 | A | 25 |
| 1 | B | 29 | 2 | A | 20 |
| 1 | B | 29 | 1 | B | 28 |
| 1 | B | 29 | 2 | B | 21 |
| 1 | B | 29 | 1 | C | 24 |
| 1 | B | 29 | 2 | C | 22 |
| 1 | B | 28 | 1 | A | 25 |
| 1 | B | 28 | 2 | A | 20 |
| 1 | B | 28 | 1 | B | 21 |
| 1 | B | 28 | 1 | C | 24 |
| 1 | B | 28 | 2 | C | 22 |
| 1 | B | 21 | 1 | A | 20 |
| 1 | C | 24 | 1 | A | 20 |
| 1 | C | 24 | 1 | B | 21 |
| 1 | C | 24 | 1 | C | 22 |
| 1 | C | 22 | 1 | A | 20 |
| 1 | C | 22 | 1 | B | 21 |
+----+------+------+--------+------+-------------+
After which you can pivot to the desired output.
However, this worked for me on a small sample but on the full table Postgres ran out of disk space (even though I have 50GB available):
ERROR: could not write to hash-join temporary file: No space left on device
I have posted this solution here as it might work for others who have smaller tables

As the FILTER clause does work with aggregate functions, I decided to write my own.
----- N = 1
-- State transition function
-- agg_state: the current state, el: new element
create or replace function lag_agg_sfunc_1(agg_state point, el float)
returns point
immutable
language plpgsql
as $$
declare
i integer;
stored_value float;
begin
i := agg_state[0];
stored_value := agg_state[1];
i := i + 1; -- First row i=1
if i = 1 then
stored_value := el;
end if;
return point(i, stored_value);
end;
$$;
-- Final function
--DROP FUNCTION lag_agg_ffunc_1(point) CASCADE;
create or replace function lag_agg_ffunc_1(agg_state point)
returns float
immutable
strict
language plpgsql
as $$
begin
return agg_state[1];
end;
$$;
-- Aggregate function
drop aggregate if exists lag_agg_1(double precision);
create aggregate lag_agg_1 (float) (
sfunc = lag_agg_sfunc_1,
stype = point,
finalfunc = lag_agg_ffunc_1,
initcond = '(0,0)'
);
----- N = 2
-- State transition function
-- agg_state: the current state, el: new element
create or replace function lag_agg_sfunc_2(agg_state point, el float)
returns point
immutable
language plpgsql
as $$
declare
i integer;
stored_value float;
begin
i := agg_state[0];
stored_value := agg_state[1];
i := i + 1; -- First row i=1
if i = 2 then
stored_value := el;
end if;
return point(i, stored_value);
end;
$$;
-- Final function
--DROP FUNCTION lag_agg_ffunc_2(point) CASCADE;
create or replace function lag_agg_ffunc_2(agg_state point)
returns float
immutable
strict
language plpgsql
as $$
begin
return agg_state[1];
end;
$$;
-- Aggregate function
drop aggregate if exists lag_agg_2(double precision);
create aggregate lag_agg_2 (float) (
sfunc = lag_agg_sfunc_2,
stype = point,
finalfunc = lag_agg_ffunc_2,
initcond = '(0,0)'
);
You can use the above aggregate functions lag_agg_1 and lag_agg_2 with the window expression in the original question:
SELECT
id, type, "date",
NULLIF(lag_agg_1("date") FILTER (WHERE type='A') OVER (PARTITION BY id ORDER BY id ASC, "date" DESC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING), 0) AS "A_lag_1",
NULLIF(lag_agg_2("date") FILTER (WHERE type='A') OVER (PARTITION BY id ORDER BY id ASC, "date" DESC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING), 0) AS "A_lag_2",
NULLIF(lag_agg_1("date") FILTER (WHERE type='B') OVER (PARTITION BY id ORDER BY id ASC, "date" DESC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING), 0) AS "B_lag_1",
NULLIF(lag_agg_2("date") FILTER (WHERE type='B') OVER (PARTITION BY id ORDER BY id ASC, "date" DESC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING), 0) AS "B_lag_2",
NULLIF(lag_agg_1("date") FILTER (WHERE type='C') OVER (PARTITION BY id ORDER BY id ASC, "date" DESC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING), 0) AS "C_lag_1",
NULLIF(lag_agg_2("date") FILTER (WHERE type='C') OVER (PARTITION BY id ORDER BY id ASC, "date" DESC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING), 0) AS "C_lag_2"
FROM tab_A
ORDER BY id ASC, type, "date" DESC
This executes reasonably quickly compared to the other options. Some things that could be improved:
I couldn't determine how to work with null values properly and so at the end fudging the result by converting all 0's to NULLs. This will cause problems in certain situations
I have just copied and pasted the functions for each lag_X as I couldn't determine how to parameterise this
Any help with the above would be much appreciated

You can try something like below.
SELECT
dt.* ,
(SELECT MAX(b.dateVAL) FROM tab_A b WHERE b.type = 'A' AND dt.A_lag_1 > b.dateVAL ) AS "A_lag_2",
(SELECT MAX(b.dateVAL) FROM tab_A b WHERE b.type = 'B' AND dt.B_lag_1 > b.dateVAL ) AS "B_lag_2" ,
(SELECT MAX(b.dateVAL) FROM tab_A b WHERE b.type = 'C' AND dt.C_lag_1 > b.dateVAL ) AS "C_lag_2"
FROM
(
SELECT
a.id, a.type, a.dateVAL,
(SELECT MAX(b.dateVAL) FROM tab_A b WHERE b.type = 'A' AND a.dateVAL > b.dateVAL ) as A_lag_1,
(SELECT MAX(b.dateVAL) FROM tab_A b WHERE b.type = 'B' AND a.dateVAL > b.dateVAL ) as B_lag_1,
(SELECT MAX(b.dateVAL) FROM tab_A b WHERE b.type = 'C' AND a.dateVAL > b.dateVAL ) as C_lag_1
FROM tab_A a
) dt
Here is the Fiddle link. this may not be the most efficient way to do that.

SQL to group time series meeting a criteria, according to start and end time

I am analyzing power systems time series data, and I am trying to find the contiguous data points that meet a certain boolean flag.
I would like to query this table by returning the start and end time corresponding to the inflection points wherein the value changed from 1 to 0, and 0 to 1.
How should go about implementing the pseudo-sql code below?
SELECT Time
FROM InputTable
WHERE InputTable.Value = 1
INTO OutputTable??, TimeStart??, TimeEnd??;
Input:
+-------+---------+------+
| Index | Time | Value|
+-------+---------+------+
| 0 | 00:00:01| 1 |
| 1 | 00:00:02| 1 |
| 2 | 00:00:03| 1 |
| 3 | 00:00:04| 0 |
| 4 | 00:00:05| 1 |
| 5 | 00:00:06| 1 |
| 6 | 00:00:07| 0 |
| 7 | 00:00:08| 1 |
+-------+---------+------+
Output:
+-------+-----------+----------+
| Index | TimeStart | TimeEnd |
+-------+-----------+----------+
| 0 | 00:00:01 | 00:00:03 |
| 1 | 00:00:05 | 00:00:06 |
| 2 | 00:00:08 | 00:00:08 |
+-------+-----------+----------+

You need to group the values based on adjacent "1"s. This is tricky in MS Access. One method that can be used in Access is to count the number of "0"s (or non-"1" values) before each row.
select ind, min(time), max(time)
from (select t.*,
(select 1 + count(*)
from inputtable as t2
where t2.value = 0 and t2.time < t.time
) as ind
from inputtable as t
) as t
where value = 1
group by ind

No rowid or key need most recent row

I am trying my hardest to get a list of the most recent rows by date in a DB2 file. The file has no unique id, so I am trying to get the entries by matching a set of columns. I need DESCGA most importantly as that changes often. When it does they keep another row for historical reasons.
SELECT B.COGA, B.COMSUBGA, B.ACCTGA, B.PRFXGA, B.DESCGA
FROM mylib.myfile B
WHERE
(
SELECT COUNT(*)
FROM
(
SELECT A.COGA,A.COMSUBGA,A.ACCTGA,A.PRFXGA,MAX(A.DATEGA) AS EDATE
FROM mylib.myfile A
GROUP BY A.COGA, A.COMSUBGA, A.ACCTGA, A.PRFXGA
) T
WHERE
(B.ACCTGA = T.ACCTGA AND
B.COGA = T.COGA AND
B.COMSUBGA = T.COMSUBGA AND
B.PRFXGA = T.PRFXGA AND
B.DATEGA = T.EDATE)
) > 1
This is what I am trying and so far I get 0 results.
If I remove
B.ACCTGA = T.ACCTGA AND
It will return results (of course wrong).
I am using ODBC in VS 2013 to structure this query.
I have a table with the following
| a | b | descri | date |
-----------------------------
| 1 | 0 | string | 20140102 |
| 2 | 1 | string | 20140103 |
| 1 | 1 | string | 20140101 |
| 1 | 1 | string | 20150101 |
| 1 | 0 | string | 20150102 |
| 2 | 1 | string | 20150103 |
| 1 | 1 | string | 20150103 |
and i need
| 1 | 0 | string | 20150102 |
| 2 | 1 | string | 20150103 |
| 1 | 1 | string | 20150103 |

You can use row_number():
select t.*
from (select t.*,
row_number() over (partition by a, b order by date desc) as seqnum
from mylib.myfile t
) t
where seqnum = 1;

Select dynamic couples of lines in SQL (PostgreSQL)

My objective is to make dynamic group of lines (of product by TYPE & COLOR in fact)
I don't know if it's possible just with one select query.
But : I want to create group of lines (A PRODUCT is a TYPE and a COLOR) as per the number_per_group column and I want to do this grouping depending on the date order (Order By DATE)
A single product with a NB_PER_GROUP number 2 is exclude from the final result.
Table :
-----------------------------------------------
NUM | TYPE | COLOR | NB_PER_GROUP | DATE
-----------------------------------------------
0 | 1 | 1 | 2 | ...
1 | 1 | 1 | 2 |
2 | 1 | 2 | 2 |
3 | 1 | 2 | 2 |
4 | 1 | 1 | 2 |
5 | 1 | 1 | 2 |
6 | 4 | 1 | 3 |
7 | 1 | 1 | 2 |
8 | 4 | 1 | 3 |
9 | 4 | 1 | 3 |
10 | 5 | 1 | 2 |
Results :
------------------------
GROUP_NUMBER | NUM |
------------------------
0 | 0 |
0 | 1 |
~~~~~~~~~~~~~~~~~~~~~~~~
1 | 2 |
1 | 3 |
~~~~~~~~~~~~~~~~~~~~~~~~
2 | 4 |
2 | 5 |
~~~~~~~~~~~~~~~~~~~~~~~~
3 | 6 |
3 | 8 |
3 | 9 |
If you have another way to solve this problem, I will accept it.

What about something like this?
select max(gn.group_number) group_number, ip.num
from products ip
join (
select date, type, color, row_number() over (order by date) - 1 group_number
from (
select op.num, op.type, op.color, op.nb_per_group, op.date, (row_number() over (partition by op.type, op.color order by op.date) - 1) % nb_per_group group_order
from products op
) sq
where sq.group_order = 0
) gn
on ip.type = gn.type
and ip.color = gn.color
and ip.date >= gn.date
group by ip.num
order by group_number, ip.num
This may only work if your nb_per_group values are the same for each combination of type and color. It may also require unique dates, but that could probably be worked around if required.
The innermost subquery partitions the rows by type and color, orders them by date, then calculates the row numbers modulo nb_per_group; this forms a 0-based count for the group that resets to 0 each time nb_per_group is exceeded.
The next-level subquery finds all of the 0 values we mapped in the lower subquery and assigns group numbers to them.
Finally, the outermost query ties each row in the products table to a group number, calculated as the highest group number that split off before this product's date.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL select with preference on column values - sql

Related

Get range (min - max) of values concatenated in a single row

Take first, second, third ... last value and selecting rows (Window function with filter and lag)

SQL to group time series meeting a criteria, according to start and end time

No rowid or key need most recent row

Select dynamic couples of lines in SQL (PostgreSQL)

Categories

Resources