Query rows based on previous rows - sql

I have a table with some financial price data in it. This data has a type (one of 'T', 'B', or 'S'), a timestamp, and a price. I need to find the rows of type 'T' whose prices are either below the price of the previous 'B'-type row or above the price of the previous 'S'-type row.
Here is some example data:
+-------+------+----------------------------------+---------------+
| id | type | htime | price |
+-------+------+----------------------------------+---------------+
| 4505 | T | 2022-04-24 19:41:00.585891 | 5799.30000000 |
| 4506 | B | 2022-04-24 19:41:00.585891 | 5799.00000000 |
| 4507 | S | 2022-04-24 19:41:00.586754 | 5801.40000000 |
| 4508 | S | 2022-04-24 19:41:00.586802 | 5801.10000000 |
| 4509 | B | 2022-04-24 19:41:00.586818 | 5799.30000000 |
| 4510 | T | 2022-04-24 19:41:00.586820 | 5799.30000000 |
| 4511 | T | 2022-04-24 19:41:00.586820 | 5799.00000000 |
| 4512 | B | 2022-04-24 19:41:00.586820 | 5799.00000000 |
| 4515 | S | 2022-04-24 19:41:00.587087 | 5801.10000000 |
| 4516 | S | 2022-04-24 19:41:00.588252 | 5801.10000000 |
| 4591 | S | 2022-04-24 19:41:00.639867 | 5801.10000000 |
| 4608 | T | 2022-04-24 19:41:00.657640 | 5798.00000000 |
| 4609 | B | 2022-04-24 19:41:00.657640 | 5797.20000000 |
+-------+------+----------------------------------+---------------+
So here I would like to have the query return rows with id 4511 (type = 'T' and price is less than price of the previous row with type = 'B') and 4608 (same reason). I don't want row 4510, because its price is neither less than the previous 'B' row nor above the previous 'S' row. I probably just want to ignore row 4505, but it's not important to me what happens there.
I have tried the following query:
WITH bids AS (SELECT * FROM my_table WHERE type = 'B'),
offers AS (SELECT * FROM my_table WHERE type = 'S')
SELECT *
FROM (SELECT * FROM my_table WHERE type = 'T') trades
WHERE trades.price < (SELECT price FROM bids WHERE bids.htime < trades.htime ORDER BY htime DESC LIMIT 1)
OR trades.price > (SELECT price FROM offers WHERE offers.htime < trades.htime ORDER BY htime DESC LIMIT 1);
but it's extremely slow. I'm hoping there's an easier self-join type solution, but I'm pretty new at this.
There is an index on the table on (type, htime).
I am using MariaDB 10.5.13

Hello and welcome to SO!
Try this query, theoretically, it shouldn't be slower.
The idea in this solution is to add an extra result in the select clause, that will be generated using a subquery, the result of which will either be null if the criteria are not met or the id of the previous row that meets the criteria.
Finally you can wrap this to an extra select just to present the rows that actually meet the criteria.
select results.id, results.previousId from (
select t1.*, (select t2.id from test22 t2
where t1.type = 'T'
and ((t2.type='B' and t1.price<t2.price) or (t2.type='S' and t1.price>t2.price))
and t2.htime<t1.htime order by t2.htime asc limit 1) as previousId
from test22 as t1)
as results
where results.previousId is not null;

Related

Make a query making groups on the same result row

I have two tables. Like this.
select * from extrafieldvalues;
+----------------------------+
| id | value | type | idItem |
+----------------------------+
| 1 | 100 | 1 | 10 |
| 2 | 150 | 2 | 10 |
| 3 | 101 | 1 | 11 |
| 4 | 90 | 2 | 11 |
+----------------------------+
select * from items
+------------+
| id | name |
+------------+
| 10 | foo |
| 11 | bar |
+------------+
I need to make a query and get something like this:
+--------------------------------------+
| idItem | valtype1 | valtype2 | name |
+--------------------------------------+
| 10 | 100 | 150 | foo |
| 11 | 101 | 90 | bar |
+--------------------------------------+
The quantity of types of extra field values is variable, but every item ALWAYS uses every extra field.
If you have only two fields, then left join is an option for this:
select i.*, efv1.value as value_1, efv2.value as value_2
from items i left join
extrafieldvalues efv1
on efv1.iditem = i.id and
efv1.type = 1 left join
extrafieldvalues efv2
on efv1.iditem = i.id and
efv1.type = 2 ;
In terms of performance, two joins are probably faster than an aggregation -- and it makes it easier to bring in more columns from items. One the other hand, conditional aggregation generalizes more easily and the performance changes by little as more columns from extrafieldvalues are added to the select.
Use conditional aggregation
select iditem,
max(case when type=1 then value end) as valtype1,
max(case when type=2 then value end) as valtype2,name
from extrafieldvalues a inner join items b on a.iditem=b.id
group by iditem,name

Conditionally apply date filter based on column - Oracle SQL

I have a table that looks like this:
| Type | DueDate |
|:----:|:---------:|
| A | 1/1/2019 |
| B | 2/3/2019 |
| C | NULL |
| A | 1/3/2019 |
| B | 9/1/2019 |
| C | NULL |
| A | 3/3/2019 |
| B | 4/3/2019 |
| C | NULL |
| B | 1/6/2019 |
| A | 1/19/2019 |
| B | 8/1/2019 |
| C | NULL |
What I need to accomplish is:
Grab all rows that have Type C. For any other type, only grab them if they have a due date AFTER May 1st 2019.
This is a dummy data -- in actuality, there are 10 or 15 types and about ~125M or so rows.
I have tried SELECT * FROM tblTest WHERE ((Type IN ('A', 'B') AND DueDate > '05-01-2019') OR Type = 'C') but that yields exactly the table above.
Simply changing WHERE DUEDATE >= '05/01/2019' filters outNULL`
How can I edit my WHERE statement to achieve desired results of below?
| Type | DueDate |
|:----:|:--------:|
| C | NULL |
| B | 9/1/2019 |
| C | NULL |
| C | NULL |
| B | 8/1/2019 |
| C | NULL |
SQL FIDDLE for reference
If your date were stored using the correct type, you would simply do:
select t.*
from tbltest
where duedate > date '2019-05-01' or type = 'C';
I would suggest you fix the duedate column to have the correct type. Until that is fixed, you can workaround the problem:
select t.*
from tbltest
where to_date(duedate, 'MM/DD/YYYY') > date '2019-05-01' or type = 'C';
As per the answer by gordon you need to use this in or condition.
If you have more conditions in where clause apart from what is mentioned in question, you need to group the conditions.
select *
from tbltest
where (duedate > DATE '2019-05-01'
or type = 'C') -- group these condition using brackets
And other_condition;
Actually your original query has or condition with all other conditions without any brackets and that yields all the rows in result.
Cheers!!

Loop over one table, subselect another table and update values of first table with SQL/VBA

I have a source table that has a few different prices for each product (depending on the order quantity). Those prices are listed vertically, so each product could have more than one row to display its prices.
Example:
ID | Quantity | Price
--------------------------
001 | 5 | 100
001 | 15 | 90
001 | 50 | 80
002 | 10 | 20
002 | 20 | 15
002 | 30 | 10
002 | 40 | 5
The other table I have is the result table in which there is only one row for each product, but there are five columns that each could contain the quantity and price for each row of the source table.
Example:
ID | Quantity_1 | Price_1 | Quantity_2 | Price_2 | Quantity_3 | Price_3 | Quantity_4 | Price_4 | Quantity_5 | Price_5
---------------------------------------------------------------------------------------------------------------------------
001 | | | | | | | | | |
002 | | | | | | | | | |
Result:
ID | Quantity_1 | Price_1 | Quantity_2 | Price_2 | Quantity_3 | Price_3 | Quantity_4 | Price_4 | Quantity_5 | Price_5
---------------------------------------------------------------------------------------------------------------------------
001 | 5 | 100 | 15 | 90 | 50 | 80 | | | |
002 | 10 | 20 | 20 | 15 | 30 | 10 | 40 | 5 | |
Here is my Python/SQL solution for this (I'm fully aware that this could not work in any way, but this was the only way for me to show you my interpretation of a solution to this problem):
For Each result_ID In result_table.ID:
Subselect = (SELECT * FROM source_table WHERE source_table.ID = result_ID ORDER BY source_table.Quantity) # the Subselect should only contain rows where the IDs are the same
For n in Range(0, len(Subselect)): # n (index) should start from 0 to last row - 1
price_column_name = 'Price_' & (n + 1)
quantity_column_name = 'Quantity_' & (n + 1)
(UPDATE result_table
SET result_table.price_column_name = Subselect[n].Price, # this should be the price of the n-th row in Subselect
result_table.quantity_column_name = Subselect[n].Quantity # this should be the quantity of the n-th row in Subselect
WHERE result_table.ID = Subselect[n].ID)
I honestly have no idea how to do this with only SQL or VBA (those are the only languages I'd be able to use -> MS-Access).
This is a pain in MS Access. If you can enumerate the values, you can pivot them.
If we assume that price is unique (or quantity or both), then you can generate such a column:
select id,
max(iif(seqnum = 1, quantity, null)) as quantity_1,
max(iif(seqnum = 1, price, null)) as price_1,
. . .
from (select st.*,
(select count(*)
from source_table st2
where st2.id = st.id and st2.price >= st.price
) as seqnum
from source_table st
) st
group by id;
I should note that another solution would use data frames in Python. If you want to take that route, ask another question and tag it with the appropriate Python tags. This question is clearly a SQL question.

How can I do SQL query count based on certain criteria including row order

I've come across certain logic that I need for my SQL query. Given that I have a table as such:
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 1 | null | 2016-05-10 |
| 1 | null | 2016-05-09 |
| 1 | yes | 2016-05-08 |
+----------+-------+------------+
This table is produced by a simple query:
SELECT * FROM products WHERE product = 1 ORDER BY date desc
Now what I need to do is create a query to count the number of nulls for certain products by order of date until there is a yes value. So the above example the count would be 2 as there are 2 nulls until a yes.
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 2 | null | 2016-05-10 |
| 2 | yes | 2016-05-09 |
| 2 | null | 2016-05-08 |
+----------+-------+------------+
Above would return 1 as there is 1 null until a yes.
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 3 | yes | 2016-05-10 |
| 3 | yes | 2016-05-09 |
| 3 | null | 2016-05-08 |
+----------+-------+------------+
Above would return 0.
You need a Correlated Subquery like this:
SELECT COUNT(*)
FROM products AS p1
WHERE product = 1
AND Date >
( -- maximum date with 'yes'
SELECT MAX(Date)
FROM products AS p2
WHERE p1.product = p2.product
AND Valid = 'yes'
)
This should do it:
select count(1) from table where valid is null and date > (select min(date) from table where valid = 'yes')
Not sure if your logic provided covers all the possible weird and wonderful extreme scenarios but the following piece of code would do what you are after:
select a.product,
count(IIF(a.valid is null and a.date >maxdate,a.date,null)) as total
from sometable a
inner join (
select product, max(date) as Maxdate
from sometable where valid='yes' group by product
) b
on a.product=b.product group by a.product

Select latest values for group of related records

I have a table that accommodates data that is logically groupable by multiple properties (foreign key for example). Data is sequential over continuous time interval; i.e. it is a time series data. What I am trying to achieve is to select only latest values for each group of groups.
Here is example data:
+-----------------------------------------+
| code | value | date | relation_id |
+-----------------------------------------+
| A | 1 | 01.01.2016 | 1 |
| A | 2 | 02.01.2016 | 1 |
| A | 3 | 03.01.2016 | 1 |
| A | 4 | 01.01.2016 | 2 |
| A | 5 | 02.01.2016 | 2 |
| A | 6 | 03.01.2016 | 2 |
| B | 1 | 01.01.2016 | 1 |
| B | 2 | 02.01.2016 | 1 |
| B | 3 | 03.01.2016 | 1 |
| B | 4 | 01.01.2016 | 2 |
| B | 5 | 02.01.2016 | 2 |
| B | 6 | 03.01.2016 | 2 |
+-----------------------------------------+
And here is example of desired output:
+-----------------------------------------+
| code | value | date | relation_id |
+-----------------------------------------+
| A | 3 | 03.01.2016 | 1 |
| A | 6 | 03.01.2016 | 2 |
| B | 3 | 03.01.2016 | 1 |
| B | 6 | 03.01.2016 | 2 |
+-----------------------------------------+
To put this in perspective — for every related object I want to select each code with latest date.
Here is a select I came with. I've used ROW_NUMBER OVER (PARTITION BY...) approach:
SELECT indicators.code, indicators.dimension, indicators.unit, x.value, x.date, x.ticker, x.name
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY indicator_id ORDER BY date DESC) AS r,
t.indicator_id, t.value, t.date, t.company_id, companies.sic_id,
companies.ticker, companies.name
FROM fundamentals t
INNER JOIN companies on companies.id = t.company_id
WHERE companies.sic_id = 89
) x
INNER JOIN indicators on indicators.id = x.indicator_id
WHERE x.r <= (SELECT count(*) FROM companies where sic_id = 89)
It works but the problem is that it is painfully slow; when working with about 5% of production data which equals to roughly 3 million fundamentals records this select take about 10 seconds to finish. My guess is that happens due to subselect selecting huge amounts of records first.
Is there any way to speed this query up or am I digging in wrong direction trying to do it the way I do?
Postgres offers the convenient distinct on for this purpose:
select distinct on (relation_id, code) t.*
from t
order by relation_id, code, date desc;
So your query uses different column names than your sample data, so it's hard to tell, but it looks like you just want to group by everything except for date? Assuming you don't have multiple most recent dates, something like this should work. Basically don't use the window function, use a proper group by, and your engine should optimize the query better.
SELECT mytable.code,
mytable.value,
mytable.date,
mytable.relation_id
FROM mytable
JOIN (
SELECT code,
max(date) as date,
relation_id
FROM mytable
GROUP BY code, relation_id
) Q1
ON Q1.code = mytable.code
AND Q1.date = mytable.date
AND Q1.relation_id = mytable.relation_id
Other option:
SELECT DISTINCT Code,
Relation_ID,
FIRST_VALUE(Value) OVER (PARTITION BY Code, Relation_ID ORDER BY Date DESC) Value,
FIRST_VALUE(Date) OVER (PARTITION BY Code, Relation_ID ORDER BY Date DESC) Date
FROM mytable
This will return top value for what ever you partition by, and for whatever you order by.
I believe we can try something like this
SELECT CODE,Relation_ID,Date,MAX(value)value FROM mytable
GROUP BY CODE,Relation_ID,Date