Conditionally apply date filter based on column - Oracle SQL - sql

I have a table that looks like this:
| Type | DueDate |
|:----:|:---------:|
| A | 1/1/2019 |
| B | 2/3/2019 |
| C | NULL |
| A | 1/3/2019 |
| B | 9/1/2019 |
| C | NULL |
| A | 3/3/2019 |
| B | 4/3/2019 |
| C | NULL |
| B | 1/6/2019 |
| A | 1/19/2019 |
| B | 8/1/2019 |
| C | NULL |
What I need to accomplish is:
Grab all rows that have Type C. For any other type, only grab them if they have a due date AFTER May 1st 2019.
This is a dummy data -- in actuality, there are 10 or 15 types and about ~125M or so rows.
I have tried SELECT * FROM tblTest WHERE ((Type IN ('A', 'B') AND DueDate > '05-01-2019') OR Type = 'C') but that yields exactly the table above.
Simply changing WHERE DUEDATE >= '05/01/2019' filters outNULL`
How can I edit my WHERE statement to achieve desired results of below?
| Type | DueDate |
|:----:|:--------:|
| C | NULL |
| B | 9/1/2019 |
| C | NULL |
| C | NULL |
| B | 8/1/2019 |
| C | NULL |
SQL FIDDLE for reference

If your date were stored using the correct type, you would simply do:
select t.*
from tbltest
where duedate > date '2019-05-01' or type = 'C';
I would suggest you fix the duedate column to have the correct type. Until that is fixed, you can workaround the problem:
select t.*
from tbltest
where to_date(duedate, 'MM/DD/YYYY') > date '2019-05-01' or type = 'C';

As per the answer by gordon you need to use this in or condition.
If you have more conditions in where clause apart from what is mentioned in question, you need to group the conditions.
select *
from tbltest
where (duedate > DATE '2019-05-01'
or type = 'C') -- group these condition using brackets
And other_condition;
Actually your original query has or condition with all other conditions without any brackets and that yields all the rows in result.
Cheers!!

Related

Query rows based on previous rows

I have a table with some financial price data in it. This data has a type (one of 'T', 'B', or 'S'), a timestamp, and a price. I need to find the rows of type 'T' whose prices are either below the price of the previous 'B'-type row or above the price of the previous 'S'-type row.
Here is some example data:
+-------+------+----------------------------------+---------------+
| id | type | htime | price |
+-------+------+----------------------------------+---------------+
| 4505 | T | 2022-04-24 19:41:00.585891 | 5799.30000000 |
| 4506 | B | 2022-04-24 19:41:00.585891 | 5799.00000000 |
| 4507 | S | 2022-04-24 19:41:00.586754 | 5801.40000000 |
| 4508 | S | 2022-04-24 19:41:00.586802 | 5801.10000000 |
| 4509 | B | 2022-04-24 19:41:00.586818 | 5799.30000000 |
| 4510 | T | 2022-04-24 19:41:00.586820 | 5799.30000000 |
| 4511 | T | 2022-04-24 19:41:00.586820 | 5799.00000000 |
| 4512 | B | 2022-04-24 19:41:00.586820 | 5799.00000000 |
| 4515 | S | 2022-04-24 19:41:00.587087 | 5801.10000000 |
| 4516 | S | 2022-04-24 19:41:00.588252 | 5801.10000000 |
| 4591 | S | 2022-04-24 19:41:00.639867 | 5801.10000000 |
| 4608 | T | 2022-04-24 19:41:00.657640 | 5798.00000000 |
| 4609 | B | 2022-04-24 19:41:00.657640 | 5797.20000000 |
+-------+------+----------------------------------+---------------+
So here I would like to have the query return rows with id 4511 (type = 'T' and price is less than price of the previous row with type = 'B') and 4608 (same reason). I don't want row 4510, because its price is neither less than the previous 'B' row nor above the previous 'S' row. I probably just want to ignore row 4505, but it's not important to me what happens there.
I have tried the following query:
WITH bids AS (SELECT * FROM my_table WHERE type = 'B'),
offers AS (SELECT * FROM my_table WHERE type = 'S')
SELECT *
FROM (SELECT * FROM my_table WHERE type = 'T') trades
WHERE trades.price < (SELECT price FROM bids WHERE bids.htime < trades.htime ORDER BY htime DESC LIMIT 1)
OR trades.price > (SELECT price FROM offers WHERE offers.htime < trades.htime ORDER BY htime DESC LIMIT 1);
but it's extremely slow. I'm hoping there's an easier self-join type solution, but I'm pretty new at this.
There is an index on the table on (type, htime).
I am using MariaDB 10.5.13
Hello and welcome to SO!
Try this query, theoretically, it shouldn't be slower.
The idea in this solution is to add an extra result in the select clause, that will be generated using a subquery, the result of which will either be null if the criteria are not met or the id of the previous row that meets the criteria.
Finally you can wrap this to an extra select just to present the rows that actually meet the criteria.
select results.id, results.previousId from (
select t1.*, (select t2.id from test22 t2
where t1.type = 'T'
and ((t2.type='B' and t1.price<t2.price) or (t2.type='S' and t1.price>t2.price))
and t2.htime<t1.htime order by t2.htime asc limit 1) as previousId
from test22 as t1)
as results
where results.previousId is not null;

Insert new data BigQuery

I have this table in BigQuery:
+------------+---------+------+
| date | country | sum |
+------------+---------+------+
| 2020-01-01 | UK | 10 |
| 2020-01-01 | Spain | 34 |
| 2020-01-01 | Germany | 78 |
| 2020-01-01 | France | 81 |
+------------+---------+------+
With the UI of BigQuery, I edited the schema and created a new metric: AVG. I would like to insert this new info in this table from another table like this:
INSERT dataset.table_old (AVG)
SELECT AVG(m) FROM table_m
If I do that I get it this:
+------------+---------+------+------+
| date | country | sum | AVG |
+------------+---------+------+------+
| 2020-01-01 | UK | 10 | NULL |
| 2020-01-01 | Spain | 34 | NULL |
| 2020-01-01 | Germany | 78 | NULL |
| 2020-01-01 | France | 81 | NULL |
| NULL | NULL | NULL | 28 |
| NULL | NULL | NULL | 7 |
| NULL | NULL | NULL | 10 |
| NULL | NULL | NULL | 41 |
+------------+---------+------+------+
How could I get the correct table with the correspondent match?
Thanks!
Solution for your problem in that you need to apply UPDATE command rather than INSERT command. If you apply INSERT then it will treat it as a new Records.
Expected Solution:-
Step 1: You need to Identify over which field you are getting AVG(m) from table table_m, that can be joined with table table_old. It Could be (date, country or sum) based on your business requirement.
Step 2: Once you Joining Field in above step, then you can write following UPDATE query:-
UPDATE dataset.table_old a
SET AVG = (SELECT AVG(m) FROM table_m b where a.JOINING_Coll=b.JOINING_Coll GROUP BY Field)
WHERE 1=1
Important part is a.JOINING_Coll=b.JOINING_Coll where you need to identify which filed will be used to join 2 tables and perform UPDATE.
You could use a BQ MERGE statement.
add a new column avg in table_od with NULL value
Then do a MERGE operation below
Assuming you need to calculate the average value in table_m for each date and country
MERGE INTO table_old AS dest
USING(
SELECT
a.date,
a.country,
a.sum,
b.avg
FROM table_old AS a
LEFT JOIN
(
SELECT
AVG(m) as avg
FROM table_m
GROUP BY date, country
) AS b
USING (date, country)
) AS source
ON source.date = dest.date AND source.coutry = dest.country
WHEN MATCHED THEN UPDATE SET
date = dest.date, country = dest.country, sum = dest.sum, avg = dest.avg
For more information about BQ MERGE statement

Measure population on several dates

I want to measure the population of our manucipality (which contains out of several places). I've got two tables in: my first dataset is a calender table with a row for each first day of every month.
My second table contains alle the people that live and have lived in the manucipality.
What I want is the population of each place on every first day of the month from my calender table. I've put some raw data below (just a few records of the persons table because it contains 100.000 records)
Calender table:
+----------+
| Date |
+----------+
| 1-1-2018 |
+----------+
| 1-2-2018 |
+----------+
| 1-3-2018 |
+----------+
| 1-4-2018 |
+----------+
Persons table
+-----+-----------+-----------+---------------+-------+
| BSN | Startdate | Enddate | Date of death | Place |
+-----+-----------+-----------+---------------+-------+
| 1 | 12-1-2000 | null | null | A |
+-----+-----------+-----------+---------------+-------+
| 2 | 10-5-2011 | null | 22-1-2018 | B |
+-----+-----------+-----------+---------------+-------+
| 3 | 16-12-2011| 10-2-2018 | null | B |
+-----+-----------+-----------+---------------+-------+
| 4 | 9-11-2012 | null | null | B |
+-----+-----------+-----------+---------------+-------+
| 5 | 8-9-2013 | null | 27-3-2018 | A |
+-----+-----------+-----------+---------------+-------+
| 6 | 7-10-2017 | 28-3-2018 | null | B |
+-----+-----------+-----------+---------------+-------+
My expected result:
+----------+-------+------------+
| Date | Place | Population |
+----------+-------+------------+
| 1-1-2018 | A | 2 |
+----------+-------+------------+
| 1-1-2018 | B | 4 |
+----------+-------+------------+
| 1-2-2018 | A | 2 |
+----------+-------+------------+
| 1-2-2018 | B | 3 |
+----------+-------+------------+
| 1-3-2018 | A | 2 |
+----------+-------+------------+
| 1-3-2018 | B | 2 |
+----------+-------+------------+
| 1-4-2018 | A | 1 |
+----------+-------+------------+
| 1-4-2018 | B | 1 |
+----------+-------+------------+
What I've done so far but doesnt seems to work:
SELECT a.Place
,c.Date
,(SELECT COUNT(DISTINCT(b.BSN))
FROM Person as b
WHERE b.Startdate < c.Date
AND (b.Enddate > c.Date OR b.Enddate is null)
AND (b.Date of death > c.Date OR b.Date of death is null)
AND a.Place = b.Place) as Population
FROM Person as a
JOIN Calender as c
ON a.Startdate <= c.Date
AND a.Enddate >= c.Date
GROUP BY Place, Date
I hope someone can help finding out the problem. Thanks in advance
First cross join Calender and the places to get the date/place pairs. Then left join the persons on the place and the date. Finally group by date and place to get the count of people for that day and place.
SELECT [ca].[Date],
[pl].[Place],
count([pe].[Place]) [Population]
FROM [Calender] [ca]
CROSS JOIN (SELECT DISTINCT
[pe].[Place]
FROM [Persons] [pe]) [pl]
LEFT JOIN [Persons] [pe]
ON [pe].[Place] = [pl].[Place]
AND [pe].[Startdate] <= [ca].[Date]
AND (colaesce([pe].[Enddate],
[pe].[Date of death]) IS NULL
OR coalesce([pe].[Enddate],
[pe].[Date of death]) > [ca].[Date])
GROUP BY [ca].[Date],
[pl].[Place]
ORDER BY [ca].[Date],
[pl].[Place];
Some notes and assumptions:
If you have a table listing the places use that instead of the subquery aliases [pl]. I just had no other option with the given tables.
I believe the Date of death also implies an Enddate for the same day. You might want to consider a trigger, that sets the Enddate automatically to the Date of death if it isn't null. That would make things easier and probably more consistent.

SQL Server: most efficient way to update multiple records depending on each other

I want to update multiple records from table "a" depending on each other. The values of the table "a" look like:
+------------+---------------+-------+
| date | transfervalue | value |
+------------+---------------+-------+
| 01.03.2018 | 0 | 10 |
| 02.03.2018 | 0 | 6 |
| 03.03.2018 | 0 | 13 |
+------------+---------------+-------+
After the update the values of the table "a" should look like:
+------------+---------------+-------+
| date | transfervalue | value |
+------------+---------------+-------+
| 01.03.2018 | 0 | 10 |
| 02.03.2018 | 10 | 6 |
| 03.03.2018 | 16 | 13 |
+------------+---------------+-------+
What is the most efficient way to do this? I've tried three different solutions, but the last solution doesn't work.
Solution 1: do a loop and iterate over each day to do the update statement
Solution 2: do an update statement statement for each day
Solution 3: do the update for the whole timespan in one statement
The output of solution 3 was:
+------------+---------------+-------+
| date | transfervalue | value |
+------------+---------------+-------+
| 01.03.2018 | 0 | 10 |
| 02.03.2018 | 10 | 6 |
| 03.03.2018 | 6 | 13 |
+------------+---------------+-------+
You seem to want a cumulative sum:
with toupdate as (
select t.*, sum(value) over (order by date rows between unbounded preceding and 1 preceding) as running_value
from t
)
update toupdate
set transfervalue = coalesce(running_value, 0);
This should work:
select t1.*,
coalesce((select sum(value) from table1 t2 where t2.date < t1.date), 0) MyNewValue
from table1 t1

How can I do SQL query count based on certain criteria including row order

I've come across certain logic that I need for my SQL query. Given that I have a table as such:
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 1 | null | 2016-05-10 |
| 1 | null | 2016-05-09 |
| 1 | yes | 2016-05-08 |
+----------+-------+------------+
This table is produced by a simple query:
SELECT * FROM products WHERE product = 1 ORDER BY date desc
Now what I need to do is create a query to count the number of nulls for certain products by order of date until there is a yes value. So the above example the count would be 2 as there are 2 nulls until a yes.
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 2 | null | 2016-05-10 |
| 2 | yes | 2016-05-09 |
| 2 | null | 2016-05-08 |
+----------+-------+------------+
Above would return 1 as there is 1 null until a yes.
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 3 | yes | 2016-05-10 |
| 3 | yes | 2016-05-09 |
| 3 | null | 2016-05-08 |
+----------+-------+------------+
Above would return 0.
You need a Correlated Subquery like this:
SELECT COUNT(*)
FROM products AS p1
WHERE product = 1
AND Date >
( -- maximum date with 'yes'
SELECT MAX(Date)
FROM products AS p2
WHERE p1.product = p2.product
AND Valid = 'yes'
)
This should do it:
select count(1) from table where valid is null and date > (select min(date) from table where valid = 'yes')
Not sure if your logic provided covers all the possible weird and wonderful extreme scenarios but the following piece of code would do what you are after:
select a.product,
count(IIF(a.valid is null and a.date >maxdate,a.date,null)) as total
from sometable a
inner join (
select product, max(date) as Maxdate
from sometable where valid='yes' group by product
) b
on a.product=b.product group by a.product