Confusion about sqlite query planner - sql

I have a view which is wrote in SQLite which is V_Order_Calculator3. And also 1.5 million records inside the orders table.
imagine the code below:
select * from (
SELECT
order_id
FROM V_Order_Calculator3
WHERE /* CHOICE1 order_id = '00002092-03b4-4661-a9f4-afa73984860a'*/
GROUP BY
(CASE
WHEN order_type IN ('webshop_sell','sell','buy') THEN order_id
WHEN order_type IN ('sell_return','buy_return') THEN order_linked_id
ELSE 0
END)
) a /* CHOICE2 where a.order_id = '00002092-03b4-4661-a9f4-afa73984860a'*/
If I uncomment CHOICE1 then the the query takes 15 milliseconds to run. and if I uncomment CHOICE2 then it will be 18000 milliseconds.
It seems the sqlite query planner not calculating the CHOICE2 where clause before using the view.
I got confused and tried many ways but no luck.

You think that the time should be the same because you assume that the queries have the same meaning and should give the same results, so sqlite should optimize the second query by pushing the WHERE down in the inner query, but your assumption is wrong.
The first one (inside WHERE) will filter the table for a single order_id, then will create groups based on the values of order_type, order_id and order_linked_id that it finds on the filtered rows. For every group it will return the order_id of an unspecified row of the group. Since you filtered for a specific order_id, it will always be the same value for every group.
The second one (outside WHERE) will scan all the table, creating groups based on the values of order_type, order_id and order_linked_id that it finds on all rows. For every group it will return the order_id of an unspecified row of the group, which at this point could be any value. The outside WHERE will then filter the result based on THESE values.
This is an example data where your two queries give different results: https://www.db-fiddle.com/f/xqiFACuv8PzU3VBnLbEoWF/2
It doesn't matter that in your application this set of data would not be possible or meaningful. Since the queries are not semantically equivalent, sqlite cannot transform the second one into the first.

Related

Is WHERE and HAVING clause possible without GROUP BY clause in SQL?

Suppose I'm writing as query for the aggregate functions where I want result based on some conditions both on the column of the table and on aggregate function. So is it possible to use WHERE and HAVING clause to get expected result without GROUP BY clause.
I wrote following query for the above condition.
select *
from ORDER_DETAILS
where item_price > 1000
having count(item) >= 5 ;
First of all, having just like where, but can apply to aggregate function results.
You should keep track of the data rows and columns after each clause.
If we name a row_id property that can be used to locate one single row of a table. Then the where clause doesn't change the row_id.
When we use aggregate functions, it implies input multi rows and get a single result, that changes the row_id.In fact no group by clause means everything go to one bucket, and the output result only have one row.
My best guess is that you want to get original data rows, which have some attributes that passes aggregated value check.Eg found order details that item price>1000(origin filter) and more than 5 items in single order(aggregated filter).
So group + aggregate + having give you aggregated filter dataset, you can join this dataset back to origin table, then the result table have same row_id with original ORDER_DETAILS
select *
from ORDER_DETAILS
where item_price > 1000
and order_id in (
select order_id
from ORDER_DETAILS
group by order_id
having count(item) >= 5
);
Noteļ¼š
order_id is the aggregated filter column example
I use in subquery for convenience, you can change it into join
If you are working with big data sql, like hive/spark, you can also use window functions to get the aggregate result on each row of original table.

Oracle SQL - creating "unstored" procedure

I have this SQL query
SELECT ACCBAL_DATE, ACCBAL_AMOUNT
FROM ACCOUNT_BALANCES t
WHERE ACC_KEY = '964570223'
AND ACCBAL_KEY = '16'
ORDER BY ACCBAL_DATE DESC
FETCH FIRST 1 ROWS ONLY;
It returns one row but I need to use this query for many ACC_KEYS (about 600).
So first way to do that is to run this query about 600x with different ACC_KEY parameter.
The second one is creating a procedure I think.
Procedure which will use variable acc_key and move it to WHERE statement.
Issue is that I can't create procedure stored on server because of permissions.
Is there some way to solve it without storing procedure on server?
EDIT: I know the IN clause but that is not what I need. I need something which will run the query about 600x, each execution with another ACC_KEY in WHERE clause and the output should be 600 rows.
when I used them in clause IN, then it will still return only one row. I want to return only one row because without limitations it returns about 100 rows, so I want only the first row which has needed data. For each ACC_KEY it should return only one row
You can still do that with an IN() clause listing all 600 key values:
select acc_key,
max(accbal_date) as accbal_date,
max(accbal_amount) keep (dense_rank last order by accbal_date) as accbal_amount
from account_balances t
where acc_key in ('964570223', '964570224', ...) -- up to 1000 allowed
and accbal_key = '16'
group by acc_key
order by acc_key;
This is using aggregate functions and grouping by the key, so you will get one row per key, with the data for the most recent date.
Read more about keep/last.
It would still be better to use a collection or a table - maybe an external table loaded from your Excel sheet, saved as a CSV; not least because you can only supply 1000 entries to a single IN() clause - or any expression list - but also for performance and readability/maintenance reasons.
You can store the keys in a table or use a derived table in the query. I would recommend something more like this:
WITH keys as (
SELECT '964570223' as ACC_KEY FROM DUAL UNION ALL
. . .
)
SELECT k.ACC_KEY, MAX(ab.ACCBAL_DATE) as ACCBAL_DATE,
MAX(ab.ACCBAL_AMOUNT) KEEP (DENSE_RANK FIRST ORDER BY ab.ACCBAL_DATE DESC) as ACCBAL_AMOUNT
FROM keys k LEFT JOIN
ACCOUNT_BALANCES ab
ON ab.ACC_KEY = k.ACC_KEY AND
ab.ACCBAL_KEY = '16'
GROUP BY k.ACC_KEY;
Of course the CTE keys could be replaced with a table that has the accounts of interest.
Note that this replaces your logic with aggregation logic. You just want the most recent date and balance, which Oracle supports using the KEEP keyword.
Step-1 : CREATE TABLE WITH 1 COLUMN ACC_KEY STORES ALL LIST OF ACC_KEY.
Step-2 : Code Run.
SELECT T.ACCBAL_DATE, T.ACCBAL_AMOUNT
FROM ACCOUNT_BALANCES t
WHERE EXISTS(SELECT A.ACC_KEY FROM <TABLENAME> A WHERE A.ACC_KEY=T.ACC_KEY)
AND T.ACCBAL_KEY = '16'
ORDER BY T.ACCBAL_DATE DESC
FETCH FIRST 1 ROWS ONLY;

Get 5 most recent records for each Number

Data I have a table in Access that has a Part Number and PriceYr and Price associated to each Part Number.There are over 10,000 records and the PartNumber are repeated and has different PriceYr and Price associated to it. However, I need a query to just find the 5 most recent price and date associated with it.
I tried using MAX(PriceYr) however, it only returns 1 most recent record for each PartNumber.
I also tried the following query but it doesn't seem to work.
SELECT Catalogs.PartNumber,Catalogs.PriceYr, Catalogs.Price FROM Catalogs
WHERE Catalogs.PriceYr in
(SELECT TOP 5 Catalogs.PriceYr
FROM Catalogs as Temp
WHERE Temp.PartNumber = Catalogs.PartNumber
ORDER By Catalogs.PriceYr DESC)
Any help will be greatly appreciated. Thank you.
Desired Result that i am trying to get.
Consider a correlated count subquery to filter by a rank variable. Right now, you pull top 5 overall on matching PartNumber not per PartNumber.
SELECT main.*
FROM
(SELECT c.PartNumber, c.PriceYr, c.Price,
(SELECT Count(*)
FROM Catalogs AS Temp
WHERE Temp.PartNumber = c.PartNumber
AND Temp.PriceYr >= c.PriceYr) As rank
FROM Catalogs c
) As main
WHERE main.rank <= 5
MAX() is an aggregating function, meaning that it groups all the data and takes the maximal value in the specified column. You need to use a GROUP BY statement to prevent the query from grouping the whole dataset in a single row.
On the other hand, your query seems to needlessly use a subquery. The following query should work quite fine :
SELECT TOP 5 c.PartNumber, c.PriceYr, c.Price
FROM Catalogs c
ORDER BY c.PriceYr DESC
WHERE c.PartNumber = #partNumber -- if you want the query to
-- work on a specific part number
(please post a table creation query to make sure this example works)

SQL Statement: How does Exist return its value?

Select * from FMN_XX.order odr where
exists(
select (1) from FMN_XX.order_expired exp
where odr.order_id = exp.order_id
);
Above is the example query for exists. I have tried looking around and reading about it but I just can't get my head wrapped around it.
When I query individually the query inside the EXISTS bracket, it returns 1 as expected and no order_id from order_expired since I didn't query for column there.
But when I run the whole query, it returns the correct number of rows! My question is, how does it know the order_ID from order_expired table when I don't even query for order_id from the order_expired table? How does it compare to get the right rows?
Extra note: Currently, in the order table, I have 19779 rows and in order_expired table, I have 8506 rows. The final result I get when I added count at the outer query layer is 8506 rows, meaning, somewhat the EXISTS statement has filters the rows. If it should just returns if at least one order_id is hit... shouldn't the whole query returns the whole 19779 rows?
how does it know the order_ID from order_expired table when I don't even query for order_id from the order_expired table? How does it compare to get the right rows?
The condition from WHERE clause of the exists's SUBSELECT gives this information :
the odr.order_id is the column from main SELECT, whereas
the exp.order_id is the column from exists SUBSELECT
where odr.order_id = exp.order_id
if the condition above returns TRUE then the record will appear in the result set.
https://en.wikipedia.org/wiki/Correlated_subquery
Exists is similar to join - you delimit your output based on values in another table (or even the same table with different condition.).
The difference in useablity is that the exists function does not care for duplicit values, it checks only if there are query results existing with your condition.
In other words, if your table order_expired would be unique in column order_id, then you should get the same result from your query as from this query:
Select odr.* from FMN_XX.order odr
join FMN_XX.order_expired exp on odr.order_id = exp.order_id;
However if it is not unique then the join would delimit your results, but at the same time duplicate orders from order_expired.
One more difference is also, that with eixsts you cant use any values from the table inside the exists subquery - with join you can use any columns from joined tables.
You said:
When I query individually the query inside the EXISTS bracket, it returns 1 as expected and no order_id from order_expired since I didn't query for column there.
However, I guess that you haven't really used the EXISTS query as it would have been:
select (1) from FMN_XX.order_expired exp
where odr.order_id = exp.order_id
and it would return error because it doesn't know what odr is.
The clause where odr.order_id = exp.order_id is exactly what gives the correlation between the main query and the EXISTS subquery.
So, the query would be roughly translated in natural language as:
select all the orders that exist into the expired orders table by looking it up by the order_id field

Is there any way to calculate total number of rows that return from dynamic query in Common Table Expression(CTE) or Subquery

We are in the process of optimizing our database.We have most of store procedure that uses CTE because it gives us high performance according to our table strucure.We have almost dynamic query that have different result according to different condition.We hold all data in CTE, and check condition, that was the not problem but we need total number of rows that return by each query ,in calculating this it takes lots of time.Temporary table or table variable not suitable in our case as it takes lots of time to insert data in it.We have structure as following
With t(fields) as
(select field1,field2.......
ROW_NUMBER() OVER (order by some column) as row...
from some table and lots of
inner n left joins
where some condition ),
rowTotal(RowTotal) as
(select max(row) from t)
select * from t,RowTotal
where condition for paging
But max(row) took lots of times if i remove this it return data within 100ms. I tried Coun(*),Count(SomeField) and many other it works but took lots of time.How can i achieve total number of rows from cte within some ms any aggregate function will not work for me.Is there any other way to calculate rowtotal like ##rowcount.Thanks in advance for any help.
If you are after the total number of rows from the inner query you can add this as a column to your select using COUNT() and PARTITION BY().
With t(fields) as
(select COUNT(*) OVER (PARTITION BY 1) AS TotalRows,
field1,field2.......
ROW_NUMBER() OVER (order by some column) as row...
from some table ...
This should give you a count of the total rows in 't' as the first column of t
I don't know that this is the fastest way to get the result you want but it works for me on 000's of returned records and prevents extra select queries to find the count separately.