Query result as variable in another using jdbc - sql

Because I want to optimize a query, I want to rennounce at a join. Due of that, I need to declare a variable before the main query, but I can't find a solution to use it in jdbc statement.
Original query:
SELECT
d.orders
SUM(price * qty) / d.orders
FROM main_table
INNER JOIN (
SELECT SUM(qty) AS orders FROM main_table
WHERE status = 1) d
WHERE status = 1
GROUP BY d.orders
formatted query:
SET #orders = SELECT SUM(qty) AS orders FROM main_table WHERE status = 1
SELECT
#orders,
SUM(price * qty) / #orders
FROM main_table
WHERE status = 1
I can't find a solution to execute correctly this formatted query using jdbc. Due of grouping by a variable, I'm not sure this will run correctly. Also, I don't want to split this in 2 separated executions using the results of first in the second because will increase the execution time and queries count.
A part of jdbc code
val statement = conn.prepareStatement(query)
val rset = statement.executeQuery()
if (rset.next()) {
// read results
}
This is executed every 10 seconds because is used in a realtime dashboard. The db type is Impala Kudu (I'm thinking to build the queries as stored procedures, but I'm afraid that Kudu doesn't have support for it). The app is writed in Scala but use jdbc from Java to querying the database.
I already removed some methods from query (decimal casts) to optimize the query and leave it as simple as possible but I still want to remove some unusefull joins there. Is not the only, I have some other similar queries there, so, a small upgrade can have a huge benefit.
thanks

I suggest to use
SELECT
SUM(qty),
SUM(price * qty)
FROM main_table
WHERE status = 1
When you get result just divide second value by first value in you java code.
Or even better:
SELECT
SUM(price * qty) / SUM(qty)
FROM main_table
WHERE status = 1

Related

Oracle - Inner query takes time

I have two queries:
select * from PRE_DETAIL_REPORT a where item = (select item from apple_skus);
select * from PRE_DETAIL_REPORT a where item IN ('100299122');
the table: APPLE_SKUS
only has one item: 100299122
When I run the first query, it takes 2 minutes to execute
When I run the second query, it takes 3 seconds to execute
What can be the reason?
you can rewrite it in this way:
select a.* from PRE_DETAIL_REPORT a
join apple_skus t on t.item = a.item;
Its the way a sql query syntax works
You have manual values for selection in your 2nd query but in the first case you have subquery specified so again a
FROM CLAUSE, N THEN SELECT so, Querying a table will take
more time than a hardcode value even when theres a single record
You could try EXISTS as it uses correlated subquery which would be much faster
Select * from table t1 where exists (select 1 from table
where
Item =t1.item)
It’s very likely that the difference is due to a different access to PRE_DETAIL_REPORT; and as mentioned earlier by someone, an explain plan (or SQL Monitor report) will tell you the answer.
But until you provide the diagnostic, this is just a guess…

Query to display total row count returned, but with a limited result set

For QA purposes, I need to provide a sample of error records as well the total number of errors records. For background, this sample needs to be limited to ~1000 records since the query results are stored in an Excel recordSet object and output to a text file. This sounds very clunky (and is) but there are reasons for it.
I know I can do:
SELECT TOP 1000
primaryKey
,expectedValue
,actualValue
,totalErrors
FROM errorTable
INNER JOIN (SELECT count(*) as totalErrors FROM errorTable) AS tmp
ON 1 = 1
But I'd like a more efficient way since errorTable is actually a subquery that finds all the error records and can get pretty computationally expensive.
Just use window functions:
SELECT TOP 1000 primaryKey, expectedValue, actualValue,
COUNT(*) OVER () as totalErrors
FROM errorTable;
You can do a cross join like this:
SELECT TOP 1000
primaryKey,
expectedValue,
actualValue,
tmp.totalErrors
FROM errorTablem, (SELECT count(*) as totalErrors FROM errorTable) AS tmp

Oracle view performance

I have a view that runs a fairly complex query on a large amount of data. When I run the the view for all data it takes a couple minutes, which is fine. My problem is when I am reducing the result set by either joining to another table with less rows or using a where clause with subquery it still takes the full 2 minutes to run. What I found is that if I specify the id as text in the where clause the query will return very fast. To illustrate what I mean:
Given the query below returns a single row with the result 1234:
select id from mytable where external_id = 1;
If I query my view for all rows it takes about 2 minutes to execute, which is expected
Select * from my_view ;
If I query my view using the following query it will still take the full 2 minutes to execute:
Select * from my_view v where v.id = (select id from mytable where external_id = 1);
But if I specify the ID right in the query it will return in less then a second
Select * from my_view v where v.id =1234;
Since both queries will always return the same result is their anyway I can instruct oracle to run the sub query first so that it can filter the view in the same manner as the second query? (Note: joining to mytable yields the same result. I choose to use a subquery in the example because I thought it was clearer).
I did not fully your question, but maybe the MO_MERGE will help you.

Assistance with SQL statement

I'm using sql-server 2005 and ASP.NET with C#.
I have Users table with
userId(int),
userGender(tinyint),
userAge(tinyint),
userCity(tinyint)
(simplified version of course)
I need to select always two fit to userID I pass to query users of opposite gender, in age range of -5 to +10 years and from the same city.
Important fact is it always must be two, so I created condition if ##rowcount<2 re-select without age and city filters.
Now the problem is that I sometimes have two returned result sets because I use first ##rowcount on a table. If I run the query.
Will it be a problem to use the DataReader object to read from always second result set? Is there any other way to check how many results were selected without performing select with results?
Can you simplify it by using SELECT TOP 2 ?
Update: I would perform both selects all the time, union the results, and then select from them based on an order (using SELECT TOP 2) as the union may have added more than two. Its important that this next select selects the rows in order of importance, ie it prefers rows from your first select.
Alternatively, have the reader logic read the next result-set if there is one and leave the SQL alone.
To avoid getting two separate result sets you can do your first SELECT into a table variable and then do your ##ROWCOUNT check. If >= 2 then just select from the table variable on its own otherwise select the results of the table variable UNION ALLed with the results of the second query.
Edit: There is a slight overhead to using table variables so you'd need to balance whether this was cheaper than Adam's suggestion just to perform the 'UNION' as a matter of routine by looking at the execution stats for both approaches
SET STATISTICS IO ON
Would something along the following lines be of use...
SELECT *
FROM (SELECT 1 AS prio, *
FROM my_table M1 JOIN my_table M2
WHERE M1.userID = supplied_user_id AND
M1.userGender <> M2.userGender AND
M1.userAge - 5 >= M2.userAge AND
M1.userAge + 15 <= M2.userAge AND
M1.userCity = M2.userCity
LIMIT TO 2 ROWS
UNION
SELECT 2 AS prio, *
FROM my_table M1 JOIN my_table M2
WHERE M1.userID = supplied_user_id AND
M1.userGender <> M2.userGender
LIMIT TO 2 ROWS)
ORDER BY prio
LIMIT TO 2 ROWS;
I haven't tried it as I have no SQL Server and there may be dialect issues.

Need a row count after SELECT statement: what's the optimal SQL approach?

I'm trying to select a column from a single table (no joins) and I need the count of the number of rows, ideally before I begin retrieving the rows. I have come to two approaches that provide the information I need.
Approach 1:
SELECT COUNT( my_table.my_col ) AS row_count
FROM my_table
WHERE my_table.foo = 'bar'
Then
SELECT my_table.my_col
FROM my_table
WHERE my_table.foo = 'bar'
Or Approach 2
SELECT my_table.my_col, ( SELECT COUNT ( my_table.my_col )
FROM my_table
WHERE my_table.foo = 'bar' ) AS row_count
FROM my_table
WHERE my_table.foo = 'bar'
I am doing this because my SQL driver (SQL Native Client 9.0) does not allow me to use SQLRowCount on a SELECT statement but I need to know the number of rows in my result in order to allocate an array before assigning information to it. The use of a dynamically allocated container is, unfortunately, not an option in this area of my program.
I am concerned that the following scenario might occur:
SELECT for count occurs
Another instruction occurs, adding or removing a row
SELECT for data occurs and suddenly the array is the wrong size.
-In the worse case, this will attempt to write data beyond the arrays limits and crash my program.
Does Approach 2 prohibit this issue?
Also, Will one of the two approaches be faster? If so, which?
Finally, is there a better approach that I should consider (perhaps a way to instruct the driver to return the number of rows in a SELECT result using SQLRowCount?)
For those that asked, I am using Native C++ with the aforementioned SQL driver (provided by Microsoft.)
If you're using SQL Server, after your query you can select the ##RowCount function (or if your result set might have more than 2 billion rows use the RowCount_Big() function). This will return the number of rows selected by the previous statement or number of rows affected by an insert/update/delete statement.
SELECT my_table.my_col
FROM my_table
WHERE my_table.foo = 'bar'
SELECT ##Rowcount
Or if you want to row count included in the result sent similar to Approach #2, you can use the the OVER clause.
SELECT my_table.my_col,
count(*) OVER(PARTITION BY my_table.foo) AS 'Count'
FROM my_table
WHERE my_table.foo = 'bar'
Using the OVER clause will have much better performance than using a subquery to get the row count. Using the ##RowCount will have the best performance because the there won't be any query cost for the select ##RowCount statement
Update in response to comment: The example I gave would give the # of rows in partition - defined in this case by "PARTITION BY my_table.foo". The value of the column in each row is the # of rows with the same value of my_table.foo. Since your example query had the clause "WHERE my_table.foo = 'bar'", all rows in the resultset will have the same value of my_table.foo and therefore the value in the column will be the same for all rows and equal (in this case) this the # of rows in the query.
Here is a better/simpler example of how to include a column in each row that is the total # of rows in the resultset. Simply remove the optional Partition By clause.
SELECT my_table.my_col, count(*) OVER() AS 'Count'
FROM my_table
WHERE my_table.foo = 'bar'
There are only two ways to be 100% certain that the COUNT(*) and the actual query will give consistent results:
Combined the COUNT(*) with the query, as in your Approach 2. I recommend the form you show in your example, not the correlated subquery form shown in the comment from kogus.
Use two queries, as in your Approach 1, after starting a transaction in SNAPSHOT or SERIALIZABLE isolation level.
Using one of those isolation levels is important because any other isolation level allows new rows created by other clients to become visible in your current transaction. Read the MSDN documentation on SET TRANSACTION ISOLATION for more details.
Approach 2 will always return a count that matches your result set.
I suggest you link the sub-query to your outer query though, to guarantee that the condition on your count matches the condition on the dataset.
SELECT
mt.my_row,
(SELECT COUNT(mt2.my_row) FROM my_table mt2 WHERE mt2.foo = mt.foo) as cnt
FROM my_table mt
WHERE mt.foo = 'bar';
If you're concerned the number of rows that meet the condition may change in the few milliseconds since execution of the query and retrieval of results, you could/should execute the queries inside a transaction:
BEGIN TRAN bogus
SELECT COUNT( my_table.my_col ) AS row_count
FROM my_table
WHERE my_table.foo = 'bar'
SELECT my_table.my_col
FROM my_table
WHERE my_table.foo = 'bar'
ROLLBACK TRAN bogus
This would return the correct values, always.
Furthermore, if you're using SQL Server, you can use ##ROWCOUNT to get the number of rows affected by last statement, and redirect the output of real query to a temp table or table variable, so you can return everything altogether, and no need of a transaction:
DECLARE #dummy INT
SELECT my_table.my_col
INTO #temp_table
FROM my_table
WHERE my_table.foo = 'bar'
SET #dummy=##ROWCOUNT
SELECT #dummy, * FROM #temp_table
Here are some ideas:
Go with Approach #1 and resize the array to hold additional results or use a type that automatically resizes as neccessary (you don't mention what language you are using so I can't be more specific).
You could execute both statements in Approach #1 within a transaction to guarantee the counts are the same both times if your database supports this.
I'm not sure what you are doing with the data but if it is possible to process the results without storing all of them first this might be the best method.
If you are really concerned that your row count will change between the select count and the select statement, why not select your rows into a temp table first? That way, you know you will be in sync.
Why don't you put your results into a vector? That way you don't have to know the size before hand.
You might want to think about a better pattern for dealing with data of this type.
No self-prespecting SQL driver will tell you how many rows your query will return before returning the rows, because the answer might change (unless you use a Transaction, which creates problems of its own.)
The number of rows won't change - google for ACID and SQL.
IF (##ROWCOUNT > 0)
BEGIN
SELECT my_table.my_col
FROM my_table
WHERE my_table.foo = 'bar'
END
Just to add this because this is the top result in google for this question.
In sqlite I used this to get the rowcount.
WITH temptable AS
(SELECT one,two
FROM
(SELECT one, two
FROM table3
WHERE dimension=0
UNION ALL SELECT one, two
FROM table2
WHERE dimension=0
UNION ALL SELECT one, two
FROM table1
WHERE dimension=0)
ORDER BY date DESC)
SELECT *
FROM temptable
LEFT JOIN
(SELECT count(*)/7 AS cnt,
0 AS bonus
FROM temptable) counter
WHERE 0 = counter.bonus