Assistance with SQL statement - sql

I'm using sql-server 2005 and ASP.NET with C#.
I have Users table with
userId(int),
userGender(tinyint),
userAge(tinyint),
userCity(tinyint)
(simplified version of course)
I need to select always two fit to userID I pass to query users of opposite gender, in age range of -5 to +10 years and from the same city.
Important fact is it always must be two, so I created condition if ##rowcount<2 re-select without age and city filters.
Now the problem is that I sometimes have two returned result sets because I use first ##rowcount on a table. If I run the query.
Will it be a problem to use the DataReader object to read from always second result set? Is there any other way to check how many results were selected without performing select with results?

Can you simplify it by using SELECT TOP 2 ?
Update: I would perform both selects all the time, union the results, and then select from them based on an order (using SELECT TOP 2) as the union may have added more than two. Its important that this next select selects the rows in order of importance, ie it prefers rows from your first select.
Alternatively, have the reader logic read the next result-set if there is one and leave the SQL alone.

To avoid getting two separate result sets you can do your first SELECT into a table variable and then do your ##ROWCOUNT check. If >= 2 then just select from the table variable on its own otherwise select the results of the table variable UNION ALLed with the results of the second query.
Edit: There is a slight overhead to using table variables so you'd need to balance whether this was cheaper than Adam's suggestion just to perform the 'UNION' as a matter of routine by looking at the execution stats for both approaches
SET STATISTICS IO ON

Would something along the following lines be of use...
SELECT *
FROM (SELECT 1 AS prio, *
FROM my_table M1 JOIN my_table M2
WHERE M1.userID = supplied_user_id AND
M1.userGender <> M2.userGender AND
M1.userAge - 5 >= M2.userAge AND
M1.userAge + 15 <= M2.userAge AND
M1.userCity = M2.userCity
LIMIT TO 2 ROWS
UNION
SELECT 2 AS prio, *
FROM my_table M1 JOIN my_table M2
WHERE M1.userID = supplied_user_id AND
M1.userGender <> M2.userGender
LIMIT TO 2 ROWS)
ORDER BY prio
LIMIT TO 2 ROWS;
I haven't tried it as I have no SQL Server and there may be dialect issues.

Related

SQL Server - Pagination Without Order By Clause

My situation is that a SQL statement which is not predictable, is given to the program and I need to do pagination on top of it. The final SQL statement would be similar to the following one:
SELECT * FROM (*Given SQL Statement*) b
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;
The problem here is that the *Given SQL Statement* is unpredictable. It may or may not contain order by clause. I am not able to change the query result of this SQL Statement and I need to do pagination on it.
I searched for solution on the Internet, but all of them suggested to use an arbitrary column, like primary key, in order by clause. But it will change the original order.
The short answer is that it can't be done, or at least can't be done properly.
The problem is that SQL Server (or any RDBMS) does not and can not guarantee the order of the records returned from a query without an order by clause.
This means that you can't use paging on such queries.
Further more, if you use an order by clause on a column that appears multiple times in your resultset, the order of the result set is still not guaranteed inside groups of values in said column - quick example:
;WITH cte (a, b)
AS
(
SELECT 1, 'a'
UNION ALL
SELECT 1, 'b'
UNION ALL
SELECT 2, 'a'
UNION ALL
SELECT 2, 'b'
)
SELECT *
FROM cte
ORDER BY a
Both result sets are valid, and you can't know in advance what will you get:
a b
-----
1 b
1 a
2 b
2 a
a b
-----
1 a
1 b
2 a
2 b
(and of course, you might get other sorts)
The problem here is that the *Given SQL Statement" is unpredictable. It may or may not contain order by clause.
your inner query(unpredictable sql statement) should not contain order by,even if it contains,order is not guaranteed.
To get guaranteed order,you have to order by some column.for the results to be deterministic,the ordered column/columns should be unique
Please note: what I'm about to suggest is probably horribly inefficient and should really only be used to help you go back to the project leader and tell them that pagination of an unordered query should not be done. Having said that...
From your comments you say you are able to change the SQL statement before it is executed.
You could write the results of the original query to a temporary table, adding row count field to be used for subsequent pagination ordering.
Therefore any original ordering is preserved and you can now paginate.
But of course the reason for needing pagination in the first place is to avoid sending large amounts of data to the client application. Although this does prevent that, you will still be copying data to a temp table which, depending on the row size and count, could be very slow.
You also have the problem that the page size is coming from the client as part of the SQL statement. Parsing the statement to pick that out could be tricky.
As other notified using anyway without using a sorted query will not be safe, But as you know about it and search about it, I can suggest using a query like this (But not recommended as a good way)
;with cte as (
select *,
row_number() over (order by (select 0)) rn
from (
-- Your query
) t
)
select *
from cte
where rn between (#pageNumber-1)*#pageSize+1 and #pageNumber*#pageSize
[SQL Fiddle Demo]
I finally found a simple way to do it without any order by on a specific column:
declare #start AS INTEGER = 1, #count AS INTEGER = 5;
select * from (SELECT *,ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS fakeCounter
FROM (select * from mytable) AS t) AS t2 order by fakeCounter OFFSET #start ROWS
FETCH NEXT #count ROWS ONLY
where select * from mytable can be any query

SQL for getting each category data in maria db

I need to fetch 4 random values from each category. What should be the correct sql syntax for maria db. I have attached one image of table structure.
Please click here to check the structure
Should i write some procedure or i can do it with basic sql syntax?
You can do that with a SQL statement if you only have a few rows:
SELECT id, question, ... FROM x1 ORDER BY rand() LIMIT 1
This works fine if you have only a few rows - as soon as you have thousands of rows the overhead for sorting the rows becomes important, you have to sort all rows for getting only one row.
A trickier but better solution would be:
SELECT id, question from x1 JOIN (SELECT CEIL(RAND() * (SELECT(MAX(id)) FROM x1)) AS id) as id using(id);
Running EXPLAIN on both SELECTS will show you the difference...
If you need random value for different categories combine the selects via union and add a where clause
http://mysql.rjweb.org/doc.php/groupwise_max#top_n_in_each_group
But then ORDER BY category, RAND(). (Your category is the blog's province.)
Notice how it uses #variables to do the counting.
If you have MariaDB 10.2, then use one of its Windowing functions.
SELECT column FROM table WHERE category_id = XXX
ORDER BY RAND()
LIMIT 4
do it for all categories

total number of rows of a query

I have a very large query that is supposed to return only the top 10 results:
select top 10 ProductId from .....
The problem is that I also want the total number of results that match the criteria without that 'top 10', but in the same time it's considered unaceptable to return all rows (we are talking of roughly 100 thousand results.
Is there a way to get the total number of rows affected by the previous query, either in it or afterwords without running it again?
PS: please no temp tables of 100 000 rows :))
dump the count in a variable and return that
declare #count int
select #count = count(*) from ..... --same where clause as your query
--now you add that to your query..of course it will be the same for every row..
select top 10 ProductId, #count as TotalCount from .....
Assuming that you're using an ORDER BY clause already (to properly define which the "TOP 10" results are), then you could add a call of ROW_NUMBER also, with the opposite sort order, and pick the highest value returned.
E.g., the following:
select top 10 *,ROW_NUMBER() OVER (order by id desc) from sysobjects order by ID
Has a final column with values 2001, 2000, 1999, etc, descending. And the following:
select COUNT(*) from sysobjects
Confirms that there are 2001 rows in sysobjects.
I suppose you could hack it with a union select
select top 10 ... from ... where ...
union
select count(*) from ... where ...
For you to get away with this type of hack you will need to add fake columns to the count query so it returns the same amount of columns as the main query. For example:
select top 10 id, first_name from people
union
select count(*), '' as first_name from people
I don't recommend using this solution. Using two separate queries is how it should be done
Generally speaking no - reasoning is as follows:
If(!) the query planner can make use of TOP 10 to return only 10 rows then RDBMS will not even know the exact number of rows that satisfy the full criteria, it just gets the TOP 10.
Therefore, when you want to find out count of all rows satisfying the criteria you are not running it the second time, but the first time.
Having said that proper indexes might make both queries execute pretty fast.
Edit
MySQL has SQL_CALC_FOUND_ROWS which returns the number of rows that query would return if there was no LIMIT applied - googling for an equivalent in MS SQL points to analytical SQL and CTE variant, see this forum (even though not sure that either would qualify as running it only once, but feel free to check - and let us know).

MySQL query question

If I have a table with the hypothetical columns foo and bar. bar might have 50-60 distinct values in it. My goal here is to pick say, up to 5 rows for say 6 unique bars. So if the 6 unique bars that get selected out of the 50-60 each happen to has at least 5 rows of data, we'll have 30 rows in total.
What you'd really want to do is:
SELECT *
FROM `sometable`
WHERE `bar` IN (
SELECT DISTINCT `bar`
FROM `sometable`
ORDER BY RAND()
LIMIT 6
)
Unfortunately, you're likely to get this:
ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
Possibly your version will be more cooperative. Otherwise, you'll probably need to do it as two queries.
Its been a while since I've worked with MySQL (I've been working with MSSQL lately), but two things come to mind:
Some sort of self join
A Cursor
Self join might look something like
SELECT DISTINCT bar FROM table AS t1 LIMIT 5
JOIN table AS t2 ON t1.foo = t2.foo
Again, its been a while, so this might not be valid MySQL. Also, you'd get all the foo's back for the 5 bars, so you'd have to figure out how to trim that down.
I think the easiest way is to use a UNION.
SELECT * FROM table WHERE bar = 'a' LIMIT 5 UNION SELECT * FROM table WHERE bar='b' UNION SEL ....... you get the jist, i hope
EDIT: not sure if this is what you need - you don't say whether this query needs also to somehow determine the bars? or if they are passed in?
A simple solution that takes 7 queries:
SELECT distinct bar FROM sometable ORDER BY rand() LIMIT 6
Then, for each of the 6 bar values above, do this, substituting {$bar} for the value, of course:
SELECT foo,bar FROM sometable WHERE bar='{$bar}' ORDER BY rand() LIMIT 5
Be careful about using "ORDER BY rand()" because it might cause MySQL to fetch a LOT of rows from your table, and compute the rand() function for all of them, and then sort them. This can take a long time if you have a big table.
If it does take a long time, then for the first query, you can remove the ORDER BY and the LIMIT clauses, and select 6 random values in your program code after the query is done.
For the second query, you can split it in to two steps:
SELECT count(*) FROM sometable WHERE bar='{$bar}'
Then, in your program code, you know how many items there are so you can randomly choose which of them to look at, and use OFFSET and LIMIT:
SELECT foo,bar FROM sometable WHERE bar='{$bar}' LIMIT 1 OFFSET {$offset}
Is this getting called from some program?
If so perhaps you can just lookup the bars, and randomly send them into a select statement.
This way your select could simply be: select * from table where bar in (?,?), and you can move the randomness problem into code, which is frankly better at dealing with that.

Need a row count after SELECT statement: what's the optimal SQL approach?

I'm trying to select a column from a single table (no joins) and I need the count of the number of rows, ideally before I begin retrieving the rows. I have come to two approaches that provide the information I need.
Approach 1:
SELECT COUNT( my_table.my_col ) AS row_count
FROM my_table
WHERE my_table.foo = 'bar'
Then
SELECT my_table.my_col
FROM my_table
WHERE my_table.foo = 'bar'
Or Approach 2
SELECT my_table.my_col, ( SELECT COUNT ( my_table.my_col )
FROM my_table
WHERE my_table.foo = 'bar' ) AS row_count
FROM my_table
WHERE my_table.foo = 'bar'
I am doing this because my SQL driver (SQL Native Client 9.0) does not allow me to use SQLRowCount on a SELECT statement but I need to know the number of rows in my result in order to allocate an array before assigning information to it. The use of a dynamically allocated container is, unfortunately, not an option in this area of my program.
I am concerned that the following scenario might occur:
SELECT for count occurs
Another instruction occurs, adding or removing a row
SELECT for data occurs and suddenly the array is the wrong size.
-In the worse case, this will attempt to write data beyond the arrays limits and crash my program.
Does Approach 2 prohibit this issue?
Also, Will one of the two approaches be faster? If so, which?
Finally, is there a better approach that I should consider (perhaps a way to instruct the driver to return the number of rows in a SELECT result using SQLRowCount?)
For those that asked, I am using Native C++ with the aforementioned SQL driver (provided by Microsoft.)
If you're using SQL Server, after your query you can select the ##RowCount function (or if your result set might have more than 2 billion rows use the RowCount_Big() function). This will return the number of rows selected by the previous statement or number of rows affected by an insert/update/delete statement.
SELECT my_table.my_col
FROM my_table
WHERE my_table.foo = 'bar'
SELECT ##Rowcount
Or if you want to row count included in the result sent similar to Approach #2, you can use the the OVER clause.
SELECT my_table.my_col,
count(*) OVER(PARTITION BY my_table.foo) AS 'Count'
FROM my_table
WHERE my_table.foo = 'bar'
Using the OVER clause will have much better performance than using a subquery to get the row count. Using the ##RowCount will have the best performance because the there won't be any query cost for the select ##RowCount statement
Update in response to comment: The example I gave would give the # of rows in partition - defined in this case by "PARTITION BY my_table.foo". The value of the column in each row is the # of rows with the same value of my_table.foo. Since your example query had the clause "WHERE my_table.foo = 'bar'", all rows in the resultset will have the same value of my_table.foo and therefore the value in the column will be the same for all rows and equal (in this case) this the # of rows in the query.
Here is a better/simpler example of how to include a column in each row that is the total # of rows in the resultset. Simply remove the optional Partition By clause.
SELECT my_table.my_col, count(*) OVER() AS 'Count'
FROM my_table
WHERE my_table.foo = 'bar'
There are only two ways to be 100% certain that the COUNT(*) and the actual query will give consistent results:
Combined the COUNT(*) with the query, as in your Approach 2. I recommend the form you show in your example, not the correlated subquery form shown in the comment from kogus.
Use two queries, as in your Approach 1, after starting a transaction in SNAPSHOT or SERIALIZABLE isolation level.
Using one of those isolation levels is important because any other isolation level allows new rows created by other clients to become visible in your current transaction. Read the MSDN documentation on SET TRANSACTION ISOLATION for more details.
Approach 2 will always return a count that matches your result set.
I suggest you link the sub-query to your outer query though, to guarantee that the condition on your count matches the condition on the dataset.
SELECT
mt.my_row,
(SELECT COUNT(mt2.my_row) FROM my_table mt2 WHERE mt2.foo = mt.foo) as cnt
FROM my_table mt
WHERE mt.foo = 'bar';
If you're concerned the number of rows that meet the condition may change in the few milliseconds since execution of the query and retrieval of results, you could/should execute the queries inside a transaction:
BEGIN TRAN bogus
SELECT COUNT( my_table.my_col ) AS row_count
FROM my_table
WHERE my_table.foo = 'bar'
SELECT my_table.my_col
FROM my_table
WHERE my_table.foo = 'bar'
ROLLBACK TRAN bogus
This would return the correct values, always.
Furthermore, if you're using SQL Server, you can use ##ROWCOUNT to get the number of rows affected by last statement, and redirect the output of real query to a temp table or table variable, so you can return everything altogether, and no need of a transaction:
DECLARE #dummy INT
SELECT my_table.my_col
INTO #temp_table
FROM my_table
WHERE my_table.foo = 'bar'
SET #dummy=##ROWCOUNT
SELECT #dummy, * FROM #temp_table
Here are some ideas:
Go with Approach #1 and resize the array to hold additional results or use a type that automatically resizes as neccessary (you don't mention what language you are using so I can't be more specific).
You could execute both statements in Approach #1 within a transaction to guarantee the counts are the same both times if your database supports this.
I'm not sure what you are doing with the data but if it is possible to process the results without storing all of them first this might be the best method.
If you are really concerned that your row count will change between the select count and the select statement, why not select your rows into a temp table first? That way, you know you will be in sync.
Why don't you put your results into a vector? That way you don't have to know the size before hand.
You might want to think about a better pattern for dealing with data of this type.
No self-prespecting SQL driver will tell you how many rows your query will return before returning the rows, because the answer might change (unless you use a Transaction, which creates problems of its own.)
The number of rows won't change - google for ACID and SQL.
IF (##ROWCOUNT > 0)
BEGIN
SELECT my_table.my_col
FROM my_table
WHERE my_table.foo = 'bar'
END
Just to add this because this is the top result in google for this question.
In sqlite I used this to get the rowcount.
WITH temptable AS
(SELECT one,two
FROM
(SELECT one, two
FROM table3
WHERE dimension=0
UNION ALL SELECT one, two
FROM table2
WHERE dimension=0
UNION ALL SELECT one, two
FROM table1
WHERE dimension=0)
ORDER BY date DESC)
SELECT *
FROM temptable
LEFT JOIN
(SELECT count(*)/7 AS cnt,
0 AS bonus
FROM temptable) counter
WHERE 0 = counter.bonus