If I have a table with the hypothetical columns foo and bar. bar might have 50-60 distinct values in it. My goal here is to pick say, up to 5 rows for say 6 unique bars. So if the 6 unique bars that get selected out of the 50-60 each happen to has at least 5 rows of data, we'll have 30 rows in total.
What you'd really want to do is:
SELECT *
FROM `sometable`
WHERE `bar` IN (
SELECT DISTINCT `bar`
FROM `sometable`
ORDER BY RAND()
LIMIT 6
)
Unfortunately, you're likely to get this:
ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
Possibly your version will be more cooperative. Otherwise, you'll probably need to do it as two queries.
Its been a while since I've worked with MySQL (I've been working with MSSQL lately), but two things come to mind:
Some sort of self join
A Cursor
Self join might look something like
SELECT DISTINCT bar FROM table AS t1 LIMIT 5
JOIN table AS t2 ON t1.foo = t2.foo
Again, its been a while, so this might not be valid MySQL. Also, you'd get all the foo's back for the 5 bars, so you'd have to figure out how to trim that down.
I think the easiest way is to use a UNION.
SELECT * FROM table WHERE bar = 'a' LIMIT 5 UNION SELECT * FROM table WHERE bar='b' UNION SEL ....... you get the jist, i hope
EDIT: not sure if this is what you need - you don't say whether this query needs also to somehow determine the bars? or if they are passed in?
A simple solution that takes 7 queries:
SELECT distinct bar FROM sometable ORDER BY rand() LIMIT 6
Then, for each of the 6 bar values above, do this, substituting {$bar} for the value, of course:
SELECT foo,bar FROM sometable WHERE bar='{$bar}' ORDER BY rand() LIMIT 5
Be careful about using "ORDER BY rand()" because it might cause MySQL to fetch a LOT of rows from your table, and compute the rand() function for all of them, and then sort them. This can take a long time if you have a big table.
If it does take a long time, then for the first query, you can remove the ORDER BY and the LIMIT clauses, and select 6 random values in your program code after the query is done.
For the second query, you can split it in to two steps:
SELECT count(*) FROM sometable WHERE bar='{$bar}'
Then, in your program code, you know how many items there are so you can randomly choose which of them to look at, and use OFFSET and LIMIT:
SELECT foo,bar FROM sometable WHERE bar='{$bar}' LIMIT 1 OFFSET {$offset}
Is this getting called from some program?
If so perhaps you can just lookup the bars, and randomly send them into a select statement.
This way your select could simply be: select * from table where bar in (?,?), and you can move the randomness problem into code, which is frankly better at dealing with that.
Related
I need to fetch 4 random values from each category. What should be the correct sql syntax for maria db. I have attached one image of table structure.
Please click here to check the structure
Should i write some procedure or i can do it with basic sql syntax?
You can do that with a SQL statement if you only have a few rows:
SELECT id, question, ... FROM x1 ORDER BY rand() LIMIT 1
This works fine if you have only a few rows - as soon as you have thousands of rows the overhead for sorting the rows becomes important, you have to sort all rows for getting only one row.
A trickier but better solution would be:
SELECT id, question from x1 JOIN (SELECT CEIL(RAND() * (SELECT(MAX(id)) FROM x1)) AS id) as id using(id);
Running EXPLAIN on both SELECTS will show you the difference...
If you need random value for different categories combine the selects via union and add a where clause
http://mysql.rjweb.org/doc.php/groupwise_max#top_n_in_each_group
But then ORDER BY category, RAND(). (Your category is the blog's province.)
Notice how it uses #variables to do the counting.
If you have MariaDB 10.2, then use one of its Windowing functions.
SELECT column FROM table WHERE category_id = XXX
ORDER BY RAND()
LIMIT 4
do it for all categories
I have two views: current_campaign and last_campaign:
current_campaign always has one row or none.
last_campaign always has one row.
I have another view that needs to get information from one of these: contributors.
If current_campaign has one row, I need to give it the preference.
If current_campaign is empty, then I can get information from last_campaign.
Is that possible?
Since at most 1 row per view seems to be given, there is a very simple and cheap solution:
-- CREATE VIEW contributors AS
TABLE current_campaign
UNION ALL
TABLE last_campaign -- assuming matching row type
LIMIT 1; -- applies to the whole query
If that was an over-simplification:
-- CREATE VIEW contributors AS
SELECT * FROM current_campaign
WHERE ...
UNION ALL
SELECT * FROM last_campaign
WHERE ...
LIMIT 1;
It would be a waste of time to count rows in current_campaign or run an EXISTS semi-join, since LIMIT 1 does everything you need automatically. Postgres stops executing as soon as enough rows are found to satisfy the LIMIT (1 in this case). You'll see "(never executed)" in the output of EXPLAIN ANALYZE for any later SELECT in the list. See links below for more.
This is an implementation detail that only works for UNION ALL (not UNION) and without an outer ORDER BY or other clauses that would force Postgres to consider all rows. I would expect other RDBMS to behave the same, but I only know about Postgres. It's guaranteed to work in all versions up to the current 9.5.
About the short syntax TABLE current_campaign:
Is there a shortcut for SELECT * FROM in psql?
Related, with more explanation, the same a bit more verbose:
Way to try multiple SELECTs till a result is available?
Sum results of a few queries and then find top 5 in SQL
I assume you mean view or table. Vision isn't really a SQL vocabulary term. Here is one way using union all and not exists:
select cc.*
from current_campaign cc
union all
select lc.*
from last_campaign lc
where not exists (select 1
from current_campaign cc
where cc.campaignId = lc.campaignId
);
How can I get distinct values from multiple fields within one table with just one request.
Option 1
SELECT WM_CONCAT(DISTINCT(FIELD1)) FIELD1S,WM_CONCAT(DISTINCT(FIELD2)) FIELD2S,..FIELD10S
FROM TABLE;
WM_CONCAT is LIMITED
Option 2
select DISTINCT(FIELD1) FIELDVALUE, 'FIELD1' FIELDNAME
FROM TABLE
UNION
select DISTINCT(FIELD2) FIELDVALUE, 'FIELD2' FIELDNAME
FROM TABLE
... FIELD 10
is just too slow
if you were scanning a small range in the data (not full scanning the whole table) you could use WITH to optimise your query
e.g:
WITH a AS
(SELECT field1,field2,field3..... FROM TABLE WHERE condition)
SELECT field1 FROM a
UNION
SELECT field2 FROM a
UNION
SELECT field3 FROM a
.....etc
For my problem, I had
WL1 ... WL2 ... correlation
A B 0.8
B A 0.8
A C 0.9
C A 0.9
how to eliminate the symmetry from this table?
select WL1, WL2,correlation from
table
where least(WL1,WL2)||greatest(WL1,WL2) = WL1||WL2
order by WL1
this gives
WL1 ... WL2 ... correlation
A B 0.8
A C 0.9
:)
The best option in the SQL is the UNION, though you may be able to save some performance by taking out the distinct keywords:
select FIELD1 FROM TABLE
UNION
select FIELD2 FROM TABLE
UNION provides the unique set from two tables, so distinct is redundant in this case. There simply isn't any way to write this query differently to make it perform faster. There's no magic formula that makes searching 200,000+ rows faster. It's got to search every row of the table twice and sort for uniqueness, which is exactly what UNION will do.
The only way you can make it faster is to create separate indexes on the two fields (maybe) or pare down the set of data that you're searching across.
Alternatively, if you're doing this a lot and adding new fields rarely, you could use a materialized view to store the result and only refresh it periodically.
Incidentally, your second query doesn't appear to do what you want it to. Distinct always applies to all of the columns in the select section, so your constants with the field names will cause the query to always return separate rows for the two columns.
I've come up with another method that, experimentally, seems to be a little faster. In affect, this allows us to trade one full-table scan for a Cartesian join. In most cases, I would still opt to use the union as it's much more obvious what the query is doing.
SELECT DISTINCT CASE lvl WHEN 1 THEN field1 ELSE field2 END
FROM table
CROSS JOIN (SELECT LEVEL lvl
FROM DUAL
CONNECT BY LEVEL <= 2);
It's also worthwhile to add that I tested both queries on a table without useful indexes containing 800,000 rows and it took roughly 45 seconds (returning 145,000 rows). However, most of that time was spent actually fetching the records, not running the query (the query took 3-7 seconds). If you're getting a sizable number of rows back, it may simply be the number of rows that is causing the performance issue you're seeing.
When you get distinct values from multiple columns, then it won't return a data table. If you think following data
Column A Column B
10 50
30 50
10 50
when you get the distinct it will be 2 rows from first column and 1 rows from 2nd column. It simply won't work.
And something like this?
SELECT 'FIELD1',FIELD1, 'FIELD2',FIELD2,...
FROM TABLE
GROUP BY FIELD1,FIELD2,...
I have a very large query that is supposed to return only the top 10 results:
select top 10 ProductId from .....
The problem is that I also want the total number of results that match the criteria without that 'top 10', but in the same time it's considered unaceptable to return all rows (we are talking of roughly 100 thousand results.
Is there a way to get the total number of rows affected by the previous query, either in it or afterwords without running it again?
PS: please no temp tables of 100 000 rows :))
dump the count in a variable and return that
declare #count int
select #count = count(*) from ..... --same where clause as your query
--now you add that to your query..of course it will be the same for every row..
select top 10 ProductId, #count as TotalCount from .....
Assuming that you're using an ORDER BY clause already (to properly define which the "TOP 10" results are), then you could add a call of ROW_NUMBER also, with the opposite sort order, and pick the highest value returned.
E.g., the following:
select top 10 *,ROW_NUMBER() OVER (order by id desc) from sysobjects order by ID
Has a final column with values 2001, 2000, 1999, etc, descending. And the following:
select COUNT(*) from sysobjects
Confirms that there are 2001 rows in sysobjects.
I suppose you could hack it with a union select
select top 10 ... from ... where ...
union
select count(*) from ... where ...
For you to get away with this type of hack you will need to add fake columns to the count query so it returns the same amount of columns as the main query. For example:
select top 10 id, first_name from people
union
select count(*), '' as first_name from people
I don't recommend using this solution. Using two separate queries is how it should be done
Generally speaking no - reasoning is as follows:
If(!) the query planner can make use of TOP 10 to return only 10 rows then RDBMS will not even know the exact number of rows that satisfy the full criteria, it just gets the TOP 10.
Therefore, when you want to find out count of all rows satisfying the criteria you are not running it the second time, but the first time.
Having said that proper indexes might make both queries execute pretty fast.
Edit
MySQL has SQL_CALC_FOUND_ROWS which returns the number of rows that query would return if there was no LIMIT applied - googling for an equivalent in MS SQL points to analytical SQL and CTE variant, see this forum (even though not sure that either would qualify as running it only once, but feel free to check - and let us know).
I'm using sql-server 2005 and ASP.NET with C#.
I have Users table with
userId(int),
userGender(tinyint),
userAge(tinyint),
userCity(tinyint)
(simplified version of course)
I need to select always two fit to userID I pass to query users of opposite gender, in age range of -5 to +10 years and from the same city.
Important fact is it always must be two, so I created condition if ##rowcount<2 re-select without age and city filters.
Now the problem is that I sometimes have two returned result sets because I use first ##rowcount on a table. If I run the query.
Will it be a problem to use the DataReader object to read from always second result set? Is there any other way to check how many results were selected without performing select with results?
Can you simplify it by using SELECT TOP 2 ?
Update: I would perform both selects all the time, union the results, and then select from them based on an order (using SELECT TOP 2) as the union may have added more than two. Its important that this next select selects the rows in order of importance, ie it prefers rows from your first select.
Alternatively, have the reader logic read the next result-set if there is one and leave the SQL alone.
To avoid getting two separate result sets you can do your first SELECT into a table variable and then do your ##ROWCOUNT check. If >= 2 then just select from the table variable on its own otherwise select the results of the table variable UNION ALLed with the results of the second query.
Edit: There is a slight overhead to using table variables so you'd need to balance whether this was cheaper than Adam's suggestion just to perform the 'UNION' as a matter of routine by looking at the execution stats for both approaches
SET STATISTICS IO ON
Would something along the following lines be of use...
SELECT *
FROM (SELECT 1 AS prio, *
FROM my_table M1 JOIN my_table M2
WHERE M1.userID = supplied_user_id AND
M1.userGender <> M2.userGender AND
M1.userAge - 5 >= M2.userAge AND
M1.userAge + 15 <= M2.userAge AND
M1.userCity = M2.userCity
LIMIT TO 2 ROWS
UNION
SELECT 2 AS prio, *
FROM my_table M1 JOIN my_table M2
WHERE M1.userID = supplied_user_id AND
M1.userGender <> M2.userGender
LIMIT TO 2 ROWS)
ORDER BY prio
LIMIT TO 2 ROWS;
I haven't tried it as I have no SQL Server and there may be dialect issues.