Select random values from each group, SQL - sql

I have a project through which I'm creating a game powered by a database.
The database has data entered like this:
(ID, Name) || (1, PhotoID),(1,PhotoID),(1,PhotoID),(2,PhotoID),(2,PhotoID) and so on. There are thousands of entries.
This is my current SQL statement:
$sql = "SELECT TOP 8 * FROM Image WHERE Hidden = '0' ORDER BY NEWID()";
But this can also produce results with matching IDs, where I need to have each result have a unique ID (that is I need one result from each group).
How can I change my query to grab one result from each group?
Thanks!

Since ORDER BY NEWID() will result in tablescan anyway, you might use row_number() to isolate first in group:
; with randomizer as (
select id,
name,
row_number() over (partition by id
order by newid()) rn
from Image
where hidden = 0
)
select top 8
id,
name
from randomizer
where rn = 1
-- Added by mellamokb's suggestion to allow groups to be randomized
order by newid()
Sql Fiddle playground thanks to mellamokb.

Looks like this may work, but I can't vouch for performance:
SELECT TOP 8 ID,
(select top 1 name from image i2
where i2.id = i1.id order by newid())
FROM Image i1
WHERE hidden = '0'
group by ID
ORDER BY NEWID();
Demo: http://www.sqlfiddle.com/#!3/657ad/6

If you have an index on the ID column and want to take advantage of the index and avoid a full table scan, do your randomization on the key values first:
WITH IDs AS
(
SELECT DISTINCT ID
FROM Image
WHERE Hidden = '0'
),
SequencedIDs AS
(
SELECT ID, ROW_NUMBER() OVER (ORDER BY NEWID()) AS Seq
FROM IDs
),
ImageGroups AS
(
SELECT i.*, ROW_NUMBER() OVER (PARTITION BY i.ID ORDER BY NEWID()) Seq
FROM SequencedIDs s
INNER JOIN Image i
ON i.ID = s.ID
WHERE s.Seq < 8
AND i.Hidden = '0'
)
SELECT *
FROM ImageGroups
WHERE Seq = 1
This should drastically reduce the cost over the table scan approach, although I don't have a schema big enough that I can test with - so try running some statistics in SSMS and make sure ID is actually indexed for this to be effective.

select * from (select * from photos order by rand()) as _SUB group by _SUB.id;

select ID, Name from (select ID, Name, row_number() over
(partition by ID, Name order by ID) as ranker from Image where Hidden = 0 ) Z where ranker = 1
order by newID()

Related

How to group and pick only certain values based on a field using select query SQL

I have a table as follow
ID
ORDERNO
1
123
1
123
2
456
2
456
During every select query done via application using JDBC, only the grouped records based on ORDERNO should be picked.
That means, for example, during first select query only details related to ID = 1, but we cannot specify the ID number in where clause because we do not know how many number of IDs will be there in future. So the query should yield only one set of records; application will delete those records after picking, hence next select query will result in picking other set of records. How to achieve it?
You can use TOP WITH TIES for this
SELECT TOP (1) WITH TIES
t.ID,
t.ORDERNO
FROM YourTable t
ORDER BY
t.ID;
If you want to select and delete at the same time you could delete using an OUTPUT clause
WITH cte AS (
SELECT TOP (1) WITH TIES
t.ID,
t.ORDERNO
FROM YourTable t
ORDER BY
t.ID
)
DELETE cte
OUTPUT deleted.*;
As one option you could select on the MIN(ID) like:
SELECT *
FROM yourtable
WHERE ID = (SELECT MIN(ID) FROM yourtable);
You could also use window functions to do this:
SELECT ID, ORDERNO
FROM
(
SELECT ID, ORDERNO
DENSE_RANK() OVER (ORDER BY ID ASC) AS dr
FROM yourtable
)dt
WHERE dr = 1;
order your rows and select top n number of rows that you want :
select top (1) with ties ID, ORDERNO
from tablename
order by ID asc

SQL - delete record where sum = 0

I have a table which has below values:
If Sum of values = 0 with same ID I want to delete them from the table. So result should look like this:
The code I have:
DELETE FROM tmp_table
WHERE ID in
(SELECT ID
FROM tmp_table WITH(NOLOCK)
GROUP BY ID
HAVING SUM(value) = 0)
Only deletes rows with ID = 2.
UPD: Including additional example:
Rows in yellow needs to be deleted
Your query is working correctly because the only group to total zero is id 2, the others have sub-groups which total zero (such as the first two with id 1) but the total for all those records is -3.
What you're wanting is a much more complex algorithm to do "bin packing" in order to remove the sub groups which sum to zero.
You can do what you want using window functions -- by enumerating the values for each id. Taking your approach using a subquery:
with t as (
select t.*,
row_number() over (partition by id, value order by id) as seqnum
from tmp_table t
)
delete from t
where exists (select 1
from t t2
where t2.id = t.id and t2.value = - t.value and t2.seqnum = t.seqnum
);
You can also do this with a second layer of window functions:
with t as (
select t.*,
row_number() over (partition by id, value order by id) as seqnum
from tmp_table t
),
tt as (
select t.*, count(*) over (partition by id, abs(value), seqnum) as cnt
from t
)
delete from tt
where cnt = 2;

Is there any optimised way in sql sever to optimse this code, I am trying to find 2nd duplicate

Is there any optimised way in sql sever to optimse this code, I am trying to find 2nd duplicate
WITH CTE AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY id,AN_KEY ORDER BY [ENTITYID]) AS [rn]
FROM [data].[dbo].[TRANSFER]
)
select *
INTO dbo.#UpSingle
from CTE
where RN=2
UPDATE:
As GurV pointed out - this query doesn't solve the problem. It will only give you the items that have exactly two duplicates, but not the row where the second duplicate lies.
I am just going to leave this here from reference purposes.
Original Answer
Why not try something like this from another SO post: Finding duplicate values in a SQL table
SELECT
id, AN_KEY, COUNT(*)
FROM
[data].[dbo].[TRANSFER]
GROUP BY
id, AN_KEY
HAVING
COUNT(*) = 2
I gather from your original SQL that the cols you would want to group by on are :
Id
AN_KEY
Here is another way to get the the second duplicate row (in the order of increasing ENTITYID of course):
select *
from [data].[dbo].[TRANSFER] a
where [ENTITYID] = (
select min([ENTITYID])
from [data].[dbo].[TRANSFER] b
where [ENTITYID] > (
select min([ENTITYID])
from [data].[dbo].[TRANSFER] c
where b.id = c.id
and b.an_key = c.an_key
)
and a.id = b.id
and a.an_key = b.an_key
)
Provided there is an index on id, an_key and ENTITYID columns, performance of both your query and this should be acceptable.
Let me assume that this query does what you want:
WITH CTE AS (
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id, AN_KEY
ORDER BY [ENTITYID]) AS [rn]
FROM [data].[dbo].[TRANSFER] t
)
SELECT *
INTO dbo.#UpSingle
FROM CTE
WHERE RN = 2;
For performance, you want a composite index on [data].[dbo].[TRANSFER](id, AN_KEY, ENTITYID).

SQL Server Group By with Max on Date field

I hope i can explain the issue i'm having and hopefully so can point me in the same direction.
I'm trying to do a group by (Email Address) on a subset of data, then i'm using a max() on a date field but because of different values in other fields its bring back more rows then require.
I would just like to return the max record per email address and return the fields that are on the same row that are on the max record.
Not sure how i can write this query?
This is a task for ROW_NUMBER:
select *
from
(
select t.*,
-- assign sequential number starting with 1 for the maximum date
row_number() over (partiton by email_address order by datecol desc) as rn
from tab
) as dt
where rn = 1 -- only return the latest row
You can write this query using row_number():
select t.*
from (select t.*,
row_number() over (partition by emailaddress order by date desc) as seqnum
from t
) t
where seqnum = 1;
How about something like this?
select a.*
from baseTable as a
inner join
(select Email,
Max(EmailDate) as EmailDate
from baseTable
group by Email) as b
on a.Email = b.Email
and a.EmailDate = b.EmailDate

Find n largest values in a column

I am trying to find the n largest numbers in a particular column in SQL Server.
We can find the largest value in a column and the 2nd largest value easily.
But how do I find say, 5 largest values in a column ?
You tagged this both for MySQL and SQL Server. In SQL Server you can use TOP:
SELECT TOP 5 yourColumn
FROM yourTable
ORDER BY someColumn DESC;
TOP limits the number of rows returned. To get the data with the largest/smallest values you will want to include an ORDER BY.
In MySQL you will use LIMIT
Another way to do this in SQL Server is using row_number():
select id
from
(
select id, row_number() over(order by id desc) rn
from yourtable
) x
where rn <= 5
See SQL Fiddle With Demo
In MySql you can use [LIMIT {[offset,] row_count }] to do this like so:
...
ORDER BY SomeField DESC
LIMIT #n;
For SQL Server you can use the TOP(n) to get the top n:
SELECT TOP(#n) SomeFieldName
FROM TABLE
ORDER BY SomeField DESC
For example:
SELECT TOP 5 items_sold
FROM tbl_PRODUCT
ORDER BY items_sold dESC
Update: If you have another table families with a foreign key family_ID to products table, and you want to find all products with the top n family id's. Then you can dot this:
SELECT *
FROM Products WHERE family_ID IN
(
SELECT TOP 5 family_ID
FROM families
ORDER BY family_ID DESC
)
Update 2: The topmost product in each family:
;WITH cte
AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY family_ID ORDER BY items_sold DESC) row_num
FROM #Products
)
SELECT * FROM cte
where row_num = 1
Order by family_ID
Here is alive demo
sql server
select min(val)
from your_table
where val in (select top 5 val from your_table order by val desc)
mysql
select min(val)
from your_table
where val in (select val from your_table order by val desc limit 5)