Random Sample in Groups - sql

I have a code where I need to pull 400 random employees from this list of over 60,000. THere are 8 different job groupings, I need a certain number of each from each grouping. So of the total 400 random samples I need that 400 needs to consist of specific numbers from each of the 8 groups. This is the code so far:
SELECT TOP (400) Business_Unit, GEMSID, First_Name, Last_Name, Region, District, Job_Function, Email_Address, Job_Group_Code
FROM dbo.v_TMS_employee_HR
ORDER BY NEWID()
IE: Of the 400 random records returned: Group 1 needs to have 45, Group 2 needs 50, Group 3 needs 35, Group 4, needs 25, Group 5 needs 100, Group 6 needs 5, Group 7 needs 70 and Group 8 needs 70.
And each group is made up of 1-4 different job codes.

If it's just 8 groups you can have 8 separate queries (1 for each group) with their own TOP number and then UNION them all together.
Something like this (You will need to set the correct record amounts to get for each group and correct group codes):
SELECT * FROM
(SELECT TOP (100) Business_Unit, GEMSID, First_Name, Last_Name, Region, District, Job_Function, Email_Address, Job_Group_Code
FROM dbo.v_TMS_employee_HR
WHERE Job_Group_Code=1
ORDER BY NEWID())
UNION
...................
UNION
...................
...................
UNION
SELECT * FROM (
SELECT TOP (10) Business_Unit, GEMSID, First_Name, Last_Name, Region, District, Job_Function, Email_Address, Job_Group_Code
FROM dbo.v_TMS_employee_HR
WHERE Job_Group_Code=8
ORDER BY NEWID())
Since you clarified that there are several job_group_codes in a group you will need to use WHERE Job_Group_Code IN (1,2,3) instead.

If you have just 8 group and it's one time thing, please try #PM 77-1 suggested. However, I would use UNION ALL instead UNION.
If you have more group or number of records selected from each group are different, you may try the following way
DECLARE #GroupSelect TABLE (Job_Group_Code INT, NumberOfRecord INT)
INSERT INTO #GroupSelect VALUES (1 ,45), (2 ,50) , .... -- List all your group and number of records your want select from them
;WITH tbl AS (
SELECT Business_Unit, GEMSID, First_Name, Last_Name, Region, District, Job_Function, Email_Address, Job_Group_Code
, ROW_NUMBER() OVER (PARTITION BY Job_Group_Code ORDER BY NEWID()) as RowNo
FROM dbo.v_TMS_employee_HR
)
, numbers (
-- if you don't have number table, you may use this.
select number + 1 as number from master..spt_values WHERE type = 'P'
)
SELECT t.*
from tbl t
INNER JOIN #GroupSelect sg
ON sg.Job_Group_Code = t.Job_Group_Code
INNER JOIN numbers n
ON sg.NumberOfRecord >= n.number
WHERE n.number = t.RowNo

Related

SUM UP two columns and then find the find MAX value in SQL Server

I am working with Microsoft SQL Server and want to find E_ID and E_Name where T1+T2 has the MAX value.
I have two steps to reach the necessary result:
Find the sum of two columns AS "total" in a table
Find the row that contains the maximum value from total
Table named "table1" looks like the following (T2 may contains NULL values):
E_ID
E_Name
T1
T2
1
Alice
55
50
2
Morgan
60
40
3
John
65
4
Monica
30
10
5
Jessica
25
6
Smith
20
5
Here is what I've tried:
SELECT
E_ID, E_Name, MAX(total) AS max_t
FROM
(SELECT
E_ID, E_Name, ISNULL(T1, 0) + ISNULL(T2, 0) AS total
FROM
table1) AS Q1;
I get this error:
'Q1.E_ID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I get the result only when I keep MAX(total) AS max_t in the SELECT part but I also want to have the columns E_ID and E_Name.
Try this - just sort by the Total column in a descending fashion, and take the first row in the result:
SELECT TOP (1)
Q1.E_ID, Q1.E_Name, Q1.Total
FROM
(SELECT
E_ID, E_Name, ISNULL(T1, 0) + ISNULL(T2, 0) AS Total
FROM
table1) AS Q1
ORDER BY
Q1.Total DESC;
You can use the query:
SELECT top 1 E_ID, E_Name, (T1+T2) as Total
FROM Table1
GROUP BY E_ID,E_Name
ORDER BY Total desc
SELECT
TOP (1) , E_ID, E_Name, ISNULL(T1, 0) + ISNULL(T2,0) AS Total
FROM
table1
ORDER BY
Total DESC;
If you wanna see all the records you just need to apply the GROUP BY clause.
SELECT
E_ID
, E_Name
, MAX(total) AS max_t
FROM
(SELECT
E_ID
, E_Name
, ISNULL(T1, 0) + ISNULL(T2, 0) AS total
FROM
table1
) AS Q1
GROUP BY
E_ID
, E_Name;
If you want to see only the MAX value in the dataset you just need to apply the TOP 1 (for one record in the result), then sum the T1 and T2 as total and
then apply the ORDER BY DESC;
SELECT TOP 1
E_ID
, E_Name
, (T1 + T2) AS total
FROM
table1
ORDER BY
total DESC
You can do it without subqueries, by ordering purposefully your data:
SELECT TOP 1
E_Name
FROM tab
ORDER BY COALESCE(T1,0)+COALESCE(T2,0) DESC
Check the demo here.

Selecting some row numbers from SQL

I'm using SQL Server management studio 2008 and use TOP to select some data from DB.
SELECT
TOP 3 Name, Company, ta_Content, Email, Writedate
FROM dbo.ta_CONTACT
WHERE Name in ('David', 'Filo', 'Rain', 'Cone', 'Source', 'Tailor', 'Fier', 'Venesse')
ORDER BY Writedate;
So by using TOP 3, I can collect top 3 data from 8 given data. But What I want to do is selecting 5-7th data from the given 8 data.
I may use ROW_NUMBER() but I want to use TOP logic by using NOT IN. But I'm not sure where to put NOT IN logic to show 5-7th data only.
Try this query:
SELECT * FROM (
SELECT ROW_NUMBER() OVER (ORDER BY Writedate ASC) AS rownumber,
Name, Company, ta_Content, Email, Writedate
FROM dbo.ta_CONTACT
) AS t
WHERE rownumber >= 5 AND rownumber <= 7
Try this query.It is not right way.
select TOP 2 Name, Company, ta_Content, Email, Writedate from (
SELECT TOP 3 Name, Company, ta_Content, Email, Writedate FROM dbo.ta_CONTACT
WHERE Name in ('David', 'Filo', 'Rain', 'Cone', 'Source', 'Tailor', 'Fier', 'Venesse')
ORDER BY Writedate desc)a ORDER BY Writedate
Using TOP (3) and filter out the row 1 - 4, then you can get 5 - 7 records
SELECT TOP (3)
*
FROM
(
SELECT
ROW_NUMBER() OVER (ORDER BY Writedate) Seq,
Name, Company, ta_Content, Email, Writedate
FROM dbo.ta_CONTACT
WHERE
-- The search criteria must be inside the inner select statement
Name in
('David', 'Filo', 'Rain', 'Cone', 'Source', 'Tailor', 'Fier', 'Venesse')
) filtered
WHERE
Seq >= 5
-- If you do not need the TOP statement just use below condition
-- Seq BETWEEN 5 AND 7

Aggregation and Total

SELECT Region ,
flag ,
Name,
COUNT(ID) AS 'CountWithFlag'
FROM Table
GROUP BY flag
this query gives me the following results. I am grouping by flag and I am able to get the counts for English/non-English based on flag. I also want to display Total Counts of English and non-English adjacent to counts
OUTPUT:
Region Flag Name CountWithFlag
a 0 English 100
b 1 Non-English 200
c 0 English 100
d 1 Non-English 200
DESIRED OUTPUT:
Region Flag Name CountWithFlag Total
a 0 English 100 200
b 1 Non-English 200 400
c 0 English 100 200
d 1 Non-English 200 400
How can I do that? I want to apply group by for specific counts with flag. But I also want to get total counts in same query!
Any inputs on how I can do that?
Another way would be something like this:
;
WITH agg1
AS (
SELECT region,
flag,
name,
COUNT(id) AS 'CountWithFlag'
FROM [dbo].[t2]
GROUP BY region,
flag,
name
),
agg2
AS (
SELECT [name],
COUNT(id) AS CountByName
FROM [dbo].[t2]
GROUP BY [name]
)
SELECT [agg1].[region],
[agg1].[flag],
[agg1].[name],
[agg1].[CountWithFlag],
[agg2].[CountByName]
FROM [agg1]
INNER JOIN [agg2]
ON [agg2].[name] = [agg1].[name]
try this
;
WITH cte
AS ( SELECT DISTINCT
Region ,
flag ,
Name ,
COUNT(ID) OVER ( PARTITION BY flag, Region, Name ) AS [CountWithFlag]
FROM [Table]
)
SELECT Region ,
flag ,
Name ,
SUM([CountWithFlag]) OVER ( PARTITION BY Name ) AS Total
FROM cte
If you want to avoid using window functions, you can do that:
SELECT
Region,
flag,
Name,
COUNT(ID) AS CountWithFlag,
(select count(ID) from Table as tbl1 where tbl1.Name=tbl.Name) AS Total
from Table as tbl
group by Region, flag, Name
But my opinion is that window aggregation should work much faster.
If you want use window aggregation then do this:
select
Region,
flag,
Name,
CountWithFlag,
sum(CountWithFlag) over(partition by Name) as Total
from (
SELECT
Region,
flag,
Name,
COUNT(ID) AS CountWithFlag
from Table as tbl
group by Region, flag, Name
) as tbl

SQL return n rows per row value

Greetings SQL people of all nations.
Simple question, hopefully a simple answer.
I have an Oracle database table with persons' information. Columns are:
FirstName, LastName, BirthDate, BirthCountry
Let's say in this table I have 1500 persons born in Aruba (BirthCountry = "Aruba"), 678 Botswanans (BirthCountry = "Botswana"), 13338 Canadians (BirthCountry = "Canadia").
What query would I need to write extract a sample batch of 10 records from each country? It doesn't matter which 10, just as long as there are 10.
This one query would output 30 rows, 10 rows from each BirthCountry.
This will select 10 youngest people from each country:
SELECT *
FROM (
SELECT p.*,
ROW_NUMBER() OVER (PARTITION BY birthCountry ORDER BY birthDate DESC) rn
FROM persons p
)
WHERE rn <= 10
This would pick ten random persons, different ones each time you run the query:
select *
from (
select row_number() over (partition by BirthCountry
order by dbms_random.value) as rn
, FirstName
, LastName
, BirthDate
, BirthCountry
from YourTable
)
where rn <= 10

SQL - One result per user record

I have a web site that collects high scores for a game - the sidebar shows the latest 10 scores (not necessarily the highest, just the latest 10). However, since a user can play multiple games quickly, they can dominate the latest 10 list. How can I write an SQL squery to show the last 10 scores but limit it to one per user?
SELECT username, max(score)
FROM Sometable
GROUP BY username
ORDER BY Max(score) DESC
and from that, select the top X depending on your db platform. select top(10) in ms-sql 2005+
edit
sorry, I see that you want things ordered by date.
Here's a working query with ms-sql 2005.
;
WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY username ORDER BY dateadded DESC) AS 'RowNo',
username, score, dateadded FROM SomeTable
)
SELECT username, score, dateadded FROM CTE
WHERE RowNo = 1
Group by user... and either select the Max(Score), Max([Submission Date]) or whatever.
In SQL Server, you could use the RANK() OVER() with appropriate PARTITION and GROUP BY, but what platform are you using?
In the interest of providing another point of view you could just add a field "max score" to your user table and then use a simple query with an order by to get the top 10.
Your update will need to check if the new score if higher then the current max score.
It does have the advantage of querying a table that will most probably have less rows then your score table.
Anyway, just another option to consider.
SELECT s2.*
FROM
(SELECT user_id, MAX(action_time) AS max_time
FROM scores s1 GROUP_BY user_id
ORDER BY MAX(action_time) DESC LIMIT 10)s1
INNER JOIN scores s2 ON (s2.user_id = s1.user_id AND s2.action_time = s1.max_time)
This is Mysql syntax, for SQL server you need to use SELECT TOP 10 ... instead of LIMIT 10.
Here is a working example that I built on SQL Server 2008
WITH MyTable AS
(
SELECT 1 as UserId, 10 as Score UNION ALL
SELECT 1 as UserId, 11 as Score UNION ALL
SELECT 1 as UserId, 12 as Score UNION ALL
SELECT 2 as UserId, 13 as Score UNION ALL
SELECT 2 as UserId, 14 as Score UNION ALL
SELECT 3 as UserId, 15 as Score UNION ALL
SELECT 3 as UserId, 16 as Score UNION ALL
SELECT 3 as UserId, 17 as Score UNION ALL
SELECT 4 as UserId, 18 as Score UNION ALL
SELECT 4 as UserId, 19 as Score UNION ALL
SELECT 5 as UserId, 20 as Score UNION ALL
SELECT 6 as UserId, 21 as Score UNION ALL
SELECT 7 as UserId, 22 as Score UNION ALL
SELECT 7 as UserId, 23 as Score UNION ALL
SELECT 7 as UserId, 24 as Score UNION ALL
SELECT 8 as UserId, 25 as Score UNION ALL
SELECT 8 as UserId, 26 as Score UNION ALL
SELECT 9 as UserId, 26 as Score UNION ALL
SELECT 10 as UserId, 20 as Score
),
MyTableNew AS
(
SELECT Row_Number() OVER (Order By UserId) Sequence, *
FROM MyTable
),
RankedUsers AS
(
SELECT *, Row_Number() OVER (Partition By UserId ORDER BY Sequence DESC) Ranks
FROM MyTableNew
)
SELECT *
FROM MyTableNew
WHERE Sequence IN
(
SELECT TOP 5 Sequence
FROM RankedUsers
WHERE Ranks = 1
ORDER BY Sequence DESC
)