Taking Sample in SQL Query - sql

I'm working on a problem which is something like this :
I have a table with many columns but major are DepartmentId and EmployeeIds
Employee Ids Department Ids
------------------------------
A 1
B 1
C 1
D 1
AA 2
BB 2
CC 2
A1 3
B1 3
C1 3
D1 3
I want to write a SQL query such that I take out 2 sample EmployeeIds for each DepartmentID.
like
Employee Id Dept Ids
B 1
C 1
AA 2
CC 2
D1 3
A1 3
Currently I am writing the query,
select
EmployeeId, DeptIds, count(*)
from
table_name
group by 1,2
sample 2
but it gives me total two rows.
Any help?

If the number of departments i know and small you could do a stratified sampling:
select *
from table_name
sample
when DeptIds = 1 then 2
when DeptIds = 2 then 2
when DeptIds = 3 then 2
end
Otherwise a combination of RANDOM and ROW_NUMBER:
select *
from
(
sel EmployeeId, DeptIds, random(1,10000000) as rand
from table_name
) as dt
qualify
row_number()
over (partition by DeptIds
order by rand) <= 2

Related

select rows from main table based on highest date in child table between a date range

Sorry for the confusing title.
I've this table:
ApplicantID Applicant Name
-------------------------------
1 Sandeep
2 Thomas
3 Philip
4 Jerin
ALong with this child table which is connected with the above table:
DetailsID ApplicantID CourseName Dt
---------------------------------------------------------------------
1 1 C1 10/5/2014
2 1 C2 10/18/2014
3 1 c3 7/3/2014
4 2 C1 3/2/2014
5 2 C2 10/18/2014
6 2 c3 1/1/2014
7 3 C1 1/5/2014
8 3 C2 4/18/2014
9 3 c3 2/23/2014
10 4 C1 3/15/2014
11 4 C2 2/20/2014
12 4 C2 2/20/2014
I want to get applicantsID, for example, when I specify a date range from
4/20/2014 to 3/5/2014 I should have:
ApplicantID Applicant Name
-------------------------------
3 Philip
4 Jerin
That means the applicants from the main table that must be in the second table and also the highest date of the second table must fall in the specified date range. Hope the scenario is clear.
you can use window analytic function row_number to get applicant with maximum date in the given time range.
select T1.[ApplicantID], [Applicant Name]
from Table1 T1
join ( select [ApplicantID],
ROW_NUMBER() over ( partition by [ApplicantID] order by Dt desc) as rn
from Table2
where Dt BETWEEN '3/5/2014' AND '4/20/2014'
) T
on T1.[ApplicantID] = T.[ApplicantID]
and T.rn =1
You will need to pull the MAX per ApplicantId with a GROUP BY in a sub-query, then JOIN to that result. This should work for you:
Select A.ApplicantId, A.[Applicant Name]
From ApplicantTableName A
Join
(
Select D.ApplicantId, Max(D.Dt) DT
From DetailsTableName D
Group By D.ApplicantId
) B On A.ApplicantId = B.ApplicantId
Where B.DT Between '03/05/2014' And '04/20/2014'

SQL. How to select multiple rows by using the MIN and GROUP BY

ID UserId Name Amount RewardId
----------------------------
1 1 James 10.00 1
2 1 James 10.00 2
3 1 James 10.00 3
4 2 Dave 20.00 1
5 2 Dave 20.00 3
6 3 Lim 15.00 2
I'm trying to insert to another table, and this is the result that i'm struggling with:
Tbl1ID RewardId
------------------
1 1
1 2
1 3
4 1
4 3
6 2
I'm trying to get the MIN(ID) of each person and select all the RewardId that belong to that person.
You could do a simple self join to get the minimum id value per userid/rewardid combination;
SELECT MIN(a.id) Tbl1ID, b.RewardId
FROM mytable a
JOIN mytable b
ON a.name = b.name
GROUP BY b.userid, b.rewardid
ORDER BY tbl1id, rewardid;
An SQLfiddle to test with.
If you are running SQL Server 2008+, you can simplify it by using Window Function.
INSERT INTO AnotherTable (Tbl1ID, RewardID)
SELECT MIN(ID) OVER (PARTITION BY Name),
RewardID
FROM SourceTable
SQLFiddle Demo
Try this
SELECT tbl1id,RewardID From
table1 S JOIN
(
SELECT MIN(ID) as tbl1id,Name FROM table1 GROUP BY Name
) T ON T.Name = S.Name
ORDER BY tbl1id
FIDDLE DEMO
Output:
Tbl1ID RewardId
----------------
1 1
1 2
1 3
4 1
4 3
6 2
If you want insert into new table then try this out
Insert into Newtable (tbl1id,RewardID)
SELECT tbl1id,RewardID from
table1 S JOIN
(
SELECT MIN(ID) as tbl1id,Name
FROM table1
GROUP BY Name
) T ON T.Name = S.Name
ORDER BY tbl1id;
FIDDLE DEMO

SQL Server matching all rows from Table1 with all rows from Table2

someone please help me with this query,
i have 2 tables
Employee
EmployeeID LanguageID
1 1
1 2
1 3
2 1
2 3
3 1
3 2
4 1
4 2
4 3
Task
TaskID LanguageID LangaugeRequired
1 1 1
1 2 0
2 1 1
2 2 1
2 3 1
3 2 0
3 3 1
LangaugeID is connected to table langauge (this table is for explaination only)
LangaugeID LanguageName
1 English
2 French
3 Italian
is there a possilbe way to make a query which gets employees where they can speak all the languages required for each task?
for example:
Task ID 1 requires only LanguageID = 1, so the result should be EmployeeID 1,2,3,4
Task ID 2 requires all 3 languages, so the result should be EmployeeID 1,4
Task ID 3 requires only LanguageID = 3, so the result should be EmployeeID 1,2,4
here is another variant to do this:
select t1.taskid, t2.employeeid from
(
select a.taskid, count(distinct a.languageid) as lang_cnt
from
task as a
where a.LangaugeRequired=1
group by a.taskid
) as t1
left outer join
(
select a.taskid, b.employeeid, count(distinct b.languageid) as lang_cnt
from
task as a
inner join
employee as b
on (a.LangaugeRequired=1 and a.languageid=b.languageid)
group by a.taskid, b.employeeid
) as t2
on (t1.taskid=t2.taskid and t1.lang_cnt=t2.lang_cnt)
###
here you can insert where statement, like:
where t1.taskid=1 and t2.employeeid=1
if such query returns row - this employee can work with this task, if no rows - no
###
order by t1.taskid, t2.employeeid
as you see, this query creates two temporary tables and then joins them.
first table (t1) calculates how many languages are required for each task
second table (t2) finds all employees who has at least 1 language required for task, groups by task/employee to find how many languages can be taken by this employee
the main query performs LEFT JOIN, as there can be situations when no employees can perform task
here is the output:
task employee
1 1
1 2
1 3
1 4
2 1
2 4
3 1
3 2
3 4
update: simpler, but less correct variant, because it will not return tasks without possible employees
select a.taskid, b.employeeid, count(distinct b.languageid) as lang_cnt
from
task as a
inner join
employee as b
on (a.LangaugeRequired=1 and a.languageid=b.languageid)
group by a.taskid, b.employeeid
having count(distinct b.languageid) = (select count(distinct c.languageid) from task as c where c.LangaugeRequired=1 and c.taskid=a.taskid)
Another version using NOT EXISTS
Retrieve all task-employee combinations where a missing language does not exist
SELECT t1.EmployeeId, t2.TaskId
FROM (
SELECT DISTINCT EmployeeID
FROM Employee
) t1 , (
SELECT DISTINCT TaskID
FROM Task
) t2
WHERE NOT EXISTS (
SELECT 1 FROM Task t
LEFT JOIN Employee e
ON e.EmployeeID = t1.EmployeeID
AND e.LanguageID = t.LanguageID
WHERE t.TaskID = t2.TaskID
AND LanguageRequired = 1
AND e.EmployeeID IS NULL
)
http://www.sqlfiddle.com/#!6/e3c78/1
You could use a Join logic to get the result, something like:
SELECT a.EmployeeID FROM Employee a, Task b WHERE b.LanguageRequired == a.LanguageID;

Show COUNT of each possible grade for an employee, showing zero when there are no grade entries

I have only one table available. I want to show the grade and the count of the number of times an employee has that grade recorded, but it must show a 0 for the grade if there are no records for that employee. I know how to do this using left join when two tables are present, but I only have 1 table.
How is this possible?
For example:
TABLE
empID | dept | grade
1 | 11 | a
2 | 11 | a
3 | 11 | b
1 | 22 | c
2 | 22 | f
3 | 22 | d
1 | 33 | a
2 | 33 | a
3 | 33 | a
If I run SELECT grade, count(grade) from table where empID = 1 Group by grade;, for example, it ends up printing out only grades the employee got and the count. Now I want to also print out the 0s for grades the employee did not have.
i think you're asking for this?
SQL> select e.grade, count(e2.empid)
2 from (select distinct grade from e) e
3 left outer join e e2
4 on e2.grade = e.grade
5 and e2.empid = 1
6 group by e.grade
7 order by grade;
G COUNT(E2.EMPID)
- ---------------
a 2
b 0
c 1
d 0
f 0
or as you have no rows with "e" grade then if you have a lookup table called grade:
SQL> select * from grade;
G
-
a
b
c
d
e
f
SQL> select e.grade, count(e2.empid)
2 from grade emp
3 left outer join emp e2
4 on e2.grade = e.grade
5 and e2.empid = 1
6 group by e.grade
7 order by grade;
G COUNT(E2.EMPID)
- ---------------
a 2
b 0
c 1
d 0
e 0
f 0
Let's say your query to select a value is:
select value from tbl;
You can ensure a 0 is returned if there are no rows in tbl t:
select nvl(t.value, 0) value
from dual d
left join tbl t on 1=1;
Sounds like you want the NVL function. With NVL, you can conditionally return an alternate value if the value is null. See the documentation.
So, if you had the following...
SELECT fooName, fooNumber FROM foo
and these were your results
fooName, fooNumber
Blah, 1
Asdf, null
Qwer, 3
poiu, null
you could rewrite the query like this...
SELECT fooName, NVL(fooNumber, 0) FROM foo
and your results would now be...
fooName, fooNumber
Blah, 1
Asdf, 0
Qwer, 3
poiu, 0
http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions105.htm

SQL: Getting the full record with the highest count

I'm trying to write sql that produces the desired result from the data below.
data:
IDNum Opt1 Opt2 Opt3 Count
1 A A E 1
1 A B J 4
2 A A E 9
3 B A F 1
3 B C K 14
4 A A M 3
5 B D G 5
6 C C E 13
6 C C M 1
desired result:
IDNum Opt1 Opt2 Opt3 Count
1 A B J 4
2 A A E 9
3 B C K 14
4 A A M 3
5 B D G 5
6 C C E 13
Essentially I want, for each ID Num, the full record with the highest count. I tried doing a group by, but if I group by Opt1, Opt2, Opt3, this doesn't work because it returns the highest count for each (ID Num, Opt2, Opt3, Opt4) combination which is not what I want. If I only group by ID Num, I can get the max for each ID Num but I lose the information as to which (Opt1, Opt2, Opt3) combination gives this count.
I feel like I've done this before, but I don't often work with sql and I can't remember how. Is there an easy way to do this?
Edit
Prior to op clarifying question for access this would have worked. I am not famillar with access to know if this query would be supported.
I think this will work on SQL Server.
select * from data
inner join (select idnum, max(count) from data
group by idNum )sub
on sub.IdNum=data.IdNum && sub.Count=data.Count
Of course if you have two id's with the same count it would return both rows...
Something like this:
SELECT * FROM table AS t1
JOIN ( SELECT id, max(count) as Id FROM table GROUP BY id ) AS t2
ON t1.id = t2.id AND t1.id = t2.id
This assumes that no idnum has the same max count or you'll get two idnums
Try this query:
SELECT * FROM my_table
GROUP BY IDNum
HAVING Count = MAX(Count)
It should work on Access, but I didn't test it.