Selecting objects that are associated with similar datasets

Selecting objects that are associated with similar datasets - sql

I'm trying to select all company rows from a [Company] table that share with at least one other company, the same number of employees (from an [Employee] table that has a CompanyId column), where each group of respective employees share the same set of LocationIds (a column in the [Employee] table) and in the same proportion.
So, for instance, two companies with three employees each that have the locationIds 1,2, and 2, would be selected by this query.
[Employee]
EmployeeId | CompanyId | LocationId |
========================================
1 | 1 | 1
2 | 1 | 2
3 | 1 | 2
4 | 2 | 1
5 | 2 | 2
6 | 2 | 2
7 | 3 | 3
[Company]
CompanyId |
============
1 |
2 |
3 |
Returns the CompanyIds:
======================
1
2
CompanyIds 1 and 2 are selected because they share in common with at least one other company: 1. the number of employees (3 employees); and 2. the number/proportion of LocationIds associated with those employees (1 employee has LocationId 1 and 2 employees have LocationId 2).
So far I think I want to use a HAVING COUNT(?) > 1 statement, but I'm having trouble working out the details. Does anyone have any suggestions?

This is ugly, but the only way I can think of to do it:
;with CTE as (
select c.Id,
(
select e.Location, count(e.Id) [EmployeeCount]
from Employee e
where e.IdCompany=c.Id
group by e.Location
order by e.Location
for xml auto
) LocationEmployeeData
from Company c
)
select c.Id
from Company c
join (
select x.LocationEmployeeData, count(x.Id) [CompanyCount]
from CTE x
group by x.LocationEmployeeData
having count(x.Id) >= 2
) y on y.LocationEmployeeData = (select LocationEmployeeData from CTE where Id = c.Id)
See fiddle: http://www.sqlfiddle.com/#!6/6bc16/5
It works by encoding the Employee count per Location data (multiple rows) into an xml string for each Company.
The CTE code on its own:
select c.Id,
(
select e.Location, count(e.Id) [EmployeeCount]
from Employee e
where e.IdCompany=c.Id
group by e.Location
order by e.Location
for xml auto
) LocationEmployeeData
from Company c
Produces data like:
Id LocationEmployeeData
1 <e Location="1" EmployeeCount="2"/><e Location="2" EmployeeCount="1"/>
2 <e Location="1" EmployeeCount="2"/><e Location="2" EmployeeCount="1"/>
3 <e Location="3" EmployeeCount="1"/>
Then it compares companies based on this string (rather than trying to ascertain whether multiple rows match, etc).

An alternative solution could look like this. However it also requires performance testing in advance (I don't feel quite confident with <> type join).
with List as
(
select
IdCompany,
Location,
row_number() over (partition by IdCompany order by Location) as RowId,
count(1) over (partition by IdCompany) as LocCount
from
Employee
)
select
A.IdCompany
from List as A
inner join List as B on A.IdCompany <> B.IdCompany
and A.RowID = B.RowID
and A.LocCount = B.LocCount
group by
A.IdCompany, A.LocCount
having
sum(case when A.Location = B.Location then 1 else 0 end) = A.LocCount
Related fiddle: http://sqlfiddle.com/#!6/d9f2e/1

Related

Full recursive employee-boss relation in SQL Server

I need to get the name of all of the employees that depends of a person directly or indirectly. Using the query in this example (from https://rextester.com/WGVRGJ67798),
create table employee(
id int not null,
employee varchar(10) not null,
boss int null
)
insert into employee values
(1,'Anna',null),
(2,'Bob',1),
(3,'Louis',1),
(4,'Sara',2),
(5,'Sophie',2),
(6,'John',4);
with boss as (
select id, employee, boss, cast(null as varchar(10)) as name
from employee
where boss is null
union all
select e.id, e.employee, b.id, b.employee
from employee e
join boss b on b.id = e.boss
)
select * from boss
I can get this result:
However, I need to see this:
It would be like showing all the possible relations between a person an all of those employees "below" him or her.

You can reverse the logic: instead of starting from the boss (the root) and going towards employees (the leafs), you could start from the leafs and walk toward the root. This lets you generate the intermediate relations as you go:
with cte as (
select e.id, e.employee, e.boss, b.employee name, b.boss new_boss
from employee e
left join employee b on b.id = e.boss
union all
select c.id, c.employee, c.new_boss, e.employee, e.boss
from cte c
join employee e on e.id = c.new_boss
)
select id, employee, boss, name
from cte
order by id, boss
Demo on DB Fiddle:
id | employee | boss | name
-: | :------- | ---: | :---
1 | Anna | null | null
2 | Bob | 1 | Anna
3 | Louis | 1 | Anna
4 | Sara | 1 | Anna
4 | Sara | 2 | Bob
5 | Sophie | 1 | Anna
5 | Sophie | 2 | Bob
6 | John | 1 | Anna
6 | John | 2 | Bob
6 | John | 4 | Sara

I like hierarchyid for this sort of thing.
use tempdb;
drop table if exists employee;
drop table if exists #e;
create table employee(
id int not null,
employee varchar(10) not null,
boss int null
)
insert into employee values
(1,'Anna',null),
(2,'Bob',1),
(3,'Louis',1),
(4,'Sara',2),
(5,'Sophie',2),
(6,'John',4);
with boss as (
select id, employee, boss,
cast(concat('/', id, '/') as hierarchyid) as h
from employee
where boss is null
union all
select e.id, e.employee, b.id,
cast(concat(b.h.ToString(), e.id, '/') as hierarchyid)
from employee e
join boss b on b.id = e.boss
)
select *
into #e
from boss
select e.id, e.employee, b.id, b.employee, b.h.ToString()
from #e as e
left join #e as b
on e.h.IsDescendantOf(b.h) = 1
and e.id <> b.id;
I took your code mostly as is and changed the following things:
Rather than keeping track of the boss in the recursive CTE, I'm building a hierarchyid path that leads all the way back to the root of the hierarchy.
Shoved the results of the cte into a temp table
Selected from the temp table, using a self-join where the join criteria are "where the inner table's notion of employee is anywhere in the management chain for the outer table".
Note, for the join, I'm excluding the case where the employee reports to themselves; you cannot be your own boss in this situation (even though the IsDescendantOf method would suggest otherwise!).

Something like this. There are two recursions. First, to get the h_level which with the first recursion represent boss-->employee relationships. Second, treats each row from the first as the leaf node in a new recursion to find direct and indirect hierarchical relationships.
Data
drop table if exists Employee;
go
create table employee(
id int not null,
employee varchar(10) not null,
boss int null)
insert into employee values
(1,'Anna',null),
(2,'Bob',1),
(3,'Louis',1),
(4,'Sara',2),
(5,'Sophie',2),
(6,'John',4);
Query
;with
boss(id, employee, boss, h_level) as (
select id, employee, boss, 0
from employee
where boss is null
union all
select e.id, e.employee, b.id, b.h_level+1
from employee e
join boss b on b.id = e.boss),
downlines(id, employee, boss, h_level, d_level) as (
select id, employee, boss, h_level, 0
from boss
union all
select b.id, b.employee, d.id, d.h_level, d.d_level+1
from boss b
join downlines d on d.id = b.boss)
select *
from downlines
order by h_level, d_level;
Output
id employee boss h_level d_level
1 Anna NULL0 0
2 Bob 1 0 1
3 Louis 1 0 1
4 Sara 2 0 2
5 Sophie 2 0 2
6 John 4 0 3
2 Bob 1 1 0
3 Louis 1 1 0
4 Sara 2 1 1
5 Sophie 2 1 1
6 John 4 1 2
4 Sara 2 2 0
5 Sophie 2 2 0
6 John 4 2 1
6 John 4 3 0

show only categories that have products in them

excuse the bad title but I couldn't find a good way to express what I want in abstract terms.
Anyway I have 3 tables
tbl_product:
PID | productname
1 | product 1
2 | product 2
3 | product 3
4 | product 4
..
tbl_categories, motherCategory allows me to nest categories:
CID | categoriename | motherCategory
1 | electronics | NULL
2 | clothing | NULL
3 | Arduino | 1
4 | Casings, extra's | 3
..
tbl_productInCategory PID and CID are foreign keys to PID and CID in tbl_product and tbl_categories respectively. A product can have multiple categories assigned to it so PID can occur more than once in this table.
PID | CID
1 | 1
2 | 1
3 | 3
4 | 4
Now I have a query that returns all categories if I give the mothercategory.
What I want to do is show ONLY the categories that have products in them recursively.
for instance on the example data above I show all categories(motherCategory is null), I want it to return only electronics since there are no products category 2, clothing.
However the problem I am having is that I also want this to work recursively. Consider this tbl_productInCategory:
PID | CID
1 | 2
2 | 2
3 | 2
4 | 4
Now it should return both clothing and electronics even though there are no products in electronics, because there are products in the nested category arduino->Casings, extra's. If I show all categories with motherCategory, electronics it should also return arduino.
I can't figure out how to do this and any help or pointers are appreciated.

First you should select all categories where products exist. On the next steps select mother categories.
WITH CTE AS
(
SELECT tbl_categories.*
FROM
tbl_categories
JOIN tbl_productInCategory on tbl_productInCategory.CID = tbl_categories.CID
UNION ALL
SELECT tbl_categories.*
FROM tbl_categories
JOIN CTE on tbl_categories.CID = CTE.motherCategory
)
SELECT DISTINCT * FROM CTE

Use a recursive CTE to get a derived table of your category tree, and then INNER JOIN it to your ProductCategory table.

It's not something I've done before, but some googling indicates it is possible.
https://technet.microsoft.com/en-us/library/ms186243(v=sql.105).aspx
The semantics of the recursive execution is as follows:
Split the CTE expression into anchor and recursive members.
Run the anchor member(s) creating the first invocation or base result set (T0).
Run the recursive member(s) with Ti as an input and Ti+1 as an output.
Repeat step 3 until an empty set is returned.
Return the result set. This is a UNION ALL of T0 to Tn.
USE AdventureWorks2008R2;
GO
WITH DirectReports (ManagerID, EmployeeID, Title, DeptID, Level)
AS
(
-- Anchor member definition
SELECT e.ManagerID, e.EmployeeID, e.Title, edh.DepartmentID,
0 AS Level
FROM dbo.MyEmployees AS e
INNER JOIN HumanResources.EmployeeDepartmentHistory AS edh
ON e.EmployeeID = edh.BusinessEntityID AND edh.EndDate IS NULL
WHERE ManagerID IS NULL
UNION ALL
-- Recursive member definition
SELECT e.ManagerID, e.EmployeeID, e.Title, edh.DepartmentID,
Level + 1
FROM dbo.MyEmployees AS e
INNER JOIN HumanResources.EmployeeDepartmentHistory AS edh
ON e.EmployeeID = edh.BusinessEntityID AND edh.EndDate IS NULL
INNER JOIN DirectReports AS d
ON e.ManagerID = d.EmployeeID
)
-- Statement that executes the CTE
SELECT ManagerID, EmployeeID, Title, DeptID, Level
FROM DirectReports
INNER JOIN HumanResources.Department AS dp
ON DirectReports.DeptID = dp.DepartmentID
WHERE dp.GroupName = N'Sales and Marketing' OR Level = 0;
GO

City names as column title

I have two tables.
Food Table
--------------------------
ID CityID FoodName
--------------------------
1 1 FoodA
2 1 FoodB
3 1 FoodC
4 2 FoodW
5 2 FoodX
6 2 FoodY
7 2 FoodZ
City Table
--------------------------
ID CityName
--------------------------
1 Memphis
2 Nashville
3 Chattanooga
So How can I use CityName s as Column title and list the food in that city.
--------------------------------------
Memphis Nashville Chattanooga
--------------------------------------
FoodA FoodW
FoodB FoodX
FoodC FoodY
FoodZ
I'm pretty sure on that I have to use pivot but I couldn't find a good solution yet.
This is what I've achieved so far.
SELECT *
FROM (
SELECT *
FROM Food F
INNER JOIN City C ON C.ID = F.CityID
) DataTable D
PIVOT(F.FoodName FOR C.CityName IN (
[Memphis]
,[Nashville]
,[Chattanooga]
)) PivotTable

you can use this query to get your output. Actually you did some mistakes to setup the pivot query.
select Memphis,Nashville,Chattanooga
from
(
select f.ID,c.CityName,f.FoodName
from Food f
inner join City c
on f.CityID=c.id
)result
pivot
(
max(FoodName)
for CityName in(Memphis,Nashville,Chattanooga)
) as pvt

The PIVOT operator uses the columns from the data table that are not in the PIVOT definition as GROUP anchor.
That mean that two values will be in the same row of a PIVOT table when they have the same value in the columns of data table that are neither the aggregated one or the pivoted one.
The OP data don't have this value so a new partitioned id is generated.
SELECT Memphis, Nashville, Chattanooga
FROM (SELECT c.CityName, f.FoodName
, FoodID = Row_Number() OVER (PARTITION BY c.ID ORDER BY FoodName)
FROM Food f
INNER JOIN City c ON f.CityID = c.id) d
PIVOT
(MAX(FoodName) FOR CityName IN (Memphis,Nashville,Chattanooga)) pvt

SQL Server matching all rows from Table1 with all rows from Table2

someone please help me with this query,
i have 2 tables
Employee
EmployeeID LanguageID
1 1
1 2
1 3
2 1
2 3
3 1
3 2
4 1
4 2
4 3
Task
TaskID LanguageID LangaugeRequired
1 1 1
1 2 0
2 1 1
2 2 1
2 3 1
3 2 0
3 3 1
LangaugeID is connected to table langauge (this table is for explaination only)
LangaugeID LanguageName
1 English
2 French
3 Italian
is there a possilbe way to make a query which gets employees where they can speak all the languages required for each task?
for example:
Task ID 1 requires only LanguageID = 1, so the result should be EmployeeID 1,2,3,4
Task ID 2 requires all 3 languages, so the result should be EmployeeID 1,4
Task ID 3 requires only LanguageID = 3, so the result should be EmployeeID 1,2,4

here is another variant to do this:
select t1.taskid, t2.employeeid from
(
select a.taskid, count(distinct a.languageid) as lang_cnt
from
task as a
where a.LangaugeRequired=1
group by a.taskid
) as t1
left outer join
(
select a.taskid, b.employeeid, count(distinct b.languageid) as lang_cnt
from
task as a
inner join
employee as b
on (a.LangaugeRequired=1 and a.languageid=b.languageid)
group by a.taskid, b.employeeid
) as t2
on (t1.taskid=t2.taskid and t1.lang_cnt=t2.lang_cnt)
###
here you can insert where statement, like:
where t1.taskid=1 and t2.employeeid=1
if such query returns row - this employee can work with this task, if no rows - no
###
order by t1.taskid, t2.employeeid
as you see, this query creates two temporary tables and then joins them.
first table (t1) calculates how many languages are required for each task
second table (t2) finds all employees who has at least 1 language required for task, groups by task/employee to find how many languages can be taken by this employee
the main query performs LEFT JOIN, as there can be situations when no employees can perform task
here is the output:
task employee
1 1
1 2
1 3
1 4
2 1
2 4
3 1
3 2
3 4
update: simpler, but less correct variant, because it will not return tasks without possible employees
select a.taskid, b.employeeid, count(distinct b.languageid) as lang_cnt
from
task as a
inner join
employee as b
on (a.LangaugeRequired=1 and a.languageid=b.languageid)
group by a.taskid, b.employeeid
having count(distinct b.languageid) = (select count(distinct c.languageid) from task as c where c.LangaugeRequired=1 and c.taskid=a.taskid)

Another version using NOT EXISTS
Retrieve all task-employee combinations where a missing language does not exist
SELECT t1.EmployeeId, t2.TaskId
FROM (
SELECT DISTINCT EmployeeID
FROM Employee
) t1 , (
SELECT DISTINCT TaskID
FROM Task
) t2
WHERE NOT EXISTS (
SELECT 1 FROM Task t
LEFT JOIN Employee e
ON e.EmployeeID = t1.EmployeeID
AND e.LanguageID = t.LanguageID
WHERE t.TaskID = t2.TaskID
AND LanguageRequired = 1
AND e.EmployeeID IS NULL
)
http://www.sqlfiddle.com/#!6/e3c78/1

You could use a Join logic to get the result, something like:
SELECT a.EmployeeID FROM Employee a, Task b WHERE b.LanguageRequired == a.LanguageID;

Get the max value of a column from set of rows

I have a table like this
Table A:
Id Count
1 4
1 16
1 8
2 10
2 15
3 18
etc
Table B:
1 sample1.file
2 sample2.file
3 sample3.file
TABLE C:
Count fileNumber
16 1234
4 2345
15 3456
18 4567
and so on...
What I want is this
1 sample1.file 1234
2 sample2.file 3456
3 sample3.file 4567
To get the max value from table A I used
Select MAX (Count) from A where Id='1'
This works well but my problem is when combining data with another table.
When I join Table B and Table A, I need to get the MAX for all Ids and in my query I dont know what Id is.
This is my query
SELECT B.*,C.*
JOIN A on A.Id = B.ID
JOIN C on A.id = B.ID
WHERE (SELECT MAX(COUNT)
FROM A
WHERE Id = <what goes here????>)
To summarise, what I want is Values from Table B, FileNumber from Table c (where the count is Max for ID from table A).
UPDATE: COrrecting table C above. Looks like I need Table A.

I think this is the query you're looking for:
select b.*, c.filenumber from b
join (
select id, max(count) as count from a
group by id
) as NewA on b.id = NewA.id
join c on NewA.count = c.count
However, you should take into account that I don't get why for id=1 in tableA you choose the 16 to match against table C (which is the max) and for id=2 in tableA you choose the 10 to match against table C (which is the min). I assumed you meant the max in both cases.
Edit:
I see you've updated tableA data. The query results in this, given the previous data:
+----+---------------+------------+
| ID | FILENAME | FILENUMBER |
+----+---------------+------------+
| 1 | sample1.file | 1234 |
| 2 | sample2.file | 3456 |
| 3 | sample3.file | 4567 |
+----+---------------+------------+
Here is a working example

Using Mosty’s working example (renaming the keyword count to cnt for a column name), this is another approach:
with abc as (
select
a.id,
a.cnt,
rank() over (
partition by a.id
order by cnt desc
) as rk,
b.filename
from a join b on a.id = b.id
)
select
abc.id, abc.filename, c.filenumber
from abc join c
on c.cnt = abc.cnt
where rk = 1;

select
PreMax.ID,
B.FileName,
C2.FileNumber
from
( select C.id, max( C.count ) maxPerID
from TableC C
group by C.ID
order by C.ID ) PreMax
JOIN TableC C2
on PreMax.ID = C2.ID
AND PreMax.maxPerID = C2.Count
JOIN TableB B
on PreMax.ID = B.ID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Selecting objects that are associated with similar datasets - sql

Related

Full recursive employee-boss relation in SQL Server

show only categories that have products in them

City names as column title

SQL Server matching all rows from Table1 with all rows from Table2

Get the max value of a column from set of rows

Categories

Resources