Zip/repeat join? - sql

Let's say I have a simple table of documents with a type column:
Documents
Id Type
1 A
2 A
3 B
4 C
5 C
6 A
7 A
8 A
9 B
10 C
Users have permissions to access different types of documents:
Permissions
Type User
A John
A Jane
B Sarah
C Peter
C John
C Mark
And I need to distribute those documents among the users as tasks:
Tasks
Id T DocId UserId
1 A 1 John
2 A 2 Jane
3 B 3 Sarah
4 C 4 Peter
5 C 5 John
6 A 6 John
7 A 7 Jane
8 A 8 John
9 B 9 Sarah
10 C 10 Mark
How do I do that? How do I get the Tasks?

You can enumerate the rows and then use modulo arithmetic for the matching:
with d as (
select d.*,
row_number() over (partition by type order by newid()) as seqnum,
count(*) over (partition by type) as cnt
from documents d
),
u as (
select u.*,
row_number() over (partition by type order by newid()) as seqnum,
count(*) over (partition by type) as cnt
from users u
)
select d.*
from d join
u
on d.type = u.type and
u.seqnum = (d.seqnum % u.cnt) + 1

Great question.
This solution returns all possible distributions, ordered by priority which is determined by information such as number of user involved, minimum documents per user, standard deviation of tasks per user etc.
I'm not counting on document.id to be a sequence of numbers starting with 1, therfore the use of dense_rank.
The core of the solutions is the iterative CTE which generates the record sets of all possible distributions.
Execution time on my laptop is around 20 seconds, (the iterative part takes 5 seconds)
with doc_user as
(
select d."id" as docid
,p."user" as userid
,dense_rank () over (order by d."id") as doc_seq
from documents d
left join permissions p
on p.type = d.type
)
,it_cte as
(
select docid
,userid
,doc_seq
,cast (coalesce(userid,'') as varchar(max)) as path
,'A' as cte_part
from doc_user
where doc_seq = 1
union all
select r.docid
,r.userid
,du.doc_seq
,r.path + ',' + coalesce (du.userid,'')
,'B'
from it_cte as r
cross join doc_user as du
where du.doc_seq = r.doc_seq + 1
union all
select du.docid
,du.userid
,du.doc_seq
,r.path + ',' + coalesce (du.userid,'')
,'C'
from it_cte as r
cross join doc_user as du
where du.doc_seq = r.doc_seq + 1
and r.cte_part in ('A','C')
)
,result_sets as
(
select dense_rank () over (order by path) as set_id
,docid
,userid
from it_cte
where doc_seq = (select count(*) from documents)
)
,result_sets_stat as
(
select set_id
,count (distinct userid) as users_involved
from result_sets
group by set_id
)
,result_sets_users_stat as
(
select set_id
,min (doc) min_doc_per_user
,stdevp (doc) stdevp_doc_per_user
from (select set_id
,userid
,count (*) as doc
from result_sets
group by set_id
,userid
) t
group by set_id
)
select s.set_priority
,r.docid
,r.userid
,s.users_involved
,s.min_doc_per_user
,s.stdevp_doc_per_user
from (select s.set_id
,s.users_involved
,u.min_doc_per_user
,u.stdevp_doc_per_user
,row_number () over
(
order by s.users_involved desc
,u.min_doc_per_user desc
,u.stdevp_doc_per_user
,s.set_id
) as set_priority
from result_sets_stat as s
join result_sets_users_stat as u
on u.set_id =
s.set_id
) s
join result_sets as r
on r.set_id =
s.set_id
order by s.set_priority
,r.docid
option (merge join)
;

Related

How to select varying count of items per colum value?

I work with Postgresql.
I have a sql code
SELECT lp."RegionId", COUNT(w."Id") FROM public.workplace w
GROUP BY lp."RegionId"
that returns to me
RegionId | Count
1 | 3
2 | 12
3 | 5
I have table 'person'. Each person have RegionId.
So i for region 1 i want to select first 3 persons, for region 2 select first 12 persons, for region 3 select first 5 persons.
So how can i use it as subquery to table 'person'?
WITH (SELECT lp."RegionId", COUNT(w."Id") FROM public.workplace w
GROUP BY lp."RegionId") AS pc
SELECT * FROM public.person p
???????
limit pc."Count"
???
Something like:
SELECT p.*
FROM (SELECT *, row_number() OVER (PARTITION BY RegionId ORDER BY PersonId) AS rn
FROM person) AS p
JOIN (SELECT RegionId, count(*) AS cnt
FROM workplace
GROUP BY RegionId) AS r ON p.RegionId = r.RegionId
WHERE p.rn <= r.cnt
ORDER BY p.RegionId, p.PersonId;

Materializing the path of Nested Set hierarchy in T-SQL

I have a table containing details on my company's chart of accounts - this data is essentially stored in nested sets (on SQL Server 2014), with each record having a left and right anchor - there are no Parent IDs.
Sample Data:
ID LeftAnchor RightAnchor Name
1 0 25 Root
2 1 16 Group 1
3 2 9 Group 1.1
4 3 4 Account 1
5 5 6 Account 2
6 7 8 Account 3
7 10 15 Group 1.2
8 11 12 Account 4
9 13 14 Account 5
10 17 24 Group 2
11 18 23 Group 2.1
12 19 20 Account 1
13 21 22 Account 1
I need to materialize the path for each record, so that my output looks like this:
ID LeftAnchor RightAnchor Name MaterializedPath
1 0 25 Root Root
2 1 16 Group 1 Root > Group 1
3 2 9 Group 1.1 Root > Group 1 > Group 1.1
4 3 4 Account 1 Root > Group 1 > Group 1.1 > Account 1
5 5 6 Account 2 Root > Group 1 > Group 1.1 > Account 2
6 7 8 Account 3 Root > Group 1 > Group 1.1 > Account 3
7 10 15 Group 1.2 Root > Group 1 > Group 1.2
8 11 12 Account 4 Root > Group 1 > Group 1.2 > Acount 4
9 13 14 Account 5 Root > Group 1 > Group 1.2 > Account 5
10 17 24 Group 2 Root > Group 2
11 18 23 Group 2.1 Root > Group 2 > Group 2.1
12 19 20 Account 1 Root > Group 2 > Group 2.1 > Account 10
13 21 22 Account 1 Root > Group 2 > Group 2.1 > Account 11
Whilst I've managed to achieve this using CTEs, the query is deathly slow. It takes just shy of two minutes to run with around 1200 records in the output.
Here's a simplified version of my code:
;with accounts as
(
-- Chart of Accounts
select AccountId, LeftAnchor, RightAnchor, Name
from ChartOfAccounts
-- dirty great where clause snipped
)
, parents as
(
-- Work out the Parent Nodes
select c.AccountId, p.AccountId [ParentId]
from accounts c
left join accounts p on (p.LeftAnchor = (
select max(i.LeftAnchor)
from accounts i
where i.LeftAnchor<c.LeftAnchor
and i.RightAnchor>c.RightAnchor
))
)
, path as
(
-- Calculate the Account path for each node
-- Root Node
select c.AccountId, c.LeftAnchor, c.RightAnchor, 0 [Level], convert(varchar(max), c.name) [MaterializedPath]
from accounts c
where c.LeftAnchor = (select min(LeftAnchor) from chart)
union all
-- Children
select n.AccountId, n.LeftAnchor, n.RightAnchor, p.level+1, p.path + ' > ' + n.name
from accounts n
inner join parents x on (n.AccountId=x.AccountId)
inner join path p on (x.ParentId=p.AccountId)
)
select * from path order by LeftAnchor
Ideally this query should only take a couple of seconds (max) to run. I can't make any changes to the database itself (read-only connection), so can anyone come up with a better way to write this query?
After your comments, I realized no need for the CTE... you already have the range keys.
Example
Select A.*
,Path = Replace(Path,'>','>')
From YourTable A
Cross Apply (
Select Path = Stuff((Select ' > ' +Name
From (
Select LeftAnchor,Name
From YourTable
Where A.LeftAnchor between LeftAnchor and RightAnchor
) B1
Order By LeftAnchor
For XML Path (''))
,1,6,'')
) B
Order By LeftAnchor
Returns
First you can try to rearrange your preparing CTEs (accounts and parents) to have it that each CTE contains all data from previous, so you only use the last one in path CTE - no need for multiple joins:
;with accounts as
(
-- Chart of Accounts
select AccountId, LeftAnchor, RightAnchor, Name
from ChartOfAccounts
-- dirty great where clause snipped
)
, parents as
(
-- Work out the Parent Nodes
select c.*, p.AccountId [ParentId]
from accounts c
left join accounts p on (p.LeftAnchor = (
select max(i.LeftAnchor)
from accounts i
where i.LeftAnchor<c.LeftAnchor
and i.RightAnchor>c.RightAnchor
))
)
, path as
(
-- Calculate the Account path for each node
-- Root Node
select c.AccountId, c.LeftAnchor, c.RightAnchor, 0 [Level], convert(varchar(max), c.name) [MaterializedPath]
from parents c
where c.ParentID IS NULL
union all
-- Children
select n.AccountId, n.LeftAnchor, n.RightAnchor, p.level+1, p.[MaterializedPath] + ' > ' + n.name
from parents n
inner join path p on (n.ParentId=p.AccountId)
)
select * from path order by LeftAnchor
This should give some improvement (50% in my test), but to have it really better, you can split first half of preparing data into #temp table, put clustered index on ParentID column in #temp table and use it in second part
if (Object_ID('tempdb..#tmp') IS NOT NULL) DROP TABLE #tmp;
with accounts as
(
-- Chart of Accounts
select AccountId, LeftAnchor, RightAnchor, Name
from ChartOfAccounts
-- dirty great where clause snipped
)
, parents as
(
-- Work out the Parent Nodes
select c.*, p.AccountId [ParentId]
from accounts c
left join accounts p on (p.LeftAnchor = (
select max(i.LeftAnchor)
from accounts i
where i.LeftAnchor<c.LeftAnchor
and i.RightAnchor>c.RightAnchor
))
)
select * into #tmp
from parents;
CREATE CLUSTERED INDEX IX_tmp1 ON #tmp (ParentID);
With path as
(
-- Calculate the Account path for each node
-- Root Node
select c.AccountId, c.LeftAnchor, c.RightAnchor, 0 [Level], convert(varchar(max), c.name) [MaterializedPath]
from #tmp c
where c.ParentID IS NULL
union all
-- Children
select n.AccountId, n.LeftAnchor, n.RightAnchor, p.level+1, p.[MaterializedPath] + ' > ' + n.name
from #tmp n
inner join path p on (n.ParentId=p.AccountId)
)
select * from path order by LeftAnchor
Hard to tell on small sample data, but it should be an improvement. Please tell if you try it.
Seems odd to me that you don't have a Parent ID, but with the aid of an initial OUTER APPLY, we can generate a Parent ID and then run a standard recursive CTE.
Example
Declare #Top int = null --<< Sets top of Hier Try 12 (Just for Fun)
;with cte0 as (
Select A.*
,B.*
From YourTable A
Outer Apply (
Select Top 1 Pt=ID
From YourTable
Where A.LeftAnchor between LeftAnchor and RightAnchor and LeftAnchor<A.LeftAnchor
Order By LeftAnchor Desc
) B
)
,cteP as (
Select ID
,Pt
,LeftAnchor
,RightAnchor
,Lvl=1
,Name
,Path = cast(Name as varchar(max))
From cte0
Where IsNull(#Top,-1) = case when #Top is null then isnull(Pt ,-1) else ID end
Union All
Select r.ID
,r.Pt
,r.LeftAnchor
,r.RightAnchor
,p.Lvl+1
,r.Name
,cast(p.path + ' > '+r.Name as varchar(max))
From cte0 r
Join cteP p on r.Pt = p.ID
)
Select *
From cteP
Order By LeftAnchor
Returns

SQL select top if columns are same

If I have a table like this:
Id StateId Name
1 1 a
2 2 b
3 1 c
4 1 d
5 3 e
6 2 f
I want to select like below:
Id StateId Name
4 1 d
5 3 e
6 2 f
For example, Ids 1,3,4 have stateid 1. So select row with max Id, i.e, 4.
; WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY STATEID ORDER BY ID DESC) AS RN
)SELECT ID, STATEID, NAME FROM CTE WHERE RN = 1
You can use ROW_NUMBER() + TOP 1 WITH TIES:
SELECT TOP 1 WITH TIES
Id,
StateId,
[Name]
FROM YourTable
ORDER BY ROW_NUMBER() OVER (PARTITION BY StateId ORDER BY Id DESC)
Output:
Id StateId Name
4 1 d
6 2 f
5 3 e
Disclaimer: I gave this answer before the OP had specified an actual database, and hence avoided using window functions. For a possibly more appropriate answer, see the reply by #Tanjim above.
Here is an option using joins which should work across most RDBMS.
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT StateId, MAX(Id) AS Id
FROM yourTable
GROUP BY StateId
) t2
ON t1.StateId = t2.StateId AND
t1.Id = t2.Id
The following using a subquery, to find the maximum Id for each of the states. The WHERE clause then only includes rows with ids from that subquery.
SELECT
[Id], [StateID], [Name]
FROM
TABLENAME S1
WHERE
Id IN (SELECT MAX(Id) FROM TABLENAME S2 WHERE S2.StateID = S1.StateID)

SQL Random N rows for each distinct value in column

I have the following table:
Name Field
A 1
B 1
C 1
D 1
E 1
F 1
G 1
H 2
I 2
J 2
K 3
L 3
M 3
N 3
O 3
P 3
Q 3
R 3
S 3
T 3
I need a SQL query which will generate me a set with 5 random rows for each distinct value on column Field.
For example, results expected:
Name Field
A 1
B 1
D 1
E 1
G 1
J 2
I 2
H 2
M 3
Q 3
T 3
S 3
P 3
Is there an easy way to do this? Or should i split that table into more tables and generate random for each table then union them?
You can do this with a CTE using a ROW_NUMBER() whilst PARTITIONing on the Field:
;With Cte As
(
Select Name, Field,
Row_Number() Over (Partition By Field Order By NewId()) RN
From YourTable
)
Select Name, Field
From Cte
Where RN <= 5
SQL Fiddle
You can readily do this with row_number():
select name, field
from (select t.*,
row_number() over (partition by field order by newid()) as seqnum
from t
) t
where seqnum <= 5;
An enhancement to Gordon Linoff's code, This code really helped me if you need criteria in your query.
select *
from (select t.*,
row_number() over (partition by region order by newid()) as seqnum
from MyTable t
WHERE t.program = 'ACME'
) t
where seqnum <= 1500;

Make Two Queries into 1 result set with 2 columns

Say I have a table that looks like this:
Person Table
ID AccountID Name
1 6 Billy
2 6 Joe
3 6 Tom
4 8 Jamie
5 8 Jake
6 8 Sam
I have two queries that I know work by themselves:
Select Name Group1 from person where accountid = 6
Select Name Group2 from person where accountid = 8
But I want a single Result Set to look like this:
Group1 Group2
Billy Jamie
Joe Jake
Tom Same
You can use row_number() to assign a distinct value for each row, ans then use a FULL OUTER JOIN to join the two subqueries:
select t1.group1,
t2.group2
from
(
select name group1,
row_number() over(order by id) rn
from yourtable
where accountid = 6
) t1
full outer join
(
select name group2,
row_number() over(order by id) rn
from yourtable
where accountid = 8
) t2
on t1.rn = t2.rn;
See SQL Fiddle with Demo
I agree you should do this client side. But it can be done in T/SQL:
select G1.Name as Group1
, G2.Name as Group2
from (
select row_number() over (order by ID) as rn
, *
from Group
where AccountID = 6
) as G1
full outer join
(
select row_number() over (order by ID) as rn
, *
from Group
where AccountID = 8
) as G2
on G1.rn = G2.rn
order by
coalesce(G1.rn, G2.rn)