Create edges list with weights from flat list using SQL? - sql

In SQL Server I have an input list (like below), where the groups are the edges of future graph:
group 1, user 1
group 1, user 2
group 1, user 3
group 2, user 1
group 2, user 3
group 3, user 1
group 4, user 4
How to create a list of edges, where the weights will be the number of common participants between groups, using SQL tools?
group 1, group 2, 2
group 1, group 3, 1
group 2, group 3, 1

You an use a self join and aggregation:
select g1.groupid, g2.groupid, count(*)
from graph g1 join
graph g2
on g1.userid = t2.userid and g1.groupid < g2.groupid
group by g1.groupid, g2.groupid;

Related

Count Childs and partition by GrandParent, that resides in an unpivoted hierarchy

I want to show the COUNT on every row how many BuildingID's there are within the MainGroundID's, while having to deal with a parent-child hierarchy that is unpivoted.
Unfortunately there is no logic within the way the GroundID and MainGroundID's are written (although it looks that way in my example, since I made an example dataset).
PMEBuilding
BuildingID, GroundID
1, 100
2, 100
3, 101
4, 201
5, 201
6, 201
7, 202
In reality the above table has 34K rows and 80+ fields.
The GroundID from the table above is N:1 to the table below via GroundID.
Within the PMEGroudn table Some GroundID's refer to a certain MainGroundID, which in turn also refer to Parents higher up in the hierarchy. The 'GrandParents' are those that have a NULL value as GroundID.
PMEGround
GroundID, MainGroundID
1, NULL --GrandParent
10, 1
100, 10
101, 10
2, NULL --GrandParent
20, 2
201, 20
202, 20
In reality the above table has 2K rows of which around 500 'GrandParents'.
I want this to be the end result:
MainGroundID MainGroundBuildingCount
1, 3
2, 7
The following code is what I used so far, but it doesn't work entirely yet:
;WITH UNPIVOT_HIERARCHY AS (
SELECT GROUNDID
,MAINGROUNDID
,PathID = CAST(GROUNDID AS VARCHAR(MAX))
FROM PMEGROUND
WHERE NULLIF(MainGroundID, '') IS NULL
UNION All
SELECT GROUNDID = r.GROUNDID
,MAINGROUNDID = r.MAINGROUNDID
,PathID = p.PathID+CONCAT(',',CAST(r.GROUNDID AS VARCHAR(MAX)))
FROM PMEGROUND r
JOIN UNPIVOT_HIERARCHY p ON r.MAINGROUNDID = p.GROUNDID
)
SELECT
B.Lvl3 AS 'MainGroundID' --This is the GrandParent, which works fine
,COUNT(PMEBUILDING.GROUNDID) OVER (PARTITION BY B.Lvl3) AS 'MainGroundCountBuildings'
FROM PMEGROUND
LEFT JOIN UNPIVOT_HIERARCHY
ON UNPIVOT_HIERARCHY.GROUNDID = PMEGROUND.GROUNDID
LEFT JOIN PMEBUILDING
ON PMEBUILDING.GROUNDID = PMEGROUND.GROUNDID
CROSS Apply (
SELECT Lvl1 = xDim.value('/x[3]','varchar(50)')
,Lvl2 = xDim.value('/x[2]','varchar(50)')
,Lvl3 = xDim.value('/x[1]','varchar(50)')
,Lvl4 = xDim.value('/x[4]','varchar(50)')
FROM ( VALUES (CAST('<x>' + REPLACE(PathID,',','</x><x>')+'</x>' AS xml))) B(xDim)
) B
GROUP BY B.Lvl3, PMEBUILDING.GROUNDID
Without the GROUP BY it gives duplicate MainGroundIDs, but the correct count.
With the GROUP BY it still gives duplicate MainGroundIDs but less, but the count is messed up now.
I want this to be the end result:
MainGroundID MainGroundBuildingCount
1, 3
2, 7
Don't you mean the end result should be?
MainGroundID MainGroundBuildingCount
1, 3
2, 4
Assuming, based on the given data, that there are 3 levels of hierarchy and PMEBuilding.GroundID contains only grandchildren I would use the following to achieve the end result:
select
gp.GroundID, count(distinct b.BuildingID)
from PMEGround gp
join PMEGround p on p.MainGroundID = gp.GRoundID
join PMEGround c on c.MainGroundID = p.GRoundID
join PMEBuilding b on b.GroundID = c.GroundID
where gp.MainGroundID is null
group by gp.GroundID
order by 1

PostgreSQL - Only different values in fields

I have a table called questions containing a single column filled with Exam question IDs. Let's say:
Q1
Q2
Q2
...
Qn
Now I'd like to pick all the combinations of three questions, something like:
1, 2, 3
...
2, 5, 6
4, 7, 1
...
9, 6, 8
And seclect a subset of them made of rows that have globally unique values only. In the previous case:
1, 2, 3
9, 6, 8
Because the other two records contain 2 and 1 which are both contained in the (1, 2, 3) record.
How can this be achieved in SQL? The purpose is to create, let's say, 8 exams made of questions that are all different by each other.
lets consider question table has
Table : questions
Qid
1
2
3
.
n
Now if you want to select only 8 (of three questions) distinct randomized subsets then
SELECT FO,STRING_AGG(QID , ',')
FROM (SELECT Qid , (Qid / 3) :: INT AS FO FROM QUETIONS
ORDER BY RANDOM ()
LIMIT 8*3 )
GROUP BY FO
A trivial approach is like this (the CROSS joins will be slow - e.g. for 100 questions may take half a minute)
SELECT Q1.ID AS Q1, Q2.ID AS Q2, Q3.ID AS Q3
FROM Questions AS Q1, Questions AS Q2, Questions AS Q3
WHERE Q1.ID <> Q2.ID AND Q1.ID <> Q3.ID AND Q2.ID <> Q3.ID
ORDER BY RANDOM() LIMIT 8
However, there are more clever ways to do it - this answer is for MS SQL Server but can be adapted to PostgreSQL
Give your questions numbers: 1, 2, 3, 4, 5 ... n. Then divide by 3 dismissing the rest: 0, 0, 0, 1, 1, ... n/3 to get groups of three. It's up to you how to number the questions, e.g. by ID (least ID is record #1, next ID is record #2, ...) or randomly. Here is an example for randomly:
select *, (row_number() over (order by random()) - 1 ) / 3 as grp
from questions
order by grp;
Keep the result as is or pivot it to get one row per grp with three columns instead, e.g.
select
max(case when rn % 3 = 0 then q end) as q1,
max(case when rn % 3 = 1 then q end) as q2,
max(case when rn % 3 = 2 then q end) as q3
from
(
select *, row_number() over (order by random()) - 1 as rn
from questions
) numbered
group by rn / 3
order by rn / 3;

Is it possible to have some kind of a group query that groups on the number in a column?

I have a very simple table:
CREATE TABLE "Score"(
"Id" varchar primary key not null ,
"EnglishCount" int ,
"RomajiCount" int )
Is there a type of query that I could run that would show me:
how many rows have EnglishCount = 0,
how many rows have EnglishCount = 1,
how many rows have EnglishCount = 2,
how many rows have EnglishCount = 3,
how many rows have EnglishCount = 4,
etc ...
Here's the kind of output I am hoping to get:
Count Instances
0 1
1 2
3 1
4 5
5 2
You can use a group by clause to separate the result per distinct value of EnglishCount and then apply count(*) to each group:
SELECT EnglishCount, COUNT(*)
FROM Score
GROUP BY EnglishCount

SQL tables join [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I am looking for a way to fetch records in a single sql query for the complex joins. I have three tables lets say users, user_projects and appointments. User can be assigned to multiple projects and user can have multiple appointments on different dates. How do I select all the users which is assigned to project1 and project2 and has appointments on date1 and date2? I am using postgres 9.5
Sample Data:
users table:
id, name
1, Steve
2, Bill
3, Emma
user_projects table:
id, user_id, project_id
1, 1, 1
2, 2, 1
3, 3, 1
4, 1, 2
appointments table:
id, user_id, date
1, 1, 2016-10-07
2, 2, 2016-10-07
3, 3, 2016-10-07
4, 1, 2016-11-15
5, 2, 2016-11-15
For this special case lets say I want to find all the users that belongs to project with id 1 and 2 and has appointment fixed on date 2016-10-07 and 2016-11-15. And the expected output is it should only include user with id 1 ie. Steve in this case
Perhaps I'm missing something in the question. Looks like a simple join to me:
select distinct u.id
from users u
join user_projects up1 on u.id=up1.user_id
join user_projects up2 on u.id=up2.user_id
join appointments a1 on u.id=a1.user_id
join appointments a2 on u.id=a2.user_id
where up1.project_id = 1
and up2.project_id = 2
and a1.date = '2016-10-07'
and a2.date = '2016-11-15'
Here is a fiddle: http://sqlfiddle.com/#!15/415fd/1/0
SELECT ID
FROM USERS U
JOIN (
-- Users in two projects
SELECT USER_ID
FROM USER_PROJECT
WHERE PROJECT_ID = 1
INTERSECTION
SELECT USER_ID
FROM USER_PROJECT
WHERE PROJECT_ID = 2
) UP ON U.ID = UP.USER_ID
JOIN (
-- user ids that have appointments on two dates:
SELECT USER_ID
FROM APPOINTMENT
WHERE DATE = '2016-10-07'
INTERSECTION
SELECT USER_ID
FROM APPOINTMENT
WHERE DATE = '2016-11-15'
) A ON U.ID = A.USER_ID
Another way to do it that should have the same performance (maybe this seems better because there are less lines?):
SELECT ID
FROM USERS U
JOIN (
-- Users in two projects
SELECT USER_ID
FROM USER_PROJECT
WHERE PROJECT_ID IN (1,2)
GROUP BY USER_ID
HAVING COUNT(DISTINCT PROJECT_ID) = 2
) UP ON U.ID = UP.USER_ID
JOIN (
-- user ids that have appointments on two dates:
SELECT USER_ID
FROM APPOINTMENT
WHERE DATE IN ('2016-10-07','2016-11-15')
GROUP BY USER_ID
HAVING COUNT(DISTINCT DATE) = 2
) A ON U.ID = A.USER_ID

How can a group by be converted to a self-join

for a table such as:
employeeID | groupCode
1 red111
2 red111
3 blu123
4 blu456
5 red553
6 blu423
7 blu341
how can I count the number of employeeIDs that are in parent groups (such as red or blu, but there are many more groups in the real table) that have a total number of group members greater than 2 (so all those with blu in this particular example) excluding themselves.
To expand: groupCode consists of a parent group (three letters), followed by some numbers for the subgroup.
using a self-join, or at least without using a group by statement.
So far I have:
SELECT T1.employeeID
FROM TABLE T1, TABLE T2
WHERE T1.groupCode <> T2.groupCode
AND SUBSTR(T1.groupCode, 1, 3) = SUBSTR(T2.gorupCode, 1, 3);
but that doesn't do much for me...
Add an index on the first 3 characters of EMPLOYEE.
Then try this one:
SELECT ed.e3
, COUNT(*)
FROM EMPLOYEE e
JOIN
( SELECT DISTINCT
SUBSTR(groupCode, 1, 3) AS e3
FROM EMPLOYEE
) ed
ON e.groupCode LIKE CONCAT(ed.e3, '%')
GROUP BY ed.e3
HAVING COUNT(*) >= 3 --- or whatever is wanted
What about
SELECT substring(empshirtno, 1, 3),
Count(SELECT 1 from myTable as myTable2
WHERE substring(mytable.empshirtno, 1, 3) = substring(mytable2.empshirtno, 1, 3))
FROM MyTable
GROUP BY substring(mytable2.empshirtno, 1, 3)
maybe counting from a subquery is speedier with an index