SQL Server : Using recursive CTE to resolve group membership - sql

I have one table (users_groups) :
+-----------+------------+---------+
| groupGUID | memberGUID | isGroup |
+-----------+------------+---------+
| 32AB160C | 5B277276 | 0 |
| 32AB160C | 0A023D1D | 0 |
| 5C952B2E | 32AB160C | 1 |
| 4444FTG5 | 5C952B2E | 1 |
+-----------+------------+---------+
isGroup column indicates whether memberGUID is a group or not.
I want to obtain a new table (new_users_groups) with all group memberships resolved :
+-----------+------------+
| groupGUID | memberGUID |
+-----------+------------+
| 32AB160C | 5B277276 |
| 32AB160C | 0A023D1D |
| 5C952B2E | 5B277276 |
| 5C952B2E | 0A023D1D |
| 4444FTG5 | 5B277276 |
| 4444FTG5 | 0A023D1D |
+-----------+------------+
For now, I'm doing everything manually :
Looking for all group's memberGUID
SELECT * FROM users_groups WHERE isGroup = 1;
For all groups returned by previous step, find its members
SELECT * FROM users_groups WHERE groupGUID = '5C952B2E'
If members are not groups, insert them into a new table
INSERT INTO new_users_groups (groupGUID, memberGUID) VALUES ('5C952B2E', '5B277276');
INSERT INTO new_users_groups (groupGUID, memberGUID) VALUES ('5C952B2E', '0A023D1D');
If members are groups, go to step 2.
How can I automate this? Maybe with a Recursive CTE ?

You can do this with a recursive CTE:
with cte as (
select ug.groupGUID, ug.groupGUID as grp, ug.memberGUID
from user_groups ug
where isGroup = 0
union all
select ug.groupGUID, ug.groupGUID as grp, cte.memberGUID
from user_groups ug join
cte
on cte.grp = ug.memberGUID
)
select groupGUID, memberGUID
from cte;
Here is a Rextester.

Try CROSS JOIN
SELECT t1.groupGUID, t2.memberGUID
FROM temp t1, temp t2 WHERE t2.isGroup = 0
GROUP BY t1.groupGUID, t2.memberGUID

Related

How do I select rows with maximum value?

Given this table I want to retrieve for each different url the row with the maximum count. For this table the output should be: 'dell.html' 3, 'lenovo.html' 4, 'toshiba.html' 5
+----------------+-------+
| url | count |
+----------------+-------+
| 'dell.html' | 1 |
| 'dell.html' | 2 |
| 'dell.html' | 3 |
| 'lenovo.html' | 1 |
| 'lenovo.html' | 2 |
| 'lenovo.html' | 3 |
| 'lenovo.html' | 4 |
| 'toshiba.html' | 1 |
| 'toshiba.html' | 2 |
| 'toshiba.html' | 3 |
| 'toshiba.html' | 4 |
| 'toshiba.html' | 5 |
+----------------+-------+
What SQL query do I need to write to do this?
Try to use this query:
select url, max(count) as count
from table_name
group by url;
use aggregate function
select max(count) ,url from table_name group by url
From your comments it seems you need corelated subquery
select t1.* from table_name t1
where t1.count = (select max(count) from table_name t2 where t2.url=t1.url
)
If row_number support on yours sqllite version
then you can write query like below
select * from
(
select *,row_number() over(partition by url order by count desc) rn
from table_name
) a where a.rn=1

Impala query - optimize a query to get the uniques for given key

I'm looking for ways to count unique users that have a specific pkey and also the count of unique users who didn't have that pkey.
Here is a sample table:
userid | pkey | pvalue
------------------------------
U1 | x | vx
U1 | y | vy
U1 | z | vz
U2 | y | vy
U3 | z | vz
U4 | null | null
I get the expected results to get the unique users who has the pkey='y' and those who didn't using this query but turns out to be expensive:
WITH all_rows AS
( SELECT userid,
IF( pkey='y', pval, 'none' ) AS val,
SUM( IF(pkey='y',1,0) ) AS has_key
FROM some_table
GROUP BY userid, val)
SELECT val,
count(distinct(userid)) uniqs
FROM all_rows
WHERE has_key=1
GROUP BY val
UNION ALL
SELECT 'no_key_set' val,
count(distinct(userid)) uniqs
FROM all_rows a1 LEFT ANTI JOIN
all_rows a2 on (a1.userid = a2.userid and a2.has_key=1)
GROUP BY val;
Results:
val | uniqs
--------------------
vy | 2
no_key_set | 2
I'm looking to avoid using any temp tables, so any better ways this can be achieved?
Thanks!
By using EXPLAIN, you can observe that most of the cost is spent on doing excessive GROUP BY aggregations rather than on using subqueries in your original query.
Here is a straightforward implementation
WITH t1 AS (
SELECT pkey, COUNT(*) AS cnt
FROM table
WHERE pkey IS NOT NULL
GROUP BY pkey
), t2 AS (
SELECT COUNT(DISTINCT userid) AS total_cnt
FROM table
)
SELECT
CONCAT('no_', pkey) AS pkey,
(total_cnt - cnt) AS cnt
FROM t1, t2
UNION ALL
SELECT * FROM t1
t1 gets a table of unique user count per pkey
+------+-----+
| pkey | cnt |
+------+-----+
| x | 1 |
| z | 2 |
| y | 2 |
+------+-----+
t2 gets the number of total unique users
+-----------+
| total_cnt |
+-----------+
| 4 |
+-----------+
we can use the result from t2 to get the complement table of t1
+------+-----+
| pkey | cnt |
+------+-----+
| no_x | 3 |
| no_z | 2 |
| no_y | 2 |
+------+-----+
a final union of the two tables gives a result of
+------+-----+
| pkey | cnt |
+------+-----+
| no_x | 3 |
| no_z | 2 |
| no_y | 2 |
| x | 1 |
| z | 2 |
| y | 2 |
+------+-----+

Access Queries comparing two tables

I have two tables in Access, Table A and Table B:
Table MasterLockInsNew:
+----+-------+----------+
| ID | Value | Date |
+----+-------+----------+
| 1 | 123 | 12/02/13 |
| 2 | 1231 | 11/02/13 |
| 4 | 1265 | 16/02/13 |
+----+-------+----------+
Table InitialPolData:
+----+-------+----------+---+
| ID | Value | Date |Type
+----+-------+----------+---+
| 1 | 123 | 12/02/13 | x |
| 2 | 1231 | 11/02/13 | x |
| 3 | 1238 | 10/02/13 | y |
| 4 | 1265 | 16/02/13 | a |
| 7 | 7649 | 18/02/13 | z |
+----+-------+----------+---+
All I want are the rows from table B for IDs not contained in A. My current code looks like this:
SELECT Distinct InitialPolData.*
FROM InitialPolData
WHERE InitialPolData.ID NOT IN (SELECT Distinct InitialPolData.ID
from InitialPolData INNER JOIN
MasterLockInsNew
ON InitialPolData.ID=MasterLockInsNew.ID);
But whenever I run this in Access it crashes!! The tables are fairly large but I don't think this is the reason.
Can anyone help?
Thanks
or try a left outer join:
SELECT b.*
FROM InitialPolData b left outer join
MasterLockInsNew a on
b.id = a.id
where
a.id is null
Simple subquery will do.
select * from InitialPolData
where id not in (
select id from MasterLockInsNew
);
Try using NOT EXISTS:
SELECT Distinct i.*
FROM InitialPolData AS i
WHERE NOT EXISTS (SELECT 1
FROM MasterLockInsNew AS m
WHERE m.ID = i.ID)

Do I need a recursive CTE to update a table that relies on itself?

I need to apologize for the title. I put a lot of thought into it but didn't get too far.
I have a table that looks like this:
+--------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------+--------+
| accountid | pricexxxxxid | accountid | pricelevelid | counts |
+--------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------+--------+
| 36B077D4-E765-4C70-BE18-2ECA871420D3 | 00000000-0000-0000-0000-000000000000 | 36B077D4-E765-4C70-BE18-2ECA871420D3 | F43C47CE-28C6-42E2-8399-92C58ED4BA9D | 1 |
| EBC18CBC-2D2E-44CB-B36A-0ADE9E2BDE9F | 00000000-0000-0000-0000-000000000000 | EBC18CBC-2D2E-44CB-B36A-0ADE9E2BDE9F | 3BEEA9D3-F26B-47E4-88FA-A2AA366980ED | 1 |
| 8DC8D0FC-3138-425A-A922-2F0CAC57E887 | 00000000-0000-0000-0000-000000000000 | 8DC8D0FC-3138-425A-A922-2F0CAC57E887 | F1B8AD5D-B008-4C3F-94A0-AD3F90C777D7 | 1 |
| 8F908A92-1327-4655-BAE4-C890D971A554 | 00000000-0000-0000-0000-000000000000 | 8F908A92-1327-4655-BAE4-C890D971A554 | 2E0EC67E-5F8F-4305-932E-BBF8DF83DBEC | 1 |
| 37221AAC-B885-4002-B7D9-591F8C14D019 | 00000000-0000-0000-0000-000000000000 | 37221AAC-B885-4002-B7D9-591F8C14D019 | F4A2A0CA-FDFF-4C21-AE92-D4583DC18DED | 1 |
| 66F406B4-0D9B-40B8-9A23-119EE74B00B7 | 00000000-0000-0000-0000-000000000000 | 66F406B4-0D9B-40B8-9A23-119EE74B00B7 | 204B8570-CEBA-4C72-9B72-8B9B14AF625E | 2 |
| D0168CE3-479E-439E-967C-4FF0D701291A | 00000000-0000-0000-0000-000000000000 | D0168CE3-479E-439E-967C-4FF0D701291A | 204B8570-CEBA-4C72-9B72-8B9B14AF625E | 2 |
| 57E5F6E5-0A8A-4E54-B793-2F6493DC1EA3 | 00000000-0000-0000-0000-000000000000 | 57E5F6E5-0A8A-4E54-B793-2F6493DC1EA3 | 893F9FD2-43C9-4355-AEFC-08A62BF2B066 | 3 |
+--------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------+--------+
It is sorted by ascending counts.
I would like to update the pricexxxxids that are all 00000000-0000-0000-0000-000000000000 with their corresponding pricelevelid.
For example for accountid = 36B077D4-E765-4C70-BE18-2ECA871420D3 I would like the pricexxxxid to be F43C47CE-28C6-42E2-8399-92C58ED4BA9D.
After that is done, I would like all the records FOLLOWING this one where accountid = 36B077D4-E765-4C70-BE18-2ECA871420D3 to be deleted.
Another words in result I will end up with a distinct list of accountids with pricexxxxid to be assigned with the corresponding value from pricelevelid.
Thank you so much for your guidance.
for your first case do !
update table
set pricexxxxids=pricelevelid.
if i understand your second case correctly :(delete duplicates/select distinct)?
delete from
(
select *,rn=row_number()over(partition by accountid order by accountid) from table
)x
where rn>1
--select distinct * from table
edited
select * from
(
select *,rn=row_number()over(partition by accountid order by accountid) from table
)x
where x.rn=1
updated
SELECT accountid,pricelevelid FROM
(
(SELECT *,
Row_number() OVER ( partition BY accountid ORDER BY counts, pricelevelid ) AS Recency
FROM table
)x
WHERE x.Recency = 1

How to apply a SUM operation without grouping the results in SQL?

I have a table like this one:
+----+---------+----------+
| id | group | value |
+----+---------+----------+
| 1 | GROUP A | 0.641028 |
| 2 | GROUP B | 0.946927 |
| 3 | GROUP A | 0.811552 |
| 4 | GROUP C | 0.216978 |
| 5 | GROUP A | 0.650232 |
+----+---------+----------+
If I perform the following query:
SELECT `id`, SUM(`value`) AS `sum` FROM `test` GROUP BY `group`;
I, obviously, get:
+----+-------------------+
| id | sum |
+----+-------------------+
| 1 | 2.10281205177307 |
| 2 | 0.946927309036255 |
| 4 | 0.216977506875992 |
+----+-------------------+
But I need a table like this one:
+----+-------------------+
| id | sum |
+----+-------------------+
| 1 | 2.10281205177307 |
| 2 | 0.946927309036255 |
| 3 | 2.10281205177307 |
| 4 | 0.216977506875992 |
| 5 | 2.10281205177307 |
+----+-------------------+
Where summed rows are explicitly repeated.
Is there a way to obtain this result without using multiple (nested) queries?
IT would depend on your SQL server, in Postgres/Oracle I'd use Window Functions. In MySQL... not possible afaik.
Perhaps you can fake it like this:
SELECT a.id, SUM(b.value) AS `sum`
FROM test AS a
JOIN test AS b ON a.`group` = b.`group`
GROUP BY a.id, b.`group`;
No there isn't AFAIK. You will have to use a join like
SELECT t.`id`, tsum.sum AS `sum`
FROM `test` as t GROUP BY `group`
JOIN (SELECT `id`, SUM(`value`) AS `sum` FROM `test` GROUP BY `group`) AS tsum
ON tsum.id = t.id