How to find all possible connections between different rows? - google-bigquery

Our company asked the user to enter various pieces of information including address and license plate in order to get car insurance quotes
We store this information in BigQuery.
Some users have entered more than one license plate (they might own more than one car) and more than one address at different times.
Basically the structure could look like this
row info_1 info_2
----- -------- --------
1 a y
2 a x
3 b y
4 b z
5 c z
6 a z
We want utilize all links between these two pieces of information to have all information from one user in one row
The above table all have connections and should thus be in
Is this possible and what is best practice?
We have experimented with both STRING_AGG and ARRAY_AGG but have not found the solution yet.

Assuming that you have an extra column user_id and making the input table:
row user_id info_1 info_2
----- -------- -------- --------
1 u1 a y
2 u2 a x
3 u1 b y
4 u1 b z
5 u2 c z
6 u2 a z
The following query gives you 2 list of distinct element for info_1 and info_1 respectively:
select
user_id,
array_agg(disctinc info_1) as element_in_info_1,
array_agg(disctinc info_2) as element_in_info_2,
from table
group by 1
Test example with input data
with table as (
select 'u1' as user_id, 'a' as info_1, 'y' as info_2 union all
select 'u2' as user_id, 'a' as info_1, 'x' as info_2 union all
select 'u1' as user_id, 'b' as info_1, 'y' as info_2 union all
select 'u1' as user_id, 'b' as info_1, 'z' as info_2 union all
select 'u2' as user_id, 'c' as info_1, 'z' as info_2 union all
select 'u2' as user_id, 'a' as info_1, 'z' as info_2
)
select
user_id,
array_agg(distinct info_1) as element_in_info_1,
array_agg(distinct info_2) as element_in_info_2
from table
group by 1
Result

Related

Oracle perform a count for each row on a table

I have a doubt regarding a query in oracle SQL.
I have a groups table, that I can query as:
SELECT *
FROM groups g
WHERE g.owner = 123;
This would give something like:
groupId
owner
1
123
2
123
3
123
4
123
I also have another table of administrators, that I can query as:
SELECT *
FROM admins a
ORDER BY groupId;
This would get administrators as:
adminId
userName
groupId
1
myadmin1
1
2
myAdmin2
1
3
myAdmin3
1
4
myAdmin4
2
5
myAdmin5
3
6
myAdmin6
3
That basically means that a group can have multiple administrators. I would like to count the number of administrators for each group. A result such as:
groupId
owner
adminCount
1
123
3
2
123
1
3
123
2
4
123
0
However, I cannot make a count of each administrator in the table and then make a join, as it is a table with a lot of rows.
I would like to perform the count query
SELECT count(*)
FROM admins a
WHERE groupId = 1;
for each row of the groups query, such that I get the desired result without performing a count of each administrator in the table, just the ones that belong to the groups from a specific owner.
Does someone know how can I count it without counting all the rows in the administrators table?
Thanks
The easiest and most readable variant is to use outer apply (or lateral(+)):
select *
from groups g
outer apply (
select count(*) as adminCount
from admins a
where a.groupId=g.groupId
);
Or you can get the same results using subqueries (moreover, in fact Oracle optimizer can decide to transform outer-apply/lateral to this variant, since it has "lateral view decorrelation" transformation):
select g.groupId,g.owner, nvl(a.adminCount,0) as adminCount
from groups g
left join (
select x.groupId, count(*) as adminCount
from admins x
group by x.groupId
) a
on a.groupId=g.groupId;
or even group-by with join:
select g.groupId,g.owner, count(a.groupId) as adminCount
from groups g
left join admins a
on g.groupId=a.groupId
group by g.groupId,g.owner
https://dbfiddle.uk/a-Q_abg8
You could use analytic function COUNT() OVER() ...
Select Distinct
g.GROUP_ID,
g.OWNER,
Count(a.ADMIN_ID) OVER(Partition By g.GROUP_ID) "COUNT_ADMINS"
From groups g
Left Join admins a ON(a.GROUP_ID = g.GROUP_ID)
Where g.OWNER = 123
Order By g.GROUP_ID
... this requires the Distinct keyword which could be performance costly with big datasets. I don't expect that user groups and admins are that big.
WIth your sample data:
WITH
groups (GROUP_ID, OWNER) AS
(
Select 1, 123 From Dual Union ALL
Select 2, 123 From Dual Union ALL
Select 3, 123 From Dual Union ALL
Select 4, 123 From Dual
),
admins (ADMIN_ID, ADMIN_USER_NAMAE, GROUP_ID) AS
(
Select 1, 'myadmin1', 1 From Dual Union All
Select 2, 'myadmin2', 1 From Dual Union All
Select 3, 'myadmin3', 1 From Dual Union All
Select 4, 'myadmin4', 2 From Dual Union All
Select 5, 'myadmin5', 3 From Dual Union All
Select 6, 'myadmin6', 3 From Dual
)
... the result is
GROUP_ID
OWNER
COUNT_ADMINS
1
123
3
2
123
1
3
123
2
4
123
0

Query to find combinations of accounts sql

I am looking for how to form a query, where I seek to find that the ordering accounts are interacting with the same beneficiary accounts 3 or more times. As I describe below.
Examples:
Account A sends account 1,2,and 3.
Account B sends account 1,2 and 3.
Account C sends account 1,2 and 3.
This is the table called TBL_ACCOUNTS
ordering account
beneficiary account
A
1
B
1
C
1
A
2
B
2
C
2
A
3
B
3
C
3
H
1
K
23
Z
329
W
3
I want to find all those accounts that meet this condition, that the ordering accounts are interacting with the same beneficiary accounts 3 or more times. The result you would expect to get is.
ordering account
beneficiary account
A
1
A
2
A
3
B
1
B
2
B
3
C
1
C
2
C
3
I hope you can guide me which way to go, because I'm a bit lost.
You can create a collection data type:
CREATE TYPE int_list IS TABLE OF INT;
and then you can use:
WITH accounts (ordering_account, beneficiary_account, accounts) AS (
SELECT t.*,
CAST(
COLLECT(beneficiary_account) OVER (PARTITION BY ordering_account)
AS int_list
)
FROM TBL_ACCOUNTS t
)
SELECT ordering_account,
beneficiary_account
FROM accounts a
WHERE EXISTS(
SELECT 1
FROM accounts x
WHERE a.ordering_account <> x.ordering_account
AND CARDINALITY(a.accounts MULTISET INTERSECT x.accounts) >= 3
-- Remove the next line if you want to return all accounts and not just the matched accounts
AND a.beneficiary_account = x.beneficiary_account
);
Which, for the sample data:
CREATE TABLE TBL_ACCOUNTS (ordering_account, beneficiary_account) AS
SELECT 'A', 1 FROM DUAL UNION ALL
SELECT 'B', 1 FROM DUAL UNION ALL
SELECT 'C', 1 FROM DUAL UNION ALL
SELECT 'A', 2 FROM DUAL UNION ALL
SELECT 'B', 2 FROM DUAL UNION ALL
SELECT 'C', 2 FROM DUAL UNION ALL
SELECT 'A', 3 FROM DUAL UNION ALL
SELECT 'B', 3 FROM DUAL UNION ALL
SELECT 'C', 3 FROM DUAL UNION ALL
SELECT 'C', 4 FROM DUAL UNION ALL
SELECT 'H', 1 FROM DUAL UNION ALL
SELECT 'K', 23 FROM DUAL UNION ALL
SELECT 'Z', 329 FROM DUAL UNION ALL
SELECT 'W', 3 FROM DUAL;
Outputs:
ORDERING_ACCOUNT
BENEFICIARY_ACCOUNT
A
1
A
3
A
2
B
1
B
3
B
2
C
1
C
2
C
3
If you want to do it without a collection then:
SELECT ordering_account,
beneficiary_account
FROM TBL_ACCOUNTS a
WHERE EXISTS(
SELECT 1
FROM TBL_ACCOUNTS x
WHERE a.ordering_account <> x.ordering_account
AND a.beneficiary_account = x.beneficiary_account
AND EXISTS(
SELECT 1
FROM TBL_ACCOUNTS l
INNER JOIN TBL_ACCOUNTS r
ON (l.beneficiary_account = r.beneficiary_account)
WHERE l.ordering_account = a.ordering_account
AND r.ordering_account = x.ordering_account
HAVING COUNT(*) >= 3
)
);
or:
SELECT ordering_account,
beneficiary_account
FROM TBL_ACCOUNTS a
WHERE EXISTS(
SELECT 1
FROM TBL_ACCOUNTS l
INNER JOIN TBL_ACCOUNTS r
ON ( l.beneficiary_account = r.beneficiary_account
AND l.ordering_account <> r.ordering_account )
WHERE l.ordering_account = a.ordering_account
GROUP BY r.ordering_account
HAVING COUNT(*) >= 3
AND COUNT(
CASE WHEN r.beneficiary_account = a.beneficiary_account THEN 1 END
) > 0
);
db<>fiddle here
Maybe something like this:
select ordering_account, beneficiary
from TBL_ACCOUNTS
group by ordering_account, beneficiary
having count(*) >= 3
order by ordering_account, beneficiary
SELECT T.ordering_account,T.beneficiary_account
FROM TBL_ACCOUNTS T
JOIN
(
SELECT Z.ordering_account
FROM TBL_ACCOUNTS Z
GROUP BY Z.ordering_account
HAVING COUNT(*)>2
)X ON T.ordering_account=X.ordering_account
ORDER BY T.ordering_account,T.beneficiary_account
or
SELECT X.ordering_account,X.beneficiary_account FROM
(
SELECT T.ordering_account,T.beneficiary_account,
COUNT(*)OVER(PARTITION BY T.ordering_account)XCOL
FROM TBL_ACCOUNTS T
)X WHERE X.XCOL=3
ORDER BY X.ordering_account,X.beneficiary_account
Self-join the table on the beneficiary account. Thus you get all ordering account pairs as often as they share the share3 beneficiary accounts. This means you can group by these pairs then and count.
The following query lists all entries of all ordering accounts for which exists another ordering account sharing at least three beneficiary accounts.
with share3 as
(
select a1.ordering_account as acc1, a2.ordering_account as acc2
from tbl_accounts a1
join tbl_accounts a2 on a2.beneficiary_account = a1.beneficiary_account
and a2.ordering_account > a1.ordering_account
group by a1.ordering_account, a2.ordering_account
having count(*) >= 3
)
select *
from tbl_accounts
where exists
(
select null
from share3
where share3.acc1 = tbl_accounts.ordering_account
or share3.acc2 = tbl_accounts.ordering_account
)
order by ordering_account, beneficiary_account;
I'm not sure I follow what you're asking, but it sounds like you simply need to include an ORDER BY clause.
At the end of your query just include
ORDER BY 'ordering account', 'beneficiary account'
The only thing that could change this is if you use different kinds of SQL that don't like single quotes. You may need to use [],"", or ``.

Find rows with a group values from other table

id
role
Group_ID
1
A
1
2
B
1
3
A
2
4
D
2
5
A
3
6
B
3
7
C
3
8
C
4
...
User_id
role
user1
A
user1
B
user2
C
user2
D
user3
A
user3
D
user4
C
user5
A
user5
B
user5
C
user5
D
...
I have 2 tables Table1 and Table2 as shown above.
My requirement is to get the User_ID from the table2 which has all the roles from a group. Additionally, only those groups need to be checked which has at least 2 roles. If a group_ID has only 1 role then it should not be considered
For example, this is how the result will look like from above 2 tables
user1 has both the roles from group 1 (A,B) -> therefore it is in the results.
user3 has both the roles from group 2 (A,D) -> therefore it is in the results.
user5 has all the roles from group 1(A,B), 2(A,D) and 3(A,B,C) -> therefore it is in the results.
User2 has role C and D which is not a group, hence not shown in the result
User4 has role C which is a group (Group_ID = 4), but the group should have at least 2 roles, hence not shown in the result
User_id
Group_ID
user1
1
user3
2
user5
1
user5
2
user5
3
....
Select Table2.USER_ID,Table1.GROUP_ID
from Table2,
Table1
Where Table2.ROLE = Table1.ROLE
group by Table1.GROUP_ID,Table2.USER_ID
With the above query, I am able to get the records with user_id assigned any of the role. However, I want to get the User_ID from the table2 which has all the roles from a group.
Any help is much appreciated. I will also make sure to accept the answer leading me to the solution
A direct translation of the requirement into SQL (Oracle style)
with g as (
select group_id, count(*) role_cnt from table1 group by group_id having count(*) > 1),
u as (
select u.user_id, g.group_id, count(*) usergroup_role_cnt from table1 g, table2 u
where u.role=g.role
group by u.user_id, g.group_id)
select u.user_id, g.group_id from g, u
where u.group_id=g.group_id
and u.usergroup_role_cnt=g.role_cnt;
Edit:
There has been concern about query performance using "group by" when tables are "very large". I don't know how "very large" is defined. Anyway, I did a test on Oracle Clould Free Tier ATP database 19c, using tables with tens of thousand rows. Here is the result.
select count(*) from table1;
COUNT(*)
--------
289327
Elapsed: 00:00:00.007
1 rows selected.
select count(*) from table2;
COUNT(*)
--------
1445328
Elapsed: 00:00:00.024
1 rows selected.
with g as (
select group_id, count(*) role_cnt from table1 group by group_id having count(*) > 1),
u as (
select u.user_id, g.group_id, count(*) usergroup_role_cnt from table1 g, table2 u
where u.role=g.role
group by u.user_id, g.group_id)
select u.user_id, g.group_id from g, u
where u.group_id=g.group_id
and u.usergroup_role_cnt=g.role_cnt;
USER_ID GROUP_ID
------- --------
user99 994
user97 462
user97 199
user87 35
user99 462
user87 179
user99 199
user98 199
user96 199
user87 813
user87 941
user96 994
user97 994
user96 462
user98 462
user98 994
Elapsed: 00:00:04.770
16 rows selected.
select a2.user_id, a1.group_id
from (
select user_id, cast(collect(role) as role_list) as user_roles
from table2
group by user_id
) a2
inner join
(
select group_id, cast(collect(role) as role_list) as group_roles
from table1
group by group_id
) a1
on a1.group_roles submultiset of a2.user_roles
order by user_id, group_id
;
USER_ID GROUP_ID
------- --------
user87 35
user87 179
user87 813
user87 941
user96 199
user96 462
user96 994
user97 199
user97 462
user97 994
user98 199
user98 462
user98 994
user99 199
user99 462
user99 994
Elapsed: 00:01:35.395
16 rows selected.
select q2.user_id, q1.group_id
from (select distinct user_id from table2) q2
cross join
(select distinct group_id from table1) q1
where not exists
(
select role
from table1
where group_id = q1.group_id
and role not in
(
select role
from table2
where user_id = q2.user_id
)
)
order by user_id, group_id
;
(Manually cancelled after 10 min)
Data is generated quite randomly and is somewhat skewed. But in reality, skewed data is not avoidable. Statistics should minimize the impact. (Both tables were analyzed and have no indexes)
Here is a solution that will work in Oracle Database; adapt it for SQL Server (if that is possible; I don't know that dialect).
Test data (others may use this too):
create table table1(id number, role varchar2(10), group_id number);
insert into table1 (id, role, group_id)
select 1, 'A', 1 from dual union all
select 2, 'B', 1 from dual union all
select 3, 'A', 2 from dual union all
select 4, 'D', 2 from dual union all
select 5, 'A', 3 from dual union all
select 6, 'B', 3 from dual union all
select 7, 'C', 3 from dual
;
create table table2 (user_id varchar2(20), role varchar2(10));
insert into table2 (user_id, role)
select 'user1', 'A' from dual union all
select 'user1', 'B' from dual union all
select 'user2', 'C' from dual union all
select 'user2', 'D' from dual union all
select 'user3', 'A' from dual union all
select 'user3', 'D' from dual union all
select 'user4', 'C' from dual union all
select 'user5', 'A' from dual union all
select 'user5', 'B' from dual union all
select 'user5', 'C' from dual union all
select 'user5', 'D' from dual
;
commit;
Create user-defined data type (collection of strings representing roles):
create or replace type role_list as table of varchar2(10);
/
Query and output:
select a2.user_id, a1.group_id
from (
select user_id, cast(collect(role) as role_list) as user_roles
from table2
group by user_id
) a2
inner join
(
select group_id, cast(collect(role) as role_list) as group_roles
from table1
group by group_id
) a1
on a1.group_roles submultiset of a2.user_roles
order by user_id, group_id
;
USER_ID GROUP_ID
------------ ----------
user1 1
user3 2
user5 1
user5 2
user5 3
The strategy is pretty obvious, and should be easy to read directly from the code. Group the roles by group_id in the first table and by user_id in the second. Identify all the pairs (user, group) where all roles for the group are found in the role list of the user - that is exactly what the submultiset comparison operator does.
A more rudimentary query (harder to follow and maintain, and likely slower), but perhaps helpful as it is likely to work with very few changes - if any - in pretty much all SQL dialects, might look like this. Assuming role can't be null in table2 (to make the query slightly simpler):
select q2.user_id, q1.group_id
from (select distinct user_id from table2) q2
cross join
(select distinct group_id from table1) q1
where not exists
(
select role
from table1
where group_id = q1.group_id
and role not in
(
select role
from table2
where user_id = q2.user_id
)
)
order by user_id, group_id
;

SQL find nearest number

Say I have a table like the following (I'm on Oracle 10g btw)
NAME VALUE
------ ------
BOB 1
BOB 2
BOB 4
SUZY 1
SUZY 2
SUZY 3
How can I select all rows where value is closest to, but not greater than, a given number. For example if I want to find all the rows where value is closest to 3 I would get:
NAME VALUE
------ ------
BOB 2
SUZY 3
This seems like it should be simple... but I'm having no luck.
Thanks!
SELECT name, max(value)
FROM tbl
WHERE value <= 3
GROUP BY name
This works (SQLFiddle demo):
SELECT name, max(value)
FROM mytable
WHERE value <= 3
GROUP BY name
Based on hagensofts answer:
SELECT name, max(value)
FROM tbl
WHERE value <= 3 AND ROWNUM <=2
GROUP BY name
With ROWNUM you can limit the output rows, so if you want 2 row, then you can limit the rownum.
WITH v AS (
SELECT 'BOB' NAME, 1 value FROM dual
UNION ALL
SELECT 'BOB', 2 FROM dual
UNION ALL
SELECT 'BOB', 4 FROM dual
UNION ALL
SELECT 'SUZY', 1 FROM dual
UNION ALL
SELECT 'SUZY', 2 FROM dual
UNION ALL
SELECT 'SUZY', 3 FROM dual
)
SELECT *
FROM v
WHERE (name, value) IN (SELECT name, MAX(value)
FROM v
WHERE value <= :num
GROUP BY name)
;

I want to convert rows 2 rows as 2 columns in sql 2000 without using pivot

I want to convert rows 2 rows as 2 columns in sql 2000
without using pivot
eg:
A B C
---- ---- -------
78 68 3
I want the output as
Projects Count
--------- -------
A 78
B 68
C 3
SELECT
pivot.field,
CASE pivot.field
WHEN 'A' THEN A
WHEN 'B' THEN B
WHEN 'C' THEN C
END as value
FROM
my_table
CROSS JOIN
(SELECT 'A' AS field UNION ALL SELECT 'B' UNION ALL SELECT 'C') AS pivot
If I understand you correctly, you indeed want a pivot query:
select 'A' as Projects, A as my_count from mytab
union all
select 'B' as Projects, B as my_count from mytab
union all
select 'C' as Projects, C as my_count from mytab
(I've replaced count with my_count, since COUNT is a reserved word in SQL).