Select from cross-reference based on inclusion (column values being superset) - sql

Given a cross-reference table t relating table a with b:
| id | a_id | b_id |
--------------------
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 7 |
| 5 | 2 | 3 |
| 6 | 3 | 2 |
| 7 | 3 | 3 |
What would be the conventional way of selecting all a_id whose b_id is a superset of a given set?
For example, for the set (2,3), I would expect the result:
| a_id |
--------
| 1 |
| 3 |
Since a_id 1 and 3 are the only set of b_id that is a superset of (2,3).
The best solution I've found so far (thanks to this answer):
select id
from a
where 2 = (select count(*)
from t
where t.a_id = a.id and t.b_id in (2,3)
);
But I'd prefer to avoid calculating stuff like cardinality before running the query.

You can simply adapt the query as:
select id
from a cross join
(select count(*) as cnt
from t
where . . .
) x
where x.cnt = (select count(*)
from t
where t.a_id = a.id and t.b_id in (2,3)
);

Related

Postgres - Unique values for id column using CTE, Joins alongside GROUP BY

I have a table referrals:
id | user_id_owner | firstname | is_active | user_type | referred_at
----+---------------+-----------+-----------+-----------+-------------
3 | 2 | c | t | agent | 3
5 | 3 | e | f | customer | 5
4 | 1 | d | t | agent | 4
2 | 1 | b | f | agent | 2
1 | 1 | a | t | agent | 1
And another table activations
id | user_id_owner | referral_id | amount_earned | activated_at | app_id
----+---------------+-------------+---------------+--------------+--------
2 | 2 | 3 | 3.0 | 3 | a
4 | 1 | 1 | 6.0 | 5 | b
5 | 4 | 4 | 3.0 | 6 | c
1 | 1 | 2 | 2.0 | 2 | b
3 | 1 | 2 | 5.0 | 4 | b
6 | 1 | 2 | 7.0 | 8 | a
I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Here is the query I ran:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select id, app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id )
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
Here is the result I got:
id | activations_count | amount_earned | referred_at | last_activated_at | id | best_selling_app | best_selling_app_count | best_selling_app_rank
----+-------------------+---------------+-------------+-------------------+----+------------------+------------------------+-----------------------
2 | 3 | 14.0 | 2 | 8 | 2 | b | 2 | 1
1 | 1 | 6.0 | 1 | 5 | 1 | b | 1 | 2
2 | 3 | 14.0 | 2 | 8 | 2 | a | 1 | 2
4 | 1 | 3.0 | 4 | 6 | 4 | c | 1 | 2
The problem with this result is that the table has a duplicate id of 2. I only need unique values for the id column.
I tried a workaround by harnessing distinct that gave desired result but I fear the query results may not be reliable and consistent.
Here is the workaround query:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select
distinct on(id), app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id
order by id, best_selling_app_count desc)
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
I need a recommendation on how best to achieve this.
I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Your question is really complicated with a very complicated SQL query. However, the above is what looks like the actual question. If so, you can use:
select r.*,
a.app_id as most_common_app_id,
a.cnt as most_common_app_id_count
from referrals r left join
(select distinct on (a.referral_id) a.referral_id, a.app_id, count(*) as cnt
from activations a
group by a.referral_id, a.app_id
order by a.referral_id, count(*) desc
) a
on a.referral_id = r.id;
You have not explained the other columns that are in your result set.

Count the number of appearances of char given a ID

I have to perform a query where I can count the number of distinct codes per Id.
|Id | Code
------------
| 1 | C
| 1 | I
| 2 | I
| 2 | C
| 2 | D
| 2 | D
| 3 | C
| 3 | I
| 3 | D
| 4 | I
| 4 | C
| 4 | C
The output should be something like:
|Id | Count | #Code C | #Code I | #Code D
-------------------------------------------
| 1 | 2 | 1 | 1 | 0
| 2 | 3 | 1 | 0 | 2
| 3 | 3 | 1 | 1 | 1
| 4 | 2 | 2 | 1 | 0
Can you give me some advise on this?
This answers the original version of the question.
You are looking for count(distinct):
select id, count(distinct code)
from t
group by id;
If the codes are only to the provided ones, the following query can provide the desired result.
select
pvt.Id,
codes.total As [Count],
COALESCE(C, 0) AS [#Code C],
COALESCE(I, 0) AS [#Code I],
COALESCE(D, 0) AS [#Code D]
from
( select Id, Code, Count(code) cnt
from t
Group by Id, Code) s
PIVOT(MAX(cnt) FOR Code IN ([C], [I], [D])) pvt
join (select Id, count(distinct Code) total from t group by Id) codes on pvt.Id = codes.Id ;
Note: as I can see from sample input data, code 'I' is found in all of Ids. Its count is zero for Id = 3 in the expected output (in the question).
Here is the correct output:
DB Fiddle

how to "deepcopy" rows

My question is similar to this one but more involved. Suppose I have a table A with id idA, and another table B with idB and foreign key idA. I would like to duplicate all entries of A, including corresponding entries in B. For example, if I have the following tables at the start:
A
|---|
|idA|
|---|
| 1 |
| 2 |
| 3 |
|---|
B
|---|---|
|idB|idA|
|---|---|
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
|---|---|
Then the result should be:
A
|---|
|idA|
|---|
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
|---|
B
|---|---|
|idB|idA|
|---|---|
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 4 |
| 5 | 4 |
| 6 | 5 |
|---|---|
This is quite tricky. You need to insert the ids into the a -- but then be able to match them back to the existing ids to insert the right values into b.
A generic solution looks like this:
with i as (
insert into a
select . . . -- the other columns you want
from a
order by idA
returning *
),
a_mapping (
select a.idA, i.idA as new_idA
from (select a.*, row_number() over (order by idA) as seqnum
from a
) a join
(select i.*, row_number() over (order by idA) as seqnum
from i
) i
on a.seqnum = i.seqnum
)
insert into b (idA) (
select am.new_idA
from b join
a_mapping am
on b.idA = am.idA;
Note: If you have another unique column or columns in the row, then the mapping is a little easier to generate. Of course, if you are copying all the columns, then nothing else is unique, so you do need the row_number().
Of course, for your very simple example, you don't need a mapping table. You can just use:
with i as (
insert into a
select . . . -- the other columns you want
from a
order by idA
returning *
)
insert into b (idA) (
select i.idA
from i
I have an approach which may be equivalent to what Gordon Linoff suggests, I would be grateful if you could point out any flaws!
Let's set up the tables:
CREATE TABLE A(
idA SERIAL PRIMARY KEY,
txt varchar);
INSERT INTO A(txt)
VALUES ('A1'), ('A2'),('A3');
CREATE TABLE B(
idB SERIAL PRIMARY KEY,
idA int REFERENCES A(idA),
txt varchar);
INSERT INTO B(idA, txt)
VALUES (1, 'A1.B1'), (1, 'A1.B2'), (2, 'A2.B1');
so the initial data looks as follows:
SELECT * FROM (A LEFT JOIN B ON A.idA=B.idA) ORDER BY A.idA, B.idB;
ida | txt | idb | ida | txt
-----+-----+-----+-----+-------
1 | A1 | 1 | 1 | A1.B1
1 | A1 | 2 | 1 | A1.B2
2 | A2 | 3 | 2 | A2.B1
3 | A3 | | |
(4 rows)
Now, we can use the NEXTVAL function to generate the mappings directly:
CREATE TEMP TABLE tmp_A_new AS (
SELECT *, NEXTVAL('A_idA_seq') as newidA
FROM A ORDER BY idA -- order probably not needed
);
INSERT INTO A(idA, txt) (SELECT newidA, txt FROM tmp_A_new);
CREATE TEMP TABLE tmp_B_new AS (
SELECT B.idB, newidA, B.txt, NEXTVAL('B_idB_seq') as newidB
FROM B, tmp_A_new WHERE B.idA=tmp_A_new.idA ORDER BY idB
);
INSERT INTO B(idB, idA, txt) (SELECT newidB, newidA, txt FROM tmp_B_new);
The results look correct:
SELECT * FROM (A LEFT JOIN B ON A.idA=B.idA) ORDER BY A.idA, B.idB;
ida | txt | idb | ida | txt
-----+-----+-----+-----+-------
1 | A1 | 1 | 1 | A1.B1
1 | A1 | 2 | 1 | A1.B2
2 | A2 | 3 | 2 | A2.B1
3 | A3 | | |
4 | A1 | 4 | 4 | A1.B1
4 | A1 | 5 | 4 | A1.B2
5 | A2 | 6 | 5 | A2.B1
6 | A3 | | |
(8 rows)
Note that this could be continued further down to C, D, etc.
I would be glad for any comments :)

Select from cross-reference based on inclusion (column values being subset)

Suppose I have a cross-reference table t with the following data:
| id | a_id | b_id |
--------------------
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 7 |
| 5 | 2 | 3 |
| 6 | 3 | 2 |
| 7 | 3 | 3 |
What would be the conventional way of selecting all a_id whose b_id is a subset of a given set?
For example, for some set (1,2,3,4,5), I would expect the result:
| a_id |
--------
| 1 |
| 3 |
Since a_id 1 and 3 are the only set of b_id that is a subset of (1,2,3,4,5).
Hmmm . . . One way uses aggregation:
select a_id
from t
group by a_id
having sum(case when b_id not in (1, 2, 3, 4, 5) then 1 else 0 end) = 0;
However, assuming you have an a table, then I prefer this method:
select a_id
from a
where not exists (select 1
from t
where t.a_id = a.a_id and t.b_id not in (1, 2, 3, 4, 5)
);
This saves the expense of aggregation and the lookup can take advantage of an appropriate index (on t(a_id, b_id)) so this should have better performance.

How to generate an order index value (as in the order of a list) in a SELECT statement

Suppose I have these two tables :
TABLEA TABLEB
----------- -----------
ID | NAME ID | TABLEA_ID | NAME
1 | ... 1 | 1 | ...
2 | 2 | 2 | ...
3 | 3 | 2 |
4 | 4 | 2 |
5 | 3 |
6 | 3 |
7 | 4 |
8 | 2 |
I want an SQL SELECT statement that can generate such result when TABLEA.ID = TABLEB.TABLEA_ID, you can note here I don't care about grouping or ordering, I just want to generate a incremented value for each line of the same TABLEB.TABLEA_ID.
ID | TABLEA_ID | ORDER_INDEX | NAME
1 | 1 | 0 | ...
2 | 2 | 0 | ...
3 | 2 | 1 |
4 | 2 | 2 |
5 | 3 | 0 |
6 | 3 | 1 |
7 | 4 | 0 |
8 | 2 | 3 |
I tried without success to use rownum in several combination of sub-selects to generate the ORDER_INDEX depending on the value in TABLEA_ID.
Do you have hint to do that in plain SQL, is it even possible with plain SQL.
Is it possible via a PL/SQL ? And how if possible ?
Thank you very much in advance.
I believe that this is what you want:
SELECT B.ID, B.TABLEA_ID,
ROW_NUMBER() OVER(PARTITION BY B.TABLEA_ID ORDER BY B.ID) - 1 ORDER_INDEX,
B.NAME -- OR A.NAME, its not clear on your question
FROM TABLEB B
LEFT JOIN TABLEA A
ON B.TABLEA_ID = A.ID
Something like this:
SELECT
TableB.ID,
TableB.TableA_ID,
ROW_NUMBER() OVER(PARTITION BY TableB.TableA_ID ORDER BY TableB.TableA_ID) AS ORDER_INDEX,
TableB.Name
FROM
TableA
JOIN TableB
ON TableA.ID=TableB.TableA_ID
ORDER BY TableB.ID
How about
ROW_NUMBER() OVER (PARTITION BY TABLEA_ID ORDER BY ID ASC) AS ORDER_INDEX
as the definition of ORDER_INDEX