SQL counting columns with same userID - sql

I have two tables. Table 1 contains content_id's that meets a certain criteria. Table 2 contains the content_id, content, and related user_id. They share a content_id field. I would like to produce a list of who has the most entries in Table 1.
Example
Table 1
content_id {1, 2, 3, 4, 5, 6}
Table 2
content_id|user_id { 1|2 , 2|3 , 3|2 , 4|1 , 5|3, 6|2 }
Desired results
user 2 has 3 entries
user 3 has 2 entries
user 1 has 1 entry
I imagine I need to INNER JOIN the two tables by content_id and then somehow use COUNT or similar?

Assuming you want to "filter" by the content of t1, one possibility is:
create table t1 (content_id int);
insert into t1 values (2, 3, 4, 5, 6);
-- note I ommitted 1 so not all values are present
create table t2 (content_id int, user_id int);
insert into t2 (values (1,2), (2,3) , (3,2) , (4,1) , (5,3), (6,2) );
select user_id, count(*) from t2 where exists (select 1 from t1 where content_id=t2.content_id) group by user_id;
-- output:
user_id | count
---------+-------
3 | 2
2 | 2
1 | 1
(3 rows)
-- OR
select user_id, count(*) from (select * from t2 except select 1 as user_id, content_id from t1) AS filtered group by user_id;
-- output:
user_id | count
---------+-------
3 | 2
2 | 2
1 | 1
(3 rows)
But other answers are already doing this too.

Here is the code snippet that will produce your desired result:
Select
tb2.user_id,count(tb2.content_id)
From table1 tb1
Inner join table2 tb2
on tb1.content_id=tb2.content_id
group by tb2.user_id

Are you looking for a simple group by?
select user_id, count(*)
from table2
group by user_id;
EDIT:
If you want to restrict the content ids, I recommend exists:
select t2.user_id, count(*)
from table2 t2
where exists (select 1
from table1 t1
where t1.content_id = t2.content_id
)
group by user_id;

I hope this what you looking for and help you
select user_id, count(`content_id`) from table2 t2
where exists (select t1.content_id from table1 t1
where t1.content_id=t2.content_id)
group by user_id order BY count(`content_id`) DESC;
+---------+---------------------+
| user_id | count(`content_id`) |
+---------+---------------------+
| 2 | 3 |
| 3 | 2 |
| 1 | 1 |
+---------+---------------------+

Related

SQL - How to pick the best available value for each column for each ID from multiple tables?

I have two tables with the same variables referring to attributes of a person.
How can I combine data from two such tables picking the best available value for each column from each table for each field?
Requirements:
For each field, I would like to fill it with a value from either one of the tables, giving a preference to table 1.
Values can be NULL in either table
In the combined table, the value for column 1 could come from table 2 (in case table 1 is missing a value for that person) and the value for column 2 could from table 1 (because both tables had a value, but the value from table 1 is preferred).
In my real example, I have many columns, so an elegant solution with less code duplication would be preferred.
Some users may exist in only one of the tables.
Example:
Table 1:
user_id | age | income
1 | NULL| 58000
2 | 22 | 60000
4 | 19 | 35000
Table 2:
user_id | age | income
1 | 55 | 55000
2 | 19 | NULL
3 | 22 | 33200
Desired output:
user_id | age | income
1 | 55 | 58000
2 | 22 | 60000
3 | 22 | 33200
4 | 19 | 35000
I think that's a full join and priorization logic with colaesce():
select user_id,
coalesce(t1.age, t2.age) as age,
coalesce(t1.income, t2.income) as income
from table1 t1
full join table2 t2 using(user_id)
Use full outer join if user_id in each table is unique.
SELECT
COALESCE(t1.user_id, t2.user_id) AS user_id,
GREATEST(t1.age, t2.age) AS age,
GREATEST(t1.income, t2.income) AS income
FROM t1
FULL OUTER JOIN t2 ON t1.user_id = t2.user_id
try like below using coalesce()
select t1.user_id, coalesce(t1.age,t2.age),
t1.income>t2.income then t1.income else t2.income end as income
table1 t1 join table2 t2 on t1.usesr_id=t2.user_id
You can use below code:
With TableA(Id,age,income) as
( --Select Common Data
select table_1.id,
--Select MAX AGE
case
when table_1.age> table_2.age or table_2.age is null then table_1.age else table_2.age
end,
--Select MAX Income
case
when table_1.income>table_2.income or table_2.income is null then table_1.income else table_2.income
end
from table_1 inner join table_2 on table_2.id=table_1.id
union all
-- Select Specific Data of Table 2
select table_2.id,table_2.age,table_2.income
from table_2
where table_2.id not in (select table_1.id from table_1)
union all
-- Select Specific Data of Table 1
select table_1.id,table_1.age,table_1.income
from table_1
where table_1.id not in (select table_2.id from table_2)
)select * from TableA

How to select from a column with a list of ids in postgresql

I've got mytable1 with row_number integer and list_of_ids int[] column
mytable2 with id integer and company text columns
Example entry for mytable1
1 | {633681,1278392,2320888,2200426}
2 | {2443842,2959599,3703823,3330376,915750,941736}
Example entry for mytable2
633681 | apple
1278392 | charmander
2320888 | apple
2200426 | null
2443842 | batman
I need to feed back values from mytable2 into mytable1. This way the expected output would be
1 | {633681,1278392,2320888,2200426} | 2 apple, 1 charmander, 1 null
2 | {2443842,2959599,3703823,3330376,915750,941736} | 1 batman etc...
You need to unnest the lists of ids, join mytable2 using unnested ids and finally aggregate back the data to get a single row for a row_number.
select
row_number,
list_of_ids,
string_agg(format('%s %s', count, company), ', ' order by count desc, company)
from (
select
row_number,
list_of_ids,
coalesce(company, '<null>') as company,
count(*)
from (
select row_number, list_of_ids, unnest(list_of_ids) as id
from mytable1
) t1
join mytable2 t2 using(id)
group by 1, 2, 3
) s
group by 1, 2
Db<>fiddle.

Select data in tables by conditions between 2 tables which is not linked

I have a large select with a lot of inner joins.
In the select I have an array_agg function for one set of data.
This array contains only a column of a table, but now I want to append at the end of the array data from another table. The data I need to add is not directly linked with the previous table where I need the column.
Query example:
select
origin_table.x,
origin_table.y,
array_agg(table1.data) ...
from
origin_table
inner join ... inner join ... full join table1 on
table.origin_table_id = origin_table.id ...
group by
...
Result array:
ID 1: example_data, {baba, bobo}
ID 2: example_data, {bibi, bubu}
Example of my tables:
table 1:
id | data | origin_table_id
----+---------+----------
1 | baba | 1
2 | bobo | 1
3 | bibi | 2
4 | bubu | 2
table 2:
id | data_bis
---+---------
1 | byby
2 | bebe
origin table:
id | table2_id
---+----------
1 | 2
2 | 1
Expected result with the 3 tables:
ID 1: example_data, {baba, bobo, bebe}
ID 2: example_data, {bibi, bubu, byby}
But got :
ID 1: example_data, {baba, bobo, bebe, byby}
ID 2: example_data, {bibi, bubu, bebe, byby}
What I need is:
How to have all the data of table 1 which respect the condition and append to it the unique table 2 data but not all elements of the table.
Try below query..
create table tab1 (id integer,data character varying,origin_table_id integer);
insert into tab1
select 1,'baba',1
union all
select 2,'bobo',1
union all
select 3,'bibi',2
union all
select 4,'bubu',2
create table tab2 (id integer,data_bis character varying);
insert into tab2
select 1,'byby'
union all
select 2,'bebe'
create table OriginalTable (id integer,table2_id integer);
insert into OriginalTable
select 1,2
union all
select 2,1
select * from OriginalTable
select origin_table_id,data
from tab1
union all
select OriginalTable.table2_id,data_bis
from OriginalTable
join tab2 on tab2.id = OriginalTable.id
order by origin_table_id
Result:
1;"baba"
1;"bobo"
1;"bebe"
2;"bibi"
2;"bubu"
2;"byby"
I can give you some idea and sample code to achieve your requirement.
First you can UNION ALL table 'table 1' and 'table 2' using the relation table 'origin table'.
WITH CTE
AS
(
SELECT AA.id,AA.data_bis,AA.origin_table_id
FROM table_1 AA
UNION ALL
SELECT NULL id, A.data_bis,B.id origin_table_id
FROM table_2 A
INNER JOIN origin_table B
ON A.id = B.table2_id
)
SELECT * FROM CTE
After applying UNION ALL, data will be looks like below-
data_bis origin_table_id
baba 1
bobo 1
bibi 2
bubu 2
byby 2
bebe 1
Now you can apply 'string_agg' on your data as below-
SELECT origin_table_id,string_agg(DISTINCT data_bis,',')
FROM CTE
GROUP BY origin_table_id
And the output will be-
origin_table_id string_agg
1 baba,bebe,bobo
2 bibi,bubu,byby
Now, you can apply further JOINING to this data as per your requirement. You can check DEMO HERE
Please keep in mind hat this is not exact solution of your issue. Just idea...

How to do selection in PostgreSQL with join when more than one row satisfies requirements?

How to do selection to get JSON array in one cell when doing INNER JOIN when there are more than 1 values to join?
ex Tables:
T1:
id | name
1 Tom
2 Dom
T2:
user_id | product
1 Milk
2 Cookies
2 Banana
Naturally I do SELECT * FROM T1 INNER JOIN T2 ON T1.id = T2.user_id.
But then I get:
id | Name | product
1 Tom Milk
2 Dom Cookies
2 Dom Banana
But I want to get:
id | Name | product
1 Tom [{"product":"Milk}]
2 Dom [{"product":"Cookies"}, {"product":"Banana"}]
If I do something with agg functions, then I need to put everything else in GROUP BY, where I have at least 10 arguments. And whole query takes more than 5 minutes.
My T1 is around 4000 rows and T2 around 300 000 rows, each associated with some row in T1.
Is there a better way?
Using LATERAL you can solve it as given example below:
-- The query
SELECT *
FROM table1 t1,
LATERAL ( SELECT jsonb_agg(
jsonb_build_object( 'product', product )
)
FROM table2
WHERE user_id = t1.id
) t2( product );
-- Result
id | name | product
----+------+-------------------------------------------------
1 | Tom | [{"product": "Milk"}]
2 | Dom | [{"product": "Cookies"}, {"product": "Banana"}]
(2 rows)
-- Test data
CREATE TABLE IF NOT EXISTS table1 (
id int,
"name" text
);
INSERT INTO table1
VALUES ( 1, 'Tom' ),
( 2, 'Dom' );
CREATE TABLE IF NOT EXISTS table2 (
user_id int,
product text
);
INSERT INTO table2
VALUES ( 1, 'Milk' ),
( 2, 'Cookies' ),
( 2, 'Banana' );

SQL right join, force return only one value from right hand side

table 1
---
id , name
table2
---
id , activity, datefield
table1 'right join' table2 will return more than 1 results from right table (table2) . how to make it return only "1" result from table2 with the highest date
You write poor information about your problem, But I'll try to make an example to help you.
You have a table "A" and a table "B" and you need to fetch the "top" date of table "B" that is related with table "A"
Example tables:
Table A:
AID| NAME
----|-----
1 | Foo
2 | Bar
Table B:
BID | AID | DateField
----| ----| ----
1 | 1 | 2000-01-01
2 | 1 | 2000-01-02
3 | 2 | 2000-01-01
If you do this sql:
SELECT * FROM A RIGHT JOIN B ON B.ID = A.ID
You get all information of A and B that is related by ID (that in this theoretical case is the field that is common for both tables to link the relation)
A.AID | A.NAME | B.BID | B.AID | B.DateField
------|--------|-------|-------|--------------
1 | Foo | 1 | 1 | 2000-01-01
1 | Foo | 2 | 1 | 2000-01-02
2 | Bar | 3 | 2 | 2000-01-01
But you require only the last date for each element of the Table A (the top date of B)
Next if you need to get only the top DATE you need to group your query by the B.AID and fetch only the top date
SELECT
B.AID, First(A.NAME), MAX(B.DateField)
FROM
A RIGHT JOIN B ON B.ID = A.ID
GROUP BY
B.AID
And The result of this operation is:
B.AID | A.NAME | B.DateField
------|--------|--------------
1 | Foo | 2000-01-02
2 | Bar | 2000-01-01
In this result I removed some fields that are duplicated (like A.AID and B.AID that is the relationship between the two tables) or are not required.
Tip: this also works if you have more tables into the sql. The sql "makes" the query and next applies a grouping for using the B to limit the repetitions of B to the top date.
right join table2 on on table1.id to to select id, max = max(date) from table2
Analytics!
Test data:
create table t1
(id number primary key,
name varchar2(20) not null
);
create table t2
(id number not null,
activity varchar2(20) not null,
datefield date not null
);
insert into t1 values (1, 'foo');
insert into t1 values (2, 'bar');
insert into t1 values (3, 'baz');
insert into t2 values (1, 'foo activity 1', date '2009-01-01');
insert into t2 values (2, 'bar activity 1', date '2009-01-01');
insert into t2 values (2, 'bar activity 2', date '2010-01-01');
Query:
select id, name, activity, datefield
from (select t1.id, t1.name, t2.id as t2_id, t2.activity, t2.datefield,
max(datefield) over (partition by t1.id) as max_datefield
from t1
left join t2
on t1.id = t2.id
)
where ( (t2_id is null) or (datefield = maxdatefield) )
The outer where clause will filter out all but the maximum date from t2 tuples, but leave in the null row where there was no matching row in t2.
Results:
ID NAME ACTIVITY DATEFIELD
---------- -------- ------------------- -------------------
1 foo foo activity 1 2009-01-01 00:00:00
2 bar bar activity 2 2010-01-01 00:00:00
3 baz
To retrieve the Top N records from a query, you can use the following syntax:
SELECT *
FROM (your ordered by datefield desc query with join) alias_name
WHERE rownum <= 1
ORDER BY rownum;
PS: I am not familiar with PL/SQL so maybe I'm wrong
my solution is
select from table1 right join table2 on (table1.id= table2.id and table2.datefiled= (select max(datefield) from table2 where table2.id= table1.id) )