How to perform "Select Count" with complicated "Where" statement to compute co-occurrences? - sql

Let's have an example to declare my concern:
Suppose we have a Table (Tags) which has two columns like this
UserID -------------------------------- Tag
1 -------------------------------------- SQL
1 -------------------------------------- Select
1 -------------------------------------- DB
2 -------------------------------------- SQL
2 -------------------------------------- Programming
2 -------------------------------------- Code
2 -------------------------------------- Software
3 -------------------------------------- Code
4 -------------------------------------- SQL
4 -------------------------------------- Code
I need to count DISTINCT co-occurrences for each tag based on UserID
So, the output should be like this (with Order by Co-occurrences desc):
Tag -------------------------------- Co-occurrences
---------------------------------------------
SQL --------------------------------------- 5
Programming ------------------------------- 3
Code -------------------------------------- 3
Software ---------------------------------- 3
Select ------------------------------------ 2
DB ---------------------------------------- 2
This is just an example..
How can I make a Select statement that can do this?
I came up with one way but for only ONE specific tag:
SELECT count (distinct (Tag)) - 1 as Co_occurrences
FROM Tags
WHERE Tag is NOT NULL and UserID in
( SELECT UserID
FROM Tags
where tag = 'SQL')
Is it possible to change the above statement to make it general for all tags in the table?

SELECT t2.tag, count (distinct (t1.Tag)) - 1 as Co_occurrences
FROM Tags t1 inner join
Tags t2 on t1.UserId = t2.UserId
GROUP BY t2.tag
ORDER BY count (distinct (t1.Tag)) desc

A GROUP BY is what you are looking for:
SELECT
UserID,
Tag,
COUNT(DISTINCT Tag) - 1 AS Co_occurrences
FROM Tags
GROUP BY UserID, Tag
ORDER BY UserID, Tag
Edit: As mentioned in the comments, the above does not answer the question. I improved the answer of #OSA-E a bit, to explain what the -1 is doing after the count.
SELECT
[t1].[Tag],
COUNT(DISTINCT [t2].[Tag]) AS [Co_occurrences]
FROM [Tags] [t1]
INNER JOIN [Tags] [t2] ON [t1].[UserID] = [t2].[UserID]
WHERE [t1].[Tag] <> [t2].[Tag]
GROUP BY [t1].[Tag]
ORDER BY [Co_occurrences] DESC
Here is the Fiddle.

Related

Postgres, groupBy and count for table and relations at the same time

I have a table called 'users' that has the following structure:
id (PK)
campaign_id
createdAt
1
123
2022-07-14T10:30:01.967Z
2
1234
2022-07-14T10:30:01.967Z
3
123
2022-07-14T10:30:01.967Z
4
123
2022-07-14T10:30:01.967Z
At the same time I have a table that tracks clicks per user:
id (PK)
user_id(FK)
createdAt
1
1
2022-07-14T10:30:01.967Z
2
2
2022-07-14T10:30:01.967Z
3
2
2022-07-14T10:30:01.967Z
4
2
2022-07-14T10:30:01.967Z
Both of these table are up to millions of records... I need the most efficient query to group the data per campaign_id.
The result I am looking for would look like this:
campaign_id
total_users
total_clicks
123
3
1
1234
1
3
I unfortunately have no idea how to achieve this while minding performance and most important of it all I need to use WHERE or HAVING to limit the query in a certain time range by createdAt
Note, PostgreSQL is not my forte, nor is SQL. But, I'm learning spending some time on your question. Have a go with INNER JOIN after two seperate SELECT() statements:
SELECT * FROM
(
SELECT campaign_id, COUNT (t1."id(PK)") total_users FROM t1 GROUP BY campaign_id
) tbl1
INNER JOIN
(
SELECT campaign_id, COUNT (t2."user_id(FK)") total_clicks FROM t2 INNER JOIN t1 ON t1."id(PK)" = t2."user_id(FK)" GROUP BY campaign_id
) tbl2
USING(campaign_id)
See an online fiddle. I believe this is now also ready for a WHERE clause in both SELECT statements to filter by "createdAt". I'm pretty sure someone else will come up with something better.
Good luck.
Hope this will help you.
select u.campaign_id,
count(distinct u.id) users_count,
count(c.user_id) clicks_count
from
users u left join clicks c on u.id=c.user_id
group by 1;
See here query output

Sql Query: How to Base on the row name to display

I have the table data as listed on below:
name | score
andy | 1
leon | 2
aaron | 3
I want to list out as below, even no jacky's data, but list his name and score set to 0
aaron 3
andy 2
jacky 0
leon 2
You didn't specify your DBMS, but the following is 100% standard ANSI SQL:
select v.name, coalesce(t.score, 0) as score
from (
values ('andy'),('leon'),('aaron'),('jacky')
) as v(name)
left join your_table t on t.name = v.name;
The values clause builds up a "virtual table" that contains the names you are interested in. Then this is used in a left join so that all names from the virtual table are returned plus the existing scores from your (unnamed table). For non-existing scores, NULL is returned which is turned to 0 using coalesce()
If you only want to specify the missing names, you can use a UNION in the virtual table:
select v.name, coalesce(t.score, 0) as score
from (
select t1.name
from your_table t1
union
select *
from ( values ('jacky')) as x
) as v(name)
left join your_table t on t.name = v.name;
fixed the query, could list out the data, but still missing jacky, only could list out as shown on below, the DBMS. In SQL is SQL2008.
data
name score scoredate
andy 1 2021-08-10 01:23:16
leon 2 2021-08-10 03:25:16
aaron 3 2021-08-10 06:25:16
andy 4 2021-08-10 11:25:16
leon 5 2021-08-10 13:25:16
result set
name | score
aaron | 1
andy | 5
leon | 7
select v.name as Name,
coalesce(sum(t.score),0) as Score
from (
values ('aaron'), ('andy'), ('jacky'), ('leon')
) as v(name)
left join Score t on t.name=v.name
where scoredate>='2021-08-10 00:00:00'
and scoredate<='2021-08-10 23:59:59'
group by v.name
order by v.name asc
Your question lacks a bunch of information, such as where "Jacky"s name comes from. If you have a list of names that you know are not in the table, just use union all:
select name, score
from t
union all
select 'Jacky', 0;

SQL query (Postgres) how to answer that?

I have a table with company id's (non unique) and some attribute (let's call it status id), status can be between 1 to 18 (many to many the row id is what unique)
now I need to get results of companies who only have rows with 1 and 18, if they have any number as well (let's say 3) then this company should not be returned.
The data is stored as row id, some meta data, company id and one status id, the example below is AFTER I ran a group by query.
So as an example if I do group by and string agg, I am getting these values:
Company ID Status
1 1,9,12,18
2 12,13,18
3 1
4 8
5 18
So in this case I need to return only 3 and 5.
You should fix your data model. Here are some reasons:
Storing numbers in strings is BAD.
Storing multiple values in a string is BAD.
SQL has poor string processing capabilities.
Postgres offers many ways to store multiple values -- a junction table, arrays, and JSON come to mind.
For your particular problem, how about an explicit comparison?
where status in ('1', '18', '1,18', '18,1')
You can group by companyid and set 2 conditions in the having clause:
select companyid
from tablename
group by companyid
having
sum((status in (1, 18))::int) > 0
and
sum((status not in (1, 18))::int) = 0
Or with EXCEPT:
select companyid from tablename
except
select companyid from tablename
where status not in (1, 18)
See the demo.
Results:
> | companyid |
> | --------: |
> | 3 |
> | 5 |
You can utilize group by and having. ie:
select *
from myTable
where statusId in (1,18)
and companyId in (select companyId
from myTable
group by companyId
having count(distinct statusId) = 1);
EDIT: If you meant to include those who have 1,18 and 18,1 too, then you could use array_agg instead:
select *
from t t1
inner join
(select companyId, array_agg(statusId) as statuses
from t
group by companyId
) t2 on t1.companyid = t2.companyid
where array[1,18] #> t2.statuses;
EDIT: If you meant to get back only companyIds without the rest of columns and data:
select companyId
from t
group by companyId
having array[1,18] #> array_agg(statusId);
DbFiddle Demo

SQL getting data from 2 tables

I've got a tricky (at least for me it's tricky) question, I want to arrange data by comment count. My first table is called all_comments which has these columns (more but not essential):
comment, target_id
My second table is called our_videos which has these columns (more but not essential):
id, title
I want to get the count of all comments that have target_id same as id on 2nd table and arrange that data by comment count. Here is example of what I want:
TABLE #1:
id target_id
----------------
1 3
2 5
3 5
4 3
5 3
TABLE #2:
id title
-----------
1 "test"
2 "another-test"
3 "testing"
5 "......"
This is basically saying that data, that is in 2nd database and have id of 3 have 3 comments, and data that have id of 5 have 2 comments, and I want to arrange that data by this comment count and get result like this:
RESULT:
id title
----------------
3 "testing"
5 "......."
1 "test"
2 "another-test"
If I missed any important info needed for this question just ask, thanks for help, peace :)
it is very simple query and you definitely have to look at any sql tutorial
naive variant will be:
select videos.id, videos.title, count(*) as comment_count
from videos
left outer join
comments
on (videos.id = comments.target_id)
group by videos.id, videos.title
order by comment_count desc
this version has some performance problems, because you have to group by name, to speed up it we usually do next thing:
select videos.id, videos.title, q.cnt as comment_count
from videos
left outer join
(
select target_id, count(*)
from comments
group by target_id
) as q
on videos.id = q.target_id
order by q.cnt DESC
select videos.id, videos.title, isnull(cnt, 0) as cnt
from videos
left outer join
(select target_id, count(*) as cnt
from comments
group by target_id) as cnts
on videos.id = cnts.target_Id
order by isnull(cnt, 0) desc, videos.title
Some systems will let you write this even though sorting is not strictly supposed to happen on an column not included in the output. I don't necessarily recommend it but I might argue it's the most straightforward.
select id, title from videos
order by (select count(*) from comments where target_id = videos.id) desc, title
If you don't mind having it in the output it's a quick change:
select id, title from videos,
(select count(*) from comments where target_id = videos.id) as comment_count
order by comment_count desc, title
SQL generally has a lot of options.

Unique results from database?

I am selecting all badge numbers from a database where category is equal to 1.
category | badge number
0 | 1
1 | 1
2 | 5
1 | 1
Sometimes the category is duplicated, is there a way to only get unique badge numbers from the database?
So above there is two 1's in category, each with badge number 1. How can I make sure the result only gives '1' rather than '1,1'
Use DISTINCT key word in the SELECT statement.
SELECT DISTINCT badge_number FROM Your_Table WHERE category = 1
Use the distinct keyword in your select.
select distinct badge_number from table_name where category = 1
Have you tried Select Distinct :
SELECT DISTINCT [badge number] from table
where Category=1
http://www.w3schools.com/sql/sql_distinct.asp
Select Distinct Badgenumber from table where Category = 1
SELECT DISTINCT BadgeNumber FROM dbo.TableName
Where Category = 1
Edited:
Ohh, there are so many posts already .... !!