Microsoft SQL - Remove duplicate data from query results

Microsoft SQL - Remove duplicate data from query results - sql

I am new to SQL Server and need help with one of my SQL query.
I have 2 tables (Rating and LikeDislike).
I am trying to get data from both of these tables using a LEFT JOIN like this:
SELECT distinct LD.TopicID, R.ID, R.Topic, R.CountLikes, R.CountDisLikes, LD.UserName, LD.Clikes
FROM Rating As R
LEFT JOIN LikeDislike AS LD on LD.TopicID = R.ID
The above SELECT statement displays results fine but also includes duplicates. I want to remove duplicates when the data is displayed, I tried using DISTINCT and GROUP BY, but with no luck, maybe because I am not using it correctly.
To be more clear and less confusing let me tell you what exactly each table does and what I am trying to achieve.
The Rating table has following columns (ID, Topic, CountLikes, CountDisLikes, Extra, CreatedByUser). It stores topic information and number of likes and dislikes for each topics and the UserID of the user who created that topic.
Rating table with sample data
ID Topic CountLikes CountDisLikes Extra CreatedByUser
1 Do You Like This 211 58 YesId 2
2 Or This 17 25 This also 3
79 Testing at home 1 0 Testing at home 2
80 Testing at home again 1 0 Testing 2
82 testing dislikes 0 1 Testing 2
76 Testing part 3 7 5 Testing 3 4
77 Testing part 4 16 6 Testing 4 5
The LikeDisLike table has following columns (ID, TopicID, UserName, Clikes). TopicID is a FK to the ID column in Rating table.
LikeDislike table with sample data
ID TopicID UserName Clikes
213 77 2 TRUE
214 76 2 FALSE
215 77 5 TRUE
194 77 3 TRUE
195 76 3 FALSE
196 2 3 TRUE
197 1 3 FALSE
Now what I am trying to do is get information from both of this table without duplicate rows. I need to get data all the columns from Rating table + UserName and Clikes columns from LikeDislike table without any duplicate rows
Below are the results with duplicates
TopicID ID Topic CountLikes CountDislikes UserName Clikes
NULL 79 Testing at home 1 0 NULL NULL
NULL 80 Testing at home2 1 0 NULL NULL
NULL 82 testing dislikes 0 1 NULL NULL
1 1 Do You Like This 211 58 3 FALSE
2 2 Or This 17 25 3 TRUE
76 76 Testing part 3 7 5 2 FALSE
76 76 Testing part 3 7 5 3 FALSE
77 77 Testing part 4 16 6 2 TRUE
77 77 Testing part 4 16 6 3 TRUE
77 77 Testing part 4 16 6 5 TRUE

Just like in yesterday's post, I don't think you understand what DISTINCT is suppose to return you. Because you have different values in your LikeDislike table, you are returning the DISTINCT rows.
Let's take TopicId 77 for instance. It returns 3 DISTINCT rows because you have 3 matching records in your LikeDislike table. If your desired output is a single row where the UserName and Clikes are comma delimted, that is possible -- look into using for xml and perhaps stuff (here is a recent answer on the subject). Or if you want to return the first row that matches the TopicId, then that is possible as well -- look into using a subquery with row_number.
Please let us know your desired output and we can help provide a solution.
Good luck.

Related

How to: For each unique id, for each unique version, grab the best score and organize it into a table

Just wanted to preface this by saying while I do have a basic understanding, I am still fairly new to using Bigquery tables and sql statements in general.
I am trying to make a new view out of a query that grabs all of the best test scores for each version by each employee:
select emp_id,version,max(score) as score from `project.dataset.table` where type = 'assessment_test' group by version,emp_id order by emp_id
I'd like to take the results of that query, and make a new table comprised of employee id's with a column for each versions best score for that rows emp_id. I know that I can manually make a table for each version by including a "where version = a", "where version = b", etc.... and then joining all of the tables at the end but that doesn't seem like the most elegant solution plus there is about 20 different versions in total.
Is there a way to programmatically create a column for each unique version or at the very least use my initial query as maybe a subquery and just reference it, something like this:
with a as (
select id,version,max(score) as score
from `project.dataset.table`
where type = 'assessment_test' and version is not null and score is not null and id is not null
group by version,id
order by id),
version_a as (select score from a where version = 'version_a')
version_b as (select score from a where version = 'version_b')
version_c as (select score from a where version = 'version_c')
select
a.id as id,
version_a.score as version_a,
version_b.score as version_b,
version_c.score as version_c
from
a,
version_a,
version_b,
version_c
Example Picture: left table is example data, right table is expected output
Example Data:
id
version
score
1
a
88
1
b
93
1
c
92
2
a
89
2
b
99
2
c
78
3
a
95
3
b
83
3
c
89
4
a
90
4
b
90
4
c
86
5
a
82
5
b
78
5
c
98
1
a
79
1
b
97
1
c
77
2
a
100
2
b
96
2
c
85
3
a
83
3
b
87
3
c
96
4
a
84
4
b
80
4
c
77
5
a
95
5
b
77
Expected Output:
id
a score
b score
c score
1
88
97
92
2
100
99
85
3
95
87
96
4
90
90
86
5
95
78
98
Thanks in advance and feel free to ask any clarifying questions

Use below approach
select * from your_table
pivot (max(score) score for version in ('a', 'b', 'c'))
if applied to sample data in your question - output is
In case if versions is not known in advance - use below
execute immediate (select '''
select * from your_table
pivot (max(score) score for version in (''' || string_agg(distinct "'" || version || "'") || "))"
from your_table
)

Repetation of column when using join between two table

As per using select query in postgres along 8 or 9 table using join found output as
1. A 2 34
2. A 2 56
3. B 3 34
4. B 3 56
whereas i required output in two form either
1. A 2 34
2. A 2 34
3. B 3 56
4. B 3 56
or
A 2 34
B 3 56
what can i do?

Using distinct?
select distinct * from table

SQL - Referencing 3 tables

This is in relation to my survey application for our team. I have 3 tables in my database related to this problem.
I apologize if the database is not fully normalized.
TBL_CHURCH columns:
1 FAM_CHURCH_SACRMNT_NUM (Primary Key) Int(15)
2 RSPONDNT_NUM
3 SURVYR_NUM
4 QN_NUMBER
5 CHRCHFAMLY_NAME
6 CHRCHFAMLY_ISBAPTIZED
Sample row based on order of columns above:
1 2 3 4 5 6
6422164 76826499 5712 362 Serio Tecson Jr. Yes
TBL_INTRVW columns:
1 QN_NUMBR (Primary Key)
2 SURVYR_NUM
3 ZONE_NUM
4 RSPONDNT_NUM
Sample row based on order of columns above:
1 2 3 4
362 5712 11 76826499
TBL_AREA columns:
1 BRGY_ZONE_NUM (Primary Key)
2 BRGY_CODE
Sample row based on order of columns above:
1 2
11 2A
21 2A
31 2A
The field CRCHFAMLY_ISBAPTIZED has only two values. A "Yes" or a "No" and each row has a QN_NUMBR value that is referenced to TBL_INTRVW and each QN_NUMBR on TBL_INTRVW has a unique ZONE_NUM that is referenced to TBL_AREA and that ZONE_NUM has a corresponding BRGY_CODE. Each BRGY_CODE have at least 2 ZONE_NUM values
My problem is that I want to count the number of people baptized in a given area.
The output more or less should look like this:
(The output is collected from the 3 different ZONE_NUM)
Zone Name Num of People Baptized
2A 20
I'm having what trouble what to use in my SQL statements. Should I use a WHERE within an INNER JOIN? And how do I go about in my SELECT statements?

SELECT c.BRGY_ZONE_NUM,count(a.CHRCHFAMLY_ISBAPTIZED) as [Num of People Baptized]
from TBL_CHURCH a
left join
TBL_INTRVW b
on a.QN_NUMBER=b.QN_NUMBER
left join
TBL_AREA c
on b.ZONE_NUM=cRGY_ZONE_NUM
where a.CHRCHFAMLY_ISBAPTIZED='Yes'
group by c.BRGY_ZONE_NUM
I dont see Zone Name column on the three table, so i used BRGY_ZONE_NUM

SQL comparing two tables with common id but the id in table 2 could being in two different columns

Given the following SQL tables:
Administrators:
id Name rating
1 Jeff 48
2 Albert 55
3 Ken 35
4 France 56
5 Samantha 52
6 Jeff 50
Meetings:
id originatorid Assitantid
1 3 5
2 6 3
3 1 2
4 6 4
I would like to generate a table from Ken's point of view (id=3) therefore his id could be possibly present in two different columns in the meetings' table. (The statement IN does not work since I introduce two different field columns).
Thus the ouput would be:
id originatorid Assitantid
1 3 5
2 6 3

If you really just need to see which column Ken's id is in, you only need an OR. The following will produce your example output exactly.
SELECT * FROM Meetings WHERE originatorid = 3 OR Assistantid = 3;
If you need to take the complex route and list names along with meetings, an OR in your join's ON clause should work here:
SELECT
Administrators.name,
Administrators.id,
Meetings.originatorid,
Meetings.Assistantid
FROM Administrators
JOIN Meetings
ON Administrators.id = Meetings.originatorid
OR Administrators.id = Meetings.Assistantid
Where Administrators.name = 'Ken'

Using correctly HAVING with group by and COUNT

I am running this query:
SELECT u.id as id,
COUNT(DISTINCT YEAR(TIMESTAMP), WEEK(TIMESTAMP)) cc,
GROUP_CONCAT(DISTINCT YEAR(TIMESTAMP),'-',WEEK(TIMESTAMP)) a
FROM users u
JOIN checkins c
ON c.userid = u.id
GROUP BY userid
HAVING COUNT(cc) = 3
And this produces the following results:
id cc a
05 3 2010-43,2010-47,2010-45
06 2 2010-44,2010-45
13 3 2010-43,2010-45,2010-48
20 3 2010-45,2010-43,2010-47
21 3 2010-43,2010-47,2010-45
22 2 2010-47,2010-48
25 3 2010-48,2010-43,2010-46
27 2 2010-42,2010-47
30 2 2010-48,2010-45
41 3 2010-44,2010-45,2010-47
44 2 2010-42,2010-44
50 2 2010-44,2010-47
52 2 2010-48,2010-47
57 2 2010-43,2010-44
71 3 2010-43,2010-48,2010-47
72 2 2010-43,2010-44
78 3 2010-47,2010-42,2010-43
79 2 2010-45,2010-46
80 2 2010-46,2010-44
87 1 2010-46
97 1 2010-48
108 3 2010-43,2010-47,2010-45
As you see the cc column has values 2, 3, or even 1.
How that comes, when I've told with HAVING that should be 3?

MySQL does allow aliases in the Having clause. You would need to use:
HAVING cc = 3
not
HAVING COUNT(cc) = 3
in order to filter the results to only include rows which have a cc value of 3 though. I'm actually quite unsure though why HAVING COUNT(cc) = 3 would return any results at all.

As previously said about aliases and having clause, I'd just like to expand on it.
You already have created cc alias which holds counts that you'd like to filter on, so you just need to reference aliased column in HAVING, like:
HAVING cc = 3
What you have tried (COUNT(cc) = 3) would make sense if you were to group by cc column (if that was possible), and then that would filter out all rows with same cc value that didn't appear exactly 3 times.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Microsoft SQL - Remove duplicate data from query results - sql

Related

How to: For each unique id, for each unique version, grab the best score and organize it into a table

Repetation of column when using join between two table

SQL - Referencing 3 tables

SQL comparing two tables with common id but the id in table 2 could being in two different columns

Using correctly HAVING with group by and COUNT

Categories

Resources