Selecting ONLY Duplicates from a joined tables query

Selecting ONLY Duplicates from a joined tables query - sql

I have the following query that I'm trying to join two tables matching their ID so I can get the duplicated values in "c.code". I've tried a lot of queries but nothing works. I have a 500k rows in my database and with this query I only get 5k back, which is not right. Im positive it's at least 200K. I also tried to use Excel but it's too much for it to handle.
Any ideas?
Thanks in advance, everyone.
SELECT c.code, c.name as SCT_Name, t.name as SYNONYM_Name, count(c.code)
FROM database.Terms as t
join database.dbo.Concepts as c on c.ConceptId = t.ConceptId
where t.TermTypeCode = 'SYNONYM' and t.ConceptTypeCode = 'NAME_Code' and c.retired = '0'
Group by c.code, c.name, t.name
HAVING COUNT(c.code) > = 1
Order by c.code

with data as (
select c.code, c.name as SCT_Name, t.name as SYNONYM_Name
from database.Terms as t inner join database.dbo.Concepts as c
on c.ConceptId = t.ConceptId
where
t.TermTypeCode = 'SYNONYM'
and t.ConceptTypeCode = 'NAME_Code'
and c.retired = '0'
)
select *
--, (select count(*) from data as d2 where d2.code = data.code) as code_count
--, count(*) over (partition by code) as code_count
from data
where code in (select code from data group by code having count(*) > 1)
order by code

If you want just duplicates of c.code, your Group By is wrong (and so is your Having clause). Try this:
SELECT c.code
FROM database.Terms as t
join database.dbo.Concepts as c on c.ConceptId = t.ConceptId
where t.TermTypeCode = 'SYNONYM' and t.ConceptTypeCode = 'NAME_Code' and c.retired = '0'
Group by c.code
HAVING COUNT(c.code) > 1
This will return all rows where you have more than one c.code value.

You need to use INTERSECT instead of JOIN. Basically you perform the select on the first table then intersect with the second table. The result is the duplicate rows.
Only select the id column, though, otherwise the intersect won't work as expected.

Related

SQL split repeating rows caused by UNION

I am writing a query to look through and get two seperate averages based on where conditions.
I tried two select statetments but ended up with lots of duplicates.
Now I have a union which works pretty well, although I have my two fields in alternating rows instead of seperate columns.
Can anyone suggest a fix, sorry for the dodgy code!
SELECT
tblSkillName.skillName,
tblTestScores.skillUID,
AVG(tblTestScores.percentage) AS `cohortPercentage`
FROM
(
(
(
tblTestScores
INNER JOIN tblUsers ON tblUsers.email = tblTestScores.email
)
INNER JOIN tblTestDetails ON tblTestScores.testDetailsID = tblTestDetails.testDetailsID
)
INNER JOIN tblSkillName ON tblSkillName.skillUID = tblTestScores.skillUID
)
WHERE
teacherGroup = '9JS2/Cp'
AND tblTestScores.testDetailsID = 1
GROUP BY
skillName
UNION ALL
SELECT
tblSkillName.skillName,
tblTestScores.skillUID,
AVG(tblTestScores.percentage) AS `groupPercentage`
FROM
(
(
(
tblTestScores
INNER JOIN tblUsers ON tblUsers.email = tblTestScores.email
)
INNER JOIN tblTestDetails ON tblTestScores.testDetailsID = tblTestDetails.testDetailsID
)
INNER JOIN tblSkillName ON tblSkillName.skillUID = tblTestScores.skillUID
)
WHERE
tblTestScores.testDetailsID = 1
GROUP BY
skillName
ORDER BY
skillUID ASC

Get Distinct values from one table in a join query containing column data type like ntext

I have two tables
Review and ProjectsReview. I want to change the order by columns without impacting the result. Initial order by was on createdDate column from review table.
Initial query is as below.
SELECT
*
FROM Review r
WHERE (status IS NULL
OR fstatus = '')
AND (crBy = '100'
OR crByPr = '')
ORDER BY createdDate
The query returns 8 rows.
The user wants to change the order by using program name which is in another table. The query to get the same is as below.
SELECT
r.*
FROM Review r
INNER JOIN ProjectsReview rp
ON rp.rID = r.rID
WHERE (status IS NULL
OR fstatus = '')
AND (crBy = '100'
OR crByPr = '')
ORDER BY prNo, prName
This returns 10 rows. But the required result is only 8 rows and only columns of review table.
I cannot apply group by on all the columns from Review table since there are data types with image and ntext.
Is there a way to achieve this without inserting the data to a temp table and get distinct values.

Try this
with cte
as
(
select
rn = row_number() over(partition by rID order by prNo,prName),
rID,
prNo,
prName
from ProjectsReview
)
SELECT r.*
FROM Review r
inner join cte rp on rp.rID =r.rID
WHERE (status IS NULL OR fstatus = '') AND (crBy = '100' OR crByPr = '')
and cte.rn = 1
ORDER BY prNo,prName

SQL Select Where or Having

Im attempting to get some records from a table based on certain factors.
One of the factors is simply with fields on the same table, the other is when joining to another table, I want to compare the number of records in the joined table to a field on the first table. Below is a sample code.
select * from tDestinations D
left join tLiveCalls LC on LC.DestinationID = D.ID
where D.ConfigurationID = 1486
AND (D.Active = 1 AND D.AlternateFail > GETDATE())
-- Having COUNT(LC.ID) = D.Lines
Now from the code above I cant have the Count function in the where clause, and I cant have a field in in the having clause without it being in a function.
Im probably missing something very simple here. But I cant figure it out.
Any help is appreciated it.
EDIT: I do apologise should have explained the structure of the tables, the Destinations are single records, which the LiveCalls table can hold multiple records based on the Destinations ID (foreign key).
Thank you very much for everyones help. My final code:
select D.ID, D.Description, D.Lines, D.Active, D.AlternateFail, D.ConfigurationID, COUNT(LC.ID) AS LiveCalls from tDestinations D
left join tLiveCalls LC on LC.DestinationID = D.ID
where D.ConfigurationID = #ConfigurationID
AND (D.Active = 1 AND D.AlternateFail > GETDATE())
GROUP BY D.ID, D.Description, D.Lines, D.Active, D.AlternateFail, D.ConfigurationID
HAVING COUNT(LC.ID) <= D.Lines

The simple thing you're missing is the GROUP BY statement.
As JNK mentioned in the comments below, you cannot use an aggregate function (such as COUNT, AVG, SUM, MIN) if you don't have a GROUP BY clause, unless your SELECT statement only references literal values (and no column names).
Your code should probably be something like:
SELECT <someFields>
FROM tDestinations D
LEFT JOIN tLiveCalls LC on LC.DestinationID = D.ID
WHERE D.ConfigurationID = 1486
AND (D.Active = 1 AND D.AlternateFail > GETDATE())
GROUP BY <someFields>
HAVING COUNT(LC.ID) = D.Lines
Note that you have to specify the selected fields explicitely, in both the SELECT and GROUP BY statements (no * allowed).

you can only use having with aggregations. Actually having is the "where clause" for aggregation, BUT you can still have a where on the columns that you are no aggregating.
For example:
SELECT TABLE_TYPE, COUNT(*)
FROM INFORMATION_SCHEMA.TABLES
where TABLE_TYPE='VIEW'
group by TABLE_TYPE
having COUNT(*)>1
In your case you need to use havving count(*)=1
so, I think your query would be something like this:
select YOUR_COLUMN
from tDestinations D
left join tLiveCalls LC on LC.DestinationID = D.ID
where D.ConfigurationID = 1486 AND (D.Active = 1 AND D.AlternateFail > GETDATE())
group by YOUR_COLUMN
Having COUNT(LC.ID) = value

Query not working as expected

Goal
Select distinct ids from blog_news where
active = 1
title is not empty
has at least one picture unless picture is logo, or at least one video
The statement so far
select distinct n.id from blog_news n
left join blog_pics p ON n.id = p.blogid and active = '1' and trim(n.title) != ''
left join blog_vdos v ON n.id = v.blogid
where (p.islogo = '0' and p.id is not null) OR (v.id is not null)
order by `newsdate` desc, `createdate` desc
The issue
selects blog_news ids that have pictures, unless they're logos [correct]
selects blog_news ids that have both videos and pictures [correct]
does not select blog_news ids that have only videos [wrong]

How about this:
SELECT DISTINCT n.id
FROM blog_news n
WHERE n.active = '1'
AND trim(n.title) != ''
AND (EXISTS (SELECT 1
FROM blog_pics p
WHERE p.blogid = n.id
AND p.islogo = 0)
OR EXISTS (SELECT 1
FROM blog_vdos v
WHERE v.blogid = n.id)
)
ORDER BY n.newsdate desc, n.createdate desc
Where you are just interested in the existence (or not) of child rows then it is often clearer and easier to use EXISTS.

I can't see any problem in your query.
I expect active is column in blog_news table, you should call it n.active. If this column is in blog_pics table, then this is the problem.
I would add the condition (n.active, n.title) to WHERE, as it's not related to left join (blog_pics) - but that's just for better readability, the result would be the same.
You can write the query using sub selects as well:
SELECT n.id FROM blog_news n
WHERE n.active = 1 AND TRIM(n.title) != '' AND n.id IN (
SELECT DISTINCT p.blogid FROM blog_pics p WHERE p.islogo = 0 UNION
SELECT DISTINCT v.blogid FROM blog_vdos
);

MS-Access -> SELECT AS + ORDER BY = error

I'm trying to make a query to retrieve the region which got the most sales for sweet products. 'grupo_produto' is the product type, and 'regiao' is the region. So I got this query:
SELECT TOP 1 r.nm_regiao, (SELECT COUNT(*)
FROM Dw_Empresa
WHERE grupo_produto='1' AND
cod_regiao = d.cod_regiao) as total
FROM Dw_Empresa d
INNER JOIN tb_regiao r ON r.cod_regiao = d.cod_regiao ORDER BY total DESC
Then when i run the query, MS-Access asks for the "total" parameter. Why it doesn't consider the newly created 'column' I made in the select clause?
Thanks in advance!

Old Question I know, but it may help someone knowing than while you cant order by aliases, you can order by column index. For example, this will work without error :
SELECT
firstColumn,
IIF(secondColumn = '', thirdColumn, secondColumn) As yourAlias
FROM
yourTable
ORDER BY
2 ASC
The results would then be ordered by the values found in the second column wich is the Alias "yourAlias".

Aliases are only usable in the query output. You can't use them in other parts of the query. Unfortunately, you'll have to copy and paste the entire subquery to make it work.

You can do it like this
select * from(
select a + b as c, * from table)
order by c
Access has some differences compared to Sql Server.

Why it doesn't consider the newly
created 'column' I made in the select
clause?
Because Access (ACE/Jet) is not compliant with the SQL-92 Standard.
Consider this example, which is valid SQL-92:
SELECT a AS x, c - b AS y
FROM MyTable
ORDER
BY x, y;
In fact, x and y the only valid elements in the ORDER BY clause because all others are out of scope (ordinal numbers of columns in the SELECT clause are valid though their use id deprecated).
However, Access chokes on the above syntax. The equivalent Access syntax is this:
SELECT a AS x, c - b AS y
FROM MyTable
ORDER
BY a, c - b;
However, I understand from #Remou's comments that a subquery in the ORDER BY clause is invalid in Access.

Try using a subquery and order the results in an outer query.
SELECT TOP 1 * FROM
(
SELECT
r.nm_regiao,
(SELECT COUNT(*)
FROM Dw_Empresa
WHERE grupo_produto='1' AND cod_regiao = d.cod_regiao) as total
FROM Dw_Empresa d
INNER JOIN tb_regiao r ON r.cod_regiao = d.cod_regiao
) T1
ORDER BY total DESC
(Not tested.)

How about:
SELECT TOP 1 r.nm_regiao
FROM (SELECT Dw_Empresa.cod_regiao,
Count(Dw_Empresa.cod_regiao) AS CountOfcod_regiao
FROM Dw_Empresa
WHERE Dw_Empresa.[grupo_produto]='1'
GROUP BY Dw_Empresa.cod_regiao
ORDER BY Count(Dw_Empresa.cod_regiao) DESC) d
INNER JOIN tb_regiao AS r
ON d.cod_regiao = r.cod_regiao

I suggest using an intermediate query.
SELECT r.nm_regiao, d.grupo_produto, COUNT(*) AS total
FROM Dw_Empresa d INNER JOIN tb_regiao r ON r.cod_regiao = d.cod_regiao
GROUP BY r.nm_regiao, d.grupo_produto;
If you call that GroupTotalsByRegion, you can then do:
SELECT TOP 1 nm_regiao, total FROM GroupTotalsByRegion
WHERE grupo_produto = '1' ORDER BY total DESC
You may think it's extra work to create the intermediate query (and, in a sense, it is), but you will also find that many of your other queries will be based off of GroupTotalsByRegion. You want to avoid repeating that logic in many other queries. By keeping it in one view, you provide a simplified route to answering many other questions.

How about use:
WITH xx AS
(
SELECT TOP 1 r.nm_regiao, (SELECT COUNT(*)
FROM Dw_Empresa
WHERE grupo_produto='1' AND
cod_regiao = d.cod_regiao) as total
FROM Dw_Empresa d
INNER JOIN tb_regiao r ON r.cod_regiao = d.cod_regiao
) SELECT * FROM xx ORDER BY total

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Selecting ONLY Duplicates from a joined tables query - sql

You need to use INTERSECT instead of JOIN. Basically you perform the select on the first table then intersect with the second table. The result is the duplicate rows. Only select the id column, though, otherwise the intersect won't work as expected.

Related

SQL split repeating rows caused by UNION

Get Distinct values from one table in a join query containing column data type like ntext

SQL Select Where or Having

Query not working as expected

MS-Access -> SELECT AS + ORDER BY = error

Categories

Resources