Returning rows with the same ID but exclude some on second column - sql

I've seen similar questions about but not quite hitting the nail on the head for what I need. Lets say I have a table.
+-----+-------+
| ID | Value |
+-----+-------+
| 123 | 1 |
| 123 | 2 |
| 123 | 3 |
| 456 | 1 |
| 456 | 2 |
| 456 | 4 |
| 789 | 1 |
| 789 | 2 |
+-----+-------+
I want to return DISTINCT IDs but exclude those that have a certain value. For example lets say I don't want any IDs that have a 3 as a value. My results should look like.
+-----+
| ID |
+-----+
| 456 |
| 789 |
+-----+
I hope this makes sense. If more information is needed please ask and if this has been answered before please point me in the right direction. Thanks.

You can use group by and having:
select id
from t
group by id
having sum(case when value = 3 then 1 else 0 end) = 0;
The having clause counts the number of "3"s for each id. The = 0 returns only returns groups where the count is 0 (i.e. there are no "3"s).

You can use not exists :
select distinct t.id
from table t
where not exists (select 1 from table t1 where t1.id = t.id and t1.value = 3);

Try this:
select id from tablename
group by id
having (case when value=3 then 1 else 0 end)=0

You can also use EXCEPT for comparing following two data sets that will give the desired result set
select distinct Id from ValuesTbl
except
select Id from ValuesTbl where Value = 3

Related

Update Missing Values

I have a dataset that looks like the following
|---------------------|------------------|------------------|
| RowID | UserID | Code |
|---------------------|------------------|------------------|
| 1 | 123 | 0 |
|---------------------|------------------|------------------|
| 2 | 123 | 0 |
|---------------------|------------------|------------------|
| 3 | 123 | 50 |
|---------------------|------------------|------------------|
| 4 | 456 | 0 |
|---------------------|------------------|------------------|
| 5 | 456 | 100 |
|---------------------|------------------|------------------|
I would like to update the 0s to the non 0 Code for each UserID. Can someone provide assistance with this?
One option uses an updateable common table expression and window functions:
with cte as (
select code, max(code) over(partition by userID) max_code
from mytable
)
update cte set code = max_code where code = 0
You can use a simple sub-query for this:
update MyTable set
/* If its possible that a user might have multiple Codes which are non-zero you adjust the sub-query to return the correct one */
Code = (select top 1 Code from MyTable T2 where Code != 0 and T2.UserID = MyTable.UserId)
where Code = 0

Find number of rows identical one some, but different on another column

Say I have the following table:
CREATE TABLE data (
PROJECT_ID VARCHAR,
TASK_ID VARCHAR,
REF_ID VARCHAR,
REF_VALUE VARCHAR
);
I want to identify rows where
PROJECT_ID, REF_ID, REF_VALUE are the same
but TASK_ID are different.
The desired output is a list of TASK_ID_1, TASK_ID_2 and COUNT(*) of such conflicts. So, for example,
DATA
+------------+---------+--------+-----------+
| PROJECT_ID | TASK_ID | REF_ID | REF_VALUE |
+------------+---------+--------+-----------+
| 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 2 |
| 1 | 2 | 1 | 1 |
| 1 | 2 | 1 | 2 |
+------------+---------+--------+-----------+
OUTPUT
+-----------+-----------+----------+
| TASK_ID_1 | TASK_ID_2 | COUNT(*) |
+-----------+-----------+----------+
| 1 | 2 | 2 |
| 2 | 1 | 2 |
+-----------+-----------+----------+
would mean that there are two entries with TASK_ID == 1 and two entries with TASK_ID == 2 that share the same values for the other three columns. The inherent symmetry in the output is fine.
How would I go about finding this information? I've tried joining the table onto itself and grouping, but this turned up more results for a single task than the table had rows altogether, so it's clearly wrong.
The database used is PostgreSQL, though a solution that applies to most common SQL systems would be preferable.
You want a self join and aggregation:
select d1.task_id as task_id_1, d2.task_id as task_id_2, count(*)
from data d1 join
data d2
on d1.project_id = d2.project_id and
d1.ref_id = d2.ref_id and
d1.ref_value = d2.ref_value and
d1.task_id <> d2.task_id
group by d1.task_id, d2.task_id;
Notes:
Add the condition d1.task_id < d2.task_id if you want each pair to occur only once in the result set.
This does not handle NULL values, although that is easy enough to handle. Use is not distinct from instead of =.
You can also simplify this a bit with the using clause:
select d1.task_id as task_id_1, d2.task_id as task_id_2, count(*)
from data d1 join
data d2
using (project_id, ref_id, ref_value)
where d1.task_id <> d2.task_id
group by d1.task_id, d2.task_id;
You can get an idea of how many rows might be returned by using:
select d.project_id, d.ref_id, d.ref_value, count(distinct d.task_id), count(*)
from data d
group by d.project_id, d.ref_id, d.ref_value;
This is how I understand your question. This assume there are only two task for the same combination.
SQL DEMO
SELECT "PROJECT_ID", "REF_ID", "REF_VALUE",
MIN("TASK_ID") as TASK_ID_1,
MAX("TASK_ID") as TASK_ID_2,
COUNT(*) as cnt
FROM Table1
GROUP BY "PROJECT_ID", "REF_ID", "REF_VALUE"
HAVING MIN("TASK_ID") != MAX("TASK_ID")
-- COUNT(*) > 1 also should work
OUTPUT
I add more column to make clear what are the same elements:
| PROJECT_ID | REF_ID | REF_VALUE | task_id_1 | task_id_2 | cnt |
|------------|--------|-----------|-----------|-----------|-----|
| 1 | 1 | 2 | 1 | 2 | 2 |
| 1 | 1 | 1 | 1 | 2 | 2 |

SQL : Getting duplicate rows along with other variables

I am working on Terradata SQL. I would like to get the duplicate fields with their count and other variables as well. I can only find ways to get the count, but not exactly the variables as well.
Available input
+---------+----------+----------------------+
| id | name | Date |
+---------+----------+----------------------+
| 1 | abc | 21.03.2015 |
| 1 | def | 22.04.2015 |
| 2 | ajk | 22.03.2015 |
| 3 | ghi | 23.03.2015 |
| 3 | ghi | 23.03.2015 |
Expected output :
+---------+----------+----------------------+
| id | name | count | // Other fields
+---------+----------+----------------------+
| 1 | abc | 2 |
| 1 | def | 2 |
| 2 | ajk | 1 |
| 3 | ghi | 2 |
| 3 | ghi | 2 |
What am I looking for :
I am looking for all duplicate rows, where duplication is decided by ID and to retrieve the duplicate rows as well.
All I have till now is :
SELECT
id, name, other-variables, COUNT(*)
FROM
Table_NAME
GROUP BY
id, name
HAVING
COUNT(*) > 1
This is not showing correct data. Thank you.
You could use a window aggregate function, like this:
SELECT *
FROM (
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
) AS sub
WHERE duplicates > 1
Using a teradata extension to ISO SQL syntax, you can simplify the above to:
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
QUALIFY duplicates > 1
As an alternative to the accepted and perfectly correct answer, you can use:
SELECT {all your required 'variables' (they are not variables, but attributes)}
, cnt.Count_Dups
FROM Table_NAME TN
INNER JOIN (
SELECT id
, COUNT(1) Count_Dups
GROUP BY id
HAVING COUNT(1) > 1 -- If you want only duplicates
) cnt
ON cnt.id = TN.id
edit: According to your edit, duplicates are on id only. Edited my query accordingly.
try this,
SELECT
id, COUNT(id)
FROM
Table_NAME
GROUP BY
id
HAVING
COUNT(id) > 1

how to get a Distinct Count of users from two related but different tables

Apologies for this but SQL is not a strong point for me, and whilst appears similar to lots of other queries I cannot translate those to this situation successfully.
I have two tables that will be related by a common value (id and Issue) if a row in table 2 exists.
I need to get a distinct count of users raising particular issues. I have users in both tables, with the table 2 user taking precedence if it exists.
There is always a REPORTER in Table 1, but there may not be a Stringvalue of Name (fieldtype = 1) in table 2. If there is a Stringvalue then that is the "User" and the Reporter can be ignored.
Table 1
| id | Reporter| Type |
| 1 | 111111 | 1 |
| 2 | 111111 | 2 |
| 3 | 222222 | 2 |
| 4 | 333333 | 1 |
| 5 | 111111 | 1 |
| 6 | 666666 | 1 |
Table 2
|issue | Stringvalue | fieldType|
| 1 | Fred | 1 |
| 1 | bananas | 2 |
| 2 | Jack | 1 |
| 5 | Steve | 1 |
I have a total of 4 issues of the right type (1,4,5,6), three reporters (111111,333333,666666) and two Stringvalues(Fred, Steve).
My total count of Distinct Users = 4 (Fred, 333333, Steve, 666666)
Result Table
| id| T1.Reporter | T2.Name |
| 1| Null | Fred |
| 4| 333333 | Null |
| 5| Null | Steve |
| 6| 666666 | Null |
How do I get this result in SQL!
Closest try so far:
SELECT
table1.REPORTER,
TO_CHAR(NULL) "NAME"
FROM table1
Where table1.TYPE =1
AND table1.REPORTER <> '111111'
Union
SELECT
TO_CHAR(NULL) "REPORTER",
table2.STRINGVALUE "NAME"
FROM table2,
table1
WHERE table2.ISSUE = table1.ID
AND table2.fieldtype= 1
and table1.issuetype = 1
Without explicitly excluding the default table 1 Reporter, this gets returned in my results even when there is a name value in table 2.
I have tried exists and in but cannot get syntax right or the correct results. As soon as try any Join that links the ID and Issue values the results always end up constrained to the matching rows or for all values. And added additional conditions to the ON does not return correct results.
I have tried too many permutations to list, logically this sounds like should be able to do union with where exists, or left outer join but my skills are lacking to make this work.
You need to use a LEFT JOIN and that is where you specify the fieldtype = 1 clause:
SELECT
table1.id,
CASE
WHEN table2.Stringvalue IS NOT NULL THEN table2.Stringvalue
ELSE table1.Reporter
END AS TheUser
FROM table1
LEFT JOIN table2 ON table1.id = table2.issue AND table2.fieldType = 1
WHERE table1.Type = 1
Result:
+------+---------+
| id | TheUser |
+------+---------+
| 1 | Fred |
| 4 | 333333 |
| 5 | Steve |
| 6 | 666666 |
+------+---------+
If I understand correctly, you want a left join and count(distinct). Here is what I think you are looking for:
select count(distinct coalesce(stringval, reporter) )
from table1 t1 left join
table2 t2
on t1.id = t2.issue and t2.fieldtype = 1
where t1.id in (1, 4, 5, 6);
You need to learn how to use explicit JOIN syntax. As a simple rule: Never use commas in the FROM clause. Always use explicit JOIN syntax. For one thing, it is more powerful, making it easy to express outer joins.

Selecting column from one table and count from another

t1
id | name | include
-------------------
1 | foo | true
2 | bar | true
3 | bum | false
t2
id | some | table_1_id
-------------------------
1 | 42 | 1
2 | 43 | 1
3 | 42 | 2
4 | 44 | 1
5 | 44 | 3
Desired output:
name | count(some)
------------------
foo | 3
bar | 1
What I have currently from looking through other solutions here:
SELECT a.name,
COUNT(r.some)
FROM t1 a
JOIN t2 r on a.id=r.table_1_id
WHERE a.include = 'true'
GROUP BY a.id,
r.some;
but that seems to get me
name | count(r.some)
--------------------
foo | 1
foo | 1
bar | 1
foo | 1
I'm no sql expert (I can do simple queries) so I'm googling around as well but finding most of the solutions I find give me this result. I'm probably missing something really easy.
Just remove the second column from the group by clause
SELECT a.name,
COUNT(r.some)
FROM t1 a
JOIN t2 r on a.id=r.table_1_id
WHERE a.include = 'true'
GROUP BY a.name
Columns you want to use in an aggregate function like sum() or count() must be left out of the group by clause. Only put the columns in there you want to be unique outputted.
This is because multiple column group requires the all column values to be same.
See this link for more info., Using group by on multiple columns
Actually in you case., if some are equal, table_1_id is not equal (And Vice versa). so grouping cannot occur. So all are displayed individually.
If the entries are like,
id | some | table_1_id
-------------------------
1 | 42 | 1
2 | 43 | 1
3 | 42 | 2
4 | 42 | 1
Then the output would have been.,
name | count
------------------
foo | 2 (for 42)
foo | 1 (for 43)
bar | 1 (for 42)
Actually, if you want to group on 1 column as Juergen said, you could remove r.some; from groupby clause.