Selecting only duplicate rows, kinda - sql

I'm really not sure on the best way to explain this, but we have a database which has a unique ID for each employee, a description, and a flag for current.
SELECT COUNT("Current_Flag"),
"Employee_Number"
FROM "Employment_History"
WHERE "Current_Flag" = 'Y'
GROUP BY "Employee_Number" ;
I'm trying to return the unique ID for every case where they have two current flags set, but I don't even know where to begin. I've tried a subselect which didn't work, I'm sure the answer is quite simple but I've just been told I only have 7 minutes to do it, so panicked and thought I'd ask here. Sorry all :/

Add a HAVING clause to your current query - like so:
select count("Current_Flag"),
"Employee_Number"
from "Employment_History"
where "Current_Flag" = 'Y'
group by "Employee_Number"
having count("Current_Flag") >= 2
(Change the condition to =2 if you only want exactly 2 matches.)

Try something like this (untested!). You might have to play around with HAVING/COUNT to get it to work
SELECT unique_id, description, flag FROM table GROUP BY unique_id HAVING COUNT(unique_id) > 1

select employee_name, count(employee_name) as cnt
group by employee_name
where cnt>1
I guess this will do the trick. Adapt it to the fields on your DB.

Self Join table return only rows where 2 "Flags" are alike exclude rows that match on UniqueID returning al instances of ID having matches on flags.
Select A.ID
from table A
INNER JOIN table B
on A.Flag1=B.Flag1 and A.Flag2=B.Flag2
Where A.ID <> B.ID

(UNTESTED STATEMENT)
SELECT Id FROM TABLE1 T1 WHERE T1.Id NOT IN(
SELECT Id FROM TABLE2 T2 WHERE (T1.Flag1 = T2.Flag1) AND (T1.Id <> T2.Id))

Related

How to only select rows where one column is unique and the other is a certain value

I am trying to figure out which ids only have one type of transaction.
I have tried joining and selecting distinct but I am doing it wrong
select transactions.type, id
from datasource
group by id, transactions.type
This gives me a table with two transaction types: either dep or withdraw. Most IDs have two rows, one being dep and the other withdraw. I want to select only the ids that have only the withdraw transaction type
use where condition and not exists
select transactions.type, id from datasource t1
where type='withdraw'
and not exists( select 1 from datasource t2 where t1.id=t2.id
and type='dep')
I think aggregation does what you want. However, I am confused what your data structure is. A join is needed somewhere:
select ds.id, min(t.type)
from datasource ds join
transactions t
on ds.? = t.? -- what is the join key?
group by ds.id
having min(t.type) = max(t.type);
If transactions.type is actually a column in datasource, then:
select ds.id, min("transactions.type")
from datasource ds
group by ds.id
having min("transactions.type") = max("transactions.type");
You can use group by with having count(*)=1 clause
select id
from datasource
where transactions_type in
(
select transactions_type, id
from datasource
group by transactions_type
having count(*)=1 )
Since "You are trying to figure out which ids only have one type of transaction"
If you specifically look for transactions_type = 'withdraw', then add and transactions_type = 'withdraw' to the end of the above Select Statement.
P.S. I suppose there's a type for transactions.type, and that should be transactions_type, isn't that?
Similar to other answers, but simpler:
select id, count(distinct transactions.type)
from datasource
group by id
having count(distinct transactions.type) = 1
It is unclear, however, what transactions.type is. It doesn't appear to be valid as a column name. Or did you mean to write transactions_type?

Comparing two sum function in where clause

I want to check that an amount of likes the users received in all their personal pictures is at least twice as large as the number of likes received in the group pictures in which they are tagged.
In case the user is not tagged in any group photo but is tagged in a personal picture that has received at least one like, it will be returned.
My Question is:
How can I make a comparison between 2 sum functions
Where one result of the sum is returned in the nested query and compared with the external query.
Can I set an auxiliary variable to enter the sum value in it and compare it?
Thanks for the helpers:)
Select distinct UIP.userID
From tblUserInPersonalPic UIP
where **sum(UIP.numOfLikes) over (Partition by UIP.userID)*0.5** >
(Select distinct U.userID, sum(P.numOfLikes) over (Partition by U.userID)
From tblgroupPictures P left outer join
tblUserInGroupPic U On P.picNum=U.picNum
group by U.userID,P.numOfLikes,P.picNum)
It's kinda hard to know for sure, and of course I can't test my answer,
but I think you can do it with a couple of left joins, group by and having:
SELECT Personal.UserId
FROM tblUserInPersonalPic Personal
LEFT JOIN tblUserInGroupPic UserInGroup ON Personal.userID = UserInGroup.UesrId
LEFT JOIM tblgroupPictures GroupPictures ON UserInGroup.picNum = GroupPictures.picNum
GROUP BY Personal.userID
HAVING SUM(GroupPictures.numOfLikes) * 2 < SUM(Personal.numOfLikes)
Please note: When posting sql questions it's always best to provide sample data as DDL + DML (Create table + insert into statements) and desired results, so that who ever answers you can test the answer before posting it.
Try using two ctes..pseudo code.Also note distinct in second query will not even work,since you are returning two columns,so i changed it it below,so that you can get that column as well
;with tbl1
as
(
select a,sum(col1) as summ
from
tbl1
)
,tbl2
as
(
select userid,sum(Anothersmcol) as sum2
from tbl2
)
select tbl1.columns,tbl2.columns
from
tbl1 t1
join
tbl2 t2
on t1.sumcol>t2.sumcol
You can't use window functions in a where clause. Define it in a subquery:
select *
from (
select sum(...) over (...) as Sum1
, OtherColumn
from YourTable
) sub
where Sum1 < (...your subquery...)

can I use a variable for the integer expression in a left sql function

I have the following query:
SELECT top 2500 *
FROM table a
LEFT JOIN table b
ON a.employee_id = b.employee_id
WHERE left(a.employee_rc,6) IN
(
SELECT employeeID, access
FROM accesslist
WHERE employeeID = '#client.id#'
)
The sub select in the where clause can return one or several access values, ex:
js1234 BLKHSA
js1234 HDF48R7
js1234 BLN6
In the primary where clause I need to be able to change the integer expression from 6 to 5 or 4 or 7 depending on what the length of the values returned in the sub select. I am at a loss if this is the right way to go about it. I have tried using OR statements but it really slows down the query.
Try using exists instead:
SELECT top 2500 *
FROM table a LEFT JOIN
table b
ON a.employee_id = b.employee_id
WHERE EXISTS (Select 1
FROM accesslist
WHERE employeeID = '#client.id#' and
a.employee_rc like concat(employeeID, '%')
) ;
I don't see how your original query worked. The subquery is returning two columns and that normally isn't allowed in SQL for an in.
Move the subquery to a JOIN:
SELECT TOP 2500 *
FROM table a
LEFT JOIN table b ON a.employee_id = b.employee_id
LEFT JOIN accesslist al ON al.access LIKE concat('%', a.employee_id)
WHERE al.employeeID = '#client.id#'
Like Gordon, I don't quite see how your query worked, so I'm not quite sure if it should be access or employeeID which is matched.
This construct will enable you to do what you said you want to do, have an integer value depend on somethign from a subquery. It's the general idea only, the details are up to you.
select field1, field2
, case when subqueryField1 = 'fred' then 1
when subqueryField1 = 'barney' then 2
else 3 end integerValue
from table1 t1 join (
select idField subqueryField1, etc
from whereever ) t2 on t1.idFeld = t2.idField
where whatever
Also, a couple of things in your query are questionable. First, a top n query without an order by clause doesn't tell the database what records to return. Second, 2500 rows is a lot of data to return to ColdFusion. Are you sure you need it all? Third, selecting * instead of just the fields you need slows down performance. If you think you need every field, think again. Since the employee ids will always match, you don't need both of them.

Equality of "select ... where in" and joins

Suppose I have a table1 like this:
id | itemcode
-------------
1 | c1
2 | c2
...
And a table2 like this:
item | name
-----------
c1 | acme
c2 | foo
...
Would the following two queries return the same result set under every condition?
SELECT id, itemcode
FROM table1
WHERE itemcode IN (SELECT DISTINCT item
FROM table2
WHERE name [some arbitrary test])
SELECT id, itemcode
FROM table1
JOIN (SELECT DISTINCT item
FROM table2
WHERE name [some arbitrary test]) items
ON table1.itemcode = items.item
Unless I'm really missing something stupid, I'd say yes. But I've done two queries which boil down to this form and I am getting different results. There are some nested queries using WHERE IN, but for the last step I've noticed a JOIN is much faster. The nested queries are all entirely isolated so I don't believe they are the problem, so I just want to eliminate the possibility that I've got a misconception regarding the above.
Thanks for any insights.
EDIT
The two original queries:
SELECT imitm, imlitm, imglpt
FROM jdedata.F4101
WHERE imitm IN
(SELECT DISTINCT ivitm AS itemno
FROM jdedata.F4104
WHERE ivcitm IN
(SELECT DISTINCT ivcitm AS legacycode
FROM jdedata.F4104
WHERE ivitm IN
(SELECT DISTINCT tritm
FROM trigdata.F4101_TRIG)
)
)
SELECT orig.imitm, orig.imlitm, orig.imglpt
FROM jdedata.F4101 orig
JOIN
(SELECT DISTINCT ivitm AS itemno
FROM jdedata.F4104
WHERE ivcitm IN
(SELECT DISTINCT ivcitm AS legacycode
FROM jdedata.F4104
WHERE ivitm IN
(SELECT DISTINCT tritm
FROM trigdata.F4101_TRIG))) itemns
ON orig.imitm = itemns.itemno
EDIT 2
Although I still don't understand why the queries returned different results, it would seem our logic was flawed from the beginning since we were using the wrong columns in some parts. Mind that I'm not saying I made a mistake interpreting the queries as written above or had some typo, we just needed to select on some different stuff.
Normally I don't rest until I get to the bottom of things like these, but I'm very tired and am entering my first vacation since January that spans more than one day, so I can't really be bothered searching further right now. I'm sure the tips given here will come in handy later. Upvotes have been distributed for all the help and I've accepted Ypercube's answer, mostly because his comments have led me the furthest. But thanks all round! If I do find out more later, I'll try to remember pinging back in.
Since table2.item is not nullable, the 2 versions are equivalent. You can remove the distinct from the IN version, it's not needed. You can check these 3 versions and their execution plans:
SELECT id, itemcode FROM table1 WHERE itemcode IN
( SELECT item FROM table2 WHERE name [some arbitrary test] )
SELECT id, itemcode FROM table1 JOIN
( SELECT DISTINCT item FROM table2 WHERE name [some arbitrary test] )
items ON table1.itemcode = items.item
SELECT id, itemcode FROM table1 WHERE EXISTS
( SELECT * FROM table2 WHERE table1.itemcode = table2.item
AND (name [some arbitrary test]) )
Ideally I would want to see the differences between the result sets.
- Are you getting duplication of records
- Is one set always a sub-set of the other
- Does one set have both 'additional' and 'missing' records in comparison to the other?
That said, the logic should be equivilent. My best guess would be that you have some empty string entries in there; because Oracle's version of a NULL CHAR/VARCHAR is just an empty string. This can give very funky results if you're not prepared for it.
Both queries perform a semijoin i.e. no attributes from table2 appear in the topmost SELECT (the resultset).
To my eye, your first query is easiest to identify as a semijoin, EXISTS even more so. On the other hand, an optimizer would no doubt see it differently ;)
You can also try to do a direct join to the second table
SELECT DISTINCT id, itemcode
FROM table1
INNER JOIN table2 ON table1.itemcode = table2.item
WHERE name [some arbitrary test] )
You don't need distinct if item is primary key or unique
Exists and Inner Join should have the same execution speed, while IN is more expensive.
I'd look for some data type conversion in there.
create table t_vc (val varchar2(6));
create table t_c (val char(6));
insert into t_vc values ('12345');
insert into t_vc values ('12345 ');
insert into t_c values ('12345');
insert into t_c values ('12345');
select t_c.val||':'
from t_c
where val in (select distinct val from t_vc);
select c.val||':'
from t_vc v join (select distinct val from t_c) c on v.val=c.val;

SQL WHEREing on a different table's COUNT

So, I want to apply a WHERE condition to a field assigned by a COUNT() AS clause. My query currently looks like this:
SELECT new_tags.tag_id
, new_tags.tag_name
, new_tags.tag_description
, COUNT(DISTINCT new_tags_entries.entry_id) AS entry_count
FROM (new_tags)
JOIN new_tags_entries ON new_tags_entries.tag_id = new_tags.tag_id
WHERE `new_tags`.`tag_name` LIKE '%w'
AND `entry_count` < '1'
GROUP BY new_tags.tag_id ORDER BY tag_name ASC
The bit that's failing is the entry_count in the WHERE clause - it doesn't know what the entry_count column is. My table looks like this:
new_tags {
tag_id INT
tag_name VARCHAR
}
new_tags_entries {
tag_id INT
entry_id INT
}
I want to filter the results by the number of distinct entry_ids in new_tags_entries that pertain to the tag ID.
Make sense?
Thanks in advance.
To filter on aggegated values use the HAVING clause...
SELECT
new_tags.tag_id, new_tags.tag_name,
new_tags.tag_description,
COUNT(DISTINCT new_tags_entries.entry_id) AS entry_count
FROM (new_tags)
JOIN new_tags_entries ON new_tags_entries.tag_id = new_tags.tag_id
WHERE `new_tags`.`tag_name` LIKE '%w'
GROUP BY new_tags.tag_id
HAVING COUNT(DISTINCT new_tags_entries.entry_id) < '1'
ORDER BY tag_name ASC
An inner join will never have a count of less than 1. Perhaps a left join and IS NULL would help. That, or using SUM() instead.
Although APC's answer will be syntactically correct, if the problem you are trying to solve is indeed: "Find me all new_tags that do not have any news_tags_entries", then the query with INNER JOIN and GROUP BY and HAVING will not yield the correct result. In fact, it will always yield the empty set.
As Ignacio Vazques Abrahams pointed out, a LEFT JOIN will work. And you don't even need the GROUP BY / HAVING:
SELECT news_tags.*
FROM news_tags
LEFT JOIN news_tags_entries
ON news_tags.tag_id = news_tags_entries.tag_id
WHERE news_tags_entries.tag_id IS NULL
(Of course, you can still add GROUP BY and HAVING if you are interested to know how many entries there are, and not just want to find news_tags with zero news_tags_entries. But the LEFT JOIN from news_tags to news_tags_entries needs to be there or else you'll lose the news_tags that have no corresponding items in news_tags_items)
Another, more explicit way to solve the "get me all x for which there is no y" is a correlated NOT EXISTS solution:
SELECT news_tags.*
FROM news_tags
WHERE NOT EXISTS (
SELECT NULL
FROM news_tags_entries
WHERE news_tags_entries.tag_id = news_tags.tag_id
)
Although nice and explicit, this solution is typically shunned in MySQL because of the rather bad subquery performance
SELECT
new_tags.tag_id, new_tags.tag_name,
new_tags.tag_description,
COUNT(DISTINCT new_tags_entries.entry_id) AS entry_count
FROM (new_tags)
LEFT JOIN new_tags_entries ON new_tags_entries.tag_id = new_tags.tag_id
WHERE `new_tags`.`tag_name` LIKE '%w'
GROUP BY new_tags.tag_id ORDER BY tag_name ASC
HAVING `entry_count` < '1'