Finding duplicates on one column using select where in SQL Server 2008

Finding duplicates on one column using select where in SQL Server 2008 - sql

I am trying to select rows from a table that have duplicates in one column but also restrict the rows based on another column. It does not seem to be working correctly.
select Id,Terms from QueryData
where Track = 'Y' and Active = 'Y'
group by Id,Terms
having count(Terms) > 1
If I remove the where it works fine but I need to restrict it to these rows only.
ID Terms Track Active
100 paper Y Y
200 paper Y Y
100 juice Y Y
400 orange N N
1000 apple Y N
Ideally the query should return the first 2 rows.

SELECT Id, Terms, Track, Active
FROM QueryData
WHERE Terms IN (
SELECT Terms
FROM QueryData
WHERE Track = 'Y' and Active = 'Y'
GROUP BY Terms
HAVING COUNT(*) > 1
)
Demo on SQLFiddle
Data:
ID Terms Track Active
100 paper Y Y
200 paper Y Y
100 juice Y Y
400 orange N N
1000 apple Y N
Results:
Id Terms Track Active
100 paper Y Y
200 paper Y Y

Don't exactly get what you're doing. You use count(Terms) in having however Terms is in your select clause. It means that for each records count(Terms) will be 1. Probably you have to exclude Terms from select list.
Honestly i reproduced your table and query and it doesn't work.
Probably this is what you're looking for(?):
select Id, count(Terms) from QueryData
where Track = 'Y' and Active = 'Y'
group by Id
having count(Terms) > 1

This will return all duplicated terms meeting the criteria:
select Terms
from QueryData
where Track = 'Y' and Active = 'Y'
group by Terms
having count(*) > 1
http://sqlfiddle.com/#!3/18a57/2
If you want all the details for these terms, you can join to this result.
;with dups as (
select Terms
from QueryData
where Track = 'Y' and Active = 'Y'
group by Terms
having count(*) > 1
)
select
qd.ID, qd.Terms, qd.Track, qd.Active
from
QueryData qd join
dups d on qd.terms = d.terms
http://sqlfiddle.com/#!3/18a57/5

Related

JOIN on aggregate function

I have a table showing production steps (PosID) for a production order (OrderID) and which machine (MachID) they will be run on; I’m trying to reduce the table to show one record for each order – the lowest position (field “PosID”) that is still open (field “Open” = Y); i.e. the next production step for the order.
Example data I have:
OrderID
PosID
MachID
Open
1
1
A
N
1
2
B
Y
1
3
C
Y
2
4
C
Y
2
5
D
Y
2
6
E
Y
Example result I want:
OrderID
PosID
MachID
1
2
B
2
4
C
I’ve tried two approaches, but I can’t seem to get either to work:
I don’t want to put “MachID” in the GROUP BY because that gives me all the records that are open, but I also don’t think there is an appropriate aggregate function for the “MachID” field to make this work.
SELECT “OrderID”, MIN(“PosID”), “MachID”
FROM Table T0
WHERE “Open” = ‘Y’
GROUP BY “OrderID”
With this approach, I keep getting error messages that T1.”PosID” (in the JOIN clause) is an invalid column. I’ve also tried T1.MIN(“PosID”) and MIN(T1.”PosID”).
SELECT T0.“OrderID”, T0.“PosID”, T0.“MachID”
FROM Table T0
JOIN
(SELECT “OrderID”, MIN(“PosID”)
FROM Table
WHERE “Open” = ‘Y’
GROUP BY “OrderID”) T1
ON T0.”OrderID” = T1.”OrderID”
AND T0.”PosID” = T1.”PosID”

Try this:
SELECT “OrderID”,“PosID”,“MachID” FROM (
SELECT
T0.“OrderID”,
T0.“PosID”,
T0.“MachID”,
ROW_NUMBER() OVER (PARTITION BY “OrderID” ORDER BY “PosID”) RNK
FROM Table T0
WHERE “Open” = ‘Y’
) AS A
WHERE RNK = 1
I've included the brackets when selecting columns as you've written it in the question above but in general it's not needed.
What it does is it first filters open OrderIDs and then numbers the OrderIDs from 1 to X which are ordered by PosID
OrderID
PosID
MachID
Open
RNK
1
2
B
Y
1
1
3
C
Y
2
2
4
C
Y
1
2
5
D
Y
2
2
6
E
Y
3
After it filters on the "rnk" column indicating the lowest PosID per OrderID. ROW_NUMBER() in the select clause is called a window function and there are many more which are quite useful.
P.S. Above solution should work for MSSQL

How to identify subsequent user actions based on prior visits

I want to identify the users who visited section a and then subsequently visited b. Given the following data structure. The table contains 300,000 rows and updates daily with approx. 8,000 rows:
**USERID** **VISITID** **SECTION** Desired Solution--> **Conversion**
1 1 a 0
1 2 a 0
2 1 b 0
2 1 b 0
2 1 b 0
1 3 b 1
Ideally I want a new column that flags the visit to section b. For example on the third visit User 1 visited section b for the first time. I was attempting to do this using a CASE WHEN statement but after many failed attempts I am not sure it is even possible with CASE WHEN and feel that I should take a different approach, I am just not sure what that approach should be. I do also have a date column at my disposal.
Any suggestions on a new way to approach the problem would be appreciated. Thanks!

Correlated sub-queries should be avoided at all cost when working with Redshift. Keep in mind there are no indexes for Redshift so you'd have to rescan and restitch the column data back together for each value in the parent resulting in an O(n^2) operation (in this particular case going from 300 thousand values scanned to 90 billion).
The best approach when you are looking to span a series of rows is to use an analytic function. There are a couple of options depending on how your data is structured but in the simplest case, you could use something like
select case
when section != lag(section) over (partition by userid order by visitid)
then 1
else 0
end
from ...
This assumes that your data for userid 2 increments the visitid as below. If not, you could also order by your timestamp column
**USERID** **VISITID** **SECTION** Desired Solution--> **Conversion**
1 1 a 0
1 2 a 0
2 1 b 0
2 *2* b 0
2 *3* b 0
1 3 b 1

select t.*, case when v.ts is null then 0 else 1 end as conversion
from tbl t
left join (select *
from tbl x
where section = 'b'
and exists (select 1
from tbl y
where y.userid = x.userid
and y.section = 'a'
and y.ts < x.ts)) v
on t.userid = v.userid
and t.visitid = v.visitid
and t.section = v.section
Fiddle:
http://sqlfiddle.com/#!15/5b954/5/0
I added sample timestamp data as that field is necessary to determine whether a comes before b or after b.
To incorporate analytic functions you could use:
(I've also made it so that only the first occurrence of B (after an A) will get flagged with the 1)
select t.*,
case
when v.first_b_after_a is not null
then 1
else 0
end as conversion
from tbl t
left join (select userid, min(ts) as first_b_after_a
from (select t.*,
sum( case when t.section = 'a' then 1 end)
over( partition by userid
order by ts ) as a_sum
from tbl t) x
where section = 'b'
and a_sum is not null
group by userid) v
on t.userid = v.userid
and t.ts = v.first_b_after_a
Fiddle: http://sqlfiddle.com/#!1/fa88f/2/0

SUM only distinct values given certain criteria SQL

I have a table which Data is like
LayerID Company id Company name Layer Name Price
1 1 x x1 20
2 1 x x2 10
3 2 y y1 50
4 2 y y2 50
5 2 y y3 50
6 3 z z1 15
What I want is to have the following table after SQL query is applied
Company id Company name Price
1 x 30
2 y 50
3 z 15
i.e. the following rules apply:
if the price for the different layers for the company are different then sum them up
example: for company x it would be 20+10 = 30
if the price for the different layers for the company are the same then take that number
example: for company y it would be 50, for z it would be 15
I'm not sure how i would so this in SQL (for Access/VBA), and have been trying to figure this out to no avail.
Thanks for your help in advance
Claudy

The SQL query that would produce the result you are looking for:
SELECT m.Company_id, m.Company_name, SUM(m.Price)
FROM
(
SELECT DISTINCT Company_id, Company_name, Price
FROM MyTable
) AS m
GROUP BY m.Company_id, m.Company_name

You can do this as:
SELECT m.Company_id, m.Company_name, SUM(distinct m.Price)
FROM table m
GROUP BY m.Company_id, m.Company_name;
As a warning: I never use sum(distinct). It generally indicates an error in the underlying data structure or subquery generating the data.
EDIT:
Why is it bad to do this? Generally, what you really want is:
SUM(m.Price) where <some id> is distinct
But you can't phrase that in SQL without a subquery. If the above is what you want, then you have a problem when two "id"s have the same price. The sum() produces the wrong value.

Select column by different column function in Access query

I have the following table in Access 2010.
EQID Breaker Circuit Rating
1 A One 1000
2 A Two 1500
3 A Three 500
4 A Four 1000
5 B One 1500
6 B Two 2000
I want to create a query to Group by Breaker, and show the Minimum Rating and the associated Circuit for that rating. I understand how to do this without showing the Circuit for the Minimum rating.
My desired query result would be:
EQID Breaker Circuit Rating
1 A Three 500
2 B One 1500

Try this:
SELECT a.*
FROM table AS a
INNER JOIN (
SELECT Breaker, MIN(Rating) AS min_rating
FROM table
GROUP BY Breaker
) AS b
ON a.Breaker = b.Breaker AND
a.Rating = b.min_rating;
SQLFiddle: http://www.sqlfiddle.com/#!2/ea4fb/2

You can try below:
SELECT t.EQID, t.Breaker, t.Circuit, t.Rating
FROM test t
INNER JOIN
(
SELECT a.Breaker, MIN(a.Rating) AS Rating
FROM test a
GROUP BY Breaker
) AS tmp
ON tmp.Breaker = t.Breaker AND tmp.Rating = t.Rating;
Sql fiddle demo: http://sqlfiddle.com/#!2/fe796/19

DB2 SQL filter query result by evaluating an ID which has two types of entries

After many attempts I have failed at this and hoping someone can help. The query returns every entry a user makes when items are made in the factory against and order number. For example
Order Number Entry type Quantity
3000 1 1000
3000 1 500
3000 2 300
3000 2 100
4000 2 1000
5000 1 1000
What I want to the query do is to return filter the results like this
If the order number has an entry type 1 and 2 return the row which is type 1 only
otherwise just return row whatever the type is for that order number.
So the above would end up:
Order Number Entry type Quantity
3000 1 1000
3000 1 500
4000 2 1000
5000 1 1000
Currently my query (DB2, in very basic terms looks like this ) and was correct until a change request came through!
Select * from bookings where type=1 or type=2
thanks!

select * from bookings
left outer join (
select order_number,
max(case when type=1 then 1 else 0 end) +
max(case when type=2 then 1 else 0 end) as type_1_and_2
from bookings
group by order_number
) has_1_and_2 on
type_1_and_2 = 2
has_1_and_2.order_number = bookings.order_number
where
bookings.type = 1 or
has_1_and_2.order_number is null
Find all the orders that have both type 1 and type 2, and then join it.
If the row matched the join, only return it if it is type 1
If the row did not match the join (has_type_2.order_number is null) return it no matter what the type is.

A "common table expression" [CTE] can often simplify your logic. You can think of it as a way to break a complex problem into conceptual steps. In the example below, you can think of g as the name of the result set of the CTE, which will then be joined to
WITH g as
( SELECT order_number, min(type) as low_type
FROM bookings
GROUP BY order_number
)
SELECT b.*
FROM g
JOIN bookings b ON g.order_number = b.order_number
AND g.low_type = b.type
The JOIN ON conditions will work so that if both types are present then low_type will be 1, and only that type of record will be chosen. If there is only one type it will be identical to low_type.
This should work fine as long as 1 and 2 are the only types allowed in the bookings table. If not then you can simply add a WHERE clause in the CTE and in the outer SELECT.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Finding duplicates on one column using select where in SQL Server 2008 - sql

Related

JOIN on aggregate function

How to identify subsequent user actions based on prior visits

SUM only distinct values given certain criteria SQL

Select column by different column function in Access query

DB2 SQL filter query result by evaluating an ID which has two types of entries

Categories

Resources