Looking for duplicates based on a few other columns

Looking for duplicates based on a few other columns - sql

I am trying to find the rows where PilotID has used the shimpmentNumber more than once.
I have this so far.
select f_Shipment_ID
,f_date
,f_Pilot_ID
,f_Shipname
,f_SailedFrom
,f_SailedTo
,f_d_m
,f_Shipmentnumber
,f_NumberOfPilots
from t_shipment
where f_Pilot_ID < 10000
and f_NumberOfPilots=1
and f_Shipmentnumber in(select f_Shipmentnumber
from t_shipment
group by f_Shipmentnumber
Having count(*) >1)

Try something like this:
-- The CTE determines the f_Pilot_ID/f_Shipmentnumber combinations that appear more than once.
with DuplicateShipmentNumberCTE as
(
select
f_Pilot_ID,
f_Shipmentnumber
from
t_shipment
where
f_Pilot_ID < 10000 and
f_NumberOfPilots = 1
group by
f_Pilot_ID,
f_Shipmentnumber
having
count(1) > 1
)
select
Shipment.f_Shipment_ID,
Shipment.f_date,
Shipment.f_Pilot_ID,
Shipment.f_Shipname,
Shipment.f_SailedFrom,
Shipment.f_SailedTo,
Shipment.f_d_m,
Shipment.f_Shipmentnumber,
Shipment.f_NumberOfPilots
from
-- The join is used to restrict the result set to the shipments identified by the CTE.
t_shipment Shipment
inner join DuplicateShipmentNumberCTE CTE on
Shipment.f_Pilot_ID = CTE.f_Pilot_ID and
Shipment.f_Shipmentnumber = CTE.f_Shipmentnumber
where
f_NumberOfPilots = 1;
You can also do this with a subquery if you want to—or if you're using an old version of SQL Server that doesn't support CTEs—but I find the CTE syntax to be more natural, if only because it enables you to read and understand the query from the top down, rather than from the inside out.

In your sub select use:
select f_Shipmentnumber
from t_shipment
group by f_pilot_id, f_Shipmentnumber
Having count(*) >1

How about this
select f_Shipment_ID
,f_date
,f_Pilot_ID
,f_Shipname
,f_SailedFrom
,f_SailedTo
,f_d_m
,f_Shipmentnumber
,f_NumberOfPilots
from t_shipment
where f_Pilot_ID < 10000
and f_NumberOfPilots=1
and f_Pilot_ID IN (select f_Pilot_ID
from t_shipment
group by f_Pilot_ID, f_Shipmentnumber
Having count(*) >1)

Related

Use of MAX function in SQL query to filter data

The code below joins two tables and I need to extract only the latest date per account, though it holds multiple accounts and history records. I wanted to use the MAX function, but not sure how to incorporate it for this case. I am using My SQL server.
Appreciate any help !
select
PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label
from
Property.dbo.PROP
inner join
Property.dbo.PROP_DATA on Property.dbo.PROP.FileID = Actuarial.dbo.PROP_DATA.FileID
where
(PROP_DATA.Label in ('Occupancy' , 'OccupancyTIV'))
and (PROP.EffDate >= '42278' and PROP.EffDate <= '42643')
and (PROP.Status = 'Bound')
and (Prop.FileTime = Max(Prop.FileTime))
order by
PROP.EffDate DESC

Assuming your DBMS supports windowing functions and the with clause, a max windowing function would work:
with all_data as (
select
PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label,
max (PROP.EffDate) over (partition by PROP.PolNo) as max_date
from Actuarial.dbo.PROP
inner join Actuarial.dbo.PROP_DATA
on Actuarial.dbo.PROP.FileID = Actuarial.dbo.PROP_DATA.FileID
where (PROP_DATA.Label in ('Occupancy' , 'OccupancyTIV'))
and (PROP.EffDate >= '42278' and PROP.EffDate <= '42643')
and (PROP.Status = 'Bound')
and (Prop.FileTime = Max(Prop.FileTime))
)
select
FileName, InsName, Status, FileTime, SubmissionNo,
PolNo, EffDate, ExpDate, Region, UnderWriter, Data, Label
from all_data
where EffDate = max_date
ORDER BY EffDate DESC
This also presupposes than any given account would not have two records on the same EffDate. If that's the case, and there is no other objective means to determine the latest account, you could also use row_numer to pick a somewhat arbitrary record in the case of a tie.

Using straight SQL, you can use a self-join in a subquery in your where clause to eliminate values smaller than the max, or smaller than the top n largest, and so on. Just set the number in <= 1 to the number of top values you want per group.
Something like the following might do the trick, for example:
select
p.FileName
, p.InsName
, p.Status
, p.FileTime
, p.SubmissionNo
, p.PolNo
, p.EffDate
, p.ExpDate
, p.Region
, p.Underwriter
, pd.Data
, pd.Label
from Actuarial.dbo.PROP p
inner join Actuarial.dbo.PROP_DATA pd
on p.FileID = pd.FileID
where (
select count(*)
from Actuarial.dbo.PROP p2
where p2.FileID = p.FileID
and p2.EffDate <= p.EffDate
) <= 1
and (
pd.Label in ('Occupancy' , 'OccupancyTIV')
and p.Status = 'Bound'
)
ORDER BY p.EffDate DESC
Have a look at this stackoverflow question for a full working example.

Not tested
with temp1 as
(
select foo
from bar
whre xy = MAX(xy)
)
select PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label
from Actuarial.dbo.PROP
inner join temp1 t
on Actuarial.dbo.PROP.FileID = t.dbo.PROP_DATA.FileID
ORDER BY PROP.EffDate DESC

SQL query top 2 columns of joined table?

I am having no luck attempting to get the top (x number) of rows from a joined table. I want the top 2 resources (ordered by name) which in this case should be Katie and Simon and regardless of what I've tried, I can't seem to get it right. You can see below what I've commented out - and what looks like it should work (but doesn't). I cannot use a union. Any ideas?
select distinct
RTRESOURCE.RNAME as Resource,
RTTASK.TASK as taskname, SUM(distinct SOTRAN.QTY2BILL) AS quantitytobill from SOTRAN AS SOTRAN INNER JOIN RTTASK AS RTTASK ON sotran.taskid = rttask.taskid
left outer JOIN RTRESOURCE AS RTRESOURCE ON rtresource.keyno=sotran.resid
WHERE sotran.phantom<>'y' and sotran.pgroup = 'L' and sotran.timesheet = 'y' and sotran.taskid >0 AND RTRESOURCE.KEYNO in ('193','159','200') AND ( SOTRAN.ADDDATE>='8/15/2015 12:00:00 AM' AND SOTRAN.ADDDATE<'9/3/2015 11:59:59 PM' )
//and RTRESOURCE.RNAME in ( select distinct top 2 RTRESOURCE.RNAME from RTRESOURCE order by RTRESOURCE.RNAME)
//and ( select count(*) from RTRESOURCE RTRESOURCE2 where RTRESOURCE2.RNAME = RTRESOURCE.RNAME ) <= 2
GROUP BY RTRESOURCE.rname,RTTASK.task,RTTASK.taskid,RTTASK.mdsstring ORDER BY Resource,taskname

You should provide a schema.
But lets assume your query work. You create a CTE.
WITH youQuery as (
SELECT *
FROM < you big join query>
), maxBill as (
SELECT Resource, Max(quantitytobill) as Bill
FROM yourQuery
)
SELECT top 2 *
FROM maxBill
ORDER BY Bill
IF you want top 2 alphabetical
WITH youQuery as (
SELECT *
FROM < you big join query>
), Names as (
SELECT distinct Resource
FROM yourQuery
Order by Resource
)
SELECT top 2 *
FROM Names

SQL: Using COUNT(*) Instead of EXISTS

Is it possible to use COUNT in place of EXISTS?
I have following query:
SELECT *
FROM Goals G
WHERE EXISTS (SELECT NULL FROM tfv_home_last6(G.Date, G.Home) WHERE GameNumber <= 6 AND
HomeGoals >= 3)
Instead of returning the row if at least one row exists in the subquery, I'd like to specify a number of rows that need to be returned in the subquery, something like
SELECT *
FROM Goals G
WHERE ROWCOUNT(*) >= 2 (SELECT NULL FROM tfv_home_last6(G.Date, G.Home) WHERE GameNumber <= 6 AND
HomeGoals >= 3)
I'm not sure how to go about it?
I'm using SQL Server 2012.

You can do the subquery pretty much just like you describe:
SELECT *
FROM Goals G
WHERE (SELECT count(*)
FROM tfv_home_last6(G.Date, G.Home)
WHERE GameNumber <= 6 AND HomeGoals >= 3
) > 0;
However, this requires calculating the entire count. The exists form is more efficient, because it stops at the first matching record.
In SQL Server 2012, you could also use `cross apply:
SELECT *
FROM Goals G cross apply
(select count(*) as cnt
FROM tfv_home_last6(G.Date, G.Home)
WHERE GameNumber <= 6 AND HomeGoals >= 3
) a
WHERE a.cnt > 0;
I do not know which would have better performance, the correlated subquery in the where clause or the
cross apply version.

SQL group by issue when i try to get some info

I can not figure it out:
I have a table called ImportantaRecords with fields like market, zip5, MHI, MHV, TheTable and I want to group all the records by zip5 that have TheTable = 'mg'… I tried this :
select a.Market,a.zip5,count(a.zip5),a.MHI,a.MHV,a.TheTable from
(select * from ImportantaRecords where TheTable = 'mg') a
group by a.Zip5
but it gives me the classic error with not an aggrefate function
and then I tried this:
select Market,zip5,count(zip5),MHI,MHV,TheTable from ImportantaRecords where TheTable = 'mg'
group by Zip5
and the same thing…
any help ?

You did not state what database you are using but if you are getting an error about columns not being in an aggregate function, then you might need to add the columns not in an aggregate function to the GROUP BY:
select Market,
zip5,
count(zip5),
MHI,
MHV,
TheTable
from ImportantaRecords
where TheTable = 'mg'
group by Market, Zip5, MHI, MHV, TheTable;
If grouping by the additional columns alters the result that you are expecting, then you could use a subquery to get the result:
select i1.Market,
i1.zip5,
i2.Total,
i1.MHI,
i1.MHV,
i1.TheTable
from ImportantaRecords i1
inner join
(
select zip5, count(*) Total
from ImportantaRecords
where TheTable = 'mg'
group by zip5
) i2
on i1.zip5 = i2.zip5
where i1.TheTable = 'mg'

SQL ROW_NUMBER with INNER JOIN

I need to use ROW_NUMBER() in the following Query to return rows 5 to 10 of the result. Can anyone please show me what I need to do? I've been trying to no avail. If anyone can help I'd really appreciate it.
SELECT *
FROM villa_data
INNER JOIN villa_prices
ON villa_prices.starRating = villa_data.starRating
WHERE villa_data.capacity >= 3
AND villa_data.bedrooms >= 1
AND villa_prices.period = 'lowSeason'
ORDER BY villa_prices.price,
villa_data.bedrooms,
villa_data.capacity

You need to stick it in a table expression to filter on ROW_NUMBER. You won't be able to use * as it will complain about the column name starRating appearing more than once so will need to list out the required columns explicitly. This is better practice anyway.
WITH CTE AS
(
SELECT /*TODO: List column names*/
ROW_NUMBER()
OVER (ORDER BY villa_prices.price,
villa_data.bedrooms,
villa_data.capacity) AS RN
FROM villa_data
INNER JOIN villa_prices
ON villa_prices.starRating = villa_data.starRating
WHERE villa_data.capacity >= 3
AND villa_data.bedrooms >= 1
AND villa_prices.period = 'lowSeason'
)
SELECT /*TODO: List column names*/
FROM CTE
WHERE RN BETWEEN 5 AND 10
ORDER BY RN

You can use a with clause. Please try the following
WITH t AS
(
SELECT villa_data.starRating,
villa_data.capacity,
villa_data.bedrooms,
villa_prices.period,
villa_prices.price,
ROW_NUMBER() OVER (ORDER BY villa_prices.price,
villa_data.bedrooms,
villa_data.capacity ) AS 'RowNumber'
FROM villa_data
INNER JOIN villa_prices
ON villa_prices.starRating = villa_data.starRating
WHERE villa_data.capacity >= 3
AND villa_data.bedrooms >= 1
AND villa_prices.period = 'lowSeason'
)
SELECT *
FROM t
WHERE RowNumber BETWEEN 5 AND 10;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Looking for duplicates based on a few other columns - sql

In your sub select use: select f_Shipmentnumber from t_shipment group by f_pilot_id, f_Shipmentnumber Having count(*) >1

Related

Use of MAX function in SQL query to filter data

SQL query top 2 columns of joined table?

SQL: Using COUNT(*) Instead of EXISTS

SQL group by issue when i try to get some info

SQL ROW_NUMBER with INNER JOIN

Categories

Resources