Find duplicates in SQL Server database where one of the columns must differ - sql

I'm trying to write a SQL query to find duplicates. What I can't manage to do is to make my query only select duplicates where one of the columns value must differ. So, I want to find all the duplicates where all the columns are the same, but one of the values must differ.
What I've got at the moment:
SELECT
a.1, underlag.1, f.1, f.2, f.3, f.4, f.5, f.6, f.7, f.8,
COUNT(*) TotalCount
FROM
f
JOIN
a ON a.Id = f.Id
JOIN
underlag ON underlag.Id = f.Id
GROUP BY
a.1, underlag.1, f.1, f.2, f.3, f.4, f.5, f.6, f.7, f.8
HAVING
COUNT(*) > 1
ORDER BY
underlag.1
The column that I want to differ is f.9 but I've no clue on how to do this. Any help or pointers in the right direction would be great!

SELECT *
FROM (
SELECT
a1 = a.[1]
, underlag1 = underlag.[1]
, f.[1], f.[2], f.[3], f.[4], f.[5], f.[6], f.[7], f.[8], f.[9]
, val = SUM(1) OVER (PARTITION BY CHECKSUM(f.[1], f.[2], f.[3], f.[4], f.[5], f.[6], f.[7], f.[8]))
FROM f
JOIN a on a.Id = f.Id
JOIN underlag on underlag.Id = f.Id
) t
WHERE t.val > 1
ORDER BY underlag1

Related

SQL Combining Two Totally seperate tables to one

I am VERY new to SQL and self taught. I have two SQL that I stuggled through but got working. Now I need to combine them into one and I'm lost.
SELECT
s.lastfirst,
s.student_number,
SUM(tr.howmany)
FROM
students s
JOIN
truancies tr ON s.id = tr.studentid
WHERE
s.enroll_status = 0 AND
s.schoolid = ~(curschoolid)
GROUP BY
s.lastfirst, s.student_number
HAVING
SUM(tr.howmany) > 0
ORDER BY
s.lastfirst
And this table:
SELECT
S.DCID as DCID,
S.LASTFIRST as LASTFIRST,
S.STUDENT_NUMBER as STUDENT_NUMBER,
S2.FC_SRVC_HRS_DUE as FC_SRVC_HRS_DUE,
CASE
WHEN S2.FC_SRVC_HRS_COMPLETED IS NULL
THEN '0'
ELSE S2.FC_SRVC_HRS_COMPLETED
END AS FC_SRVC_HRS_COMPLETED,
S2.FC_SRVC_HRS_BUYOUT as FC_SRVC_HRS_BUYOUT,
CASE
WHEN S2.FC_SRVC_HRS_COMPLETED IS NULL
THEN S2.FC_SRVC_HRS_DUE * S2.FC_SRVC_HRS_BUYOUT
ELSE ((S2.FC_SRVC_HRS_DUE - S2.FC_SRVC_HRS_COMPLETED) * S2.FC_SRVC_HRS_BUYOUT)
END as Balance_Due
FROM
STUDENTS S, U_STUDENTSUSERFIELDS S2
WHERE
S.DCID = S2.STUDENTSDCID AND
s.enroll_status = 0 AND
s.schoolid = ~(curschoolid) AND
(((S2.FC_SRVC_HRS_DUE - S2.FC_SRVC_HRS_COMPLETED) * S2.FC_SRVC_HRS_BUYOUT) > 0 OR
((S2.FC_SRVC_HRS_DUE - S2.FC_SRVC_HRS_COMPLETED) * S2.FC_SRVC_HRS_BUYOUT) IS NULL) AND
S2.FC_SRVC_HRS_DUE >.1
ORDER BY
s.lastfirst
What I am really looking for are the totals of both of these. I want the SUM(tr.howmany) from the first table and the balance due of the second BUT I need the filters that are in there. This would be sorted by student. I hope I am making sense. Any assistance would be appreciated.
You can join together 2 separate SQL select statements:
Eg:
Select A.id, A.value, B.value
From (select id, count(*) as value from TableA ...) AS A
join (select id, sum(field) as value from TableB ...) AS B
on A.id = B.id
order by A.id
As long as you have a common field to join on this would work. In your case the student_number looks like a good candidate. You will have to do the ordering outside of your subqueries.

select only distinct value from one column

I am displaying three fields from my query, and I want to display a distinct banking_no. I still want to display the other fields even they are not distinct. Please help.
SELECT C.RECEIPT_OFFICE_PREFIX, B.BANKING_NO, B.STATUS_CD
FROM TControl B, TComponent C
WHERE C.DEPOSIT_BANK_ACCT = 'xxx-xxxxxx-xxxxx'
AND B.BANKING_NO = C.BANKING_NO
AND B.COMPANY_ID = C.COMPANY_ID
AND B.RECEIPT_OFFICE_PREFIX = C.RECEIPT_OFFICE_PREFIX
AND B.STATUS_CD != 'C'
ORDER BY B.BANKING_NO
I am using Sybase ASE 12.5
More recent versions of Sybase support row_number(). You would use it in a subquery like this:
SELECT RECEIPT_OFFICE_PREFIX, BANKING_NO, STATUS_CD
FROM (SELECT C.RECEIPT_OFFICE_PREFIX, B.BANKING_NO, B.STATUS_CD,
ROW_NUMBER() OVER (PARTITION BY B.BANKING_NO ORDER BY B.BANKING_NO) as seqnum
FROM TControl B JOIN
TComponent C
ON B.BANKING_NO = C.BANKING_NO AND B.COMPANY_ID = C.COMPANY_ID
WHERE C.DEPOSIT_BANK_ACCT = 'xxx-xxxxxx-xxxxx' AND
B.RECEIPT_OFFICE_PREFIX = C.RECEIPT_OFFICE_PREFIX AND
B.STATUS_CD != 'C'
) BC
WHERE seqnum = 1
ORDER BY BANKING_NO;
What you want may not be called distint. If the rest of data is the same just using a DISTINCT will do. However if data may be different you need to tell what result should the database choose. the first row? the last row?
If you do not mind just use;
SELECT DISTINCT MAX(C.RECEIPT_OFFICE_PREFIX), B.BANKING_NO, MAX(B.STATUS_CD)
FROM TControl B, TComponent C
WHERE C.DEPOSIT_BANK_ACCT = 'xxx-xxxxxx-xxxxx'
AND B.BANKING_NO = C.BANKING_NO
AND B.COMPANY_ID = C.COMPANY_ID
AND B.RECEIPT_OFFICE_PREFIX = C.RECEIPT_OFFICE_PREFIX
AND B.STATUS_CD != 'C'
GROUP BY B.BANKING_NO
In you need to choose one row use Gordon solution.

Column is invalid in the ORDER BY clause because it is not contained in either an aggregate function or the GROUP BY clause

Ok here's my View (vw_LiftEquip)
SELECT dbo.tbl_equip_swl_unit.unit_id,
dbo.tbl_equip_swl_unit.unit_name,
dbo.tbl_equip_swl_unit.archived,
dbo.tbl_categories.category_id,
dbo.tbl_categories.categoryName,
dbo.tbl_categories.parentCategory,
dbo.tbl_categories.sub_category,
dbo.tbl_categories.desc_category,
dbo.tbl_categories.description,
dbo.tbl_categories.miscellaneous,
dbo.tbl_categories.category_archived,
dbo.tbl_equip_swl_unit.unit_name AS Expr1,
dbo.tbl_categories.categoryName AS Expr2,
dbo.tbl_categories.description AS Expr3,
dbo.tbl_equip_depts.dept_name,
dbo.tbl_equip_man.man_name,
dbo.tbl_Lifting_Gear.e_defects AS Expr7,
dbo.tbl_Lifting_Gear.e_defects_desc AS Expr8,
dbo.tbl_Lifting_Gear.e_defects_date AS Expr9,
dbo.tbl_equipment.equipment_id,
dbo.tbl_equipment.e_contract_no,
dbo.tbl_equipment.slID,
dbo.tbl_equipment.e_entered_by,
dbo.tbl_equipment.e_serial,
dbo.tbl_equipment.e_model,
dbo.tbl_equipment.e_description,
dbo.tbl_equipment.e_location_id,
dbo.tbl_equipment.e_owner_id,
dbo.tbl_equipment.e_department_id,
dbo.tbl_equipment.e_manafacture_id,
dbo.tbl_equipment.e_manDate1,
dbo.tbl_equipment.e_manDate2,
dbo.tbl_equipment.e_manDate3,
dbo.tbl_equipment.e_dimensions,
dbo.tbl_equipment.e_test_no,
dbo.tbl_equipment.e_firstDate1,
dbo.tbl_equipment.e_firstDate2,
dbo.tbl_equipment.e_firstDate3,
dbo.tbl_equipment.e_prevDate1,
dbo.tbl_equipment.e_prevDate2,
dbo.tbl_equipment.e_prevDate3,
dbo.tbl_equipment.e_insp_frequency,
dbo.tbl_equipment.e_swl,
dbo.tbl_equipment.e_swl_unit_id,
dbo.tbl_equipment.e_swl_notes,
dbo.tbl_equipment.e_cat_id,
dbo.tbl_equipment.e_sub_id,
dbo.tbl_equipment.e_parent_id,
dbo.tbl_equipment.e_last_inspector,
dbo.tbl_equipment.e_last_company,
dbo.tbl_equipment.e_deleted AS Expr11,
dbo.tbl_equipment.e_deleted_desc AS Expr12,
dbo.tbl_equipment.e_deleted_date AS Expr13,
dbo.tbl_equipment.e_deleted_insp AS Expr14,
dbo.tbl_Lifting_Gear.e_defects_action AS Expr15,
dbo.tbl_equipment.e_rig_location,
dbo.tbl_Lifting_Gear.e_add_type AS Expr17,
dbo.tbl_Lifting_Gear.con_id,
dbo.tbl_Lifting_Gear.lifting_date,
dbo.tbl_Lifting_Gear.lifting_ref_no,
dbo.tbl_Lifting_Gear.e_id,
dbo.tbl_Lifting_Gear.inspector_id,
dbo.tbl_Lifting_Gear.lift_testCert,
dbo.tbl_Lifting_Gear.lift_rig_location,
dbo.tbl_Lifting_Gear.inspected,
dbo.tbl_Lifting_Gear.lifting_through,
dbo.tbl_Lifting_Gear.liftingNDT,
dbo.tbl_Lifting_Gear.liftingTest,
dbo.tbl_Lifting_Gear.e_defects,
dbo.tbl_Lifting_Gear.e_defects_desc,
dbo.tbl_Lifting_Gear.e_defects_date,
dbo.tbl_Lifting_Gear.e_defects_action,
dbo.tbl_Lifting_Gear.lift_department_id,
dbo.tbl_Lifting_Gear.lifting_loc
FROM dbo.tbl_equipment
INNER JOIN dbo.tbl_equip_swl_unit
ON dbo.tbl_equipment.e_swl_unit_id = dbo.tbl_equip_swl_unit.unit_id
INNER JOIN dbo.tbl_categories
ON dbo.tbl_equipment.e_cat_id = dbo.tbl_categories.category_id
INNER JOIN dbo.tbl_equip_depts
ON dbo.tbl_equipment.e_department_id = dbo.tbl_equip_depts.dept_id
INNER JOIN dbo.tbl_equip_man
ON dbo.tbl_equipment.e_manafacture_id = dbo.tbl_equip_man.man_id
INNER JOIN dbo.vwSubCategory
ON dbo.tbl_equipment.e_sub_id = dbo.vwSubCategory.category_id
INNER JOIN dbo.vwDescCategory
ON dbo.tbl_equipment.e_cat_id = dbo.vwDescCategory.category_id
INNER JOIN dbo.tbl_Lifting_Gear
ON dbo.tbl_equipment.equipment_id = dbo.tbl_Lifting_Gear.e_id
And here's the select statement with subquery that I am using:
SELECT *
FROM vw_LiftEquip
WHERE lifting_loc = ? AND
con_id = ? AND
EXPR11 =
'N'(
SELECT MAX(lifting_date) AS maxLift
FROM vw_LiftEquip
WHERE e_id = equipment_id
)
ORDER BY lifting_ref_no,
category_id,
e_swl,
e_serial
I get the error :
Column "vw_LiftEquip.category_id" is invalid in the ORDER BY clause because it is not contained in either an aggregate function or the GROUP BY clause.
Can't see why its returning that error, this is admittedly the first time I've ran a subquery on such a complex view, and I am a bit lost, thanks in advance for any help. I have looked through the similar posts and can find no answers to this one, sorry if I am just being dumb.
You are missing AND between EXPR11 = 'N' and (SELECT MAX(...
Otherwise, it looks OK. MAX without GROUP BY is allowed if you have no other columns in the SELECT
Update: #hvd also noted that you have nothing to compare to MAX(lifting_date). See comment
Update 2,
SELECT *
FROM vw_LiftEquip v1
CROSS JOIN
(
SELECT MAX(lifting_date) AS maxLift
FROM vw_LiftEquip
WHERE e_id = equipment_id
) v2
WHERE v1.lifting_loc = ? AND
v1.con_id = ? AND
v1.EXPR11 = 'N'
ORDER BY v1.lifting_ref_no,
v1.category_id,
v1.e_swl,
v1.e_serial

How to replace a NULL when a COUNT(*) returns NULL in DB2

I have a query:
SELECT A.AHSHMT AS SHIPMENT, A.AHVNAM AS VENDOR_NAME, D.UNITS_SHIPPED, D.ADPON AS PO, B.NUMBER_OF_CASES_ON_TRANSIT, C.NUMBER_OF_CASES_RECEIVED FROM AHASNF00 A
INNER JOIN (SELECT IDSHMT, COUNT(*) AS NUMBER_OF_CASES_ON_TRANSIT FROM IDCASE00 WHERE IDSTAT = '01' GROUP BY IDSHMT) B
ON (A.AHSHMT = B.IDSHMT)
LEFT JOIN (SELECT IDSHMT, (COUNT(*) AS NUMBER_OF_CASES_RECEIVED FROM IDCASE00 WHERE IDSTAT = '10' GROUP BY IDSHMT) C
ON (A.AHSHMT = C.IDSHMT)
INNER JOIN (SELECT ADSHMT, ADPON, SUM(ADUNSH) AS UNITS_SHIPPED FROM ADASNF00 GROUP BY ADSHMT, ADPON) D
ON (A.AHSHMT = D.ADSHMT)
WHERE A.AHSHMT = '540041134';
On the first JOIN statement I have a COUNT(*), on this count sometimes I will get NULL. I need to replace this with a "0-zero", I know think I know how to do it in SQL
ISNULL(COUNT(*), 0)
But this doesn't work for DB2, how can I accomplish this? All your help is really appreciate it.
Wrap a COALESCE around each of the nullable totals in your SELECT list:
SELECT A.AHSHMT AS SHIPMENT,
A.AHVNAM AS VENDOR_NAME,
COALESCE( D.UNITS_SHIPPED, 0 ) AS UNITS_SHIPPED,
D.ADPON AS PO,
COALESCE( B.NUMBER_OF_CASES_ON_TRANSIT, 0 ) AS NUMBER_OF_CASES_ON_TRANSIT,
COALESCE( C.NUMBER_OF_CASES_RECEIVED, 0 ) AS NUMBER_OF_CASES_RECEIVED
FROM ...
The inner joins you're using for expressions B and D mean that you will only receive rows from A that have one or more cases in transit (expression B) and have one or more POs in expression D. Is that the way you want your query to work?
Instead of using ISNULL(COUNT(*), 0),
try using COALESCE(COUNT(*),0)
use IFNULL(COUNT(*), 0) for DB2

SQL - Derived tables issue

I have the following SQL query:
SELECT VehicleRegistrations.ID, VehicleRegistrations.VehicleReg,
VehicleRegistrations.Phone, VehicleType.VehicleTypeDescription,
dt.ID AS 'CostID', dt.IVehHire, dt.FixedCostPerYear, dt.VehicleParts,
dt.MaintenancePerMile, dt.DateEffective
FROM VehicleRegistrations
INNER JOIN VehicleType ON VehicleRegistrations.VehicleType = VehicleType.ID
LEFT OUTER JOIN (SELECT TOP (1) ID, VehicleRegID, DateEffective, IVehHire,
FixedCostPerYear, VehicleParts, MaintenancePerMile
FROM VehicleFixedCosts
WHERE (DateEffective <= GETDATE())
ORDER BY DateEffective DESC) AS dt
ON dt.VehicleRegID = VehicleRegistrations.ID
What I basically want to do is always select the top 1 record from the 'VehicleFixedCosts' table, where the VehicleRegID matches the one in the main query. What is happening here is that it's selecting the top row before the join, so if the vehicle registration of the top row doesn't match the one we're joining to it returns nothing.
Any ideas? I really don't want to have use subselects for each of the columns I need to return
Try this:
SELECT vr.ID, vr.VehicleReg,
vr.Phone, VehicleType.VehicleTypeDescription,
dt.ID AS 'CostID', dt.IVehHire, dt.FixedCostPerYear, dt.VehicleParts,
dt.MaintenancePerMile, dt.DateEffective
FROM VehicleRegistrations vr
INNER JOIN VehicleType ON vr.VehicleType = VehicleType.ID
LEFT OUTER JOIN (
SELECT ID, VehicleRegID, DateEffective, IVehHire, FixedCostPerYear, VehicleParts, MaintenancePerMile
FROM VehicleFixedCosts vfc
JOIN (
select VehicleRegID, max(DateEffective) as DateEffective
from VehicleFixedCosts
where DateEffective <= getdate()
group by VehicleRegID
) t ON vfc.VehicleRegID = t.VehicleRegID and vfc.DateEffective = t.DateEffective
) AS dt
ON dt.VehicleRegID = vr.ID
Subquery underneath dt might need some grouping but without schema (and maybe sample data) it's hard to say which column should be involved in that.