Many Duplicates, caused by a phone number column. Need to cut down duplicates

Many Duplicates, caused by a phone number column. Need to cut down duplicates - sql

See query below returning approx 38K rows. When 'phone' join and column are removed, it cuts down to the correct 15.5K rows.
SELECT
tc.customer_no
,fdn.display_name_short 'name'
,tc.cont_amt
,tc.ref_no
,tc.cont_dt
,tc.cont_type
,tca.fyear
,(ISNULL(street1, 'none') + ' ' + ISNULL(city, 'none') + ' ' + ISNULL(state, 'none')
+ ', ' + ISNULL(postal_code, 'none')) 'address'
,ISNULL(tp.phone, 'none')
,ISNULL(te.address, 'none')
FROM T_CONTRIBUTION tc
JOIN FT_CONSTITUENT_DISPLAY_NAME() fdn
ON tc.customer_no = fdn.customer_no
JOIN T_CAMPAIGN tca
ON tc.campaign_no = tca.campaign_no
LEFT JOIN T_ADDRESS ta
ON tc.customer_no = ta.customer_no AND ta.primary_ind = 'y'
LEFT JOIN T_EADDRESS te
ON tc.customer_no = te.customer_no AND te.primary_ind = 'y'
LEFT JOIN T_PHONE tp
ON tc.customer_no = tp.customer_no
WHERE tca.fyear BETWEEN 2018 AND 2022
AND tc.cont_amt > 0
AND te.inactive = 'N'
AND ta.inactive = 'N'
Any advice as to how i can include the phone number column, while eliminating as many duplicates as possible? I don't have to be highly precise with this query, but need to get the row count down as much as possible. The phone table has about 50 different phone types (ex. 1,2,or 22), and the PK is the phone number. The DB has since moved to using only phone type 1 or 2, but i am searching 4 yrs back which is before they switched to only using two different phone types.

Followed suggestions in comments, ended up with:
CTE to create numbered and grouped rows
WITH cte AS (
SELECT customer_no, phone
, row_number() OVER(PARTITION BY customer_no ORDER BY phone) AS rn
FROM T_PHONE
)
Then referenced said cte in the main query's select.
Finally added
WHERE cte.rn = 1
Which selected the first phone number at random, in each group of customer's phones numbers.

Related

How to Grab Specific Row info?

The below is an example of what will output when you run the query open: select A.DispatchNote, A.MStockCode, A.NComment
from MdnMaster
MdnMaster.DispatchNote
MdnMaster.MStockCode
MdnMaster.NComment
12345/001
CAL2-01234-010-50L
12345/001
FREIGHT
12345/001
1 Parcel
12345/001
Trk# 1Z8R9V80013141323 - 5 lb
12345/001
Trk#: 1Z8R9V900381868191 -- 18 lb
12345/001
SHP 21401
12345/002
CAL3-0121-020-50L
12345/002
FREIGHT
12345/002
2 Parcels
12345/002
Trk# 1Z8R9V80013141323 - 5 lb
12345/002
Trk#: 1Z8R9V900381868191 -- 18 lb
12345/002
SHP 2140
I'm trying to do a query that'll grab just the first tracking number in the list. and ignore the second (or sometimes third they have)
The database has blank NComment lines when there's an MStockCode, and then the MStockCode lines are blank for every NComment line so I don't know what I'm doing.
What I have so far:
SELECT
m.DispatchNote,
MAX(d.MStockCode) as StockCode,
MAX(case when d.NComment like 'Trk%' then d.NComment end) as NComment,
MAX(m.CustomerPoNumber) as CustomerPO
FROM MdnMaster AS m
LEFT OUTER JOIN MdnDetail AS d on m.DispatchNote = d.DispatchNote
AND (d.NComment LIKE 'Trk%' OR d.MStockCode is not null)
and m.Customer = 'LAWSON'
and d.MLineShipDate =
case
when datepart(weekday, getdate() -1) = '7'
then DATEADD(hh,0,dateadd(DAY, datediff(day, 0, getdate()),-2)) -- if yesterday was Saturday, set to Friday
when datepart(weekday, getdate() -1) = '1'
then DATEADD(hh,0,dateadd(DAY, datediff(day, 0, getdate()),-3)) -- if yesterday was Sunday, set to Friday
else DATEADD(hh,0,dateadd(DAY, datediff(day, 0, getdate()),-1))
end
GROUP BY m.DispatchNote
My issue is that it gives me nothing since I only know how to ask it explicitly that I want the lines that aren't blank. How do I fix it?
EDIT: I should mention that all of the information comes from the MdnMaster Table (which is A) and MLineShipDate will come from B (MdnDetail). I omitted that information because I didn't think it was pertinent to the question at hand.
An example of what I want to see FROM above:
MdnMaster.DispatchNote
MdnMaster.MStockCode
MdnMaster.NComment
12345/001
CAL2-01234-010-50L
Trk# 1Z8R9V80013141323 - 5 lb

Here's a quick way to get some results. Hopefully, it will set you on the right path.
I'm assuming you can specify a second column to determine the order of the comments. Replace all instances of Line below with the actual column name.
Select
m1.DispatchNote,
m3.MStockCode,
m1.NComment
From
MdnMaster m1
Inner Join (
Select DispatchNote, Min(Line) as Line
From MdnMaster
Where NComment like 'Trk%'
Group by DispatchNote ) m2
on m1.DispatchNote = m2.DispatchNote and m1.Line = m2.Line
Inner Join (
Select DispatchNote, Max(MStockCode) as MStockCode
From MdnMaster
Group by DispatchNote ) m3
on m1.DispatchNote = m3.DispatchNote

One approach use a cross apply together with select top 1 to retrieve the tracking number.
select M.DispatchNote, M.MStockCode, TRK.NComment
from MdnMaster M
cross apply (
select top 1 M2.NComment
from MdnMaster M2
where M2.DispatchNote = M.DispatchNote
and M2.NComment LIKE 'Trk# %'
-- order by ?
) TRK
where M.MStockCode <> ''
Another approach is to join to a subselect that selects all tracking numbers and assigns row numbers withing each group. The final select would limit itself to those tracking numbers where row number = 1.
select M.DispatchNote, M.MStockCode, TRK.NComment
from MdnMaster M
join (
select M2.DispatchNote, M2.NComment,
row_number() OVER(PARTITION BY M2.DispatchNote order by (select null)) as RN
from MdnMaster M2
where M2.NComment LIKE 'Trk# %'
) TRK ON TRK.DispatchNote = M.DispatchNote
where M.MStockCode <> ''
and TRK.RN = 1
See this db<>fiddle for examples of both.
If there is a chance that there is no tracking number, but you still want to include the other results, change cross apply to outer apply in the first query, or the join to a left join in the second. A cross apply is like an inner join to a subselect, while an outer apply is like a left join.
If you have criteria that prefers one tracking number over another, include it in the order by clause of the subselect in the first query, or replace the order by (select null) placeholder clause in the second. Otherwise, an arbitrary tracking number will be selected.

TSQL - Multiple Values in INSERT (because of joins)

Im trying to insert a data from one database to another. This is what i have so far, on the select side:
USE [db2]
SELECT
sP.pers_FirstName
,sp.pers_LastName
,sPH.Phon_Number
,CASE WHEN LEFT(sPH.Phon_Number, 2) = '04' THEN sPH.Phon_number ELSE NULL END
,CASE WHEN sp.pers_gender = 1 THEN 'M' WHEN sp.pers_gender = 2 THEN 'F' ELSE 'U' END
,CASE
WHEN sP.pers_salutation = '10' THEN 8
WHEN sp.pers_salutation = '6' THEN 2
WHEN sp.pers_salutation = '7' THEN 1
WHEN sp.pers_salutation = '8' THEN 4
WHEN sp.pers_salutation = '9' THEN 5
WHEN sp.pers_salutation = 'APROF' THEN 6
WHEN sp.pers_salutation = 'Ms.' THEN 4
WHEN sp.pers_salutation = 'PROF' THEN 6
END
,sp.pers_dob
,sp.pers_CreatedDate
,sp.pers_UpdatedDate
,'Candidate'
,1
,e.Emai_EmailAddress
,sP.pers_personID
FROM [db1].dbo.person sP
LEFT JOIN [db1].dbo.PhoneLink sPL ON sp.pers_personID = sPL.PLink_recordID
LEFT JOIN [db1].dbo.Phone sPH ON sPL.PLink_PhoneId = sPH.Phon_PhoneID
LEFT JOIN [db1].dbo.EmailLink eL ON sP.pers_personID = eL.ELink_RecordID
LEFT JOIN [db1].dbo.Email e ON eL.Elink_EmailID = e.Emai_EmailID
WHERE
(
sP.pers_employedby NOT IN (
'Aspen'
,'ACH'
)
)
OR
(
sP.pers_employedby IN (
'Aspen'
,'ACH'
)
AND sP.pers_personID NOT IN (
SELECT c.oppo_PrimaryPersonID FROM [SageCRM].dbo.Opportunity c
WHERE (c.oppo_contractcompleted <= '2016-01-01' OR c.oppo_contractterminated <= '2016-01-01') and c.Oppo_Deleted is null)
AND
sp.pers_isanemployee != 'ECHO'
AND sP.pers_personID IN (
SELECT c.oppo_PrimaryPersonID FROM [SageCRM].dbo.Opportunity c
WHERE c.oppo_Status != 'In Progress' OR c.oppo_Status = 'Completed')
AND sP.pers_dod IS NULL
AND sP.pers_FirstName NOT LIKE '%test%'
AND sP.pers_LastName NOT LIKE '%test%'
AND sp.pers_isanemployee != 'SalesContact'
)
Due to the fact that each person record can have multiple phone numbers linked to them, i end up with multiple records for each person, which obviously wont work as i will end up with duplicates when i actually insert the data.
The problem is, that i need to have all of the phone numbers for each record, just displayed in a different field (home phone, work phone, mobile phone).
Any Ideas, other than doing this in a separate insert statement for each phone / email link?
-------- EDIT: -----------------------------------------------------------------
Ok so, my bad for not giving you enough information. Both of your answers were good answers so thanks for that (#Horaciux, #John Wu).
However, there is no phoneType column, just a phone number. That being said, since every mobile starts with 04 and every home phone with anything else, i can pretty easily distinguish between the two phone types.
There are duplicates in the phone table though, so i will have to delete these, most likely via CTE, shouldn't be too hard.
So, i will end up with something like this for the two phone numbers:
SELECT (phon_number FROM phone p INNER JOIN PhoneLink p1 on p1.PhoneLinkID = p.PhoneLink WHERE LEFT(p.Phon_Number, 2) = '04')
SELECT (phon_number FROM phone p INNER JOIN PhoneLink p1 on p1.PhoneLinkID = p.PhoneLink WHERE LEFT(p.Phon_Number, 2) != '04')
My duplicate removal will be something like this:
WITH CTE AS
(
SELECT phon_linkID, phon_phonNumber, ROW_NUMBER() OVER (PARTITION BY phon_phonNumber ORDER BY phon_linkID) AS RN
FROM phone
)
DELETE FROM CTE WHERE RN<>1

Two easy steps.
Get rid of the joins to the phone number table.
Lookup the phone numbers per record by using a subquery in the select clause, one for each type of phone. Example
SELECT sP.pers_FirstName,
sP.pers_LastName,
(SELECT Phon_Number FROM Phone p JOIN PhoneLink pl ON pl.PhoneLinkID = p.PhoneLinkID WHERE pl.Person_ID = sP.pers_personID AND pl.Type = 'WORK') WorkPhone,
(SELECT Phon_Number FROM Phone p JOIN PhoneLink pl ON pl.PhoneLinkID = p.PhoneLinkID WHERE pl.Person_ID = sP.pers_personID AND pl.Type = 'HOME') HomePhone
FROM person

Without knowing your table's structure, I'll do some example.
select person.id,
max(case when phone.type='home' then phone.vlaue else 0 end) 'home',
max(case when phone.type='work' then phone.vlaue else 0 end) 'work'
from person,phone where...
group by person.id
Then use this query to join all other tables needed

Return First 4 Rows, then Repeat for Grouping

I am trying to return data that will ultimately populate a label.
Each label is going onto a box, and the box can only have 4 items in it.
If a delivery has more than 4 items, then I need one label per 4.
Each row of data returned will populate one label, so if the delivery contains 9 items, then I need 3 rows of data returned.
Below is my current query, which is returning all items into a comma separated value using Stuff.
I want it so the first 4 rows for the delivery return in the first row, then the next 4 in the second and so on.
My Field LineOrd returns correctly if there are more than 4 lines on the dispatch.
select Distinct
delivery_header.dh_datetime,
delivery_header.dh_number,
order_header.oh_order_number as 'Order No',
order_header_detail.ohd_delivery_name,
order_header_detail.ohd_delivery_address1,
order_header_detail.ohd_delivery_address2,
order_header_detail.ohd_delivery_address3,
order_header_detail.ohd_delivery_town,
order_header_detail.ohd_delivery_county,
order_header_detail.ohd_delivery_postcode,
order_header_detail.ohd_delivery_country,
STUFF((Select ', '+convert(varchar(50),convert(decimal(8,0),DL.dli_qty))+'x '+OLI.oli_description
from delivery_header DH join delivery_line_item DL on DL.dli_dh_id = DH.dh_id join order_line_item OLI on OLI.oli_id = DL.dli_oli_id
Outer APPLY
(select
case when DelCurLine.CurLine <= 4
then '1'
Else
Case when DelCurLine.CurLine <= 8
then '2'
Else '3'
End
End +'-'+order_header.oh_order_number as LineOrd) as StuffLineOrder
Where DH.dh_id = delivery_header.dh_id And StuffLineOrder.LineOrd = LineOrder.LineOrd
FOR XML PATH('')),1,1,'') as Items,
LineOrder.LineOrd
from delivery_header
join delivery_line_item on delivery_line_item.dli_dh_id = delivery_header.dh_id
join order_line_item on order_line_item.oli_id = delivery_line_item.dli_oli_id
join order_header on order_header.oh_id = order_line_item.oli_oh_id
join order_header_detail on order_header_detail.ohd_oh_id = order_header.oh_id
join variant_detail on variant_detail.vad_id = order_line_item.oli_vad_id
join stock_location on stock_location.sl_id = order_line_item.oli_sl_id
Outer APPLY
(select count(DLI.dli_id) CurLine from delivery_line_item DLI where DLI.dli_dh_id = delivery_header.dh_id and DLI.dli_id <= delivery_line_item.dli_id)
as DelCurLine
Outer APPLY
(select
case when DelCurLine.CurLine <= 4
then '1'
Else
Case when DelCurLine.CurLine <= 8
then '2'
Else '3'
End
End +'-'+order_header.oh_order_number as LineOrd) as LineOrder
Outer APPLY
(select convert(varchar(50),convert(decimal(8,0),delivery_line_item.dli_qty))+'x '+order_line_item.oli_description as LineName) as LineName
where
delivery_header.dh_datetime between #DateFrom and #DateTo
and stock_location.sl_id = #StockLoc
and (order_header.oh_order_number = #OrderNo or #AllOrder = 1)
order by
delivery_header.dh_datetime,
delivery_header.dh_number,
order_header.oh_order_number,
order_header_detail.ohd_delivery_name,
order_header_detail.ohd_delivery_address1,
order_header_detail.ohd_delivery_address2,
order_header_detail.ohd_delivery_address3,
order_header_detail.ohd_delivery_town,
order_header_detail.ohd_delivery_county,
order_header_detail.ohd_delivery_postcode,
order_header_detail.ohd_delivery_country

You can use ROW_NUMBER() with a division by 4. This truncate the decimal because numerator is an interger. This give you group number with a maximum of four row in each group. You can then adjust your query to use this group number in a "group by" clause to return grouped rows into a single one.
Exemple here :
SELECT RawData.BoxGroup,
MIN(dh_datetime),
MIN(dh_number),
MIN(order_header.oh_order_number) as 'Order No'
--And so on
FROM
(SELECT BoxGroup = (ROW_NUMBER() OVER(ORDER BY (SELECT 1)) - 1) / 4,
*
FROM [TableNameOrQuery]) AS RawData
GROUP BY RawData.BoxGroup
Hope this help.

Check special character in sql view

Hi i am in fix for a particular scenario, i have a view which is created by joining multiple tables, the requirement is that i have to find whether the columns in that view will return special character.
SELECT SI.ShipmentId,
CASE
WHEN SA.AddressType = 1 THEN 'SH'
ELSE 'CN'
END AS AddressType,
SI.Pieces,
SI.PalletCount,
SI.Weight,
SI.UserDescription,
SI.Class,
SA.CompanyName,
SA.Street,
SA.City,
SA.State,
SA.ZipCode,
CASE
WHEN SA.Country = 1 THEN 'USA'
WHEN SA.Country = 2 THEN 'CANADA'
END AS Country,
SA.ContactPerson,
Cast(Replace(Replace(Replace(Replace(SA.Phone, ')', ''), '(', ''), '-', ''), ' ', '') AS VARCHAR(25)) AS Phone,
S.PoNo,
S.EstimatedDueDate,
Cast(S.ShipmentReadyTime AS VARCHAR(10)) AS ShipmentReadyTime,
Cast(S.ShipmentCloseTime AS VARCHAR(10)) AS ShipmentCloseTime,
B.BOLNumber,
S.HazMatEmergencyNo
FROM CarrierRate.Shipment AS S
INNER JOIN CarrierRate.BOL AS B
ON B.ShipmentId = S.ID
INNER JOIN CarrierRate.ShipmentItems AS SI
ON SI.ShipmentId = S.ID
INNER JOIN CarrierRate.ShipmentAddresses AS SA
ON SA.ShipmentId = S.ID
INNER JOIN CarrierRate.Carriers AS C
ON C.ID = S.CarrierId
WHERE ( SI.AccessorialId = 1 )
AND ( SA.AddressType IN ( 1, 2 ) )
This is that view, i just want to know what all columns will have special character as its data.
For E.g: i have SA.CompanyName as one of the column, i have to check whether that column can be filled with special characters?
Please let me know the probable solutions, i am clueless.

You need to look at the column types of the tables/columns/expressions which contribute to the columns of the view you are interested in. Assuming they are defined as some form of text or varchar then they CAN contain special characters, unless there exists some form of constraint on those columns/tables.

Records repeat for SSRS 2005 report

When I run a report for a purchase order, the report duplicates records for product codes.
For example the purchase order is: P000976, the report display the product code twice when it should only appear once. 45-5540 appears twice.
P000976 09-17-2012 15,040.00 15,040.00 0.00
45-5540 "Lordotic Cervical Spacer 10mm
Lordotic Cervical Spacer 10mm" 20 20 0
45-5540 "Lordotic Cervical Spacer 10mm
Lordotic Cervical Spacer 10mm" 20 20 0
When I put the report's SQL in SQL server and run the sql by seeing where the code cause the additional product code it is this line within the SQL:
join all_product_codes_VW p on q.distpartno = p.distpartno
select q.specialrequirement
, q.distpartno
, q.toproduce
, q.prodbegindate
, q.distributor
, rc.report_category_name
, s.productperpo
, r.ebi_released
, w.ebi_in_WIP
, p.distproductname
, tp.typeprefixdetail
, tp.cost
, '1' as ReportTotals
from all_required_vw q
left join all_shipped_grafts_new_VW s on (q.distpartno = s.distpartno and q.specialrequirement = s.ponumber)
left join all_released_Grafts_VW r on q.distpartno = r.distpartno
left join all_in_WIP_VW w on q.distpartno = w.distpartno
join all_product_codes_VW p on q.distpartno = p.distpartno
join setup_tissue_prefix tp on q.typenumber = tp.typeprefix
join setup_report_category_1 rc on q.distributor = rc.report_category_id
where q.prodbegindate < #enddate
and q.completed = '0'
and rc.report_category_name like '%' + isnull(#tcustomer, '') + '%'
order by q.prodbegindate, p.distproductname
This is the SQL for the view for which the join creates the duplicate.
SELECT COUNT_BIG(*) AS BIG, DistPartNo, DistProductName, Distributor, UMTBProductCode
FROM dbo.Setup_Distributor_Product_info
WHERE (Distributor <> '7') OR (Distributor IS NULL)
GROUP BY DistPartNo, DistProductName, Distributor, USSAProductCode

If you comment out these lines
--, p.distproductname
--join all_product_codes_VW p on q.distpartno = p.distpartno
Does the query return single rows for each distpartno? If yes then you're right that the all_products_code_VW view is causing the multiple rows.
Run these two queries and look at how many rows are in each and it will give you a clue as to why this is:
Select * from all_required_vw where distpartno = '45-5540'
Select * from all_product_codes_VW where distpartno = '45-5540'
My guess is that joining on only the distpartno is not enough to give you unique results. There might be more than one distributor for the same part, or multiple productnames for the same part number, e.g. different distrubutors using the the same part number with different products.

Possible this GROUP BY clause
GROUP BY DistPartNo, DistProductName, Distributor, USSAProductCode
need replace on this
GROUP BY DistPartNo, DistProductName, Distributor, UMTBProductCode

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Many Duplicates, caused by a phone number column. Need to cut down duplicates - sql

Related

How to Grab Specific Row info?

TSQL - Multiple Values in INSERT (because of joins)

Return First 4 Rows, then Repeat for Grouping

Check special character in sql view

Records repeat for SSRS 2005 report

Categories

Resources