SQL Query - Distinct doesn't seem to filter - sql

I'm utilizing four separate tables and i can't seem to figure out why my DISTINCT isn't filtering the results. I'm trying to get a single result for each acct.Name in this query. Regardless if i use DISTINCT or not, i get the exact same results.
Select DISTINCT
acct.Name,
inv.InvoiceNumber,
acct.AccountNumber,
addr.Line1,
addr.Line2,
addr.Line3,
addr.City,
addr.StateOrProvince,
addr.postalcode
FROM InvoiceBase inv, AccountBase acct
JOIN AccountExtensionBase base
ON base.AccountId = acct.AccountId
JOIN CustomerAddressBase addr
ON addr.ParentId = acct.AccountId
WHERE
inv.AccountId=acct.AccountId And
base.New_cocat_master = 1 And
base.New_CompanyId = 1 And
inv.StateCode = 0 And
inv.Name = '2013 ' + acct.AccountNumber
ORDER by acct.Name
The first result i get now has the first three values (acct.Name, inv.InvoiceNumnber, acct.AccountNumber) and the rest of the columns are blank. The second row has all of the columns with the information. I'm just trying to make the acct.Name to be DISTINCT

The rows are DISTINCT
This may be confusing when you are selecting multiple strings, since there might be hidden characters/spaces. Select the length of each one of those fields and compare the so called duplicate rows.

Turns out all i had to do was add in a simple clause in the WHERE, since the address is required for a valid invoice (where to send it):
WHERE
base.New_cocat_master = 1 And
base.New_CompanyId = 1 And
inv.StateCode = 0 And
inv.Name = '2013 ' + acct.AccountNumber And
addr.Line1 IS NOT NULL

Related

I have info on cycle counts in a warehouse that count the same locations multiple times. I want to get the latest NET_VAR for a specific location

I have used MAX(Date) and that will get me what i need until i put the qty in the mix, since they get different results after they fix things it has multiple answers and makes me group by the qty which in the end gives me multiple results. i just want the last count numbers.
SELECT (CH.MOD_DATE_TIME),LH.LOCN_BRCD ,DSP_SKU, (CH.ACTL_INVN_QTY-CH.EXPTD_QTY) "NET VAR" FROM CYCLE_COUNT_HIST CH , LOCN_HDR LH, ITEM_MASTER IM WHERE CH.WHSE = 'SH1' AND CH.LOCN_ID = LH.LOCN_ID AND CH.SKU_ID = IM.SKU_ID AND IM.CD_MASTER_ID = '147001' and DSP_SKU LIKE 'JBLBAR31BLKAM' AND LH.LOCN_BRCD = 'HAHK42A01' AND trunc(CH.CREATE_DATE_TIME) > SYSDATE-120
It returns 3 rows of results and I want the most recent line only. I plan to modify this to (select dsp_sku, sum(NET_VAR) in the end to run a summary of the sku.
I think you can use subquery.
You just need to put the following condition in where clause:
where .....
AND
CH.MOD_DATE_TIME = (select MAX( MOD_DATE_TIME)
from cycle_count_hist)

Is hiding all rows of duplicated data possible?

I am building a report using a SQL Stored Parameter that I created. It pulls data that may have 1 line per unique number, or 2 lines per unique number. I want to only show rows where there is only 1 line.
I have tried changing visibility, but the closest I have come to is
=IIF(Fields!uniquenumber.Value = previous(Fields!uniquenumber.value,True,False)
but this hides 1 of the rows, and not both. I have also tried using CASE WHEN in the query to identify if there are more than 1 line, but I still haven't been able to hide when there is more than 1 line. My query is below (redacted quite a few lines of extra criteria that aren't relevant to my question here. There are quite a few instances where the first part of my WHERE statement will have data that also meets the criteria of the 2nd part.
SELECT
Reorders.LastRxNo,
Reorders_1.LastRxNo AS Reorders1LastRxNo,
Rxs_2.RxBatch AS Rxs2RxBatch,
Reorders.FacID,
KeyIdentifiers.GPI,
Reorders.LastFillDt,
Rxs_1.RxBatch as Rxs1RxBatch,
CASE
WHEN (Reorders.LastRxNo<>Reorders_1.LastRxNo) THEN 1
ELSE 0
END AS Duplicate
FROM Reorders
LEFT OUTER JOIN Rxs AS Rxs_1
ON Reorders.LastRXNo = Rxs_1.RxNo
RIGHT OUTER JOIN KeyIdentifiers
ON Reorders.NDC = KeyIdentifiers.NDC
INNER JOIN Patients
ON Reorders.FacID = Patients.FacID
AND Reorders.PatID = Patients.PatID
LEFT OUTER JOIN Reorders AS Reorders_1
ON Reorders.FacID = Reorders_1.FacID
AND Reorders.PatID = Reorders_1.PatID
AND KeyIdentifiers.NDC = Reorders_1.NDC
LEFT OUTER JOIN Rxs AS Rxs_2
ON Reorders_1.LastRxNo = Rxs_2.RxNo
WHERE
Reorders.ProfileOnly = 1
AND Rxs_1.RxBatch IS NULL
AND Reorders.CutOffDt IS NULL
AND Reorders.PackType LIKE 'PHDEF%'
AND Reorders.Auto = 1
AND Reorders.PhRxStatus IS NULL
AND Reorders.LastFillDt > '01/01/2019'
AND (Reorders.LastRxNo = Reorders_1.LastRxNo
OR Reorders.LastRxNo <> Reorders_1.LastRxNo AND Rxs_2.RxBatch IN ('CF','GONE'))
ORDER BY Reorders.FacID, Patients.PatLName, Patients.PatFName
Yields Results Similar to:
LastRxNo Reorders1LastRxNo Rxs2RxBatch Duplicate
111 111 null 0
222 222 null 0
222 444 CF 1
In the above example, I only need to see the lines like line 1. Since "LastRxNo" 222 has 2 lines, I do not want to see it on my final report. (However, it is queried, because when LastRxNo =222 AND Reorders1LastRxNo = 222 and Rxs2RxBatch IS NULL would pull if I didn't account for it in my WHERE statement, I think. I am happy doing this any way possible, whether it be in the query or in SSRS report.
Replace your CASE ... as duplicate with
COUNT(*) OVER(PARTITION BY LastRxNo) AS LastRxNoCount
then
wrap the whole query in another select like this...
SELECT * FROM
(
your original query here including the change stated above
) q
WHERE q.LastRxNoCount =1
Just filter out all rows where the number of rows is greater than 1:
SELECT LastRxNo, RxBatch
FROM Reorders
WHERE RxBatch IN ('CF','GONE')
AND Reorders.LastRxNo IN
(SELECT LastRxNo FROM Reorders GROUP BY LastRxNo HAVING COUNT(LastRxNo) = 1)
You should just be able to add the last two lines into your WHERE clause.
Alternatively, you could add it to your JOIN conditions:
SELECT
Reorders.LastRxNo
FROM Reorders
INNER JOIN (
SELECT LastRxNo FROM Reorders
GROUP BY LastRxNo
HAVING COUNT(LastRxNo) = 1
) AS UniqueRxNo ON UniqueRxNo.LastRxNo = Reorders.LastRxNo
Yes, you can do it one of 2 ways:
is in your data source, do a distinct select.
Set the "Hide Duplicates" property on the detail line in the report properties

Identify Duplicate Invoice number with prefix or suffixe

I'm using access 2013 and trying to identify duplicate payments made to vendors. I use the SQL query below to identify different type of duplicates but it is not giving desired results as sometimes two criteria are different like invoice number and invoice date.
SELECT
Base.ID AS SerialNumber,
Base.CoCd AS CoCode,
Base.DocumentNo AS DocID,
Base.ClrngdocNo AS ClearingDoc,
Base.DocumentType AS DocType,
Base.Account AS VendorName,
Base.Reference AS InvoiceNumber,
Base.DocumentDate AS InvoiceDate,
Base.GrossInvoiceAmount AS InvAmount
FROM RawData2017TillDate AS Base
INNER JOIN RawData2017TillDate AS duplicate
ON (Base.ID <> duplicate.ID)
AND (Base.Account = duplicate.Account)
AND (Base.Reference <> duplicate.Reference)
AND (Base.DocumentDate = duplicate.DocumentDate)
AND (Base.GrossInvoiceAmount = duplicate.GrossInvoiceAmount)
ORDER BY Base.GrossInvoiceAmount DESC , Base.reference DESC;
I just want single query to identify duplicate with one or more characters added at the begining or at the end of invoice number like examples below
2713565
2713565R,
01456
1456,
I-0001118588
1118588
Also, if I could get a better query to identify duplicates based on other criteria will be appreciated. I am looking for a single query for all criteria.
Thanks in advance!
Please try like below:
I grouped invoice numbers based on their first 2 letters.
SELECT
Mid(Base.reference, 1, 2) , count(1)
FROM RawData2017TillDate AS Base
INNER JOIN RawData2017TillDate AS duplicate
ON (Base.ID <> duplicate.ID)
AND (Base.Account = duplicate.Account)
AND (Base.Reference <> duplicate.Reference)
AND (Base.DocumentDate = duplicate.DocumentDate)
AND (Base.GrossInvoiceAmount = duplicate.GrossInvoiceAmount)
group by Mid(Base.reference, 1, 2) ;

How can I do a SQL join to get a value 4 tables farther from the value provided?

My title is probably not very clear, so I made a little schema to explain what I'm trying to achieve. The xxxx_uid labels are foreign keys linking two tables.
Goal: Retrieve a column from the grids table by giving a proj_uid value.
I'm not very good with SQL joins and I don't know how to build a single query that will achieve that.
Actually, I'm doing 3 queries to perform the operation:
1) This gives me a res_uid to work with:
select res_uid from results where results.proj_uid = VALUE order by res_uid asc limit 1"
2) This gives me a rec_uid to work with:
select rec_uid from receptor_results
inner join results on results.res_uid = receptor_results.res_uid
where receptor_results.res_uid = res_uid_VALUE order by rec_uid asc limit 1
3) Get the grid column I want from the grids table:
select grid_name from grids
inner join receptors on receptors.grid_uid = grids.grid_uid
where receptors.rec_uid = rec_uid_VALUE;
Is it possible to perform a single SQL that will give me the same results the 3 I'm actually doing ?
You're not limited to one JOIN in a query:
select grids.grid_name
from grids
inner join receptors
on receptors.grid_uid = grids.grid_uid
inner join receptor_results
on receptor_results.rec_uid = receptors.rec_uid
inner join results
on results.res_uid = receptor_results.res_uid
where results.proj_uid = VALUE;
select g.grid_name
from results r
join resceptor_results rr on r.res_uid = rr.res_uid
join receptors rec on rec.rec_uid = rr.rec_uid
join grids g on g.grid_uid = rec.grid_uid
where r.proj_uid = VALUE
a small note about names, typically in sql the table is named for a single item not the group. thus "result" not "results" and "receptor" not "receptors" etc. As you work with sql this will make sense and names like you have will seem strange. Also, one less character to type!

Count of how many times id occurs in table SQL regexp

Hi I have a redshift table of articles that has a field on it that can contain many accounts. So there is a one to many relationship between articles to accounts.
However I want to create a new view where it lists the partner id's in one column and in another column a count of how many times the partner id appears in the articles table.
I've attempted to do this using regex and created a new redshift view, but am getting weird results where it doesn't always build properly. So one day it will say a partner appears 15 times, then the next 17, then the next 15, when the partner id count hasn't actually changed.
Any help would be greatly appreciated.
SELECT partner_id,
COUNT(DISTINCT id)
FROM (SELECT id,
partner_ids,
SPLIT_PART(partner_ids,',',i) partner_id
FROM positron_articles a
LEFT JOIN util.seq_0_to_500 s
ON s.i < regexp_count (partner_ids,',') + 2
OR s.i = 1
WHERE i > 0
AND regexp_count (partner_ids,',') = 0
ORDER BY id)
GROUP BY 1;
Let's start with some of the more obvious things and see if we can start to glean other information.
Next GROUP BY 1 on your outer query needs to be GROUP BY partner_id.
Next you don't need an order by in your INNER query and the database engine will probably do a better job optimizing performance without it so remove ORDER BY id.
If you want your final results to be ordered then add an ORDER BY partner_id or similar clause after your group by of your OUTER query.
It looks like there are also problems with how you are splitting a partnerid from partnerids but I am not positive about that because I need to understand your view and the data it provides to know how that affects your record count for partnerid.
Next your LEFT JOIN statement on the util.seq_0_to_500 I am pretty sure you can drop off the s.i = 1 as the first condition will satisfy that as well because 2 is greater than 1. However your left join really acts more like an inner join because you then exclude any non matches from positron_articles that don't have a s.i > 0.
Oddly then your entire join and inner query gets kind of discarded because you only want articles that have no commas in their partnerids: regexp_count (partner_ids,',') = 0
I would suggest posting the code for your util.seq_0_to_500 and if you have a partner table let use know about that as well because you can probably get your answer a lot easier with that additional table depending on how regexp_count works. I suspect regex_count(partnerids,partnerid) exampleregex_count('12345,678',1234) will return greater than 0 at which point you have no choice but to split the delimited strings into another table before counting or building a new matching function.
If regex_count only matches exact between commas and you have a partner table your query could be as easy as this:
SELECT
p.partner_id
,COUNT(a.id) AS ArticlesAppearedIn
FROM
positron_articles a
LEFT JOIN PARTNERTABLE p
ON regexp_count(a.partnerids,p.partnerid) > 0
GROUP BY
p.partner_id
I will actually correct myself as I just thought of a way to join a partner table without regexp_count. So if you have a partner table this might work for you. If not you will need to split strings. It basically tests to see if the partnerid is the entire partnerids, at the beginning, in the middle, or at the end of partnerids. If one of those is met then the records is returned.
SELECT
p.partner_id
,COUNT(a.id) AS ArticlesAppearedIn
FROM
PARTNERTABLE p
INNER JOIN positron_articles a
ON
(
CASE
WHEN a.partnerids = CAST(p.partnerid AS VARCHAR(100)) THEN 1
WHEN a.partnerids LIKE p.partnerid + ',%' THEN 1
WHEN a.partnerids LIKE '%,' + p.partnerid + ',%' THEN 1
WHEN a.partnerids LIKE '%,' + p.partnerid THEN 1
ELSE 0
END
) = 1
GROUP BY
p.partner_id