Deduplicate table while keeping the version vers column is not null

Deduplicate table while keeping the version vers column is not null - sql

I have a table with 2 columns: an id and an associated label.
Example:
id1,label1
id1,null
id2,label2
id3,null
I would like to deduplicate on the 1st column to keep the version where the label column is not null. But if the id appears only once, I want to keep the line no matter what, even though the label is null.
The output I would want with the example is:
id1,label1
id2,label2
id3,null
How can I do this ?

Consider below (BigQuery)
select *
from your_table
qualify 1 = row_number() over(partition by id order by label desc)
if applied to sample data in your question - output is

Use EXISTS (see demo)
delete
from <table1> d1
where d1.label is null
and exists ( select null
from <table1> d2
where d2.id = d1.id
and d2.label is not null
) ;
EXISTS always returns True or False if the subslect would have returned at least 1 row. In this case it returns True only when there is a matching row where label is not null.
Note: This does not work where there are multiple id each having a non-null label nor each having a null label. So would not work on tuples
('a',null) with ('a',null) nor the tuples
('a','l1') with ('a','l1')

Related

SQL Query : should return Single Record if Search Condition met, otherwise return Multiple Records

I have table with Billions of Records, Table structure is like :
ID NUMBER PRIMARY KEY,
MY_SEARCH_COLUMN NUMBER,
MY_SEARCH_COLUMN will have Numeric value upto 15 Digit in length.
What I want is, if any specific record is matched, I will have to get that matched value only,
i.e. : If I enter WHERE MY_SEARCH_COLUMN = 123454321 and table has value 123454321 then this only should be returned.
But if exact value is not matched, I will have to get next 10 values from the table.
i.e. : if I enter WHERE MY_SEARCH_COLUMN = 123454321 and column does not have the value 123454321 then it should return 10 values from the table which is greater than 123454321
Both the case should be covered in single SQL Query, and I have have to keep in mind the Performance of the Query. I have already created Index on the MY_SEARCH_COLUMN columns, so other suggestions are welcome to improve the Performance.

This could be tricky to do without using a proc or maybe some dynamic SQL, but we can try using ROW_NUMBER here:
WITH cte AS (
SELECT ID, MY_SEARCH_COLUMN,
ROW_NUMBER() OVER (ORDER BY MY_SEARCH_COLUMN) rn
FROM yourTable
WHERE MY_SEARCH_COLUMN >= 123454321
)
SELECT *
FROM cte
WHERE rn <= CASE WHEN EXISTS (SELECT 1 FROM yourTable WHERE MY_SEARCH_COLUMN = 123454321)
THEN 1
ELSE 10 END;
The basic idea of the above query is that we assign a row number to all records matching the target or greater. Then, we query using either a row number of 1, in case of an exact match, or all row numbers up to 10 in case of no match.

SELECT *
FROM your_table AS src
WHERE src.MY_SEARCH_COLUMN = CASE WHEN EXISTS (SELECT 1 FROM your_table AS src2 WITH(NOLOCK) WHERE src2.MY_SEARCH_COLUMN = 123456321)
THEN 123456321
ELSE src.MY_SEARCH_COLUMN
END

merge condition is not working in sql server

I have two tables one is (providerLoc) and another one is (tmpProviderLoc) I need to take three columns combination from tmpProviderLoc and need to check the records exist in ProviderLoc
Case 1 : If record exist in providerLoc i need to update another column(Npi) in providerLoc based on column (npi) in tmpProviderLoc
case 2 : if not exist i need to insert the values in providerLoc
for that I have written below query:
MERGE INTO [dbo].[ProviderLoc] AS PL
USING
(
select *
from (
select *,
row_number() over (partition by [Location_ID],[PProviderTaxID]
,[POBOXZIP] order by [Location_ID],[PProviderTaxID],[POBOXZIP]) as row_number
from [dbo].[TmpProviderLoc]
) as rows
where row_number = 1
) AS TPL
ON TPL.[Location_ID] = PL.[ecProviderID]
AND TPL.[PProviderTaxID] = PL.[TaxID]
AND TPL.[NPI] = PL.[NPI]
AND TPL.[POBOXZIP] = PL.[POBOXZIP]
WHEN MATCHED THEN
UPDATE SET PL.[NPI] = CASE
WHEN TPL.[NPI] = NULL THEN PL.[NPI]
ELSE TPL.[NPI]
END
WHEN NOT MATCHED THEN
INSERT (EcProviderID,TaxID,NPI,POBOXZIP,ProviderLocationStatusID,CreatedON)
VALUES (TPL.[Location_ID],TPL.[PProviderTaxID],TPL.[NPI]
,TPL.[POBOXZIP],1,GETDATE());
But I am failing in updating the NPI value -- if npi value is new in tmpProviderLoc it is not updating in ProviderLoc..
Could any one please look into this issue..
or any other way to go through this kind of checking

The equal symbol in this is incorrect: WHEN TPL.[NPI] = NULL THEN PL.[NPI]
Use IS NULL
WHEN TPL.[NPI] IS NULL THEN PL.[NPI]
NULLs are special. They are "indeterminate" so they cannot be equal or unequal to anything purely because they just cannot have any value "determined". NULLs are the absence of value and equal/unequal does not apply.
To discover if NULL exists use IS NULL - or - IS NOT NULL to discover if there is a non-null value.

PL/SQL Increase value of new row, with value of previous

I need to increase value of next NEWLOSAL row, to be bigger than one, from previous of NEWHISA.
Like HISAL and LOSAL column.
NEWLOSAL need to be previous NEWHISAL + 1.

not that sure if this is what you want:
update table1 t1
set t1.Newlosal=case when t1.grade=1 then (t1.Newhisal+1) else (select t2.Newhisal+1 from table1 t2 where t2.grade = (t1.grade-1)) end
WHERE EXISTS (
SELECT 1
FROM table1 t2
WHERE t2.grade=(t1.grade-1))

This can efficiently be done using the merge statement and a window function:
merge into table1 tg
using
(
select id, -- I assume this is the PK column
lag(newhisal) over (order by grade) + 1 as new_losal
from table1
) nv on (nv.id = tg.id)
when matched then update
set tg.newlosal = nv.new_losal;
In SQL rows in a table (or a result) or not ordered, so the concept of a "previous" row only makes sense if you define a sort order. That's what the over (order by grade) does in the window function. From the screen shot I can not tell by which column this should be sorted.
The screen shot also doesn't reveal the primary key column of your table. I assumed it's named ID. You have to change that to reflect your real PK column name.
I also didn't include a partition by clause in the window function assuming that the formula should be applied for all rows in the same way. If this is not the case you need to be more specific with your sample data.

How to update a table if values of the attributes are contained within another table?

I've got a database like this one:
I'm trying to create a query that would enable me to update the value of the status attribute inside the incident table whenever the values of all of these three attributes: tabor_vatrogasci, tabor_policija, and tabor_hitna are contained inside the izvještaj_tabora table as a value of the oznaka_tabora attribute. If, for example, the values of the tabor_vatrogasci, tabor_policija, and tabor_hitna attributes are 3, 4 and 5 respectively, the incident table should be updated if (and only if) 3, 4, and 5 are contained inside the izvještaj_tabora table.
This is what I tried, but it didn't work:
UPDATE incident SET status='Otvoren' FROM tabor,izvjestaj_tabora
WHERE (incident.tabor_policija=tabor.oznaka
OR incident.tabor_vatrogasci=tabor.oznaka
OR incident.tabor_hitna=tabor.oznaka)
AND izvjestaj_tabora.oznaka_tabora=tabor.oznaka
AND rezultat_izvjestaja='Riješen' AND
((SELECT EXISTS(SELECT DISTINCT oznaka_tabora FROM izvjestaj_tabora)
WHERE oznaka_tabora=incident.tabor_policija) OR tabor_policija=NULL) AND
((SELECT EXISTS(SELECT DISTINCT oznaka_tabora FROM izvjestaj_tabora)
WHERE oznaka_tabora=incident.tabor_vatrogasci) OR tabor_vatrogasci=NULL) AND
((SELECT EXISTS(SELECT DISTINCT oznaka_tabora FROM izvjestaj_tabora)
WHERE oznaka_tabora=incident.tabor_hitna) OR tabor_hitna=NULL);
Does anyone have any idea on how to accomplish this?

Asuming INCIDENT.OZNAKA is the key and you need all 3 to be ralated for the event to open (I am Slovenian that why I understand ;) )
UPDATE incident
SET status='Otvoren'
WHERE oznaka in (
SELECT DISTINCT i.oznaka
FROM incident i
INNER JOIN izvještaj_tabora t1 ON i.tabor_vatrogasci = t1.oznaka_tabora
INNER JOIN izvještaj_tabora t2 ON i.tabor_policija = t2.oznaka_tabora
INNER JOIN izvještaj_tabora t3 ON i.tabor_hitna = t3.oznaka_tabora
WHERE t1.rezultat_izvjestaja='Riješen' AND t2.rezultat_izvjestaja='Riješen' AND t3.rezultat_izvjestaja='Riješen'
)

According to your description the query should look something like this:
UPDATE incident i
SET status = 'Otvoren'
WHERE (tabor_policija IS NULL OR
EXISTS (
SELECT 1 FROM izvjestaj_tabora t
WHERE t.oznaka_tabora = i.tabor_policija
)
)
AND (tabor_vatrogasci IS NULL OR
EXISTS (
SELECT 1 FROM izvjestaj_tabora t
WHERE t.oznaka_tabora = i.tabor_vatrogasci
)
)
AND (tabor_hitna IS NULL OR
EXISTS (
SELECT 1 FROM izvjestaj_tabora t
WHERE t.oznaka_tabora = i.tabor_hitna
)
)
I wonder though, why the connecting table tabor is irrelevant to the operation.
Among other things you fell victim to two widespread misconceptions:
1)
tabor_policija=NULL
This expression aways results in NULL. Since NULL is considered "unknown", if you compare it to anything, the outcome is "unknown" as well. I quote the manual on Comparison Operators:
Do not write expression = NULL because NULL is not "equal to" NULL.
(The null value represents an unknown value, and it is not known
whether two unknown values are equal.)
2)
EXISTS(SELECT DISTINCT oznaka_tabora FROM ...)
In an EXISTS semi-join SELECT items are completely irrelevant. (I use SELECT 1 instead). As the term implies, only existence is checked. The expression returns TRUE or FALSE, SELECT items are ignored. It is particularly pointless to add a DISTINCT clause there.

Use of CASE statement values in THEN expression

I am attempting to use a case statement but keep getting errors. Here's the statement:
select TABLE1.acct,
CASE
WHEN TABLE1.acct_id in (select acct_id
from TABLE2
group by acct_id
having count(*) = 1 ) THEN
(select name
from TABLE3
where TABLE1.acct_id = TABLE3.acct_id)
ELSE 'All Others'
END as Name
from TABLE1
When I replace the TABLE1.acct_id in the THEN expression with a literal value, the query works. When I try to use TABLE1.acct_id from the WHEN part of the query, I get a error saying the result is more than one row. It seems like the THEN expression is ignoring the single value that the WHEN statement was using. No idea, maybe this isn't even a valid use of the CASE statement.
I am trying to see names for accounts that have one entry in TABLE2.
Any ideas would be appreciated, I'm kind of new at SQL.

First, you are missing a comma after TABLE1.acct. Second, you have aliased TABLE1 as acct, so you should use that.
Select acct.acct
, Case
When acct.acct_id in ( Select acct_id
From TABLE2
Group By acct_id
Having Count(*) = 1 )
Then ( Select name
From TABLE3
Where acct.acct_id = TABLE3.acct_id
Fetch First 1 Rows Only)
Else 'All Others'
End as Name
From TABLE1 As acct
As others have said, you should adjust your THEN clause to ensure that only one value is returned. You can do that by add Fetch First 1 Rows Only to your subquery.

Then ( Select name
From TABLE3
Where acct.acct_id = TABLE3.acct_id
Fetch First 1 Rows Only)
Fetch is not accepting in CASE statement - "Keyword FETCH not expected. Valid tokens: ) UNION EXCEPT. "

select name from TABLE3 where TABLE1.acct_id = TABLE3.acct_id
will give you all the names in Table3, which have a accompanying row in Table 1. The row selected from Table2 in the previous line doesn't enter into it.

Must be getting more than one value.
You can replace the body with...
(select count(name) from TABLE3 where TABLE1.acct_id = TABLE3.acct_id)
... to narrow down which rows are returning multiples.
It may be the case that you just need a DISTINCT or a TOP 1 to reduce your result set.
Good luck!

I think that what is happening here is that your case must return a single value because it will be the value for the "name" column. The subquery (select acct_id from TABLE2 group by acct_id having count(*) = 1 ) is OK because it will only ever return one value. (select name from TABLE3 where TABLE1.acct_id= TABLE3.acct_id) could return multiple values depending on your data. The problem is you trying to shove multiple values into a single field for a single row.
The next thing to do would be to find out what data causes multiple rows to be returned by (select name from TABLE3 where TABLE1.acct_id= TABLE3.acct_id), and see if you can further limit this query to only return one row. If need be, you could even try something like ...AND ROWNUM = 1 (for Oracle - other DBs have similar ways of limiting rows returned).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Deduplicate table while keeping the version vers column is not null - sql

Consider below (BigQuery) select * from your_table qualify 1 = row_number() over(partition by id order by label desc) if applied to sample data in your question - output is

Related

SQL Query : should return Single Record if Search Condition met, otherwise return Multiple Records

merge condition is not working in sql server

PL/SQL Increase value of new row, with value of previous

How to update a table if values of the attributes are contained within another table?

Use of CASE statement values in THEN expression

Categories

Resources