SQLite set records with matching checksums to same value if value is not set - sql

I have a table that contains images: imageArchive
Images in the table are not unique
There is a checksum field that lets you know which images are identical
The record also contains a userID, but in many cases this field is NULL
How can I, if one record with a given checkSum has a userID, set all records with that checksum to the known userId?
I've gotten as far as:
select imageChksum from imageArchive
where userId != "NULL"
group by imagechksum
which gives me the set of known checksums that have userIds
and:
select * from imageArchive
where imagechksum in
(select imageChksum from imageArchive
where userId != "NULL"
group by imagechksum)
which gives me a list of targets to set. However I'm clueless as to how to set them all... probably simple? I'm pretty much self-taught in SQL and generally would do something like this in code but have a sense that I might be close

You can use window functions:
select imageChksum, max(userid) over(partition by imageChksum) as userid
from imageArchive
Note that if there are two different users assigned to the same checksum, the greatest id will be chosen.
If you wanted an update statement, I would recommend correlated subqueries:
update imageArchive
set userid = (select max(ia1.userid) from imageArchive ia1 where ia1.imageChksum = imageArchive.imageChksum)
where
userid is null
and exists (select 1 from imageArchive ia1 where ia1.imageChksum = imageArchive.imageChksum)

Related

Duplicate ID in the database

I noticed in my database, some users have the same ID number (it seems to be a bug that didn't check if the id number was already taken for a deleted user).
There are hundreds of couples of users with the same ID number.
Through SQL I would like to update (adding a 0) to all those users who have a duplicate ID and are deleted.
I'm very familiar with the SQL language.
I found all the duplicate ID users using this query, but I am not sure how I should proceed.
SELECT ID, COUNT(*) As Num
FROM Users
GROUP BY ID
HAVING COUNT(ID) >= 2
If I understand correctly, you have some sort of "isdeleted" flag. Although I'm not sure that "adding a zero" is the best solution to your problem, the standard SQL for this would, based on your description, look something like this:
update t
set id = id || '0'
where isdeleted = 1 and
exists (select 1 from t t2 where t2.id = t.id and t2.isdeleted = 0);
This assumes that isdeleted is a number, with 0 for false and 1 for true. || is the standard SQL operator for string concatenation. Some databases have other mechanisms for string concatenation.
The query is for oracle, not sure what database are you using,
update users set id = id||0 where rowid not in
(select max(rowid ) from users group by id)
--and flag = 'Deleted Flag' -- uncomment the delete flag if you have in the table. If not just use same query a it is
;

Update columns in DB2 using randomly chosen static values provided at runtime

I would like to update rows with values chosen randomly from a set of possible values.
Ideally I would be able to provide this values at runtime, using JdbcTemplate from Java application.
Example:
In a table, column "name" can contain any name. The goal is to run through the table and change all names to equal to either "Bob" or "Alice".
I know that this can be done by creating a sql function. I tested it and it was fine but I wonder if it is possible to just use simple query?
This will not work, seems that the value is computed once, and applied to all rows:
UPDATE test.table
SET first_name =
(SELECT a.name
FROM
(SELECT a.name, RAND() idx
FROM (VALUES('Alice'), ('Bob')) AS a(name) ORDER BY idx FETCH FIRST 1 ROW ONLY) as a)
;
I tried using MERGE INTO, but it won't even run (possible_names is not found in SET query). I am yet to figure out why:
MERGE INTO test.table
USING
(SELECT
names.fname
FROM
(VALUES('Alice'), ('Bob'), ('Rob')) AS names(fname)) AS possible_names
ON ( test.table.first_name IS NOT NULL )
WHEN MATCHED THEN
UPDATE SET
-- select random name
first_name = (SELECT fname FROM possible_names ORDER BY idx FETCH FIRST 1 ROW ONLY)
;
EDIT: If possible, I would like to only focus on fields being updated and not depend on knowing primary keys and such.
Db2 seems to be optimizing away the subselect that returns your supposedly random name, materializing it only once, hence all rows in the target table receive the same value.
To force subselect execution for each row you need to somehow correlate it to the table being updated, for example:
UPDATE test.table
SET first_name =
(SELECT a.name
FROM (VALUES('Alice'), ('Bob')) AS a(name)
ORDER BY RAND(ASCII(SUBSTR(first_name, 1, 1)))
FETCH FIRST 1 ROW ONLY)
or may be even
UPDATE test.table
SET first_name =
(SELECT a.name
FROM (VALUES('Alice'), ('Bob')) AS a(name)
ORDER BY first_name, RAND()
FETCH FIRST 1 ROW ONLY)
Now that the result of subselect seems to depend on the value of the corresponding row in the target table, there's no choice but to execute it for each row.
If your table has a primary key, this would work. I've assumed the PK is column id.
UPDATE test.table t
SET first_name =
( SELECT name from
( SELECT *, ROW_NUMBER() OVER(PARTITION BY id ORDER BY R) AS RN FROM
( SELECT *, RAND() R
FROM test.table, TABLE(VALUES('Alice'), ('Bob')) AS d(name)
)
)
AS u
WHERE t.id = u.id and rn = 1
)
;
There might be a nicer/more efficient solution, but I'll leave that to others.
FYI I used the following DDL and data to test the above.
create table test.table(id int not null primary key, first_name varchar(32));
insert into test.table values (1,'Flo'),(2,'Fred'),(3,'Sue'),(4,'John'),(5,'Jim');

Alternative to NOT IN()

I have a table with 14,028 rows from November 2012. I also have a table with 13,959 rows from March 2013. I am using a simple NOT IN() clause to see who has left:
select * from nov_2012 where id not in(select id from mar_2013)
This returned 396 rows and I never thought anything of it, until I went to analyze who left. When I pulled all the ids for the lost members and put them in a temp table (##lost), 32 of them were actually still in the mar_2013 table. I can pull them up when I search for their ids using the following:
select * from mar_2013 where id in(select id from ##lost)
I can't figure out what is going on. I will mention that the id field I created is an IDENTITY column. Could that have any effect on the matching using NOT IN? Is there a better way to check for missing rows between tables? I have also tried:
select a.* from nov_2012 a left join mar_2013 b on b.id = a.id where b.id is NULL
And received the same results.
This is how I created the identity field;
create table id_lookup( dateofcusttable date ,sin int ,sex varchar(12) ,scid int identity(777000,1))
insert into id_lookup (sin, sex) select distinct sin, sex from [Client Raw].dbo.cust20130331 where sin <> 0 order by sin, sex
This is how I added the scid into the march table:
select scid, rowno as custrowno
into scid_20130331
from [Client Raw].dbo.cust20130331 cust
left join id_lookup scid
on scid.sin = cust.sin
and scid.sex = cust.sex
update scid_20130331
set scid = custrowno where scid is NULL --for members who don't have more than one id or sin information is not available
drop table Account_Part2_Current
select a.*, scid
into Account_Part2_Current
from Account_Part1_Current a
left join scid_20130331 b
on b.custrowno = a.rowno_custdmd_cust
I then group all the information by the scid
I would prefer this form (and here's why):
SELECT a.id --, other columns
FROM dbo.nov_2012 AS a
WHERE NOT EXISTS (SELECT 1 FROM dbo.mar_2013 WHERE id = a.id);
However this should still give the same results as what you've tried, so I suspect there is something about the data model that you're not telling us - for example, is mar_2013.id nullable?
this is logically equivalent to not in and is faster than not in.
where yourfield in
(select afield
from somewhere
minus
select
thesamefield
where you want to exclude the record
)
It probably isn't as fast as using where not exists, as per Aaron's answer so you should only use it if not exists does not provide the results you want.

How to update a table if values of the attributes are contained within another table?

I've got a database like this one:
I'm trying to create a query that would enable me to update the value of the status attribute inside the incident table whenever the values of all of these three attributes: tabor_vatrogasci, tabor_policija, and tabor_hitna are contained inside the izvještaj_tabora table as a value of the oznaka_tabora attribute. If, for example, the values of the tabor_vatrogasci, tabor_policija, and tabor_hitna attributes are 3, 4 and 5 respectively, the incident table should be updated if (and only if) 3, 4, and 5 are contained inside the izvještaj_tabora table.
This is what I tried, but it didn't work:
UPDATE incident SET status='Otvoren' FROM tabor,izvjestaj_tabora
WHERE (incident.tabor_policija=tabor.oznaka
OR incident.tabor_vatrogasci=tabor.oznaka
OR incident.tabor_hitna=tabor.oznaka)
AND izvjestaj_tabora.oznaka_tabora=tabor.oznaka
AND rezultat_izvjestaja='Riješen' AND
((SELECT EXISTS(SELECT DISTINCT oznaka_tabora FROM izvjestaj_tabora)
WHERE oznaka_tabora=incident.tabor_policija) OR tabor_policija=NULL) AND
((SELECT EXISTS(SELECT DISTINCT oznaka_tabora FROM izvjestaj_tabora)
WHERE oznaka_tabora=incident.tabor_vatrogasci) OR tabor_vatrogasci=NULL) AND
((SELECT EXISTS(SELECT DISTINCT oznaka_tabora FROM izvjestaj_tabora)
WHERE oznaka_tabora=incident.tabor_hitna) OR tabor_hitna=NULL);
Does anyone have any idea on how to accomplish this?
Asuming INCIDENT.OZNAKA is the key and you need all 3 to be ralated for the event to open (I am Slovenian that why I understand ;) )
UPDATE incident
SET status='Otvoren'
WHERE oznaka in (
SELECT DISTINCT i.oznaka
FROM incident i
INNER JOIN izvještaj_tabora t1 ON i.tabor_vatrogasci = t1.oznaka_tabora
INNER JOIN izvještaj_tabora t2 ON i.tabor_policija = t2.oznaka_tabora
INNER JOIN izvještaj_tabora t3 ON i.tabor_hitna = t3.oznaka_tabora
WHERE t1.rezultat_izvjestaja='Riješen' AND t2.rezultat_izvjestaja='Riješen' AND t3.rezultat_izvjestaja='Riješen'
)
According to your description the query should look something like this:
UPDATE incident i
SET status = 'Otvoren'
WHERE (tabor_policija IS NULL OR
EXISTS (
SELECT 1 FROM izvjestaj_tabora t
WHERE t.oznaka_tabora = i.tabor_policija
)
)
AND (tabor_vatrogasci IS NULL OR
EXISTS (
SELECT 1 FROM izvjestaj_tabora t
WHERE t.oznaka_tabora = i.tabor_vatrogasci
)
)
AND (tabor_hitna IS NULL OR
EXISTS (
SELECT 1 FROM izvjestaj_tabora t
WHERE t.oznaka_tabora = i.tabor_hitna
)
)
I wonder though, why the connecting table tabor is irrelevant to the operation.
Among other things you fell victim to two widespread misconceptions:
1)
tabor_policija=NULL
This expression aways results in NULL. Since NULL is considered "unknown", if you compare it to anything, the outcome is "unknown" as well. I quote the manual on Comparison Operators:
Do not write expression = NULL because NULL is not "equal to" NULL.
(The null value represents an unknown value, and it is not known
whether two unknown values are equal.)
2)
EXISTS(SELECT DISTINCT oznaka_tabora FROM ...)
In an EXISTS semi-join SELECT items are completely irrelevant. (I use SELECT 1 instead). As the term implies, only existence is checked. The expression returns TRUE or FALSE, SELECT items are ignored. It is particularly pointless to add a DISTINCT clause there.

Fastest check if row exists in PostgreSQL

I have a bunch of rows that I need to insert into table, but these inserts are always done in batches. So I want to check if a single row from the batch exists in the table because then I know they all were inserted.
So its not a primary key check, but shouldn't matter too much. I would like to only check single row so count(*) probably isn't good, so its something like exists I guess.
But since I'm fairly new to PostgreSQL I'd rather ask people who know.
My batch contains rows with following structure:
userid | rightid | remaining_count
So if table contains any rows with provided userid it means they all are present there.
Use the EXISTS key word for TRUE / FALSE return:
select exists(select 1 from contact where id=12)
How about simply:
select 1 from tbl where userid = 123 limit 1;
where 123 is the userid of the batch that you're about to insert.
The above query will return either an empty set or a single row, depending on whether there are records with the given userid.
If this turns out to be too slow, you could look into creating an index on tbl.userid.
if even a single row from batch exists in table, in that case I
don't have to insert my rows because I know for sure they all were
inserted.
For this to remain true even if your program gets interrupted mid-batch, I'd recommend that you make sure you manage database transactions appropriately (i.e. that the entire batch gets inserted within a single transaction).
INSERT INTO target( userid, rightid, count )
SELECT userid, rightid, count
FROM batch
WHERE NOT EXISTS (
SELECT * FROM target t2, batch b2
WHERE t2.userid = b2.userid
-- ... other keyfields ...
)
;
BTW: if you want the whole batch to fail in case of a duplicate, then (given a primary key constraint)
INSERT INTO target( userid, rightid, count )
SELECT userid, rightid, count
FROM batch
;
will do exactly what you want: either it succeeds, or it fails.
If you think about the performace ,may be you can use "PERFORM" in a function just like this:
PERFORM 1 FROM skytf.test_2 WHERE id=i LIMIT 1;
IF FOUND THEN
RAISE NOTICE ' found record id=%', i;
ELSE
RAISE NOTICE ' not found record id=%', i;
END IF;
as #MikeM pointed out.
select exists(select 1 from contact where id=12)
with index on contact, it can usually reduce time cost to 1 ms.
CREATE INDEX index_contact on contact(id);
SELECT 1 FROM user_right where userid = ? LIMIT 1
If your resultset contains a row then you do not have to insert. Otherwise insert your records.
select true from tablename where condition limit 1;
I believe that this is the query that postgres uses for checking foreign keys.
In your case, you could do this in one go too:
insert into yourtable select $userid, $rightid, $count where not (select true from yourtable where userid = $userid limit 1);