Fastest check if row exists in PostgreSQL - sql

I have a bunch of rows that I need to insert into table, but these inserts are always done in batches. So I want to check if a single row from the batch exists in the table because then I know they all were inserted.
So its not a primary key check, but shouldn't matter too much. I would like to only check single row so count(*) probably isn't good, so its something like exists I guess.
But since I'm fairly new to PostgreSQL I'd rather ask people who know.
My batch contains rows with following structure:
userid | rightid | remaining_count
So if table contains any rows with provided userid it means they all are present there.

Use the EXISTS key word for TRUE / FALSE return:
select exists(select 1 from contact where id=12)

How about simply:
select 1 from tbl where userid = 123 limit 1;
where 123 is the userid of the batch that you're about to insert.
The above query will return either an empty set or a single row, depending on whether there are records with the given userid.
If this turns out to be too slow, you could look into creating an index on tbl.userid.
if even a single row from batch exists in table, in that case I
don't have to insert my rows because I know for sure they all were
inserted.
For this to remain true even if your program gets interrupted mid-batch, I'd recommend that you make sure you manage database transactions appropriately (i.e. that the entire batch gets inserted within a single transaction).

INSERT INTO target( userid, rightid, count )
SELECT userid, rightid, count
FROM batch
WHERE NOT EXISTS (
SELECT * FROM target t2, batch b2
WHERE t2.userid = b2.userid
-- ... other keyfields ...
)
;
BTW: if you want the whole batch to fail in case of a duplicate, then (given a primary key constraint)
INSERT INTO target( userid, rightid, count )
SELECT userid, rightid, count
FROM batch
;
will do exactly what you want: either it succeeds, or it fails.

If you think about the performace ,may be you can use "PERFORM" in a function just like this:
PERFORM 1 FROM skytf.test_2 WHERE id=i LIMIT 1;
IF FOUND THEN
RAISE NOTICE ' found record id=%', i;
ELSE
RAISE NOTICE ' not found record id=%', i;
END IF;

as #MikeM pointed out.
select exists(select 1 from contact where id=12)
with index on contact, it can usually reduce time cost to 1 ms.
CREATE INDEX index_contact on contact(id);

SELECT 1 FROM user_right where userid = ? LIMIT 1
If your resultset contains a row then you do not have to insert. Otherwise insert your records.

select true from tablename where condition limit 1;
I believe that this is the query that postgres uses for checking foreign keys.
In your case, you could do this in one go too:
insert into yourtable select $userid, $rightid, $count where not (select true from yourtable where userid = $userid limit 1);

Related

T-SQL Stored Procedure: Performance of select count(*) vs. select count([uniqueId])

So, I'm looking at a stored procedure here, which has more than one line like the following pseudocode:
if(select count(*) > 0)
...
on tables having a unique id (or identifier, for making it more general).
Now, in terms of performance, is it more performant to change this clause
to
if(select count([uniqueId]) > 0)
...
where uniqueId is, e.g., an Idx containing double values?
An example:
Consider a table like Idx (double) | Name (String) | Address (String)
Now the 'Idx' is a foreign key which I want to join in a stored procedure.
So, in terms of performance: what is better here?
if(select count(*) > 0)
...
or
if(select count(Idx) > 0)
...
? Or does the SQL Engine Change select count(*) to select count(Idx) internally, so we do not have to bother about this? Because at first sight, I'd say that select count(Idx) would be more performant.
The two are slightly different. count(*) counts rows. count([uniqueid]) counts the number of non-NULL values for uniqueid. Because a unique constraint allows a NULL value, SQL Server actually needs to read the column. This could add microseconds of time to a query, particularly if the page with the id is not already in memory. This also gives SQL Server more opportunities to optimize count(*).
As #lad2025 writes in a comment, the performant solution is to use if (exists . . ..
SELECT t1.*
FROM Table1 t1
JOIN Table2 t2 ON t2.idx = t1.idx
will give you only the rows in t1 that match an idx value in Table2. I'm not sure there is a good reason to do an if(select count...).
If you are really interested in the performance of something like this, just create a temp table with a million rows and give it a go:
CREATE TABLE #TempTable (id int identity, txt varchar(50))
GO
INSERT #TempTable (txt) VALUES (##IDENTITY)
GO 1000000

Efficiently checking if more than one row exists in a recordset

I'm using SQL Server 2008 R2, and I'm trying to find an efficient way to test if more than 1 row exists in a table matching a condition.
The naive way to do it is a COUNT:
IF ( SELECT COUNT(*)
FROM Table
WHERE Column = <something>
) > 1 BEGIN
...
END
But this requires actually computing a COUNT, which is wasteful. I just want to test for more than 1.
The only thing I've come up with is a COUNT on a TOP 2:
IF ( SELECT COUNT(*)
FROM ( SELECT TOP 2 0 x
FROM Table
WHERE Column = <something>
) x
) > 1 BEGIN
...
END
This is clunky and requires commenting to document. Is there a more terse way?
If you have a PK in the table that you're checking for >1 row, you could nest another EXISTS clause. Not sure if this is faster, but it achieves your record result. For example, assuming a Station table with a PK named ID that can have zero-to-many Location table records with a PK named ID, Location has FK StationID, and you want to find the Stations with at least two Locations:
SELECT s.ID
FROM Station s
WHERE EXISTS (
SELECT 1
FROM Location L
WHERE L.StationID = s.ID
AND EXISTS (
SELECT 1
FROM Location L2
WHERE L2.StationID = L.StationID
AND L2.ID <> L.ID
)
)
I have the following solution to share, which may be lighter performance-wise.
I suppose that you are trying to fetch the first record and process it once you make sure that your SQL selection returns a single record. Therefore, go on and fetch it, but once you do that, try to fetch the next record right away, and if successful, you know that more than one record exists, and you can start your exception processing logic. Otherwise, you can still process your single record.

T-SQL cursor and update

I use a cursor to iterate through quite a big table. For each row I check if value from one column exists in other.
If the value exists, I would like to increase value column in that other table.
If not, I would like to insert there new row with value set to 1.
I check "if exists" by:
IF (SELECT COUNT(*) FROM otherTabe WHERE... > 1)
BEGIN
...
END
ELSE
BEGIN
...
END
I don't know how to get that row which was found and update value. I don't want to make another select.
How can I do this efficiently?
I assume that the method of checking described above isn't good for this case.
Depending on the size of your data and the actual condition, you have two basic approaches:
1) use MERGE
MERGE TOP (...) INTO table1
USING table2 ON table1.column = table2.column
WHEN MATCHED
THEN UPDATE SET table1.counter += 1
WHEN NOT MATCHED SOURCE
THEN INSERT (...) VALUES (...);
the TOP is needed because when you're doing a huge update like this (you mention the table is 'big', big is relative, but lets assume truly big, +100MM rows) you have to batch the updates, otherwise you'll overwhelm the transaction log with one single gigantic transaction.
2) use a cursor, as you are trying. Your original question can be easily solved, simply always update and then check the count of rows updated:
UPDATE table
SET column += 1
WHERE ...;
IF ##ROW_COUNT = 0
BEGIN
-- no match, insert new value
INSERT INTO (...) VALUES (...);
END
Note that this approach is dangerous though because of race conditions: there is nothing to prevent another thread from inserting the value concurrently, so you may end up with either duplicates or a constraint violation error (preferably the latter...).
This is just psuedo code because I have no idea of your table structure but I think you will understand... basically Update the columns you want then Insert the columns you need. A Cursor operation sounds unnecessary.
Update OtherTable
Set ColumnToIncrease = ColumnToIncrease + 1
FROM CurrentTable Where ColumnToCheckValue is not null
Insert Into OtherTable (ColumnToIncrease, Field1, Field2,...)
SELECT
1,
?
?
FROM CurrentTable Where ColumnToCheckValue is not null
Without a sample, I think this is the best I can do. Bottom line: you don't need a cursor. UPDATE where a match exists (INNER JOIN) and INSERT where one does not.
UPDATE otherTable
SET IncrementingColumn = IncrementingColumn + 1
FROM thisTable INNER JOIN otherTable ON thisTable.ID = otherTable.ID
INSERT INTO otherTable
(
ID
, IncrementingColumn
)
SELECT ID, 1
FROM thisTable
WHERE NOT EXISTS (SELECT *
FROM otherTable
WHERE thisTable.ID = otherTable.ID)
I think you'd be better off using a view for this -- then it's always up to date, no risk of mistakenly double/triple/etc counting:
CREATE VIEW vw_value_count AS
SELECT st.value,
COUNT(*) AS numValue
FROM SOME_TABLE st
GROUP BY st.value
But if you still want to use the INSERT/UPDATE approach:
IF EXISTS(SELECT NULL
FROM SOMETABLE WHERE ... > 1)
BEGIN
UPDATE TABLE
SET count = count + 1
WHERE value = #value
END
ELSE
BEGIN
INSERT INTO TABLE
(value, count)
VALUES
(#value, 1)
END
What about Update statement with inner join to perform +1, and Insert selected rows that do not exist in the first table.
Provide the tables schema and the columns you want to check and update so I can help.
Regards.

I want to leave always one record if table record count = 1 with SQL

I can delete records with this SQL clause,
DELETE FROM TABLE WHERE ID = 2
I need to always leave one record if table count = 1 even if "ID=2". How can I do this?
Add a WHERE clause to ensure there's more than one row:
DELETE FROM TABLE
WHERE ID = 2
AND (SELECT COUNT(*) FROM TABLE) > 1
Untested, but something in the lines of this might work?
DELETE FROM TABLE WHERE ID = 2 LIMIT (SELECT COUNT(*)-1 FROM TABLE WHERE ID=2);
Maybe add in an if-statement to ensure count is above 1.
Simple way is to disallow any delete that empties the table
CREATE TRIGGER TRG_MyTable_D FOR DELETE
AS
IF NOT EXISTS (SELECT * FROM MyTable)
ROLLBACK TRAN
GO
More complex, what if you do this multirow delete that empties the table?
DELETE FROM TABLE WHERE ID BETWEEN 2 AND 5
so, randomly repopulate from what you just deleted
CREATE TRIGGER TRG_MyTable_D FOR DELETE
AS
IF NOT EXISTS (SELECT * FROM MyTable)
INSERT mytable (col2, col2, ..., coln)
SELECT TOP 1 col2, col2, ..., coln FROM INSERTED --ORDER BY ??
GO
However, the requirement is a bit dangerous and vague. In English, OK, "always have at least one row in the table", but in practice "which row?"

Check whether a table contains rows or not sql server 2005

How to Check whether a table contains rows or not sql server 2005?
For what purpose?
Quickest for an IF would be IF EXISTS (SELECT * FROM Table)...
For a result set, SELECT TOP 1 1 FROM Table returns either zero or one rows
For exactly one row with a count (0 or non-zero), SELECT COUNT(*) FROM Table
Also, you can use exists
select case when exists (select 1 from table)
then 'contains rows'
else 'doesnt contain rows'
end
or to check if there are child rows for a particular record :
select * from Table t1
where exists(
select 1 from ChildTable t2
where t1.id = t2.parentid)
or in a procedure
if exists(select 1 from table)
begin
-- do stuff
end
Like Other said you can use something like that:
IF NOT EXISTS (SELECT 1 FROM Table)
BEGIN
--Do Something
END
ELSE
BEGIN
--Do Another Thing
END
FOR the best performance, use specific column name instead of * - for example:
SELECT TOP 1 <columnName>
FROM <tableName>
This is optimal because, instead of returning the whole list of columns, it is returning just one. That can save some time.
Also, returning just first row if there are any values, makes it even faster. Actually you got just one value as the result - if there are any rows, or no value if there is no rows.
If you use the table in distributed manner, which is most probably the case, than transporting just one value from the server to the client is much faster.
You also should choose wisely among all the columns to get data from a column which can take as less resource as possible.
Can't you just count the rows using select count(*) from table (or an indexed column instead of * if speed is important)?
If not then maybe this article can point you in the right direction.
Fast:
SELECT TOP (1) CASE
WHEN **NOT_NULL_COLUMN** IS NULL
THEN 'empty table'
ELSE 'not empty table'
END AS info
FROM **TABLE_NAME**