How select query works in SQL - sql

I have a select query that selects data on streaming. Suppose I ran the query and data count is 100; while retrieving the data, a few more rows are inserted, for example 10 more. Now my question is: will the select return 100 or 110 rows?

This gets into Isolation in RDBMS environments. For example, in SQL Server, if I run a query that selects all COMMITTED data from a table, and at the time it has 100 rows, I will return 100 rows. If this table is currently being inserted into and those new rows are not yet committed, I will still return 100 rows (assuming the table is not locked). I have to rerun the query each time. The result set will not just magically get bigger. You have to issue a select each time you want to return data.
Now, if I am selecting UNCOMMITTED data and using something like NOLOCK, each time I run my select, I will return records that have not been committed yet. This means that each time I run my select, while the table is receiving new records, I will see those new records each time my data set is returned. This is helpful to see the newest records as they are coming in, but this can lead to dirty reads if for any reason that transaction fails or gets rolled back.

Related

Updating a column in a very large SQL table

I have a task on my dev databases to update some customer PII. Some of these tables have many rows and are quite large (on the order of 51 million+ and 25GB in size). Every row in the table will need to be updated so right now I'm just running a simple update statement without a where clause on a single column, and this query is, at the time of this post, running 35mins+. Is there a faster way to update large tables? or a better way to mask PII data?
Current query is just
update mytable
set mycolumn = 'some text'

PostgreSQL concurrency in transactions

I want to understand how transactions work in SQL, specifically in PostgreSQL
Imagine I have a very large table (first_table) and the query below lasts 2 seconds and I execute the query below via psql.
sudo -u postgres psql -f database/query.sql
This is the query:
TRUNCATE TABLE second_table;
INSERT INTO second_table (
foo1
,foo2
)
SELECT foo1
, foo2
FROM first_table;
What can happen if I execute another query selecting from second_table at the same time the previous query is executing. Notice the truncate table at the start of the previous query.
example:
SELECT * FROM second_table;
EDIT: I mean I would get zero or non-zero records in the second query?
I mean I would get zero or non-zero records in the second query?
Under reasonable transaction isolation levels, the database does not allow dirty reads, meaning no transaction can see changes from other transactions that have not yet been committed. (In Postgresql, it is not even an option to turn that off, a very sensible choice in my book).
That means that the second query will either see the contents of the table before the TRUNCATE, or it will see the new records added after the TRUNCATE. But it will not see something in between, i.e. it will not get an empty table (assuming there have been records in the table before the TRUNCATE) and it will not see an incomplete half of the new records (or even a weird mix).
If you say that the second query returns before the first query has committed, then it will have seen the state of the table before any changes from the first query have been applied.

In a SQL table with many rows, how can I quickly determine if a query might return more than 1000 rows

NOTE: This is a re-posting of a question from a Stack Overflow Teams site to attract a wider audience
I have a transaction log table that has many millions of records. Many of the data items that are linked to these logs might have more than 100K rows for each item.
I have a requirement to display a warning if a user tries to delete an item when more than 1000 items in the log table exist.
We have determined that 1000 logs means this item is in use
If I try to simply query the table to lookup the total number of log rows the query takes too long to execute:
SELECT COUNT(1)
FROM History
WHERE SensorID IN (SELECT Id FROM Sensor WHERE DeviceId = 96)
Is there a faster way to determine if the entity has more than 1000 log records?
NOTE: history table has an index on the SensorId column.
You are right to use Count instead of returning all the rows and checking the record count, but we are still asking the database engine to seek across all rows.
If the requirement is not to return the maximum number of rows, but just to determine if there are more than X number of rows, then the first improvement I would do is to return the count of just the first X rows from the table.
So if X is 1000, your application logic does not need to change, you will still be able to determine the difference between an item with 999 logs and 1000+ logs
We simply change the existing query an select the TOP(X) rows instead of the count, and then return the count of that resultset, only select the primary key or a unique indexed column so that we are only inspecting the index and not the underlying table store.
select count(Id) FROM (
SELECT TOP(1000) // limit the seek that the DB engine does to the limit
Id // Further constrain the seek to just the indexed column
FROM History
where SensorId IN ( // this is the same filter condition as before, just re-formatted
SELECT Id
FROM Sensor
WHERE DeviceId = 96)
) as trunk
Changing this query to top 10,000 still provides sub-second response, however with X = 100,000 the query took almost as long as the original query
There is another seemingly 'silver bullet' approach to this type of issue if table in question has a high transaction rate and the main reason for the execution time is due to waiting cause by lock contention.
If you suspect that locks are the issue, and you can accept a count response that includes uncommitted rows then you can use the WITH(NOLOCK) table hint to allow the query to run effectively in the READ UNCOMMITED transaction isolation level.
There is a good discussion about the effect of the NOLOCK table hint on select queries here
SELECT COUNT(1) FROM History WITH (NOLOCK)
WHERE SensorId IN (SELECT Id FROM Sensor WHERE DeviceId = 96)
Although strongly discouraged, this is a good example of a scenario when NOLOCK can easily be permitted, it even makes sense, as your count before delete will take into account another user or operation that is actively adding to the log count.
After many trials, when querying for 1000 or 10K rows the select with count solution is still faster than using the NOLOCK table hint. NOLOCK however presents an opportunity to execute the same query with minimal change, while still returning within a timely manner.
The performance of a select with NOLOCK will still increase as the number of rows in the underlying result set increases, where as the performance of the select that has a top with no order by clause should remain constant once the top limit has been exceeded.

Different result size between SELECT * and SELECT COUNT(*) on Oracle

I have an strange behavior on an oracle database. We make a huge insert of around 3.1 million records. Everything fine so far.
Shortly after the insert finished (around 1 too 10 minutes) I execute two statements.
SELECT COUNT(*) FROM TABLE
SELECT * FROM TABLE
The result from the first statement is fine it gives me the exact number of rows that was inserted.
The result from the second statement is now the problem. Depending on the time, the number of rows that are returned is for example around 500K lower than the result from the first statement. The difference of the two results is decreasing with time.
So I have to wait 15 to 30 minutes before both statements return the same number of rows.
I already talked with the oracle dba about this issue but he has no idea how this could happen.
Any ideas, questions or suggestions?
Update
When I select only an index column I get the correct row count.
When I instead select an non index column I get again the wrong row count.
That doesn't sounds like a bug to me, if I understood you correctly, it just takes time for Oracle to fetch the entire table . After all, 3 Mil is not a small amount.
As opposed to count, which brings 1 record with the total number of rows.
If after some waiting, the number of records being output equals to the number that the count query returns, then everything is fine.
Have you already verified with these things:
1- Count single column instead of * ALL to verify both result
2- You can verify both queries result by adding where clause and gradually select more rows by removing conditions so that you can get the issue where it is returning different value from both.
I think you should check Execution plan to identify missing indexes to improve performance.
Add missing Indexes and check the result.
Why missing Indexes are impotent:
To count row, Oracle engine no need to go throw paging operation. But while fetching all the details from a table, it requires to go through paging.
And paging process depends on indexes created on a table to fetch the data effectively and fast.
So to decrease time for your second statement, you should find missing indexes and create those indexes.
How to Find Missing Indexes:
You can start with DBA_HIST_ACTIVE_SESS_HISTORY, and look at all statements that contain that type of hint.
From there, you can pull the index name coming from that hint, and then do a lookup on dba_indexes to see if the index exists, is valid, etc.

How to check progress of long running insertions in oracle

I am trying to insert 1 million record after performing some calculation on each row in oracle. 15 hours gone and it is still in works. When i write a select query on this table it shows nothing. I don't know where is my inserted data going on each insert.
So my question is that, is there any way to check how many rows insert till now while performing long running insertion in oracle table, thanks.
It depends whether you are doing the insertion in SQL or PL/SQL. While using PL/SQL you have your own ways to get the number of rows already been processed, you can of course write your own program.
Coming to SQL, I can think of two ways :
V$SESSION_LONGOPS
V$TRANSACTION
Most of the GUI based tools would have nice graphical representation for the long operations view. You can query -
SELECT ROUND(SOFAR*21411/TOTALWORK)
FROM V$SESSION_LONGOPS
WHERE username = '<username>'
AND TIME_REMAINING > 0
The V$TRANSACTION view can tell you whether any transaction is still pending. If your INSERT is completed and COMMIT is issued, the transaction would be completed. You can join it with v$session. You can query -
SELECT ....
from v$transaction t
inner join v$session s
ON t.addr = s.taddr;