I made a SQL query to update registers on a table. The table has about 15 million registers. The update statement is like:
UPDATE temp_conafe
set apoyo = trim(apoyo)
where cve_status like '%APOYO%';
I keep checking the field v$transaction.used_ured to see if the query is rolling forward or backwards but when the number of records reach to more than 15 millions the query starts rolling backwards.
How do I get the update to complete successfully?
I'm not the DBA, just a programmer, but I can't keep developing till that thing updates my registers.
It looks as if your transaction is to big. Try to add another limiting clause in the where. If you have a Id field you can add something like this:
where cve_status like '%APOYO%'
AND id > 1 AND id < 100000
You need to run it multiple times an change the range accordingly. If this is not an option you have to talk to your DBA and ask him to give you more resources.
Related
I have a single database table on a relational database. Data will be loaded into it. I then want to have multiple servers processing that data concurrently (I don't want to have only one server running at a time). E.g. each server will:
Query for a fixed number of rows
Do some work for each row retrieved
Update each row to show it has been processed
How do I ensure that each row is only processed once? Note I don't want to pre-assign a row of data to a server; i'm designing for high availability so the solution should keep running if one or more servers goes down.
The solution I've gone for so far is as follows:
The table has three columns: LOCKED_BY (VARCHAR), LOCKED_AT (TIMESTAMP) and PROCESSED (CHAR)
Each server starts by attempting to "pseudo-lock" some rows by doing:
UPDATE THE_TABLE
SET LOCKED_BY= $servername,
LOCKED_AT = CURRENT_TIMESTAMP,
WHERE (LOCKED_BY = null OR (CURRENT_TIMESTAMP- LOCKED_AT > $timeout)
AND PROCSSED = 'N'
i.e. try to "pseudo-lock" rows that aren't locked already or where the pseudo-lock as expired. Only do this for unprocessed rows.
More than one server may have attempted this at the same time. The current server needs to query to find out if it was successful in the "pseudo-lock":
SELECT * FROM THE_TABLE
WHERE LOCKED_BY = $server_name
AND PROCESSED = 'N'
If any rows are returned the server can process them.
Once the processing has been done the row is updated
UPDATE THE_TABLE SET PROCESSED = 'Y' WHERE PRIMARYKEYCOL = $pk
Note: the update statement should ideally limit the number of rows updated.
If you are open to changing platform then I would suggest moving to a modern, cloud-based solution like Snowflake. This will do what you want but in the background and by default - so you don't need to know what it's doing or how it's doing it (unless you want to).
This may come across as patronising, which is not my intention, but what you are attempting (in the way you are attempting it) is very complex; so if you don't already know how to do it then someone telling you how to do it is not going to give you the skills/experience you need to be able to implement it successfully
Thanks in advance for putting up with me.
Pulling a 33,000-record recordset from the database took LESS execution time than using Count() in the SQL and just grabbing 20 rows.
How is that possible?
A bit more detail:
Before, we were grabbing the entire recordset yet only displaying 20 rows of it on a page at a time for pagination. That was cringeworthy and wasteful, so I redesigned the page to only grab 20 rows at a time and to simply use an index variable to grab the next page, and so on.
All well and good, but that lacked a record count, which our people needed.
So after the record query, I added (what I thought would be) a quick query just on the index of the table using the Count(index) function in Structured Query Language.
A side by side comparison of the original page and my new page indicates my new page takes roughly 10% longer to execute than the original! I was flabbergasted. I thought for sure it would be lightning fast, way faster than the original.
Any thoughts on why and what I might do to remedy that?
Is it because the script has to run two queries, regardless of the data retrieved?
Update:
Here is the SQL.
(Table names and field names are fictionalized in this post for security, but the structure is the same as the real page).
The main recordset select query contains:
SELECT
top 21 roster_id, roster_pplid, roster_pplemailid, roster_emailid, roster_firstname,
roster_lastname, roster_since, roster_pplsubscrid, roster_firstppldone, roster_pmtcurrent,
roster_emailverified, roster_active, roster_selfcanceled, roster_deactreason
FROM roster
WHERE
roster_siteid = 22
AND roster_isdeleted = false
order by roster_id desc
The record count query contains:
SELECT
COUNT(roster_id)
FROM
roster
WHERE
roster_siteid = 22
AND roster_isdeleted = false
The first query runs, then the second. The second always dynamically has the same matching WHERE filter.
I think I know why it is slower, I'm using GetRows to grab the recordset in the new page, was not using that in the old page. That seems to be the slowdown. But I have to use it, cannot step beyond the 21st record otherwise.
Nick.McDermaid : The SQL shown is selecting the TOP 21 rows, that is how it is grabbing just 20 rows (number 21 is just to populate the index for the "Next" page link).
I coding a application that dealing with files. So, I have a table that contains information about all the files that registered in the application.
My "files" table looks like this: ID, Path and LastScanTime.
The algorithm that I use in my application is simple:
Take the oldest row (LastScanTime is the oldest)
Extract the file path
Do some magics on this file (takes exactly 5 minutes)
Update the LastScanTime to the current time (now)
Go to step "1"
Until now, the task is pretty simple. For this, I going to use this SQL statement for getting the oldest item:
SELECT TOP 1 * FROM files ORDER BY [LastScanTime] ASC
and at the end of the item processing (preventing the item to be selected immediately again):
UPDATE Files SET [LastScanTime]=GETDATE() WHERE Id=#ItemID
Now, I going to add some complexity to the algorithm:
Take the 3 oldest row (LastScanTime is the oldest)
For each row, do:
A. Extract the file path
B. Do some magics on this file (takes exactly 5 minutes)
C. Update the LastScanTime to the current time (now)
D. Go to step "1"
The problem that now I facing with is that the whole process is going to be processed in parallel (no more serial processing). So, changing my SQL statement to the next statement is not enough!
SELECT TOP 3 * FROM files ORDER BY [LastScanTime] ASC
Why this SQL statement isn't enough?
Let's say that I run my code and started to execute the first 3 items. Now, after a minute I want to execute another 3 items. This SQL statement will retrieve exactly the same "oldest" items that we already started to process.
Possible solution
Implementing a SELECT & UPDATE (combined) that getting the 3 oldest item and immediately update their last scan time. Since there no SELECT & UPDATE in same statement, what will happens if between the executing of the first SELECT, will come in another SELECT? The both statements will get the same results. This is a problem... Another problem is that we mark the item as "scanned recently", before the scan is really finished. What happend if the scanned will terminated by an error?
I'm looking for tips and tricks to solve this problem. The solutions can add columns as needed.
I'll appreciate you help.
Well I usually have habit of having two different field name in the database. one is AddedDate and another is ModifiedDate.
So the algorithm in your terms will be:-
Take the oldest row (AddedDate is the oldest)
Extract the file path
Do some process on this file
Update the ModifiedDate to the current time (now)
It seems that you are going to invent event queue with your SQL. Possibly standard approaches like RabbitMQ or ActiveMQ may solve your problem.
I am writing a query that basically updates all date fields for EstimatedDepartureDate to a specified date. I am using a view I created to pull a list of accounts to update and running this against my tnpEmployee table which has all employee records. When running this query however, it always has an output similar to:
(1 row(s) affected)
(55 row(s) affected)
As in, it's affecting one row first and then the rest. And when I go to run the SQL in that view I created to see the list of persons of interest, it shows one record less with every run of the query, as well as one number less in that bottom (55 row(s) affected), so the next run would show 54.
Here is the query I'm running:
UPDATE tnpEmployee
SET tnpEmployee.EstimatedDepartureDate='2014-04-01 13:37:43.000'
FROM tnpEmployee
INNER JOIN vnpGetActiveAccountsAgainstToBeDisabled
ON tnpEmployee.EmployeeID=vnpGetActiveAccountsAgainstToBeDisabled.EmployeeID
WHERE tnpEmployee.Email=vnpGetActiveAccountsAgainstToBeDisabled.Email
Any help would be greatly appreciated, it's killing me! All that needs to happen is the rows get that date field set and that's it, they should all populate in the very basic view I created, not one less on every run. Also, when I run that view and get one record less, ALL the records have been updated with the new date..
You could try to rewrite your SQL statement:
UPDATE tnpEmployee
SET tnpEmployee.EstimatedDepartureDate='2014-04-01 13:37:43.000'
WHERE tnpEmployee.EmployeeID IN (
SELECT vnpGetActiveAccountsAgainstToBeDisabled.EmployeeID
FROM vnpGetActiveAccountsAgainstToBeDisabled
WHERE tnpEmployee.Email=vnpGetActiveAccountsAgainstToBeDisabled.Email
);
As the experts already suggested, your result looks like your query invokes a trigger.
Hello I need a SQL query statement that gets me rows 'start' to 'finish'.
For example:
A website with many items where page 1 selects only items 1-10, page 2 has 11-20 and so on.
I know how to do this with Microsoft SQL Server and MySQL but I need an implementation that is platform independent. :/
I have an Increment line for IDs but deleting in-between will mess the result when I select via
WHERE ID > number AND ID < othernumber
of course
Is this possible without fetching the whole database to a ResultSet?
I think your safest bet would be to use the BETWEEN operator. I believe it works across Oracle/MySQL/MSSQL.
WHERE ID BETWEEN number AND othernumber
Concerning your comment " I was just think for the case when first 100 IDs are gone I'll have to check further until there is something to fetch", you might wanna consider NOT actually ever deleting stuff from your database but to add a flag like "active" or something like that to your tables so you can avoid situations like the one you're now trying to avoid. The alternative is where you are now, having to find the max and min rows in a filter