I am writing a query that basically updates all date fields for EstimatedDepartureDate to a specified date. I am using a view I created to pull a list of accounts to update and running this against my tnpEmployee table which has all employee records. When running this query however, it always has an output similar to:
(1 row(s) affected)
(55 row(s) affected)
As in, it's affecting one row first and then the rest. And when I go to run the SQL in that view I created to see the list of persons of interest, it shows one record less with every run of the query, as well as one number less in that bottom (55 row(s) affected), so the next run would show 54.
Here is the query I'm running:
UPDATE tnpEmployee
SET tnpEmployee.EstimatedDepartureDate='2014-04-01 13:37:43.000'
FROM tnpEmployee
INNER JOIN vnpGetActiveAccountsAgainstToBeDisabled
ON tnpEmployee.EmployeeID=vnpGetActiveAccountsAgainstToBeDisabled.EmployeeID
WHERE tnpEmployee.Email=vnpGetActiveAccountsAgainstToBeDisabled.Email
Any help would be greatly appreciated, it's killing me! All that needs to happen is the rows get that date field set and that's it, they should all populate in the very basic view I created, not one less on every run. Also, when I run that view and get one record less, ALL the records have been updated with the new date..
You could try to rewrite your SQL statement:
UPDATE tnpEmployee
SET tnpEmployee.EstimatedDepartureDate='2014-04-01 13:37:43.000'
WHERE tnpEmployee.EmployeeID IN (
SELECT vnpGetActiveAccountsAgainstToBeDisabled.EmployeeID
FROM vnpGetActiveAccountsAgainstToBeDisabled
WHERE tnpEmployee.Email=vnpGetActiveAccountsAgainstToBeDisabled.Email
);
As the experts already suggested, your result looks like your query invokes a trigger.
Related
Thanks in advance for putting up with me.
Pulling a 33,000-record recordset from the database took LESS execution time than using Count() in the SQL and just grabbing 20 rows.
How is that possible?
A bit more detail:
Before, we were grabbing the entire recordset yet only displaying 20 rows of it on a page at a time for pagination. That was cringeworthy and wasteful, so I redesigned the page to only grab 20 rows at a time and to simply use an index variable to grab the next page, and so on.
All well and good, but that lacked a record count, which our people needed.
So after the record query, I added (what I thought would be) a quick query just on the index of the table using the Count(index) function in Structured Query Language.
A side by side comparison of the original page and my new page indicates my new page takes roughly 10% longer to execute than the original! I was flabbergasted. I thought for sure it would be lightning fast, way faster than the original.
Any thoughts on why and what I might do to remedy that?
Is it because the script has to run two queries, regardless of the data retrieved?
Update:
Here is the SQL.
(Table names and field names are fictionalized in this post for security, but the structure is the same as the real page).
The main recordset select query contains:
SELECT
top 21 roster_id, roster_pplid, roster_pplemailid, roster_emailid, roster_firstname,
roster_lastname, roster_since, roster_pplsubscrid, roster_firstppldone, roster_pmtcurrent,
roster_emailverified, roster_active, roster_selfcanceled, roster_deactreason
FROM roster
WHERE
roster_siteid = 22
AND roster_isdeleted = false
order by roster_id desc
The record count query contains:
SELECT
COUNT(roster_id)
FROM
roster
WHERE
roster_siteid = 22
AND roster_isdeleted = false
The first query runs, then the second. The second always dynamically has the same matching WHERE filter.
I think I know why it is slower, I'm using GetRows to grab the recordset in the new page, was not using that in the old page. That seems to be the slowdown. But I have to use it, cannot step beyond the 21st record otherwise.
Nick.McDermaid : The SQL shown is selecting the TOP 21 rows, that is how it is grabbing just 20 rows (number 21 is just to populate the index for the "Next" page link).
I run the following query:
SELECT * FROM [fabrika21.master] OMIT RECORD IF NOT SOME (contact.phone = "9037777417")
with the following options:
Destination table: some existing table
Write Preference: Overwrite table
Results size: Allow large results
Results Schema: Flatten results
I have the following results:
As you can see, the returned record does not match the query.
When I replace star in the query with field, I have empty results:
Strange, there is no "Query returned zero records." message.
And when I remove the destination table option, I have correct results:
I think it is a bug. If somebody from BigQuery team would like to help me, jobIds are:
bquijob_691c1514_1577669d359 (query with star and destination table)
bquijob_14e10ce2_157766b1a1b (query with explicit field and destination table)
bquijob_60d53244_157766c4d8e (query with explicit field and no destination table)
Thanks!
You're correct, this is a bug in our display of the query results! While the actual query execution produced correct results, the web UI is caching previous results for the same destination table. In your case, an earlier query (perhaps bquijob_2aa85566_15775c5cce4) produced the results you later saw.
We'll address this immediately, but you can work around the problem by using the bq CLI program or refreshing your browser window between queries.
Thank you for the detailed post, it was a great help to diagnose the problem.
I coding a application that dealing with files. So, I have a table that contains information about all the files that registered in the application.
My "files" table looks like this: ID, Path and LastScanTime.
The algorithm that I use in my application is simple:
Take the oldest row (LastScanTime is the oldest)
Extract the file path
Do some magics on this file (takes exactly 5 minutes)
Update the LastScanTime to the current time (now)
Go to step "1"
Until now, the task is pretty simple. For this, I going to use this SQL statement for getting the oldest item:
SELECT TOP 1 * FROM files ORDER BY [LastScanTime] ASC
and at the end of the item processing (preventing the item to be selected immediately again):
UPDATE Files SET [LastScanTime]=GETDATE() WHERE Id=#ItemID
Now, I going to add some complexity to the algorithm:
Take the 3 oldest row (LastScanTime is the oldest)
For each row, do:
A. Extract the file path
B. Do some magics on this file (takes exactly 5 minutes)
C. Update the LastScanTime to the current time (now)
D. Go to step "1"
The problem that now I facing with is that the whole process is going to be processed in parallel (no more serial processing). So, changing my SQL statement to the next statement is not enough!
SELECT TOP 3 * FROM files ORDER BY [LastScanTime] ASC
Why this SQL statement isn't enough?
Let's say that I run my code and started to execute the first 3 items. Now, after a minute I want to execute another 3 items. This SQL statement will retrieve exactly the same "oldest" items that we already started to process.
Possible solution
Implementing a SELECT & UPDATE (combined) that getting the 3 oldest item and immediately update their last scan time. Since there no SELECT & UPDATE in same statement, what will happens if between the executing of the first SELECT, will come in another SELECT? The both statements will get the same results. This is a problem... Another problem is that we mark the item as "scanned recently", before the scan is really finished. What happend if the scanned will terminated by an error?
I'm looking for tips and tricks to solve this problem. The solutions can add columns as needed.
I'll appreciate you help.
Well I usually have habit of having two different field name in the database. one is AddedDate and another is ModifiedDate.
So the algorithm in your terms will be:-
Take the oldest row (AddedDate is the oldest)
Extract the file path
Do some process on this file
Update the ModifiedDate to the current time (now)
It seems that you are going to invent event queue with your SQL. Possibly standard approaches like RabbitMQ or ActiveMQ may solve your problem.
I am working towards counting customer subscription ("package") changes. To do this, I am selecting all data from my package table once, every day. I am calling the daily query results "snapshots" (approx 500k rows). I then load the snapshot data into a new table. After 10 days I have a total of 5 million rows in the snapshots table (500k rows * 10 days). The majority of customers do not changes packages (65%). I need to report which customers, of the remaining 35%, are switching packages, when they are switching packages, what package changes they are making (from "package X" to "package y") and which customers are changing packages most frequently.
The query I have written uses a self-join. I am identifying the changes but my results contain duplicate rows.
This is my query:
select *
from UserPackageDump UPD1, UserPackageDump UPD2
where UPD1.user_id = UPD2.user_id
and UPD1.package_id <> UPD2.package_id
How can I change this query to yield only distinct results?
SELECT
DISTINCT *
FROM
UserPackageDump UPD1
JOIN UserPackageDump UPD2
ON UPD1.user_id = UPD2.user_id
WHERE
UPD1.package_id <> UPD2.package_id
You have many options for doing this, and I'm not sure your approach is the right one to take. Firstly to answer your specific question, you could perform a DISTINCT as per #sqlab's answer. Or you could include the date in the join, ensuring that UDP1 only matches a record in UDP2 that is one day different.
However, to come back to the approach, there should be no need to take a full copy of all the data. You have lots of other options for more efficient data storage, some of which being:
Put a "LastUpdated" datetime2 field in the database, to be populated each time the row is changed. Copy only those rows that have a LastUpdated more recent than the last time the copy was made. Assuming the only change possible to the table is to change the package_id then you will now only have rows in the table for users that have changed.
Create a UserPackageHistory table into which rows are written each time a user subscribes to a package, at the same time that UserPackage is updated. This then leaves you with much the same result as the first bullet, but in advance of running the copy job.
Then, with any one of these sets of data, to satisfy the reporting requirements you could populate a cube. Your source would be a set of rows containing user_id, old_package_id, new_package_id and date. You would create a measure group containing these measures:
Distinct count of user_id
Count of switches (basically just the row count of the source data)
This measure group could then be related to the following dimensions:
Date, so you can see when the switches are taking place
User, so you can drill down on who is switching
Switch Type, which is a dimension built from the selecting the old_package_id and new_package_id from your source data. This gives you the ability to see the popularity of particular shifts.
I made a SQL query to update registers on a table. The table has about 15 million registers. The update statement is like:
UPDATE temp_conafe
set apoyo = trim(apoyo)
where cve_status like '%APOYO%';
I keep checking the field v$transaction.used_ured to see if the query is rolling forward or backwards but when the number of records reach to more than 15 millions the query starts rolling backwards.
How do I get the update to complete successfully?
I'm not the DBA, just a programmer, but I can't keep developing till that thing updates my registers.
It looks as if your transaction is to big. Try to add another limiting clause in the where. If you have a Id field you can add something like this:
where cve_status like '%APOYO%'
AND id > 1 AND id < 100000
You need to run it multiple times an change the range accordingly. If this is not an option you have to talk to your DBA and ask him to give you more resources.