Does SQL server retrieve the same rows when you use top statement - sql

My SQL table consists of three columns - Event (type xml), InsertedTime (type datetime) and status (type nvarchar - possible values processed and unprocessed). None of them are unique identifiers and all of these are mandatory.
As part of a select query, I retrieve the top 1000 rows of the table (based on the unprocessed status), use the XML to retrieve some values, and would like to update the status of these exact 1000 rows to processed status.
My question is: I'm using the SELECT TOP 1000 FROM table WHERE status ='Unprocessed' ORDER BY InsertedTime to retrieve and UPDATE TOP 1000 table WHERE status = 'Processed' ORDER BY InsertedTime statements to achieve this.
I understand that in Oracle, I can use the rowid pseudocolumn to ensure that I'm updating the same rows that were retrieved in the first place. But how do I achieve this without having any unique identifier or primary key in the table in SQL?
Note: The table is being written to continuously.

You're selecting rows to be handled and then trying to update the status of those rows? Instead of doing select + update you could use output clause, with something like this:
UPDATE TOP (1000) table
set status = 'Processed'
output deleted.Event, deleted.InsertedTime, deleted.status
where status = 'Unprocessed'
This will both update the rows + return Event, InsertedTime and status fields (old values). If you need the new values, you can use the virtual table inserted.

Assuming that processes may try to insert new rows while your two queries are processing, you have a few options:
Wrap the two queries in a transaction. This should guarantee atomicity between them, at the cost of extra locking on the table.
Find the oldest InsertedTime value from the first query and use that with the WHERE clause in the 2nd query.
Combine the UPDATE and SELECT into a single statement via an OUTPUT clause.

Related

How to identify which records were not updated after executing an update query in oracle?

I am trying to update 100 records in oracle through the following update statement, but when I execute it, it says 90 records updated, whereas I have 100 records in the where clause.
Now how to identify which 10 records (codes) are not updated out of 100 in oracle?
In the following statement, I want to know the code names which were not updated? is there any simple trick to know?
update table1 a set a.column1='Yes'
where a.column2 in ('code1','code2','code3','code4',........,'code100');
You can't really directly from the update; but you could use the same list of values (assuming it's a hard-coded list, not coming from a table) as a collection, and then look for values in the collection that are not in the table:
select *
from table(sys.odcivarchar2list('code1','code2','code3','code4',........,'code100')) t
where not exists (
select null
from table1 a
where a.column2 = t.column_value
);
db<>fiddle with a smaller set to demonstrate the idea.
You could modify your update to only update rows which are not already 'Yes'; and if you did that then you could either look for collection values that don't exist at all, or those which exist but don't need to be updated - in that case, before you actually run the update, of course. db<>fiddle.
odcivarchar2list is a built-in collection type, but you could use your own.
If you already have the values in a collection or table you can use that directly, both for this query and for the update.

Querying a SQL table and only transferring updated rows to a different database

I have a database table which constantly gets updated. I am looking to query only the changes/additions that have been made on rows with a specific attribute in a column. e.g. get the rows which have been changed/added, the 'description' column of which is "xyz". My end goal is to copy these rows to another table in another database. Is this even possible? The reason for not just querying and overwriting the rows in the other database is to avoid inefficiency.
What I have tried so far?
I am able to select query on the table to get the rows but it gives me all the rows, not the ones that have been changed or recently added. If i add these rows to the table in the other database, the only option I have is to overwrite the rows.
Log table logs the changes in a table but I can't put additional filters in SQL which tells me which of these changes are associated with 'description' column as 'xyz'.
Write your update statements to make use of OUTPUT to capture the before and after values and log them to a table of your choice.
Here is a really simple example update example that uses output to store the RowID, before and after values for the ActivityType column:
DECLARE #MyTableVar table (
SummaryBefore nvarchar(max),
SummaryAfter nvarchar(max),
RowID int
);
update DBA.dbo.dtest set ActivityType = 3
OUTPUT deleted.ActivityType,
inserted.ActivityType,
inserted.RowID
INTO #MyTableVar
select * From #MyTableVar
You can do it two ways
Have new date fields/columns like update_time and/or create_time(Can be defaulted if needed). These fields will indicate the status of the record. You need to save your previous_run_time and then your select query will look for records with update_time/create_time greater than previous_run_time, and then you can move these records to the new DB.
Have CDC turned on the source table, which is available by default in SQL server and then move only those records that have been impacted.

How to remove duplicate rows and keep one in an Access database?

I need to remove duplicate rows in my Access database, does anyone have generic query to do this? As I have this problem with multiple tables
There are two things you need to do,
Determine what the criteria are for a unique record - what is the list of columns where two, or more, records would be considered duplicates, e.g. JobbID and HisGuid
Decide what you want to do with the duplicate records - do you want to hard delete them, or set the IsDeleted flag that you have on the table
Once you've determined the criteria that you want to use for uniqueness you then need to pick 1 record from each group of duplicates to retain. A query along the lines of:
SELECT MAX(ID)
FROM MyTable
GROUP
BY JobbID, HisGuid
Will give you (and I've assumed that the ID column is an auto-increment/identity column that is unique across all records in the table) the highest value for each group of records where JobbID and HisGuid are both the same. You could use MIN(ID) if you want, it's up to you - you just need to pick ONE record from each group to keep.
Assuming that you want to set the IsDeleted flag on the records you don't want to keep, you can then incorporate this into an update query:
UPDATE MyTable
SET IsDeleted = 1
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP
BY JobbID, HisGuid
)
This takes the result of the query that retrieves the highest IDs and uses it to say set IsDeleted to 1 for all the records where the ID isn't the highest ID for each group of records where JobbID and HisGuid are the same.
The only part I can't help you with is running these queries in Access as I don't have it installed on the PC I'm using right now and my memory is a bit rusty regarding how/where to run arbitrary queries.

SQL update statement to populate numerical series for each distinct subset of table records

I need the SQL update statement to assign consecutive sequence numbers to subsets of records in a table. I'm using MS access.
Let's say the current table has records like:
notebook,blue
notebook.Yellow
pencil,yellow
chair,blue
desk,green
desk,blue
I would like to add another field to the table and populate it as follows:
notebook,blue,1
notebook.Yellow,1
pencil,yellow,2
chair,blue,2
desk,green,1
desk,blue,3
you see that I have given a consecutive number assignment based on a certain set of criteria. In this example, the criteria was a distinct value in the second field (in real life, the criteria will be a distinct combination of values from several fields, but all the relevant fields are within the same table... no join is needed to get the criteria). since there are three records with blue in field 2, these are numbered 1,2,3. And since there are two records with yellow, they are numbered 1,2.
So I can't derive the numbering from the row number, since I have several numbering series in the same table all starting with 1.
Also, I need it to be a query where I don't have to explicitly specify the value in the second field. I just want each unique value in the second field to get its own numbering series. that is, I don't want to have to explicitly write one query to generate the numbers for "blue", and write a separate query to generate the numbers for "yellow"
The maximum number of records in the series is under 1000. So I don't mind if I would need to create and auxiliary table with 1000 records, with a field containing the values 1 to 1000. Then the update statement to the primary table could pull in the next value from the auxiliary table.
But I don't know the SQL syntax to use for this update statement, or for the update statement for any other approach. So I need your advice.
I'm not sure how to do this with a single SQL statement, but here are 2 SQL statements that could be used to handle each case:
insert into table ('desk', 'blue', 1)
where not exists (select field3 from table where field1 = 'desk' and field2 = 'blue');
insert into table (field1, field2, field3)
select field1, field2, count(1) + 1
from table
where field1 = 'desk'
and field2 = 'blue'
group by field1, field2;
Create Table #TableAutoIncrement (ID int identity(1000 , 1) , item varchar(20), COLOR varchar(20) )
Insert INTO #TableAutoIncrement
(item, COLOR )
SELECT item, COLOR FROM YOURTABLE
--- GETTING all the values from the temporary table
SELECT * FROM #TableAutoIncrement
A colleague of mine worked out the necessary SQL. Here's the generalized solution (note that I really needed to number the multiple series in my data set based on a combination of two fields. In my simplified example in the original post, I was using only one field--color--but since I really need two fields, that's what I show in this solution.
SELECT *,
(SELECT COUNT(T1.ID)
FROM
[TableName] AS T1
WHERE T1.ID >= T2.ID and t1.[NameCriteriaField1] = t2.[NameCriteriaField1]
and t1.[NameCriteriaField2]= t2.[NameCriteriaField2])
AS Sequence into OutputResultsTableName
FROM
[TableName] AS T2
ORDER BY [NameCriteriaField2] , [NameCriteriaField1]
The source table is set up with "ID" as field with an integer value. Every record has a unique value of ID, but it does not matter if there are gaps in the ID or how the records are sequenced against the ID. (e.g., the typical MS access auto numbered primary key field serves this purpose)
This query is set up to assume that there are two fields in your data set that you want to use to group your records and assign a numerical series count to each record within each group. (Thus your table may contain multiple groups, and each group has its own numbering series starting with 1. But the way the query is formulated, there are exactly 2 criteria that define the group.) You cannot use any where clauses to further filter the records that get counted. Through experimentation, I found that adding where clauses gives unreliable results where records can get omitted. So if you need the results to be filtered so that some records are not to be included in the numerical series for a particular group, then do one of the following before running my query:
run a query to delete the undesired records from the source table
first copy all records from the source table into a new table and delete the records from the new table that should not be numbered, and run my query on the new table
deleting extraneous records before running this query is needed only if those records qualify as members of a group defined by criteria 1 and criteria 2. If there are extraneous records that don't match those two criteria, you can leave them in the table, because they will not impact the numbering of the records within the groups that you care about. They will just get their own independent numbering, which you can just ignore.**
The numbering of each group starts at 1, and the query dynamically defines the groups based on the distinct combinations of criteria1 and criteria2. However, if you have records that do not belong to any group, these records will all be numbered with 0. (Criteria1 and criteria2--at least to the extent of my testing--are non-null values. (In theory--at least on Microsoft Access, an empty string is different than Null, but I did not test this with empty strings either.) If you have records that have null in the criteria1 or criteria2 fields, MS Access consider these records as not belonging to any group and thus numbers them with 0. That is, these distinct groups need to define by non-null values for criteria1 and criteria2, and thus this is different than the way SQL DISTINCT statement works.
If you need to have NULL as a valid criteria for defining the group (and thus to have groups defined by NULL numbered), it's very simple. Prior to running my query, first run an update statement that changes all instances of null values in criteria1 or criteria2 to the phrase "placeholder for null field". Then run my query. On the result set (after the numbering has been assigned to the groups), run another update to change all occurrences of the placeholder phrase back to null.
Adjustment to syntax if your group is defined by only one field criteria
SELECT *,
(SELECT COUNT(T1.ID)
FROM
[TableName] AS T1
WHERE T1.ID >= T2.ID and t1.[NameCriteriaField1] = t2.[NameCriteriaField1] )
AS Sequence into OutputResultsTableName
FROM
[TableName] AS T2
ORDER BY [NameCriteriaField2] , [NameCriteriaField1]
Adjustment to syntax if your group is defined by combination of 3 field criteria
SELECT *,
(SELECT COUNT(T1.ID)
FROM
[TableName] AS T1
WHERE T1.ID >= T2.ID and t1.[NameCriteriaField1] = t2.[NameCriteriaField1]
and t1.[NameCriteriaField2]= t2.[NameCriteriaField2]
and t1.[NameCriteriaField3]= t2.[NameCriteriaField3])
AS Sequence into OutputResultsTableName
FROM
[TableName] AS T2
ORDER BY [NameCriteriaField2] , [NameCriteriaField1]

Updating rows in order with SQL

I have a table with 4 columns. The first column is unique for each row, but it's a string (URL format).
I want to update my table, but instead of using "WHERE", I want to update the rows in order.
The first query will update the first row, the second query updates the second row and so on.
What's the SQL code for that? I'm using Sqlite.
Edit: My table schema
CREATE table (
url varchar(150),
views int(5),
clicks int(5)
)
Edit2: What I'm doing right now is a loop of SQL queries
update table set views = 5, click = 10 where url = "http://someurl.com";
There is around 4 million records in the database. It's taking around 16 seconds in my server to make the update. Since the loop update the row in order, so the first query update the first row; I'm thinking if updating the rows in order could be faster than using the WHERE clause which needs to browse 4 million rows.
You can't do what you want without using WHERE as this is the only way to select rows from a table for reading, updating or deleting. So you will want to use:
UPDATE table SET url = ... WHERE url = '<whatever>'
HOWEVER... SqlLite has an extra feature - the autogenerated column, ROWID. You can use this column in queries. You don't see this data by default, so if you want the data within it you need to explicitly request it, e.g:
SELECT ROWID, * FROM table
What this means is that you may be able to do what you want referencing this column directly:
UPDATE table SET url = ... WHERE ROWID = 1
you still need to use the WHERE clause, but this allows you to access the rows in insert order without doing anything else.
CAVEAT
ROWID effectively stores the INSERT order of the rows. If you delete rows from the table, the ROWIDs for remaining rows will NOT change - hence it is possible to have gaps in the ROWID sequence. This is by design and there is no workaround short of re-creating the table and re-populating the data.
PORTABILITY
Note that this only applies to SQLite - you may not be able to do the same thing with other SQL engines should you ever need to port this. It would be MUCH better to add an EXPLICIT auto-number column (aka an IDENTITY field) that you can use and manage.