My company has a large Access database which lists every user that has ever existed for a particular client of ours. This database is curated manually. I have been asked to delete 500+ users. This means I have to modify three columns for each user:
Status (must be changed to "deleted")
Date Deleted (to current date)
Date Revised (to current date)
Obviously, I don't want to have to Ctrl+F and change these fields manually for over 500 users. What is the easiest way to go about doing this more quickly?
I have the list of users that need to be deleted in Excel. I tried to create a query that shows all of these users in one table so that I don't have to sort through the users who don't need to be modified. It looked something like this:
SELECT UserID, Status, Date Deleted, Date Revised
FROM [database name]
WHERE UserID = 'a'
OR UserID 'b'
//(and then 500+ more OR statements for each UserID)
ORDER BY UserID;
I figured if I can at least do this, at least I'll have all the users I need to edit in front of me so that I don't have to Ctrl+F. But this didn't work, because it exceeded the 1,024 character limit in Access. Any ideas for how I can accomplish this?
Don't attempt to write 500+ UserID values into your SQL statement. Instead, import the Excel list as a table into your Access database. Then you can use that list of UserID values to identify which rows of your main table should be updated.
UPDATE MainTable AS m
SET m.Status = 'deleted', m.[Date Deleted] = Date(), m.[Date Revised] = Date()
WHERE m.UserID IN (SELECT UserID FROM tblFromExcel)
Related
One of our staff members made a change on the live DB that has wiped the address details of circa 3000 users. I have attached yesterdays backup to a new DB, meaning I have a table of all the correct address on a different database.
Is there an easy way I can essentially copy this table from the backup DB to the live DB? I will of course test it first, as I wish this user did. I don't want to restore from the backup is it will erase a lot in different tables.
I have attached a backup of the DB in order to extract the records needed.
I probably should've added this, but the entire table hasn't been erased, just a number of records within the table. The table itself has
SerialNum, FirstName, LastName, Address 1, PostCode etc
The SerialNum, FirstName, LastName are all still fine, it's just the address fields that have been erased through a bad data import
You want something like this:
INSERT INTO ProdDB.dbo.[TableName]
SELECT *
FROM RestoredDB.dbo.[TableName] r
WHERE NOT EXISTS (SELECT 1 FROM ProdDB.dbo.[TableName] p WHERE p.ID = r.ID)
Of course you'll need to adjust that based on the real primary key and name of the table, and you may also need to turn on IDENTITY_INSERT.
I have a database with 1Mil+ rows in it.
This database consists (for the sake of this question) of 2 columns; user_id, and username.
These values are not controlled by my application; I am not always certain that these are the current correct values. All I know is that the user_id is guaranteed to be unique. I get periodic updates which allow me to update my database to ensure I have an "eventually consistent" version of the user_id/username mapping.
I would like to be able to retrieve the latest addition of a certain username; "older" results should be ignored.
I believe there are two possible approaches here:
- indexing: there should be an index of username:row (hashmap?) where username is always the last added username; so gets updated on each row addition, or update.
- Setting username as unique, and doing an on conflict update to set the old row to the empty string, and the new row to the username
From what I've understood about indexing, it sounds like its the faster option (and wont require me checking the unicity of 1Mil rows in my database). I also hear hashmaps are a pain because they require rebuilding, so feel free to give other ideas.
My current implementation does a full search over the entire database, which is beginning to get quite slow at 1Mil+ rows. It currently gets the "last" value of this added string; which I am not even sure is a valid assumption at this point.
Given a sample database:
user_id, username
3 , bob
2 , alice
4 , joe
1 , bob
I would expect a search of `username = bob` to return (1, bob).
I cannot rely on ID ordering to solve this, since there is no linearity to which ID is assigned to which username.
You can do this using:
select distinct on (id) s.*
from sample s
where s.username = 'bob'
order by s.id desc;
For performance, you want an index on sample(username, id).
Alternatively, if you are doing periodic bulk updates, then you can construct a version of the table with unique rows per username:
create table most_recent_sample as
select max(id) as id, username
from sample
group by username;
create index idx_most_recent_sample_username on most_recent_sample(username);
This might take a short amount of time, but you are doing the update anyway.
I have an ODBC database that I've linked to an Access table. I've been using Access to generate some custom queries/reports.
However, this ODBC database changes frequently and I'm trying to discover where the discrepancy is coming from. (hundreds of thousands of records to go through, but I can easily filter it down into what I'm concerned about)
Right now I've been manually pulling the data each day, exporting to Excel, counting the totals for each category I want to track, and logging in another Excel file.
I'd rather automate this in Access if possible, but haven't been able to get my heard around it yet.
I've already linked the ODBC databases I'm concerned with, and can generate the query I want to generate.
What I'm struggling with is how to capture this daily and then log that total so I can trend it over a given time period.
If it the data was constant, this would be easy for me to understand/do. However, the data can change daily.
EX: This is a database of work orders. Work orders(which are basically my primary key) are assigned to different departments. A single work order can belong to many different departments and have multiple tasks/holds/actions tied to it.
Work Order 0237153-03 could be assigned to Department A today, but then could be reassigned to Department B tomorrow.
These work orders also have "ranking codes" such as Priority A, B, C. These too can be changed at any given time. Today Work Order 0237153-03 could be priority A, but tomorrow someone may decide that it should actually be Priority B.
This is why I want to capture all available data each day (The new work orders that have come in overnight, and all the old work orders that may have had changes made to them), count the totals of the different fields I'm concerned about, then log this data.
Then repeat this everyday.
the question you ask is very vague so here is a general answer.
You are counting the items you get from a database table.
It may be that you don't need to actually count them every day, but if the table in the database stores all the data for every day, you simply need to create a query to count the items that are in the table for every day that is stored in the table.
You are right that this would be best done in access.
You might not have the "log the counts in another table" though.
It seems you are quite new to access so you might benefit form these links videos numbered 61, 70 here and also video 7 here
These will help or buy a book / use web resources.
PART2.
If you have to bodge it because you can't get the ODBC database to use triggers/data macros to log a history you could store a history yourself like this.... BUT you have to do it EVERY day.
0 On day 1 take a full copy of the ODBC data as YOURTABLE. Add a field "dump Number" and set it all to 1.
1. Link to the ODBC data every day.
join from YOURTABLE to the ODBC table and find any records that have changed (ie test just the fields you want to monitor and if any of them have changed...).
Append these changed records to YOURTABLE with a new value for "dump number of 2" This MUST always increment!
You can now write SQL to get the most recent record for each primary key.
SELECT *
FROM Mytable
WHERE
(
SELECT PrimaryKeyFields, MAX(DumpNumber) AS MAXDumpNumber
FROM Mytable
GROUP BY PrimaryKeyFields
) AS T1
ON t1.PrimaryKeyFields = Mytable.PrimaryKeyFields
AND t1.MAXDumpNumber= Mytable.DumpNumber
You can compare the most recent records with any previous records.
ie to get the previous dump
Note that this will NOT work in the abvoe SQL (unless you always keep every record!)
AND t1.MAXDumpNumber-1 = Mytable.DumpNumber
Use something like this to get the previous row:
SELECT *
FROM Mytable
INNER JOIN
(
SELECT PrimaryKeyFields
, MAX(DumpNumber) AS MAXDumpNumber
FROM Mytable
INNER JOIN
(
SELECT PrimaryKeyFields
, MAX(DumpNumber) AS MAXDumpNumber
FROM Mytable
GROUP BY PrimaryKeyFields
) AS TabLatest
ON TabLatest.PrimaryKeyFields = Mytable.PrimaryKeyFields
AND
TabLatest.MAXDumpNumber <> Mytable.DumpNumber
-- Note that the <> is VERY important
GROUP BY PrimaryKeyFields
) AS T1
ON t1.PrimaryKeyFields = Mytable.PrimaryKeyFields
AND t1.MAXDumpNumber= Mytable.DumpNumber
Create 4 and 5 and MS Access named queries (or SS views) and then treate them like tables to do comparison.
Make sure you have indexes created on the PK fields and the DumpNumber and they shoudl be unique - this will speed things up....
Finish it in time for christmas... and flag this as an answer!
I have a very simple table where I keep player bans. There are only two columns (not actually, but for simplicity's sake) - a player's unique ID (uid) and the ban expire date (expiredate)
I don't want to keep expired bans in the table, so all bans where expiredate < currentdate need to be deleted.
To see if a player is banned, I query this table with his uid and the current date to see if there are any entries. If there are - we determine that the player is banned.
So I need to run two queries. One that would fetch me the bans, and another that would clean up the table of redundant bans.
I was wondering if it would be possible to combine these queries into one. Select and return the entry if it is still relevant, and remove the entry and return nothing if it is not.
Are there any nice ways to do this in a single select query?
Edit:
To clarify, I actually have some other information in the table, such as the ban reason, the ban date, etc. so I need to return the row as well as delete irrelevant entries.
Unfortunately you cannot have a delete statement inside a select statement... I got you a link from sqlite docos for your reference.
Select statements are only used for retrieving data. Although a select could be used in a sub-query for a delete statement, it would still be used for retrieving data.
You must execute the two statements in separated in your case.
delete from player_bans p
where exists (
select 1 from player_bans
where p.uid = uid
and expire_date < current_date)
Is this what you are trying to do?
DELETE FROM TableName
WHERE expiredate < CURDATE();
I'm trying to think of an efficient way to allow a group of people to work through a queue of data entry tasks. Previously we've just had one person doing this so it hasn't been an issue. The back-end is an RDBMS and the front-end is a web-application.
Currently we do something like this:
To assign a record for editing:
SELECT * FROM records WHERE in_edit_queue LIMIT 1;
Then,
To save changes to a previously assigned record:
UPDATE records SET ..., in_edit_queue = false
WHERE id = ? AND in_edit_queue = true;
This means it's possible for two users to be assigned the same record to edit, and we favor the first one that submits, failing silently on subsequent submissions, e.g.:
User A loads up record 321 for editing
User B loads up record 321 for editing
User B submits changes (they are saved in the DB)
User A submits changes (they are not saved in the DB)
(Note: We can trust all our users to submit acceptable data, so there is no need for us to keep the data from the second UPDATE.)
The problem with this method is when users start at the same time and edit at roughly the same speed, they are often updating the same records but only 1 of them is getting saved. In other words, wasting a lot of man-hours. I can mitigate this to some extent by picking random rows but I'd prefer something a bit more guaranteed.
So here's what I'm thinking...
Have a table called: locked_records (record_id integer, locked_until timestamp)
-- Assign a record for editing:
-- Same as before but also make sure the
-- record is not listed in locked_records...
SELECT * FROM records
WHERE in_edit_queue AND id NOT IN (
SELECT record_id FROM locked_records
WHERE locked_until > now() )
LIMIT 1;
-- ..and effectively remove it from
-- the queue for the next 5 minutes
INSERT INTO locked_records (record_id, locked_until)
VALUES (?, now() + 300);
Then:
UPDATE records SET ..., in_edit_queue = false
WHERE id = ? AND in_edit_queue = true;
DELETE FROM locked_records WHERE record_id = ?;
A typical edit takes about 30 seconds to 1 minute, 5 minutes out of the queue should be a good amount. I can also have an XHR on the web app keep updating the lock if it turned out to be advantageous.
Can anyone offer thoughts on this? Sound like a good way of doing things? Sound like a terrible way? Done this before? I'd love to hear some feedback.
Thanks!
J
What about the RDBMSes internal list of locks? Would altering the SELECT statement to be SELECT FOR UPDATE be an option?
Another idea: these records have two additional columns: assigned_to and completed.
When someone wants to edit a record, do something like
update records set assigned_to = ? # assigning to 'me'
where assigned_to is null
and completed = false
limit 1 # only assign one record at a time
Then, to get that row back:
select ...
from records
where assigned_to = ? # assigned to 'me'
and completed = false
And once you're done you set the completed to 'true'.
You could have an additional timestamp column for when a record was assigned to someone, and then add an OR alternative to the "assigned_to is null" part of the where clause in the update statement above, where you require a certain recensy for an assignment to be valid.