Can a mft_reference correspond to two different files at different time? - ntfs

I am working on parsing USN Journal files now, and what I know is that in USN Journal log entry, there is a mft_reference field, it references the corresponding FileRecord in MFT table.
After a period of time, the USN Journal files may accumulate quite lot of file change records, such as file adding, file modifying, file deleting.
If I just get a mft_reference number(64 bits integer) mft_refer_1 at the very beginning of the USN Journal file, and get another mft_reference number mft_refer_2 at the end of the USN Journal file, and they are equal in value, mft_refer_1 == mft_refer_2 Can I say the two journal records are specifying the same file?What I am not quite sure is if an later added FileRecord will replace the position of a former deleted FileRecord.
Thank you in advance!

I figure out this by experimenting with "fsutil usn" tools;
First we should know how mft_refer is composed:
0xAAAABBBBBBBBBBBB, where AAAA stands for update number, and BBBBBBBBBBBB stands for File Record index into MFT table.
First I create a text document named by "daniel.txt", and find out its mft_refer is 0x00050000000c6c3f,
and then I delete it to Recycle Bin, its name is changed to something like "$R2QW90X.txt", but its mft_refer is still 0x00050000000c6c3f,
I delete it thoroughtly from Recycle Bin, and create another document also named as "daniel.txt", now the new document's mft_refer is 0x00040000000c6c48,
and then I create several other temporary files, one of these files occupies the 0x00000000000c6c3f-th file record with an updated mft_refer 0x00060000000c6c3f.
So my coclusion is the file record space is very precious in MFT, if a previous file has been thoroughtly deleted, then the file record space will be reclaimed for a new created file, but will update the "update number" field in mft_refer.
For the detailed experiment process, see here

Related

Suggestions for file and data tranforms using SQL Query Results to manipulate existing PDF Files

Apologies if something similar to the question I'm asking has already been addressed. I'm not even sure how to best frame my question but I haven't been able to find any posts that are obviously germane. I'm hoping someone has some experience with this and might be willing to offer some suggestions. My company has already contracted to have the bulk of our database converted to HTML for ETL purposes and we simply can't afford to double the already barely-manageable costs of this project by adding this additional requirement to the scope.
We have a SQL database from an EMR software vendor that our company has now left. Due to recent economic factors, we just just can't afford to stay with them any longer. When we left this ex-vendor begrudgingly provided us with a backup copy of our SQL database along with copies of all the scanned images our users have uploaded via their application GUI over the years. I was told they stored the uploads as BLOB data but it turns out not. They weren't actually storing the files in the database at all. Instead, they moved the image to a storage location and wrote the ID, DocType, Filename, DirPath and other document information to the Document table of the DB. It makes sense but leaves us in a bind. Mainly because the filename appears to have been randomly generated at upload. So we now have 50,000 image files with unintelligible filenames stored in a date-based folder structure with no way to correlate any of them with the patients to whom they belong. A couple of examples are as follows:
/root/2020/05102019/69353829-e46b-47e7-ab56-a1762424f0dd.pdf
/root/2014/09282017/385ba21d-e108-4cbb-9287-91110c16edb0.jpg
I compiled a list of attribs so I can make any of them available to the transform. I pulled:
SELECT * FROM document d
JOIN patients p ON d.PatientId = p.pid
JOIN users u ON d.PatientId = u.uid
WHERE u.UserType = '3' AND d.fileformat is NOT NULL AND d.dirpath LIKE 'm%'
ORDER BY u.ulname;
This gave me all patient and document attribs resulting in a list with 197 columns. The challenge is the new EMR vendor can only import these files if all the files for each patient are in a dedicated folder at the patient level so I need the files in a new folder structure. I am trying to do it without abandoning things like PatientID, Scan Date, Description (the customName column), Scanned By, and a possibly a couple other points.
I'll probably end up making the file name something like a concat of customName+docID for identification purposes. Then I'll just need to get the files in something like a /Patient/Docs.extension folder structure.
I went ahead and flattened all the files into a single folder figuring that would make it easier to manipulate. I batched them out like so:
md "D:\OneDrive\Documents\Assets\eClinicalworks\PID\FTP\mobiledoc\Documents\All\"
cd /d "D:\OneDrive\Documents\Assets\eClinicalworks\PID\FTP\mobiledoc\Documents\"
for /r %d in (*) do copy "%d" "D:\OneDrive\Documents\Assets\eClinicalworks\PID\FTP\mobiledoc\Documents\All\"
Now I have them all together.
Screenshot
I still have to figure out how to get them into the new folder structure by patient though.
Just to have it mentioned, I was originally considering using SQL so I could recreate the files and assign the desired attribs as file attribs in one step.
To answer the question asked about the HTML conversion, we have tons of Progress Notes, Doctors Notes, Prescriptions, etc in the database. The only way to get them to the new EMR is to export them to HTML and group them at the patient level so the new vendor can import them.
Honestly, after having to wrestle with all this garbage, I would prefer to avoid this situation in the future by refusing to upload them to the new EMR at all. Instead, just put all these documents on OUR file server and give the new EMR a hyper-link to insert into each patient's patient record that would open all the patient files. The new EMR is browser-based so it could be feasible but I doubt I'll be able to get them to write files to our file server moving forward so doing so would likely just end up making the end-user experience more disjointed.
I don't think your contractors did anything wrong tbh. Taking uploaded files with all their problem characters/duplicated names (got more than one patient called JohnSmith.jpg?) etc and renaming them to a GUID so they can coexist alongside other images without overwriting them is a) sensible and b) what I would do.
I also wouldn't store images inside a database as then the only thing you can do with them is get them out again; something you have to do every time you want to do anything with them. Being able to map an images folder to a url on your web server and then send html using just the file name means that the web server can sever the image without having to pull it out of the db; the db doesn't have to involve itself in pointless IO.
The way to correlate these images with the patients to whom they belong is done by the database. Somewhere else in the db structure will be eg a Patient record with a DocumentId column that links to this document record or a PatientDocuments table that has PatientId/DocumentId pairs.
If there is not, then storing the document bytes in the db won't have helped relate them to the patient, because this relation is not about where the bytes of an image are, it's about what other data was stored to make for a usable system. As it stands your thoughts on the matter, of uploading tens of thousands of images into a db just so you can... er.. get them all out again, would seem to indicate you haven't yet fully grasped the reasons behind why your contractors did what they did.
Because you're under the impression that you can do this, you seem to know how the db relates a document to a patient (if it doesn't then your proposed process will fail) and as such you can arrange for a suitable renaming process without needing to move the image data anywhere. In essence, you're failing to see that a file system storing file data against unique paths is no different to a database table storing file data against unique ids. Your database tables for documents clearly thus links to your file system/file system can be viewed as an extension to the documents table. You need the other tables in the db to make sense of the files, but you need the other tables in a db to make sense of any table in a db. These are key concepts of modelling related data
I don't recommend you undertake the process you propose, but I'm sure that won't dissuade. Consider then (because you didn't really post any details we can work with) this assumed scenario:
Patients
Name,DocumentId
John Smith,1
Jane Doe,2
Documents
Id,FilePath
1,'/root/2020/05102019/69353829-e46b-47e7-ab56-a1762424f0dd.pdf'
2,'/root/2014/09282017/385ba21d-e108-4cbb-9287-91110c16edb0.jpg'
SELECT CONCAT('REN ', d.filePath, ' "', p.Name, RIGHT(d.filePath, 4), '"')
FROM
Patients p
INNER JOIN Documents d ON p.DocumentId = d.DocumentId
The results of the query will essentially be a batch file full of rename commands that renames all the files into a single folder, organized by patient name.
And now all your multiple patients with the same names will overwrite each other and everything will end up in a mess
It also makes my point for me about "don't store files in the db" - look how easy it is to manipulate files when they're in a file system, using existing commands that understand filesystems and files and do things like rename files, or extract exif data, rotate, resize and print... if all those images were in your db the only thing you could do with them, is get them out again; sqlserver cannot rotate, resize, print etc BLOB data but there are thousands of tools out there that understand files and can convert them - those tools cannot understand your db so putting files into a db saddles you with the problem that they become useless until dug out again
Your contractors may not have been so daft as you think; pause a moment before you set about hacking apart all they did, and question whether your driver for doing so is actually correct. If Jane from reception needs to see a picture of John Smith with drivers license XY1234 to ID him, don't provide her with a shared drive full of everyone's pictures, and let her double click, drag and accidentally delete her way around the file system. Provide her with an app that looks in the db, gets the unintelligible but helpfully unique filename off disk and opens it for her to view. And make the file system read only to everyone other than the app, so that users can't break things

How can i control the multiple users accessing the same database

I am making my first project using vb.net and access. I am trying to develop a project in which the data of patients of the is added from different counters. what I developed is working fine when only one counter adds data in database but when two counters access the same database and try to save the record it gives error primary key can not be duplicate.
I doing what, first I am generating a primary key number i.e. patient no that is unique to every patient. patient no. is one increment to the last saved record. then the user enter (counter data entry operator) adds the patient details and then hit the save button.
In multi-user environment both the operators generate the same patient no. when they hit the new record button as both see the same last saved record. while saving the record one operator save the record successfully but other operator get the duplicate primary key error.
Pessimistic and optimistic locks are not working for me or i am not understanding how to use them.
rsS As New ADODB.Recordset
rsS.Open(str, conn, ADODB.CursorTypeEnum.adOpenDynamic, ADODB.LockTypeEnum.adLockPessimistic)
What I have tried:
I also tried to solve this problem by saving the patient no. in another variable oldPatientno and beore saving the record checked if there is any change in the database if so then regenerate the patient no. but this is not working.

How do I prevent a duplicate file from being imported into an access table

I have an access database I am building for sales reporting. I am automating a process to import sales transactions from the point of sale on a weekly basis. I want to develop a way to perform a simple check to validate that the file being Imported is not the same as the previous weeks file.
The file will always have the same file name and be in the same folder which access will look for when the macro I have written runs.
My proposed solution was to create a staging table for loading the sales transactions into and a backup of that staging table for comparison. Each week I would backup the temporary table which would have last weeks transactions in it and then load the new file into the staging table. To validate that the new file loaded is not identical to the previous week I would sum the value in the "total sell" column of the backup table and the staging table and compare the values.
I need help to create the code/query to do this and how I would insert it into the macro I have build. Or help in coming up with any other solutions.
I have searched quite a bit on the web but haven't found a solution to this.
This is a link to sample data
https://drive.google.com/file/d/0BwD_Ubcf_4voSnN2elFvTWI2QTA/view?usp=sharing
Please, read my comment to #Gene Skuratovsky solution. I'd suggest there to create another table (in pseudo code):
TABLE ImportedFiles
(
ImportID Integer PrimaryKey,
FilePath String,
ImportDate Date
)
Before you start importing a file, you need to check if record exists in corresponding table ;)
You can use DLookup function to check if record exists.
Function FileHasBeenImported(ByVal sFullFileName As String) As Boolean
FileHasBeenImported = (DLookup("[FilePath]", "ImportedFiles", "[FilePath] =" & sFullFileName )<>"")
End If
Each record should be DateTime "stamped" (you would certainly design something like a [Transaction_DateTime] field into your record structure). If the files you import are indeed "strictly" weekly, you can check just one record to see if this is a new file or an old one. Otherwise, check them all.
Edit:
You do not need the Time part, the Date is sufficient. Assuming you import data into a recordset, you will need something like this (brute force will work just fine, and quick too:
rstX.MoveFirst
Do
If rstX("Trans Date") <= Date - 7 Then
MsgBox "Found a transaction less that 1 week old!"
Exit Do
End If
rstX.MoveNext
Loop Until rstX.EOF
If rstX.EOF Then
MsgBox "All transactions are at least 1 week old!"
End IF
Modify this as appropriate.

Optimal way to add / update EF entities if added items may or may not already exist

I need some guidance on adding / updating SQL records using EF. Lets say I am writing an application that stores info about files on a hard disk, into an EF4 database. When you press a button, it will scan all the files in a specified path (maybe the whole drive), and store information in the database like the file size, change date etc. Sometimes the file will already be recorded from a previous run, so its properties should be updated; sometimes a batch of files will be detected for the first time and will need to be added.
I am using EF4, and I am seeking the most efficient way of adding new file information and updating existing records. As I understand it, when I press the search button and files are detected, I will have to check for the presence of a file entity, retrieve its ID field, and use that to add or update related information; but if it does not exist already, I will need to create a tree that represents it and its related objects (eg. its folder path), and add that. I will also have to handle the merging of the folder path object as well.
It occurs to me that if there are many millions of files, as there might be on a server, loading the whole database into the context is not ideal or practical. So for every file, I might conceivably have to make a round trip to the database on disk to detect if the entry exists already, retrieve its ID if it exists, then another trip to update. Is there a more efficient way I can insert/update multiple file object trees in one trip to the DB? If there was an Entity context method like 'Insert If It Doesnt Exist And Update If It Does' for example, then I could wrap up multiple in a transaction?
I imagine this would be a fairly common requirement, how is it best done in EF? Any thoughts would be appreciated.(oh my DB is SQLITE if that makes a difference)
You can check if the record already exists in the DB. If not, create and add the record. You can then set the fields of the record which will be common to insert and update like the sample code below.
var strategy_property_in_db = _dbContext.ParameterValues().Where(r => r.Name == strategy_property.Name).FirstOrDefault();
if (strategy_property_in_db == null)
{
strategy_property_in_db = new ParameterValue() { Name = strategy_property.Name };
_dbContext.AddObject("ParameterValues", strategy_property_in_db);
}
strategy_property_in_db.Value = strategy_property.Value;

Fortran 90 OPEN file

I've been working on my project about bank account transactions (withdraw, deposit, check cashed, and balance inquiry) using "account.txt".
My TA said that I have to use temporary file. This temporary file will read line by line to find what the user is looking for. However, I did not understand this temporary OPEN file at all. Does anyone explain what that is, and if it's possible, would you attach example of it?
Here are the project instructions:
This project is about writing a program to perform transactions on bank accounts. You will be given a file which contains all the accounts in the bank (the file is named “account.txt”). Your program is to provide an interactive menu for users to perform transactions on these accounts. Your program needs to update the account file after each transaction. The user may perform transactions on accounts that are not available. Your program needs print an error message on the screen and return to the menu. In addition, your program needs to print whether a transaction is successful. For unsuccessful transaction, your program will print out the reason for the failed transaction.
Your program needs to be able to handle the following transactions:
Deposit money into an account
Withdraw money from an account
Check cashed against an account
Balance inquiry of an account
There is a limit on how many checks can be cashed against a saving account. The limit is 2 checks per month. There is a $0.25 penalty for each check cashed over the limit. If there is enough fund to cash the check but not the penalty, the transaction should go through and the resulting balance would be zero.
Here is the format in the account file for one account (data fields are separated by exactly one space):
Account type, S for saving, C for checking (1 character)
Account number of 5 digits
Last name of account holder (15 characters)
First name of account holder (15 characters)
Balance of the account in the form xxxxx.xxx
An integer field indicating how many checks have been cahsed this month (three digit)
An interest rate in the form of xx.xx (e.g. 10.01 = 10.01%)
For names with fewer than 15 characters, the data will be padded to have width of 15 characters.
Here is an example of the account file:
C 12345 Smith John 100.000 10 0.00
S 45834 Doe Jane 3462.340 0 0.30
C 58978 Bond Jones 13.320 5 0.00
*Creating temporary file
There is a way in FORTRAN to create a temporary file. Use:
OPEN(UNIT = , STATUS = "SCRATCH", ...)
There is no need to provide (FILE = ””). By using a temporary file, you can copy the accounts from the account file to the temporary file. Then when you copy the data back from the temporary file to the account file, perform the necessary transactions. Your program should not copy accounts between these two files if a transaction is to be failed.
Please forgive my english, I'm Japanese.
The are saying that a statement such as:
OPEN (7, ACCESS = 'DIRECT',STATUS = 'SCRATCH')
You can create a temporary file--one that will only live until you close it, and not be saved to disk. This file needs no name (it's never going to be referred to by name) just a unit number (in my example 7).
You can use this file to hold the account information temporarily during a transaction. You need this because, when you are inserting rows into the real file, and you don't want to overwrite subsequent data. So they are saying:
Copy everything to a temporary file
If the transaction succeeds, copy the data back to the main file but
Omit rows that are to be deleted
Add in the rows that are to be inserted
Does that help?