Optimal way to add / update EF entities if added items may or may not already exist - sql

I need some guidance on adding / updating SQL records using EF. Lets say I am writing an application that stores info about files on a hard disk, into an EF4 database. When you press a button, it will scan all the files in a specified path (maybe the whole drive), and store information in the database like the file size, change date etc. Sometimes the file will already be recorded from a previous run, so its properties should be updated; sometimes a batch of files will be detected for the first time and will need to be added.
I am using EF4, and I am seeking the most efficient way of adding new file information and updating existing records. As I understand it, when I press the search button and files are detected, I will have to check for the presence of a file entity, retrieve its ID field, and use that to add or update related information; but if it does not exist already, I will need to create a tree that represents it and its related objects (eg. its folder path), and add that. I will also have to handle the merging of the folder path object as well.
It occurs to me that if there are many millions of files, as there might be on a server, loading the whole database into the context is not ideal or practical. So for every file, I might conceivably have to make a round trip to the database on disk to detect if the entry exists already, retrieve its ID if it exists, then another trip to update. Is there a more efficient way I can insert/update multiple file object trees in one trip to the DB? If there was an Entity context method like 'Insert If It Doesnt Exist And Update If It Does' for example, then I could wrap up multiple in a transaction?
I imagine this would be a fairly common requirement, how is it best done in EF? Any thoughts would be appreciated.(oh my DB is SQLITE if that makes a difference)

You can check if the record already exists in the DB. If not, create and add the record. You can then set the fields of the record which will be common to insert and update like the sample code below.
var strategy_property_in_db = _dbContext.ParameterValues().Where(r => r.Name == strategy_property.Name).FirstOrDefault();
if (strategy_property_in_db == null)
{
strategy_property_in_db = new ParameterValue() { Name = strategy_property.Name };
_dbContext.AddObject("ParameterValues", strategy_property_in_db);
}
strategy_property_in_db.Value = strategy_property.Value;

Related

How to import updated records from XML files into SQL Database?

So from last few weeks I was trying to design a SSIS package that would read some XML files that I have and move the data from it to the multiple tables I want.
These file contains different nodes like Individual (parent node) and Address, Alias, Articles (all child nodes of Individual) etc.
Data in those files look like this:
<Individuals>
<Individual>
<UniqueID>1001</UniqueID>
<Name>Ben</Name>
<Soft_Delete>N</Soft_Delete>
<Soft_Delete_Date>NULL</Soft_Delete_Date>
</Individual>
<Addresses>
<Address>
<Address_Line_1>House no 280</Address_Line_1>
<Address_Line_2>NY</Address_Line_2>
<Country>US</Country>
<Soft_Delete>N</Soft_Delete>
<Soft_Delete_Date>NULL</Soft_Delete_Date>
</Address>
<Address>
<Address_Line_1>street 100</Address_Line_1>
<Address_Line_2>California</Address_Line_2>
<Country>US</Country>
<Soft_Delete>N</Soft_Delete>
<Soft_Delete_Date>NULL</Soft_Delete_Date>
</Address>
</Addresses>
</Individuals>
I was successful in designing it and now I have a different task.
The files I had were named like this: Individual_1.xml,Individual_2.xml,Individual_3.xml etc.
Now I have received some new files which are named like this:
Individual_UPDATE_20220716.xml,Individual_UPDATE_20220717.xml,Individual_UPDATE_20220718.xml,Individual_UPDATE_20220720.xml etc
Basically these files contains the updated information of previously inserted records
OR
There are totally new records
For example:
A record or a particular information like Address of an Individual was Soft Deleted.
Now I am wondering how would I design or modify my current SSIS package to update the data from these new files into my database?
Any guidance would be appreciated....
Thank you...
It looks like you have no problem reading the XML so I won't really talk about that. #Yitzack's comment to his prior answer is "a" way to do it. However, his answer assumes you can create staging tables. To do this entirely inside SSIS, the way to do it is as follows...
I would treat all the files the same (as long as they have the same data structure and it seems like the case.
Read the XML as a source.
Proceed to a lookup.
Set the lookup to ignore errors (this is handled in the next step)
Set your lookup to the destination table and lookup uniqueID and add to the data flow. Since you said ignore errors, SSIS will insert a null in that field if the lookup fails to find a match.
Condition Split based on the destination.UniqueID == null and call that output inserts and change the default to updates.
Add a SQL statement to update the existing record and map your row to it (this is somewhat slow and why the merge is better with large data sets).
Connect from Condition Split to update SQL statement and map appropriately.
Add an insert, connect the proper output from Conditional Split and map.
Note: It looks like you are processing from a file system, and it is likely that order is important. You may have to order you foreach loop. I will provide a simple example that you can modify.
Create a filesInOrder variable of type object.
Create a script task.
Add filesInOrder as a read/write variable
Enter script...
var diFiles = new System.IO.DirectoryInfo(#"path to your folder").GetFiles("*.xml");
var files = diFiles.OrderBy(o => o.CreationTime).Select(s => s.FullName);
Dts.Variables["filesInOrder"].Value = files.ToArray();
Make sure you add Using System.Linq; to your code.
Finally, use filesInOrder as your base to a foreach component based on an ADO enumerator (filesInOrder).

Multiple users accessing a linked table occasionally see a message "Cannot update. Database or object is read-only"

We have a split MS Access database. When users log on, they are connected/linked to two separate Access database (one for the specific project they are working on and one for record locking (and other global settings)). The "locking" database is the one I need to find a solution for.
One of the tables "tblTS_RecordLocking", simply stores a list of user names and the recordID of the record they are editing. This never has more than 50 records - usually being closer to 5-10. But before a user can do anything on a record, it opens "tblTS_RecordLocking" to see if the record is in use (so it happens a lot):
Set recIOC = CurrentDb.OpenRecordset("SELECT tblTSS_RecordLocking.* FROM tblTSS_RecordLocking WHERE (((tblTSS_RecordLocking.ProjectID_Lock)=1111) AND ((tblTSS_RecordLocking.RecordID_Lock)=123456));", , dbReadOnly)
If it's in use, the user simply gets a message and the form/record remains locked. If not in use, it will close the recordset and re-open it so that the user name is updated with the Record ID:
Set recIOC = CurrentDb.OpenRecordset("SELECT tblTSS_RecordLocking.* FROM tblTSS_RecordLocking WHERE (((tblTSS_RecordLocking.UserName_Lock)='John Smith'));")
If recIOC.EOF = True Then
recIOC.AddNew
recIOC.Fields![UserName_Lock] = "John Smith"
Else
recIOC.Edit
End If
recIOC.Fields![RecordID_Lock] = 123456
recIOC.Fields![ProjectID_Lock] = 111
recIOC.Update
recIOC.Close: set recIOC=Nothing
As soon as this finishes, everything realting to the second database is closed down (and the .laccdb file disappears).
So here's the problem. On very rare occasions, a user can get a message:
3027 - Cannot update. Database or object is read-only.
On even rarer occasions, it can flag the db as corrrupt needing to be compressed and re-indexed.
I really want to know the most reliable way to do the check/update. The process may run several hundred times in a day, and whilst I only see the issue once every few weeks, and for the most part handle it cleanly (on the front-end), I'm sure there is a better, more reliable way.
I agree with mamadsp that moving to SQL is the best option and am already in the process of doing this. However, whilst I was not able to create a fix for this issue, I was able to find a work-around that has yet to fail.
Instead of having a single lock table in the global database. I found that creating a lock table in the project database solved the problem. The main benefit of this is that there are much fewer activities on the table. So, not perfect - but it is stable.

Suggestions for file and data tranforms using SQL Query Results to manipulate existing PDF Files

Apologies if something similar to the question I'm asking has already been addressed. I'm not even sure how to best frame my question but I haven't been able to find any posts that are obviously germane. I'm hoping someone has some experience with this and might be willing to offer some suggestions. My company has already contracted to have the bulk of our database converted to HTML for ETL purposes and we simply can't afford to double the already barely-manageable costs of this project by adding this additional requirement to the scope.
We have a SQL database from an EMR software vendor that our company has now left. Due to recent economic factors, we just just can't afford to stay with them any longer. When we left this ex-vendor begrudgingly provided us with a backup copy of our SQL database along with copies of all the scanned images our users have uploaded via their application GUI over the years. I was told they stored the uploads as BLOB data but it turns out not. They weren't actually storing the files in the database at all. Instead, they moved the image to a storage location and wrote the ID, DocType, Filename, DirPath and other document information to the Document table of the DB. It makes sense but leaves us in a bind. Mainly because the filename appears to have been randomly generated at upload. So we now have 50,000 image files with unintelligible filenames stored in a date-based folder structure with no way to correlate any of them with the patients to whom they belong. A couple of examples are as follows:
/root/2020/05102019/69353829-e46b-47e7-ab56-a1762424f0dd.pdf
/root/2014/09282017/385ba21d-e108-4cbb-9287-91110c16edb0.jpg
I compiled a list of attribs so I can make any of them available to the transform. I pulled:
SELECT * FROM document d
JOIN patients p ON d.PatientId = p.pid
JOIN users u ON d.PatientId = u.uid
WHERE u.UserType = '3' AND d.fileformat is NOT NULL AND d.dirpath LIKE 'm%'
ORDER BY u.ulname;
This gave me all patient and document attribs resulting in a list with 197 columns. The challenge is the new EMR vendor can only import these files if all the files for each patient are in a dedicated folder at the patient level so I need the files in a new folder structure. I am trying to do it without abandoning things like PatientID, Scan Date, Description (the customName column), Scanned By, and a possibly a couple other points.
I'll probably end up making the file name something like a concat of customName+docID for identification purposes. Then I'll just need to get the files in something like a /Patient/Docs.extension folder structure.
I went ahead and flattened all the files into a single folder figuring that would make it easier to manipulate. I batched them out like so:
md "D:\OneDrive\Documents\Assets\eClinicalworks\PID\FTP\mobiledoc\Documents\All\"
cd /d "D:\OneDrive\Documents\Assets\eClinicalworks\PID\FTP\mobiledoc\Documents\"
for /r %d in (*) do copy "%d" "D:\OneDrive\Documents\Assets\eClinicalworks\PID\FTP\mobiledoc\Documents\All\"
Now I have them all together.
Screenshot
I still have to figure out how to get them into the new folder structure by patient though.
Just to have it mentioned, I was originally considering using SQL so I could recreate the files and assign the desired attribs as file attribs in one step.
To answer the question asked about the HTML conversion, we have tons of Progress Notes, Doctors Notes, Prescriptions, etc in the database. The only way to get them to the new EMR is to export them to HTML and group them at the patient level so the new vendor can import them.
Honestly, after having to wrestle with all this garbage, I would prefer to avoid this situation in the future by refusing to upload them to the new EMR at all. Instead, just put all these documents on OUR file server and give the new EMR a hyper-link to insert into each patient's patient record that would open all the patient files. The new EMR is browser-based so it could be feasible but I doubt I'll be able to get them to write files to our file server moving forward so doing so would likely just end up making the end-user experience more disjointed.
I don't think your contractors did anything wrong tbh. Taking uploaded files with all their problem characters/duplicated names (got more than one patient called JohnSmith.jpg?) etc and renaming them to a GUID so they can coexist alongside other images without overwriting them is a) sensible and b) what I would do.
I also wouldn't store images inside a database as then the only thing you can do with them is get them out again; something you have to do every time you want to do anything with them. Being able to map an images folder to a url on your web server and then send html using just the file name means that the web server can sever the image without having to pull it out of the db; the db doesn't have to involve itself in pointless IO.
The way to correlate these images with the patients to whom they belong is done by the database. Somewhere else in the db structure will be eg a Patient record with a DocumentId column that links to this document record or a PatientDocuments table that has PatientId/DocumentId pairs.
If there is not, then storing the document bytes in the db won't have helped relate them to the patient, because this relation is not about where the bytes of an image are, it's about what other data was stored to make for a usable system. As it stands your thoughts on the matter, of uploading tens of thousands of images into a db just so you can... er.. get them all out again, would seem to indicate you haven't yet fully grasped the reasons behind why your contractors did what they did.
Because you're under the impression that you can do this, you seem to know how the db relates a document to a patient (if it doesn't then your proposed process will fail) and as such you can arrange for a suitable renaming process without needing to move the image data anywhere. In essence, you're failing to see that a file system storing file data against unique paths is no different to a database table storing file data against unique ids. Your database tables for documents clearly thus links to your file system/file system can be viewed as an extension to the documents table. You need the other tables in the db to make sense of the files, but you need the other tables in a db to make sense of any table in a db. These are key concepts of modelling related data
I don't recommend you undertake the process you propose, but I'm sure that won't dissuade. Consider then (because you didn't really post any details we can work with) this assumed scenario:
Patients
Name,DocumentId
John Smith,1
Jane Doe,2
Documents
Id,FilePath
1,'/root/2020/05102019/69353829-e46b-47e7-ab56-a1762424f0dd.pdf'
2,'/root/2014/09282017/385ba21d-e108-4cbb-9287-91110c16edb0.jpg'
SELECT CONCAT('REN ', d.filePath, ' "', p.Name, RIGHT(d.filePath, 4), '"')
FROM
Patients p
INNER JOIN Documents d ON p.DocumentId = d.DocumentId
The results of the query will essentially be a batch file full of rename commands that renames all the files into a single folder, organized by patient name.
And now all your multiple patients with the same names will overwrite each other and everything will end up in a mess
It also makes my point for me about "don't store files in the db" - look how easy it is to manipulate files when they're in a file system, using existing commands that understand filesystems and files and do things like rename files, or extract exif data, rotate, resize and print... if all those images were in your db the only thing you could do with them, is get them out again; sqlserver cannot rotate, resize, print etc BLOB data but there are thousands of tools out there that understand files and can convert them - those tools cannot understand your db so putting files into a db saddles you with the problem that they become useless until dug out again
Your contractors may not have been so daft as you think; pause a moment before you set about hacking apart all they did, and question whether your driver for doing so is actually correct. If Jane from reception needs to see a picture of John Smith with drivers license XY1234 to ID him, don't provide her with a shared drive full of everyone's pictures, and let her double click, drag and accidentally delete her way around the file system. Provide her with an app that looks in the db, gets the unintelligible but helpfully unique filename off disk and opens it for her to view. And make the file system read only to everyone other than the app, so that users can't break things

Cosmos DB where condition by external document

I have a following structure of document (omiting all with underscore prefix like _self)
{
"id": "c5055e2b-efb2-4c86-907d-a0beb1dca4dc",
"Name": "John Johnson",
"partitionKey": "0ecdb989-01c6-4f11-9fd2-3e1dcc1c8cb9",
"FKToBeDeleted": "FK_c5055e2b-efb2-4c86-907d-a0beb1dca4dc_ToBeDeleted",
}
And as You can see there is a field named FKToBeDeleted and I use this to mark document, but it has to be as a reference, because in my app may occure kind of database concurrency, because 1st app can GET document, process it, and 2nd app can update document during processing and 1st one will not see any changes, because downloading again huge document and updating it is RU consuming, so I wanted to reduce the cost. Going further, I created a document for this.
{
"id": "FK_c5055e2b-efb2-4c86-907d-a0beb1dca4dc_ToBeDeleted",
"partitionKey": "0ecdb989-01c6-4f11-9fd2-3e1dcc1c8cb9",
"ToBeDeleted": false,
}
And now there is a problem, because my front-end app does not want to display any ToBeDeleted documents. This kinda cheats the user, because I just mark it as deleted but later delete the document.
Now the question is how the SQL query should look like? Previously it was like the following query, because r.ToBeDeleted was boolean.
SELECT r.id, r.Name, r.AddedAt, r._ts
FROM ROOT r
WHERE
(NOT(r.ToBeDeleted))
ORDER BY r.AddedAt desc
Now FKToBeDeleted is only a reference to another document, but the ID is in r.FKToBeDeleted, so I tried some nested SELECT but it didn't work.
Any suggestions what is the right way to achieve that?
EDIT (clarification)
Let's have a following situation.
There are two apps (you can also treat them as threads) which uses the same Cosmos DB instance.
STEP 1 - is a moment of start of processing some data, but database document is needed, so it gets that and on the right side you can see current document (but in fact only ToBeDeleted is interesting here).
STEP 2 - is a moment, when user wants to remove this processed item, because he is no longer interested of its results, but database document is also required here, so again there is a GET.
STEP 3 - is a moment, when job of soft delete is done and there is a need to update database document, and the field is set to true.
STEP 4 - is a moment, when processing is over of common flow and at the end there is update of the document. BUT, Application 2 downloaded it before STEP 3, and it's overriding things that Application 1 did, which is bad.
So I made a solution for that.
As you can see, the steps are the same, but instead of updating the same document, I update a referenced document, so I don't have a problem with overriding data.
Now, the problem is how to make a SQL query to join two documents, so the FK_1 id will be replaced of the value of ToBeDeleted field in another document.
According to this article there is no possibility to join two documents, which of course does not help me at all, yet closes the topic.
JOIN keyword exists in the language, but it is used to “unfold” nested containers, there is no way to join different documents.
Perhaps, you can use subquery instead of JOIN.
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-subquery#mimic-join-with-external-reference-data

Copy one database table's contents into another on same database

I'm a bit of a newbie with the workings of phpmyadmin. I have a database and now there are 2 parts within it - the original tables jos_ and the same again but with a different prefix, say let's ****_ that will be the finished database.
This has come about because I am upgrading my Joomla 1.5 site to 2.5. I used a migration tool for the bulk of the new database but one particular piece of information did not transfer because the new database has a different structure.
I want to copy the entire contents of jos_content, attribs, keyref= across to ****_content, metadata, "xreference"."VALUE" if that makes sense. This will save manually typing in the information contained within 1000s of articles.
jos_content, attribs currently contains
show_title=
link_titles=
show_intro=
show_section=
link_section=
show_category=
link_category=
show_vote=
show_author=
show_create_date=
show_modify_date=
show_pdf_icon=
show_print_icon=
show_email_icon=
language=
keyref=41.126815,0.732623
readmore=
****_content, metadata currently contains
{"robots":"all","author":""}
but I want it to end up like this
{"robots":"","author":"","rights":"","xreference":"41.126815,0.732623","marker":""}
Could anyone tell me the SQL string that I would need to run to achieve this please?
If it makes any difference I have manually changed about 300 of these articles already and thought there must be a better way.
Edit: Being nervous of trying this I would like to try and find the exact syntax (if that's the right word) for the SQL Query to run.
The value I want to extract from the source table is just, and only, the numbers next to keyref= and I want them to turn up in the destination table prefixed by "xreference". - so it shows "xreference"."VALUE" with VALUE being the required numbers. There is also an entry - ,"marker":"" that is in the destination table so I guess the Query needs to produce that as well?
Sorry for labouring this but if I get it wrong, maybe by guessing what to put, I don't really have the knowledge to put it all right again....
Thanks.
Please Try it
insert into tableone(column1,column2) select column1,column2 from Tablesecond
if You have not Table another Daabase Then This query
select * into anyname_Table from tablesource