I'm using a database as a kind of cache - the purpose is to store currently known "facts" (the details aren't that interesting). Crucially, however, I'd like to avoid inserting duplicate facts. To do so, the current data access code uses insert or ignore in many places.
I'd like to use the entity framework to benefit from it's manageability; I've had enough of writing boilerplate code for yet another on subtly different query, and so far the Entity Framework's support for queries looks great.
However, I can't really seem to find good support for adding new facts - the obvious way seems to be to simply construct new Entities in .NET, add them to the model, and save the changes to the database. However, this means that I need to check in advance whether a particular row already exists since if it does insertion is not permitted. In the best of times, that means extra roundtrips, but it's usually much worse: previously this code was highly parallel - the occasional duplicate insert wouldn't matter since the database would just ignore it - but now not only would I need to check whether a row exists beforehand, I'd need to do so in far smaller batches (which is typically many times slower due to transaction overhead) since just one already-existing row from a parallel insert will cause a rollback of the entire transaction.
So, does anyone know how to get similar behavior using the entity framework?
To be specific, I'd like to be able to perform several inserts and have the database simply ignore those inserts that are impossible - or more generally, to be able to express server-side conflict resolution code to avoid unnecessary roundtrips and transactions.
If it matters, I'm using SQLite.
This functionality can be implemented via stored procedures (which the entity framework can use) on DB's that support it (MSSQL, for instance). On SQLite, these aren't supported. However, a simple solution exists for sqlite as well, namely to use triggers such as the following:
CREATE TRIGGER IF NOT EXISTS MyIgnoreTrigger BEFORE INSERT ON TheTable
FOR EACH ROW BEGIN
INSERT OR IGNORE
INTO TheTable (col1, col2, col3)
VALUES (NEW.col1, NEW.col2, NEW.col3);
select RAISE(IGNORE);
END;
Alternatively, the trigger could be set up to directly verify the constraint and only raise "ignore" when the constraint is violated.
Related
Scenario:
Entity 1 can have 0 or more Entity 2.
What trying to do:
When a field in Entity 1 is updated, a field in Entity 2 is consecutively updated.
What I'm doing:
Update field in Entity 1 by update sql, then querying related Entity 2 records (using SELECT ATTR FROM ENTITY2 WHERE ENTITY1.ID = ENTITY2.ENT1_ID) just to get the old value of ENTITY2 attr before doing an update on that records. Type of update (e.g. Subtract or add) on ENTITY2 record is based on the update value on ENTITY1.
Alternative :
Using triggers to consecutively update these related records.
(I still plan to study to implement triggers but I am not sure if it is worth it.
Any help from this also please? Or links?)
Is it better to use triggers? Or just stick to my current solution (which I think is quite slow due to the number of sql executions but easier to track down).
There are people, such as Tom Kyte who believe triggers should be used as little as possible, if at all.
There are others, such as Toon Koppelaars who believe they should be used, if their use is considered carefully.
I am of the second camp and believe triggers may be used. However, this use should not be to 'automagically' cause cascade actions such as you are suggesting. Instead these triggers may be used to enforce integrity constraints that cannot be declared using the standard mechanism of a table constraint clause i.e. the triggers themselves do no DML other than SELECT from tables.
(note: there are other mechanisms by which these constraints may be enforced, including materialized views or the introduction of additional columns and the use of specific indexing strategies) Therefore, I would suggest another alternative. Create triggers - or use these alternative mechanisms - to ensure no data that breaks your integrity constraints can be committed. Then create APIs, using PL/SQL, that encapsulate the multi-table data amendments that are required to ensure the integrity constraints are not broken and use these as your update path.
In this way you can be assured that no invalid data exists in the database, but also that the actual DML required to achieve this is not hidden across the database in multiple program units and triggers but is stated explicitly in one place.
Tom Kyte is brilliant. But he is, at heart, still just a DBA. Always keep that in mind when considering his advice on table design.
Can triggers be overused? Of course. But here's the rub: anything can be overused. I lean toward triggers because there is just no way to guarantee that all data manipulation will go through your app or any single channel. Or, if possible, define a foreign key relationship and let "cascade update" take care of everything. Tricky, I admit, and could be problematic, but don't reject any solution out of hand.
Having said that, I don't know if a trigger for this purpose is called for. I don't know why you are duplicating the data to a field in a different table. Without knowing your overall design and what you are trying to accomplish, there is no way to judge. But consider keeping data in one field in one table then use a view to expose that field as part of a second "table." Change the data where it resides, viola, it is now changed wherever it appears.
Performance hit? Yes. But keeping duplicate data in different places and keeping them synchronized is a data integrity hit. Only you know (or are in a position to find out) which way this balance tilts.
Oh, can views be overused? Of course. But there's always that rub I mentioned; and besides, views are so chronically underused in most databases that overuse would be a long way away.
I am working on a project in SQLite where several INSERT statements involve recursive WITH statements. These INSERT statements are within triggers that are watching for INSERT statements being called on other databases.
I've found that this leads to the recalculation of lots of values every time the WITH statement is triggered, and rather than introduce logic into the WITH statement to only recalculate values that need to be recalculated (which creates sprawl and is difficult for me to maintain), I'd rather take some of the temporary views defined in the WITH statement and make them permanent tables that cache values. I'd then chain triggers so that an update of table_1 leads to an update of table_2, which leads to an update of table_3, etc.. This makes the code easier to debug and maintain.
The issue is that, in the project, I want to make a clear distinction between tables that are user editable and those that are meant for caching intermediary values. One way to do this is (perhaps) to make the "caching" tables read-only EXCEPT when called via certain triggers. I've seen a few questions about similar issues in other dialects of SQL but nothing pertaining to SQLite and nothing looking to solve this problem, thus the new question.
Any help would be appreciated!
I'm writing a PHP-based web application that should work with multiple database systems. The most important are MySQL and SQLite but PostgreSQL and others would also be nice. For this I try to use as portable SQL as possible. Where this isn't possible, I have defined some meta words in my queries that are handled by my DB layer and converted to platform-specific SQL commands.
I'm now trying to add sequences support. Every DBMS handles sequences differently, there is no common way to write them in SQL. I have read and understood the way PostgreSQL does it. I have found an interesting solution for MySQL that uses MyISAM tables to escape the isolation constraints of a transaction. After all, sequences are not rolled back with the transaction they're used in and that's exactly what I want. Sequences are supposed to be multi-user safe.
Now I haven't found a solution for SQLite. It lacks built-in sequence support. It doesn't provide ways to store data outside a running transaction. My current implementation is to lock the table far enough to do a SELECT MAX(...) and use that value. But I want to get rid of that entirely. In SQLite, this approach requires locking the whole database!
Does anybody know a solution for this with SQLite?
Just create a regular counter table. On creation of a sequence foo, do
create table foo(value int);
insert into foo(value) values(0);
Then, when asking for the next counter value, do
update foo set value=value+1;
While this gets rolled back when the transaction is aborted, it is multi-user safe: no two users will commit the same number. sqlite implements concurrency with a database lock, so the second writer will block anyway (not just because of the sequence update, but also because of the other changes it wants to make).
I would use lastInsertRowID. This returns the rowid of the last inserted data (which equals the INTEGER PRIMARY KEY value of that row).
You won't need any sequence then.
A similar question has been asked, but since it always depends, I'm asking for my specific situation separately.
I have a web-site page that shows some data that comes from a database, and to generate the data from that database I have to do some fairly complex multiple joins queries.
The data is being updated once a day (nightly).
I would like to pre-generate the data for the said view to speed up the page access.
For that I am creating a table that contains exact data I need.
Question: for my situation, is it reasonable to do complete table wipe followed by insert? or should I do update,insert?
SQL wise seems like DELETE + INSERT will be easier (INSERT part is a single SQL expression).
EDIT: RDBMS: MS SQL Server 2008 Ent
TRUNCATE will be faster than delete, so if you need to empty a table do that instead
You didn't specify your RDBMS vendor but some of them also have MERGE/UPSERT commands This enables you do update the table if the data exists and insert if it doesn't
It partly depends on how the data is accessed. If you have a period of time with no (or very few) users accessing it, then there won't be much impact on the data disappearing (between the DELETE and the completion of the INSERT) for a short while.
Have you considered using a materialized view (MSSQL calls them indexed views) instead of doing it manually? This could also have other performance benefits as an indexed view gives the query optimizer more choices when its constructing execution plans for other queries that reference the table(s) in the view.
It depends on the size of the table and the recovery model on the database. If you are deleting many hundreds of thousands of records and reinstating them vs updating a small batch of a few hundred and inserting tens of rows, it will add an unnecessary size to your transaction logs. However you could use TRUNCATE to get around this as it won't affect the transaction log.
Do you have the option of a MERGE/UPSERT? If you're using MS-SQL you can use CROSS APPLY to do something similar if you don't.
One approach to handling this type of problem is to insert into new table, then do a table Rename. This will insure that all new data is present at the same time.
What if some data that was present yesterdays is not anymore? Delete may be safer or you could end up deleting some records anyway.
And in the end it doesnt really matter which way you go.
Unless on the case #kevinw mentioned
Although I fully agree with SQLMenace's answer I do would like to point out that MERGE does NOT remove unneeded records ! If you're sure that your new data will be a super-set of the existing data, then MERGE is great, otherwise you'll either need to make sure that you delete any superfluous records later on, or use the TRUNCATE + INSERT method ...
(Personally I'm still a fan of the latter as it usually is quite fast, just make sure to drop all indexes/unique constraints upfront and rebuild them one by one. This has the benefit of the INSERT transaction being smaller and the index-adding being done in (smaller) transactions again later on). (**)
(**: yes, this might be tricky on live system, but then again he already mentioned this was done during some kind of overnight anyway, I'm extrapolating there is no user-access at that time)
What is the best, DBMS-independent way of generating an ID number that will be used immediately in an INSERT statement, keeping the IDs roughly in sequence?
DBMS independent? That's a problem. The two most common methods are auto incrementing columns, and sequences, and most DBMSes do one or the other but not both. So the database independent way is to have another table with one column with one value that you lock, select, update, and unlock.
Usually I say "to hell with DBMS independence" and do it with sequences in PostgreSQL, or autoincrement columns in MySQL. For my purposes, supporting both is better than trying to find out one way that works everywhere.
If you can create a Globally Unique Identifier (GUID) in your chosen programming language - consider that as your id.
They are harder to work with when troubleshooting (it is much easier to type in a where condition that is an INT) but there are also some advantages. By assigning the GUID as your key locally, you can easily build parent-child record relationships without first having to save the parent to the database and retrieve the id. And since the GUID, by definition, is unique, you don't have to worry about incrementing your key on the server.
There is auto increment or sequence
What is the point of this, that is the least of your worries?
How will you handle SQL itself?
MySQL has Limit,
SQL Server has Top,
Oracle has Rank
Then there are a million other things like triggers, alter table syntax etc etc
Yep, the obvious ways in raw SQL (and in my order of preference) are a) sequences b) auto-increment fields. The better, more modern, more DBMS-independent way is to not touch SQL at all, but to use a (good) ORM.
There's no universal way to do this. If there were, everyone would use it. SQL by definition abhors the idea - it's an antipattern for set-based logic (although a useful one, in many real-world cases).
The biggest problem you'd have trying to interpose an identity value from elsewhere is when a SQL statement involves several records, and several values must be generated simultaneously.
If you need it, then make it part of your selection requirements for a database to use with your application. Any serious DBMS product will provide its own mechanism to use, and it's easy enough to code around the differences in DML. The variations are pretty much all in the DDL.
I'd always go for the DB specific solution, but if you really have to the usual way of doing this is to implement your own sequence. Your RDBMS has to support transactions.
You create a sequence table which contains an int column and seed this with the first number, your transaction logic then looks something like this
begin transaction
update tblSeq set intID = intID + 1
select #myID = intID from tblSeq
inset into tblData (intID, ...) values (#myID, ...)
end transaction
The transaction forces a write lock such that the then next queued insert cannot update the tblSeq value before the record has been inserted into tblData. So long as all inserts go though this transaction then your generated ID is in sequence.
Use an auto-incrementing id column.
Is there really a reason that they have to be in sequence? If you're just using it as an ID, then you should just be able to use part of a UUID or the first couple digits of md5(now()).
You could take the time and massage it. It'd be the equivalent of something like
DateTime.Now.Ticks
So it be something like YYYYMMDDHHMMSSSS
It may be of a bit lateral approach, but a good ORM-type library will probably be able to at least hide the differences. For example, in Ruby there is ActiveRecord (commonly used in but not exclusively tied to the Ruby the Rails web framework) which has Migrations. Within a table definition, which is declared in platform-agnostic code, implementation details such as datatypes, sequential id generation and index creation are pushed down below your vision.
I have transparently developed a schema on SQLite, then implemented it on MS SQL Server and later ported to Oracle. Without ever changing the code that generates my schema definition.
As I say, it may not be what you're looking for, but the easiest way to encapsulate what varies is to use a library that has already done the encapsulation for you.
With only SQL, following could be one to the approaches:
Create a table to contain the starting id for your needs
When the application is deployed for the first time, the application should read the value in its context.
Thereafter, increment id (in thread-safe fashion) as required
3.1 Write the id to the database (in thread-safe fashion) which always keeps updated value
3.2 Don't write it to the database, just keep incrementing in the memory (thread-safe manner)
If for any reason server is going down, write the current id value to the database
When the server is up again it will pick from where it left, the last time.