What's the best way to validate that the grails domain classes are in sync with a database? It's legacy database and i can't build it from the domain classes. An interesting idea here which implies fetching one row of each of the domains. However, it doesn't feel like a complete solution mainly because the test database against which I validate may not be so data rich as to have data in all tables.
Thanks in advance for taking time to read/reply.
That's a nice approach and must work even for empty tables - if a table is empty, you have no legacy data to worry about validating, right? Or, if you want to test Grails constraints for compatibility with DB constraints, create a new instance of the class and try to save() it in transaction - and always roll the transaction back.
If the database is small, I'd even go and remove max:1 from list() - to validate every record, because only some of the records may violate constraints.
I'd also replace println "${it}" with assert it.validate().
One last optimization, I'd limit the classes tested only to those that I know can violate some constraints. This will save a good part of such a test - and the test is going to take a plenty of time, you know - reading all the database with GORM overhead.
Related
We have an old repair database that has alot of relational tables and it works as it should but i need to update it to be able to handle different clients ( areas ) - currenty this is done as a single client only.
So i need to extend the tables and the sql statements so ex i can login as user A and he will see his own system only and user B will have his own system too.
Is it correctly understood that you wouldnt create new tables for each client but just add a clientID to every record in every ( base ) table and then just filter with a clientid in all sql statements to be able to achieve multiple clients ?
Is this also something that would work ( how is it done ) on hosted solutions ? Am worried about performance if thats an issue lets say i had 500 clients ( i wont but from a theoretic viewpoint ) ?
The normal situation is to add a client key to each table where appropriate. Many tables don't need them -- such as reference tables.
This is preferred for many reasons:
You have the data for all clients in one place, so you can readily answers a question such as "what is the average X for each client".
If you change the data structure, then it affects all clients at the same time.
Your backup and restore strategy is only implemented once.
Your optimization is only implemented once.
This is not always the best solution. You might have requirements that specify that data must be separated -- in which case, each client should be in a separate database. However, indexes on the additional keys are probably a minor consideration and you shouldn't worry about it.
This question has been asked before. The problem with adding the key to every table is that you say you have a working system, and this means every query needs to be updated.
Probably the easiest is to create a new database for each client, so that the only thing you need to change is the connection string. This also means you can get automated query tools for example to work without worrying about cross-client data leakage.
And it also allows you to backup, transfer, delete a single client easily as well.
There are of course pros and cons to this approach, but it will simplify the development effort. Also remember that if you plan to spin it up in a cloud environment then spinning up databases like this is also very easy.
I need to create an Audit table that is going to track the actions (insert, update, delete) of my tables in the database and add new row with date, row id, table name and a few more details, so I will know what action happened and when.
So basically from my understanding I need a trigger for each table which is going to track insert/update/delete and a trigger on the database which is going to track new table creation.
My main problem is understanding how to connect between those things so when a new table is being created a trigger will be created for that table which is going to track the actions and add new rows for the Audit table as needed.
Is it possible to make a DDL trigger for create_table and inside of it another trigger for insert / update / delete ?
What you're hoping for is not possible. And I'd strongly advise that you'd be better off thinking about what you really want to achieve at a business level with auditing. It will yield a much simpler and more practical solution.
First up
...trigger on the database which is going to track new table creation.
I cannot stress enough how terrible this idea is. Who exactly has such unfettered access to you database that they can create tables without going through code-review and QA? Which should of course be on the gated pathway towards production. Once you realise that schema changes should not happen ad-hoc, it's patently obvious that you don't need triggers (which are by their very nature reactive) to do something because the schema changed.
Even if you could write such triggers: it's at a meta-programming level that simply isn't worth the effort of trying to foresee all possible permutations.
Better options include:
Requirements assessment and acceptance: This is new information in the system. What are the audit requirements?
Design review: New table; does it need auditing?
Test design: How to test an audit requirements?
Code Review: You've added a new table. Does it need auditing?
Not to mention features provided by tools such as:
Source Control.
Db deployment utilities (whether home-grown or third party).
Part two
... a trigger will be created for that table which is going to track the actions and add new rows for the Audit table as needed.
I've already pointed out why doing the above automatically is a terrible. Now I'm going a step further to point out that doing the above at all is also a bad idea.
It's a popular approach, and I'm sure to get some flack from people who've nicely compartmentalised their particular flavour of it; swearing blind how much time it "saves" them. (There may even be claims to it being a "business requirement"; which I can assure you is more likely a misstated version of the real requirement.)
There are fundamental problems with this approach:
It's reactive instead of proactive. So it usually lacks context.
You'll struggle to audit attempted changes that get rolled back. (Which can be a nightmare for debugging and usually violates real business audit requirements.)
Interpreting audit will be a nightmare because it's just raw data. The information is lost in the detail.
As columns are added/renamed/deleted your audit data loses cohesion. (This is usually the least of problems though.)
These extra tables that always get updated as part of other updates can wreak havoc on performance.
Usually this style of auditing involves: every time a column is added to the "base" table, it's also added to the "audit" table. (This ultimately makes the "audit" table very much like a poorly architected persistent transaction log.)
Most people following this approach overlook the significance of NULLable columns in the "base" tables.
I can tell you from first hand experience, interpreting such audit trails in any but the simplest of cases is not easy. The amount of time wasted is ridiculous: investigating issues, training others to be able to interpret them correctly, writing utilities to try make working with these audit trails less painful, painstakingly documenting findings (because the information is not immediately apparent in the raw data).
If you have any sense of self-preservation you'll heed my advice.
Make it great
(Sorry, couldn't resist.)
A better approach is to proactively plan for what needs auditing. Push for specific business requirements. Note that different cases may need different auditing techniques:
If user performs action X, record A details about the action for legal traceability.
If user attempts to do Y but it prevented by system rules, record B details to track rule system integrity.
If user fails to log in, record C details for security purposes.
If system is upgraded, record D details for troubleshooting.
If certain system events occur, record E details ...
The important thing is that once you know the real business requirements, you won't be saying: "Uh, let's just track everything. It might be useful." Instead you'll:
Be able to produce a cleaner more appropriate and reliable design for each distinct kind of auditing.
Be able to test that it behaves as required!
Be able to use the audit data more easily whenever it's needed.
We need to insert data(8k records) into a CRM Entity, the data will come from other CRM Entities. Currently we are doing it through code but it takes too much time (Hours). I was wondering if we use SQL to insert directly into the CRM Database it will be a lot easier and will take only minutes. But before moving farward I have few questions:
Is it safe to insert directly into CRM Database, using SQL?
What is the best practice for insert data into CRM using SQL?
What things should i consider before trying it?
EDIT:
4: How do I increase the insert performance?
No, it is not. It is considered unsupported
Don't do it
Rollup 12 was just released and contains a new API feature. There is now a ExecuteMultipleRequest which could be used for batched bulk imports. See http://msdn.microsoft.com/en-us/library/jj863631.aspx
It shouldn't take hours to insert 8000 records. It would help to see your code, but here are some things to consider to improve performance:
Reuse your IOrganizationService. I've found a 10x increase in performance by reusing a IOrganizationService, rather than creating a new one with each record that is being updated
Use multi-threading. You have to be careful with this one, because it could lead to worse performance if the function to check for the entity existing is your bottle neck.
Tweak your exists function. If the check for the entity existing is taking a long time, consider pulling back the entire table and storing it in memory (assuming it's not ridiculously huge). This would remove 8000 separate select statements.
Turn off plugins that may be degrading performance. If you have any plugins registered on the entity, see if performance increases if you disable them during the import.
Create a new, "How do I increase the insert performance" question with your code posted for additional help.
I have not used the CRM application you are referring to, but if you bypass the code you might bypass certain restrictions or even triggers that the code has in place based on certain values sent in.
For example, if you sent a number in through the code, it might perform some mathematical function on that number and add it to some other value and end up storing two values in the database (one value for the number you entered, and another value representing the total including the newly added one).
So if you had just inserted the one value straight into the database, then the total wouldn't get updated with it.
That is just a hypothetical scenario. You may not run into any problems like that or any others, but there could be the chance.
Well i found this article very helpful. It says:
The direct SQL writes to CRM database are not supported.The reason for this is that creating a record in CRM database is so much more than just INSERT INTO…-statement. The first step of optimizing is to understand what happens behind the scenes and can affect the speed:
1. CRM entities usually consist of 2 physical tables.
2. Cascade rules/Sharing: If created record has any relationships with
cascade rules, web service will handle the cascades automatically.
For example cascaded sharing will lead to additional records being
created in PrincipalObjectAccess table. In case of one-time
migrations, disabling the cascade rules while migration runs can
save lot of time
3. Record Ownership: If you are inserting records, make sure you are
setting the owner as an attribute for create and not as an
additional owner assign request. Assigning owner actually takes
4. Money/Time: Web Service handles currencies and time zones.
5. Workflows/Plugins: If the system has any custom workflows and/or
plugins, I strongly recommend pausing them for the duration of
migration.
I've learned a lot about triggers and active databases in the last weaks, but I've some questions about real world examples for these.
At work we use the Entity Framework with ASP.Net and an MSSQL Server. We just use the auto generated constrains and no triggers.
When I heared about triggers I asked myself the following questions:
Which tasks can be performed by triggers?
e.g.: Generation of reporting data: currently the data for the reports is created in vb, but I think a trigger could handle this as well. The creation in vb takes a lot of time and the user should not need to wait for it, because it's not necessary for his work.
Is this an example for a perfect task for a trigger?
How does OR-Mapper handle trigger manipulated data?
e.g.: Do OR-Mapper recognize if a trigger manipulated data? The entity framework seems to cache a lot of data, so I'm not sure if it reads the updated data if a trigger manipulates the data, after the insert/update/delete from the framework is processed.
How much constraint handling should be within the database?
e.g.: Sometimes constrains in the database seem much easier and faster than in the layer above (vb.net,...), but how to throw exceptions to the upper layer that could be handled by the OR-Mapper?
Is there a good solution for handeling SQL exceptions (from triggers) in any OR-Mapper?
Thanks in advance
When you hear about a new tool or feture it doesn't mean you have to use it everywhere. You should think about design of your application.
Triggers are used a lot when the logic is in the database but if you build ORM layer on top of your database you want logic in the business layer using your ORM. It doesn't mean you should not use triggers. It means you should use them with ORM in the same way as stored procedures or database functions - only when it makes sense or when it improves performance. If you pass a lot of logic to database you can throw away ORM and perhaps whole your business layer and use two layered architecture where UI will talk directly to database which will do everything you need - such architecture is considered "old".
When using ORM trigger can be helpful for some DB generated data like audit columns or custom sequences of primary key values.
Current ORM mostly don't like triggers - they can only react to changes to currently processed record so for example if you save Order record and your update trigger will modify all ordered items there is no automatic way to let ORM know about that - you must reload data manually. In EF all data modified or generated in the database must be set with StoreGeneratedPattern.Identity or StoreGeneratedPattern.Computed - EF fully follows pattern where logic is either in the database or in the application. Once you define that value is assigned in the database you cannot change it in the application (it will not persist).
Your application logic should be responsible for data validation and call persistence only if validation passes. You should avoid unnecessary transactions and roundtrips to database when you can know upfront that transaction will fail.
I use triggers for two main purposes: auditing and updating modification/insertion times. When auditing, the triggers push data to related audit tables. This doesn't affect the ORM in any way as those tables are not typically mapped in the main data context (there's a separate auditing data context used when needed to look at audit data).
When recording/modifying insert/modification times, I typically mark those properties in the model as [DatabaseGenerated( DatabaseGenerationOptions.Computed )] This prevents any values set on in the datalayer from being persisted back to the DB and allows the trigger to enforce setting the DateTime fields properly.
It's not a hard and fast rule that I manage auditing and these dates in this way. Sometimes I need more auditing information than is available in the database itself and handle auditing in the data layer instead. Sometimes I want to force the application to update dates/times (since they may need to be the same over several rows/tables updated at the same time). In those cases I might make the field nullable, but [Required] in the model to force a date/time to be set before the model can be persisted.
The old Infomodeler/Visiomodeler ORM (not what you think - it was Object Role Modeling) provided an alternative when generating the physical model. It would provide all the referential integrity with triggers. For two reasons:
Some dbmses (notably Sybase/SQL Server) didn't have declarative RI yet, and
It could provide much more finely grained integrity - e.g. "no more than two children" or "sons or daughters but not both" or "mandatory son or daughter but not both".
So trigger logic related to the model in the same way that any RI constraint does. In SQL Server it handled violations with RAISERROR.
An conceptual issue with triggers is that they are essentially context-free - they always fire regardless of context (at least without great pain, and you might better include their logic with the rest of the context-specific logic.) So global domain constraints are the only place I find them useful - which I guess is another general way to identify "referential integrity".
Triggers are used to maintain integrity and consistency of data (by using constraints), help the database designer ensure certain actions are completed and create database change logs.
For example, given numeric input, if you want the value to be constrained to say, less then 100, you could write a trigger that fires for every row on update or insert, and raise an application error if the value of that column does not meet that contraint.
Suppose you want to log historical changes to a table. You could create a Trigger that fires AFTER each INSERT, UPDATE, and DELETE, which also inserts the data into a logging table. If you need to execute custom custom logic, then Triggers may appeal to you.
How are people coping with changes to redis object schemas - adding or removing properties from objects?
Sharing from my own experience (one year old project with thousands of user requests per second).
Usually, there were two scenarios for me:
Add new information to existing structures (like, "email" field to a user)
Remove or change existing values in existing structures (like, change format of some field)
Drop stuff from the database
For 1 I keep following simple strategy: degrade gracefully, e.g. if user doesn't have email record - treat it as empty email. Worked all the time.
For 2 and 3 it depends, whether data can be changed/calculated/fixed before releasing or after. I run a job on database that does all the work for me, for few millions of keys it takes considerable time (minutes). If that job can be run only after I release the new code - then degrading gracefully helps a lot, I simply release and then run the job.
PS: If you affect a lot of keys in redis then it is very important to use http://redis.io/topics/pipelining Saves a lot of time.
Take a list of all affected (i.e. you want to fix them in any way) keys or records in pipeline
Do whatever you want on them. If it's possible try to queue writing operations into pipeline too
Send queued operations to redis.
It is also very important for you to make indexes of your structures. I keep sets with ids. Then I simply iterate over SMEMBERS(set_with_ids).
It is much, much better than iterating over KEYS command.
For extremely simple versioning, you could use different database numbers. This could be quite limiting in cases where almost everything is the same between two versions but it's also a very clean way to do it if it will work for you.