I work in a QA department and, as part of my job, often times look at the configuration files prepared by our developers. On occasion, i see that particular keys are redefined as configuration file goes on
A = 12
...
A = 34
To me this looks like a very bad practice. At the end of the day, it is not clear what A is equal to and since program logic is hidden, it is not clear whether application is handling this case properly, or, perhaps there is a very good reason why things are done this way.
Additionally, from time to time, i see
A = 12
...
A = puppies
The meaning assigned to the key seems to be very ambiguous. Is developer building variations of what A may be equal to? Is first A a typo? Many questions remain ..
I wonder ... Is there ever a good reason why developers would configure their apps to redefine Keys (include duplicate keys), or .. am i correct to question this?
NB: The application works fine with the duplicate keys, however it is handled, it appears to work
After speaking with several developers with decades of experience, the answer to my question is a resounding "NO". There is no good reason for a configuration file to contain duplicate keys.
Related
For some reason I only recently found out about unique constraints for Core Data. It looks way cleaner than the alternative (doing a fetch first, then inserting the missing entities in the designated context) so I decided to refactor all my existing persistence code.
If I got it right, the gist of it is to always insert a new entity, and, as longs as I have a proper merge policy, saving the context will take care of the uniqueness and in a more efficient way. The problem is every time I save a context with the inserted entity I get a NSCoreDataConstraintViolationException, no error though. When I do the fetch to make sure
there is indeed only one instance with a unique field
other changes to this entity were applied
everything seems to be okay, but I’m still concerned about this exception, since I do saves and therefore get it quite often, a few times per second in some cases.
My project is in objective-c and I know exceptions are expensive there so I’m having doubts if I’m missing something.
Here is a sample project with this issue (just a few lines of code, be sure to add an exception breakpoint)
NSMergeByPropertyObjectTrumpMergePolicy and constraints are not useful tools and should also never be used. The correct way to manage uniqueness is with a fetch before the insert as it appears you have already been doing.
Let's starts with why the only correct merge policy is NSErrorMergePolicy. You should only be writing to core data in on synchronous say (performBackgroundTask is not enough you also need an operation queue). If you have two performBackgroundTask running at the same time and they contradict then you will lose data. Merge policy is answer the question of "which data would you like to lose?" the correct answer is "Don't lose my data!" which is NSErrorMergePolicy.
The same issue happens when you have a constraint. Let's says you have an entity with a unique constraint on the phone number. And you try to insert another entity with the same phone number. What would you like to happen? It depends on what exactly the data is. It might be two different people, and the phone number should be made different (perhaps they were lacking area code), or it might be one person and the data should be merged. Or you might have a constraint on an uniqueID and the number should just be incremented. But on the database level it doesn't know. It always just does a merge. It will silently lose data.
You can create a custom NSMergePolicy and inspect NSConstraintConflict to decide what do to. But in practice you'd have to think about every time you edit the database and what each change means, which can be very hard outside of the context of writing a change to the database. In other words, the problem with a constraints and merge policy is that it the run is on the wrong level of your application to effectively deal with the problem.
Using constraints with a merge policy of error is OK, as it is a way to find problems with your app (as long as you are monitoring crashes and fixing them). But you still need to do the fetch before the insert to make sure the error doesn't happen.
If you want to clean up code then just have one place that you create your objects. Something like objectWithId:createIfNeed:inContext: which does the fetch and create.
After many years of C/C++, PHP, some Ruby and other languages on one hand an different projects with different frameworks on the other, I now want to learn Rails.
After working through (Getting started-) Guides, I think Rails is powerfull and fairly easy to learn. And I feel ready to start with a non Bookshop app.
But a friend warned me about Rails's 'convention over configuration' and the way it 'does things'. I cant see a 'problem' with that, but are there pitfalls?
And: Are there things Rails does very different than other framworks?
You're going to either get zero replies or a bunch of opinions. I would recommend googling for "rails is opinionated". Hopefully that will turn up more examples of what you might run into.
Is it a problem? No, not really. Can it be a problem? Yes, absolutely.
Integrating with legacy databases can be a PITA sometimes. Or if you have some insane desire to name all your primary keys something other than "id" that can be a problem.
Not so much a problem really, but you're fighting a lot of convention.
Really, other than legacy databases I can't think of anything off the top of my head bothers me about it's conventions.
What your friend says is only somehow true. Rails's naming conventions are powerfull and keep your brain free for other things.
But: If you think you have experience ... you are learning Rails -> you are back to school.
Rail's 'naming conventions' are not only conventions. They are Rails somehow. So if you break the convention, you are off road and soon in the middle of no where. I think that this part of Rails could be better pointed out in Guides.
Let me give an example: (you are tired of "books" and start wit a little app around 'Pubs')
You scaffold your Pup (intended typo)
You then put in some logic, put some work, then you realize your (oops) typo. Now dark clouds arise. Since you are experienced, you start correcting the typo ... PupsController -> PubsController, filename of PubsController (you are already off road) ...
You will end up at the database table 'pups' ... (middle of nowhere)
This happened because you think you are experienced. A beginner has built a new scaffold (without typo) or asked here on SO how to correct 'correctly')
An other example is to name thigs 'more nice'. After years and many projects you are probably one, that never uses "unspecified" names like 'user','role','guest','owner' for classes and so on. So you start to name them (nicley?): PubUser, PubOwner, ... Noboddy told you "DON'T".
You put all in a namespace (there are many people here saying "don't") with the nice name 'PubApp'
Although your files are well organized, you will end in tablenames: pub_app_pub_owners and so on, not to think about the name of assotiative Tables between them.
And later on you will type something like
link_to 'add' new_pub_app_pub_pub_guest_url
link_to 'add' new_pub_app_pub_owner_pub_url
This is probaly not what your intention was to make things 'clean'. And if you take a look at the 'beginers' link...
link_to 'add' new_pub_guest_url
What I do not want is to preferre one or the other.
I want to point out, that - since you are not experienced with Rails - you dont know where the things you are doing (off road) are leading you. With only a hard way to return.
Thats somehow a pitfall.
But next time you will know about that and make a compromise: 'Pubowner' and 'Guest' (and 'pa' as namespace (if you realy want to)
so
link_to 'add' new_pa_pubowner_guest_url
Is not so bad. But its hard to reverse things so think before ...
When writing application using any framework it may be necessary to write a lot of configuration code. However if we follow Rails Standard conventions then it is possible to avoid the excess configuration and in some cases no configuration at all. Thus, explicit configuration would be needed only in those cases where you can't follow the standard convention.
Following are the conventions provided by rails:
Naming conventions: Active Record uses some naming conventions to find out how the mapping between models and database tables should be created. Rails will pluralize your class names to find the respective database table. So, for a class Book, you should have a database table called books.
Example:
Database Table - Plural with underscores separating words (e.g., book_clubs).
Model Class - Singular with the first letter of each word capitalized (e.g., BookClub).
Schema Conventions: Active Record uses naming conventions for the columns in database tables, depending on the purpose of these columns.
Foreign keys - These fields should be named following the pattern singularized_table_name_id (e.g., item_id, order_id). These are the fields that Active Record will look for when you create associations between your models.
Primary keys - By default, Active Record will use an integer column named id as the table's primary key. When using migrations to create your tables, this column will be automatically created.
There is a point I never had before, that took me a nerve or two: Rails is caching a lot - even in development env. (for gods sake it does!)
Let me construct a scenario (no; not a construct, happend to me in a more complex variant)
After enough hours of work and you did all that was on the plan for the day, you close with a little cleanup. Check evering is still working - smile - and off
sleep
Back to the Computer you start all up and get a 'constant xy is not ...', so but, but why, overnight?
The answer is easy (if someone tells you at least once): Rail's caching does not (or better cant) check if a class / file / method is just removed, and not altered (sometimes ...)
So the (one to many) deleted file removed the Class, it contained from the world, but not from Rail's cache. Power off did the rest.
I had more subtile situations, that i solved with a rails server restart after i looked out of the window, if I am still on earth ...
What I try to point out is, there is no magic behind, but be warned, if you think you are smart enough to touch the framework code (why and how ever you want to do that). Big cache gets you back where you are, at beginners level.
I'm working on an ASP.net web application that uses SQL as a database back-end. One issue that I have is that it sometimes takes a while to get my DBA to create or modify tables in the database which under no circumstance am I allowed to modify on my own.
Here is something that I do is when I expect users to upload files with their data.
Suppose the user uploads a new record for a table called Student_Records. The user uploads a record with fname Bob and lname Smith. The record is assigned primary key 123 The user also uploads two files: attendance_record.pdf and homework_record.pdf. Let's suppose that I have a network share: \\foo\bar where the files are saved.
One way of handling this situtation would be to have a table Student_Records_Files that associates the key 123 with Bob Smith. However, since I have trouble getting tables created, I've gone and done something different: When I save the files on the server, I call them 123_attendance_record.pdf and 123_homework_record.pdf. That way, I can easily identify what table record each file is associated with without having to create a new SQL table. I am, in essence, using the file system itself as a join table (Obviously, the file system is a type of database).
In my code for retrieving the files, I scan the directory \\foo\bar and look for files that begin with each primary key number from Student_Records.
It seems to work very well, but is it good practice?
There is nothing wrong with using the file system to store files. It's what it is used for.
There are a few things to keep in mind though.
I would consider a better method of storing the files - perhaps a directory for each user, rather than simply appending the user id to the filename.
Ensure that the file store is resilient and backed up with the same regularity as your database. If your database is configured to give you a backup every 10 minutes, but your file store only does a backup every day (or worse week) then you might be in for a world of pain.
Also consider what would happen if the user uploads two documents that are the same name.
First of all, I think it's a bad practice, in general, to design your architecture based on how responsive your DBA is. Any given compromise based on this approach may or may not be a big deal, but over time it will result in a poorly designed system.
Second, making the file name this critical seems dangerous to me; there's no protection against a person or application modifying the filename without realizing its importance.
Third, one of the advantages of having a table to maintain the join between the person and the file is that you can add additional data, such as: when was the file uploaded, what is the MIME type, has the file been read by anyone through the system, is this file a newer version of a previous file, etc. etc. Metadata can be very powerful, and the filesystem offers only limited ways to store it.
There are really two questions here. One is, given that for administrative reasons you cannot get changes made to the database schema, is it acceptable to devise some workaround. To that I'd have to say yes. What else can you do? In theory, if it takes two weeks to get the DBA to make a schema change for you, then this two weeks should be added to any deadline that you are given. In practice, this almost never happens. I've often worked places where some paperwork or whatever required two weeks before I could even begin work, and then I'd be given two weeks and one day to do the project. Sometimes you just have to put it together with rubber bands and bandaids.
Two is, is it a good idea to build a naming convention into file names and use this to identify files and their relationship to other data. I've done this at times and it's generally worked for me, though I have a perhaps irrational emotional feeling that it's not a good idea.
On the plus side, (a) By building information into a file name, you make it easy for both the computer and a human being to identify file associations. (Human readable as long as the naming convention is straightforward enough, anyway.) (b) By eliminating the separate storage of a link, you eliminate the possibility of a bad link. A file with the appropriate name may not exist, of course, but a database record with appropriate keys may not exist, or the file reference in such a record may be null or invalid. So it seems to solve one problem there without creating any new problems.
Potential minuses are: (a) You may have characters in the key that are not legal in file names. You may be able to just strip such characters out, or this may cause duplicates. The only safe thing to do is to escape them in some way, which is a pain. (b) You may exceed the legal length of a file name. Not as much of an issue as it was in the bad old 8.3 days. (c) You can't share files. If a database record points to a file, then two db records could point to the same file. If you must make two copies of a file, not only does this waste disk space, but it also means that if the file is updated, you must be sure to update all copies. If in your application it would make no sense to share files, than this isn't an issue.
You have to manage the files in some way, but you had to do that anyway.
I really can't think of any over-riding minuses. As I say, I've done this on occassion and didn't run into any particular problems. I'm interested in seeing others' responses.
I think it is not good practice because you are making your working application very dependent on specific implementation details and it would make it pretty hard to work with in the future to maintain, or if other people later needed access to your code/api.
Now weather you should do this or not is a whole different question. If you are really taking that much of a performance hit and it is significantly easier to work with how you have it, then I would say go ahead and break the rules. Ideally its good to follow best practice methods, but sometimes you have to bend the rules a little to make things work.
First, why is this a table change as opposed to a data change? Once you have the tables set up you should only need to update rows in that table every time that a user adds new files. If you have to put up with this one-time, two-week delay then bite the bullet and just get it done right.
Second, instead of trying to work around the problem why don't you try to fix the problem? Why is the process of implementing table changes so slow? Are you at least able to work on a development database (in which you have control to test and try out these changes)? Even if it's your own laptop you can at least continue on with development. Work with your manager, the DBA, and whoever else you need to, in order to improve the process. Would it help to speed things up if your scripts went through a formal testing process before you handed them off to the DBA so that he doesn't need to test the scripts, etc. himself?
Third, if this is a production database then you should probably be building in this two-week delay into your development cycle. You know that it takes two weeks for the DBA to review and implement changes in production, so make sure that if you have a deadline for releasing functionality that you have enough lead time for it.
Building this kind of "data" into a filename has inherent problems as others have pointed out. You have no relational integrity guarantees and the "data" can be changed without knowledge of the rest of the application/database.
It's best to keep everything in the database.
Network file I/O is spotty at best. In addition, its slower than the DB I/O.
If the DBA is difficult in getting small changes into the database, you
may be dealing with:
A political control issue. Maybe he just knows DB stuff and is threatened
when he perceives others moving in on his turf. Whatever his reasons, you need
to GET WORK DONE. Period. Document all the extra time / communication / work
you need to do for each small change and take that up with the management.
If the first level of management is unwilling to see things your way,
(it does not matter what their reasons are), escalate the issue
to the next level of management. In the past, I've gotten results this way.
It was more of a political territory problem than a technical problem.
The DBA eventually gave up and gave me full access to the TEST system BUT
he also stipulated that I would need to learn his testing process,
naming convention, his DB standards and practices, his way of testing, etc.
I was game.
I would also need to fix any database problems arising from changes I introduced.
This was fair and I got to wear the DBA hat in addition to the developer hat.
I got the freedom I needed and he got one less thing to worry about.
A process issue. Maybe the DBA needs to put every small DB change you submit
through a gauntlet of testing and performance analysis. Maybe he has a highly
normalized DB schema and because he has the big picture, he needs to normalize or
denormalize your requested DB changes to fit into the existing schema.
Ask to work with him. Ask him for a full DB design diagram.
Get a good sense of his DB design philosophy. Implement your DB changes with
his DB design philosophy in mind. Show that you understand that he's trying
to keep the DB in good order (understand normalization, relational constraints,
check constraints) Give him less to worry about. He needs to trust that you
will not muck up his database.
Accumulate all the small changes into a lengthy script and submit them to the DBA.
This way, you won't have to wait for each small change to go through all of his
process / testing. In addition, you're giving him a bigger picture view of your
development planning (that is in step with his DB design philosophy) instead of
just the play by play.
I'm working with a legacy database which due to poor management and design has had a wildgrowth of columns which never have been or are no longer beeing used.
Is it possible to some how query for column usage? As in how often a column is beeing selected (either specifically or with *, or joined on)?
Seems to me like this is something we should be able to somehow retrieve but i have been unable to find anything like this.
Greetings,
F.B. ten Kate
Unfortunately, this analysis on the DB side isn't really going to be a full answer. I've seen a LOT of instances where application code only needed 3 columns of a 10+ column table, but selected them all anyway.
Your column would still show up on a usage report in any sort of trace or profiling you did, but it still may not ACTUALLY be in use.
You might have to either a) analyze the entire collection of apps that use this website or b) start drafting the a return-on-investment style doc on whether it's worth rebuilding.
This article will give you a good idea of how to search all fixed code (prodedures, views, functions and triggers) for the columns that are used. The code in the article searches for a specific table/column combination. You could easily adapt it to run for all columns. For anything dynamically executed, you'd probably have to set up a profiler trace.
Even if you could determine whether a column had been used in the past X period of time, would that be good enough? There may be some obscure program out there that populates a column once a week, a month, a year; or once every time they click the mystery button that no one ever clicks, or to log the report that only Fred in accounting ever runs (he quit two years ago), or that gets logged to if that one rare bug happens (during daylight savings time, perhaps?)
My point is, the only way you can truly be certain that a column is absolutely not used by anything is to review everything -- every call, every line of code, every ad hoc Excel data dump, every possible contingency -- everything that references the database . As this may be all but unachievable, try to get a formally defined group of programs and procedures that must be supported, bend over backwards to make sure they are supported, and be prepared to fix things when some overlooked or forgotten piece of functionality turns up.
An interesting problem occured recently, and I've been thinking of the "best" way (for a given value of "best") to implement this.
In essence, it's one of tracking notes against source code. The example that flagged this was getting a problem fixed in live within SLAs, and how to best achieve this. Without going into all the details, it came down to finding a function that's used in a number of places which may or may not be buggy, yet the problem was being reporting only in a single location.
The fix to meet the SLAs was simply to add a check into the location where the problem was reported, rather than tweaking the common code and having to test everything that touches that function.
The interesting issue is then for upstreaming. The "correct" method would then be to go back and check the original function, validate it's correct for everywhere it's called and then make the change "properly" if its determined the library function is wrong.
The problem is this takes time, so upstreaming may simply take the workaround, etc. However if the problem occurs again (say six months later) in another location calling the same library function, there isn't an easy way to link the two problems together. You can search the bug tracking database, but this isn't guranteed to help - it depends if a note's been added saying something along the lines of "this library function needs more thorough checking, but no time to investigate now".
So the question is this: within a large team of developers (30 plus, split into teams of both support and on-going development), what methods do you use to manage (what are effectively) "sticky notes" against source code, short of adding a comment to the suspicious function's source code saying "this might be a bit dodgy"?
The problem with the commiting a comment is one of process: a change is a change, so committing a zero-change change (i.e., one where just comments are added) is not ideal; developers can make mistakes even adding a comment (hit a stray key or something) so it's always (IMO) better to commit only where actual changes are made.
Now a wiki could be used to track per-file notes, but we've got a minimum of four branches and inexcess of a few hundred files (SQL objects, source code, XML files, etc), so a wiki will get unmangable quite quickly.
This is the sort of thing that it would be nice if SCM's could support - bits of metadata against files that are simply notes, but don't add to the SCM's version history - that can be displayed when doing (say) an svn update, or manually viewed.
There may already be solutions out there -- so how do you manage this type of knowledge sharing?
Well we're now using this method: in each folder checked into SVN, we've created a .url shortcut (this is Windows we're dev'ing on) that links to a page on our development wiki about that folder. Thus we can update the Wiki info freely, and on checkout/update everyone gets a link that will take them to the appropriate Wiki page for that folder/module.
We've not long instigated it so we'll have to see how well it works long term -- but it's better than what we had before (i.e., nothing :-) ).