I have a multi-user application that keeps a centralized logfile for activity. Right now, that logging is going into text files to the tune of about 10MB-50MB / day. The text files are rotated daily by the logger, and we keep the past 4 or 5 days worth. Older than that is of no interest to us.
They're read rarely: either when developing the application for error messages, diagnostic messages, or when the application is in production to do triage on a user-reported problem or a bug.
(This is strictly an application log. Security logging is kept elsewhere.)
But when they are read, they're a pain in the ass. Grepping 10MB text files is no fun even with Perl: the fields (transaction ID, user ID, etc..) in the file are useful, but just text. Messages are written sequentially, one like at a time, so interleaved activity is all mixed up when trying to follow a particular transaction or user.
I'm looking for thoughts on the topic. Anyone done application-level logging with an SQL database and liked it? Hated it?
I think that logging directly to a database is usually a bad idea, and I would avoid it.
The main reason is this: a good log will be most useful when you can use it to debug your application post-mortem, once the error has already occurred and you can't reproduce it. To be able to do that, you need to make sure that the logging itself is reliable. And to make any system reliable, a good start is to keep it simple.
So having a simple file-based log with just a few lines of code (open file, append line, close file or keep it opened, repeat...) will usually be more reliable and useful in the future, when you really need it to work.
On the other hand, logging successfully to an SQL server will require that a lot more components work correctly, and there will be a lot more possible error situations where you won't be able to log the information you need, simply because the log infrastructure itself won't be working. And something worst: a failure in the log procedure (like a database corruption or a deadlock) will probably affect the performance of the application, and then you'll have a situation where a secondary component prevents the application of performing it's primary function.
If you need to do a lot of analysis of the logs and you are not comfortable using text-based tools like grep, then keep the logs in text files, and periodically import them to an SQL database. If the SQL fails you won't loose any log information, and it won't even affect the application's ability to function. Then you can do all the data analysis in the DB.
I think those are the main reasons why I don't do logging to a database, although I have done it in the past. Hope it helps.
We used a Log Database at my last job, and it was great.
We had stored procedures that would spit out overviews of general system health for different metrics that I could load from a web page. We could also quickly spit out a trace for a given app over a given period, and if I wanted it was easy to get that as a text file, if you really just like grep-ing files.
To ensure the logging system does not itself become a problem, there is of course a common code framework we used among different apps that handled writing to the log table. Part of that framework included also logging to a file, in case the problem is with the database itself, and part of it involves cycling the logs. As for the space issues, the log database is on a different backup schedule, and it's really not an issue. Space (not-backed-up) is cheap.
I think that addresses most of the concerns expressed elsewhere. It's all a matter of implementation. But if I stopped here it would still be a case of "not much worse", and that's a bad reason to go the trouble of setting up DB logging. What I liked about this is that it allowed us to do some new things that would be much harder to do with flat files.
There were four main improvements over files. The first is the system overviews I've already mentioned. The second, and imo most important, was a check to see if any app was missing messages where we would normally expect to find them. That kind of thing is near-impossible to spot in traditional file logging unless you spend a lot of time each day reviewing mind-numbing logs for apps that just tell you everything's okay 99% of the time. It's amazing how freeing the view to show missing log entries is. Most days we didn't need to look at most of the log files at all... something that would be dangerous and irresponsible without the database.
That brings up the third improvement. We generated a single daily status e-mail, and it was the only thing we needed to review on days that everything ran normally. The e-mail included showed errors and warnings. Missing logs were re-logged as warning by the same db job that sends the e-mail, and missing the e-mail was a big deal. We could send forward a particular log message to our bug tracker with one click, right from within the daily e-mail (it was html-formatted, pulled data from a web app).
The final improvement was that if we did want to follow a specific app more closely, say after making a change, we could subscribe to an RSS feed for that specific application until we were satisfied. It's harder to do that from a text file.
Where I'm at now, we rely a lot more on third party tools and their logging abilities, and that means going back to a lot more manual review. I really miss the DB, and I'm contemplated writing a tool to read those logs and re-log them into a DB to get these abilities back.
Again, we did this with text files as a fallback, and it's the new abilities that really make the database worthwhile. If all you're gonna do is write to a DB and try to use it the same way you did the old text files, it adds unnecessary complexity and you may as well just use the old text files. It's the ability to build out the system for new features that makes it worthwhile.
yeah, we do it here, and I can't stand it. One problem we have here is if there is a problem with the db (connection, corrupted etc), all logging stops. My other big problem with it is that it's difficult to look through to trace problems. We also have problems here with the table logs taking up too much space, and having to worry about truncating them when we move databases because our logs are so large.
I think its clunky compared to log files. I find it difficult to see the "big picture" with it being stored in the database. I'll admit I'm a log file person, I like being able to open a text file and look through (regex) it instead of using sql to try and search for something.
The last place I worked we had log files of 100 meg plus. They're a little difficult to open, but if you have the right tool it's not that bad. We had a system to log messages too. You could quickly look at the file and determine which set of log entries belonged which process.
We've used SQL Server centralized logging before, and as the previous posted mentioned, the biggest problem was that interrupted connectivity to the database would mean interrupted logging. I actually ended up adding a queuing routine to the logging that would try the DB first, and write to a physical file if it failed. You'd just have to add code to that routine that, on a successful log to the db, would check to see if any other entries are queued locally, and write those too.
I like having everything in a DB, as opposed to physical log files, but just because I like parsing it with reports I've written.
I think the problem you have with logging could be solved with logging to SQL, provided that you are able to split out the fields you are interested in, into different columns. You can't treat the SQL database like a text field and expect it to be better, it won't.
Once you get everything you're interested in logging to the columns you want it in, it's much easier to track the sequential actions of something by being able to isolate it by column. Like if you had an "entry" process, you log everything normally with the text "entry process" being put into the "logtype" column or "process" column. Then when you have problems with the "entry process", a WHERE statement on that column isolates all entry processes.
we do it in our organization in large volumes with SQL Server. In my openion writing to database is better because of the search and filter capability. Performance wise 10 to 50 MB worth of data and keeping it only for 5 days, does not affect your application. Tracking transaction and users will be very easy compare to tracking it from text file since you can filter by transaction or user.
You are mentioning that the files read rarely. So, decide if is it worth putting time in development effort to develop the logging framework? Calculate your time spent on searching the logs from log files in a year vs the time it will take to code and test. If the time spending is 1 hour or more a day to search logs it is better to dump logs in to database. Which can drastically reduce time spend on solving issues.
If you spend less than an hour then you can use some text search tools like "SRSearch", which is a great tool that I used, searches from multiple files in a folder and gives you the results in small snippts ("like google search result"), where you click to open the file with the result interested. There are other Text search tools available too. If the environment is windows, then you have Microsoft LogParser also a good tool available for free where you can query your file like a database if the file is written in a specific format.
Here are some additional pros and cons and the reason why I prefer log files instead of databases:
Space is not that cheap when using VPS's. Recovering space on live database systems is often a huge hassle and you might have to shut down services while recovering space. If your logs is so important that you have to keep them for years (like we do) then this is a real problem. Remember that most databases does not recover space when you delete data as it simply re-uses the space - not much help if you are actually running out of space.
If you access the logs fequently and you have to pull daily reports from a database with one huge log table and millions and millions of records then you will impact the performance of your database services while querying the data from the database.
Log files can be created and older logs archived daily. Depending on the type of logs massive amounts of space can be recovered by archiving logs. We save around 6x the space when we compress our logs and in most cases you'll probably save much more.
Individual smaller log files can be compressed and transferred easily without impacting the server. Previously we had logs ranging in the 100's of GB's worth of data in a database. Moving such large databases between servers becomes a major hassle, especially due to the fact that you have to shut down the database server while doing so. What I'm saying is that maintenance becomes a real pain the day you have to start moving large databases around.
Writing to log files in general are a lot faster than writing to DB. Don't underestimate the speed of your operating system file IO.
Log files only suck if you don't structure your logs properly. You may have to use additional tools and you may even have to develop your own to help process them, but in the end it will be worth it.
You could log to a comma or tab delimited text format, or enable your logs to be exported to a CSV format. When you need to read from a log export your CSV file to a table on your SQL server then you can query with standard SQL statements. To automate the process you could use SQL Integration Services.
I've been reading all the answers and they're great. But in a company I worked due to several restrictions and audit it was mandatory to log into a database. Anyway, we had several ways to log and the solution was to install a pipeline where our programmers could connect to the pipeline and log into database, file, console, or even forwarding log to a port to be consumed by another applications.
This pipeline doesn't interrupt the normal process and keeping a log file at the same time you log into the database ensures you rarely lose a line.
I suggest you investigate further log4net that it's great for this.
http://logging.apache.org/log4net/
I could see it working well, provided you had the capability to filter what needs to be logged and when it needs to be logged. A log file (or table, such as it is) is useless if you can't find what you're looking for or contains unnecessary information.
Since your logs are read rarely, I would write them on file (better performance and reliability).
Then, if and only if you need to read them, I would import the log file in a data base (better analysis).
Doing so, you get the advantages of both methods.
Related
I have the following use case for which I'm trying to find an optimal use of either filsystem, database (rdbms or a flavour of noSql solution). Any advice is welcome, as I want to see what is optimal.
Client application: will generate logs intervals of 1-3 seconds. By logs I mean structured log data (either about connections, applications used, processes used, screenshots, etc..). Some log data will be structured, some will be unstructured (where the schema can change thus).
Storage solution: will need to persist all this data very fast. Will sit on 1-* server(s). It doesn't matter if it's a hybrid solution between filesystem/rdbms/(any suitable flavour of) noSql.
Post processing: the data needs to be queryable ofcourse. E.g. just a key-value store would not suffice, that's a given (maybe for the screenshots only yes).
As a reference, here's a more concrete example:
User runs the client for 2-3 hours (during a "monitoring period"). It sends log data over the wire to the server (storage). Writing speed and data accuracy is vital here.
Management system accumulates the data and makes a report on certain characteristics. All log data should be able to be fetched if needed - but there will be a specific query for a set of users in a given monitoring period. Reading speed is less necessary here, but data accuracy and finding all log parts back eventually is necessary.
If I need to give more information, please let me know.
If you prefer to roll your own rather than use logging packages, I would stick with append only text files. You can certainly encode screenshots in Base64 and keep it in the same file, but I would rather store that separately in the file system with a generated filename stored in the log.
As for reporting, you can obviously read it through a text editor, but if you need a more sophisticated and regular management reporting, you can create an ETL of only the info you report on into a RDBMS. You can always go back and rerun ETL if you decide that you want more info later on.
I am open for the suggestions on the following:
have a file on S3
this file will be randomly downloaded by random people
the volume of downloads is low, maybe 200-300 at most per day, on a spike, but usually might be as low as 5-10.
file size is ~10-20mb.
I need somehow to count a) how many accesses to the file happened b) how many full (completed) downloads happened.
I believe the only good day is just have some Ruby or Node.js script. It'll count accesses, then somehow supply the file, and on final byte do the completed count.
Unfortunately, doesn't seem like a too nice of the approach.
Any better ideas?
I was also thinking about enabling access logging on S3 and then parsing logs, but that doesn't seem too good neither, as requires downloading and parsing logs.
I would stick with your first idea. Having some sort of server-side logic handling the counting stuff.
I don't know which type of clients are accessing your system, but with this approach you can parse additional data coming from your clients, like HTTP headers (if applicable), etc, and that can help you identifying the profile of your clients. That might not be useful at all to you, though.
Also, if you ever need to add more complicated logic (authentication, privileges, permissions, uploading files, etc), it will be much easier once you already have a backend application/script in place.
I'm looking to apply continuous delivery concepts to web app we are building, and wondering if there any solution to protecting the database from accidental erroneous commit. For example, a bug that erases whole table instead of a single record.
How this issue impact can be limited according to continuous delivery doctorine, where the application deployed gradually over segments of infrastructure?
Any ideas?
Well first you cannot tell just from looking what is a bad SQL statement. You might have wanted to delete the entire contents of the table. Therefore is is not physiucally possible to have an automated tool that detects intent.
So to protect your database, first make sure you are in full recovery (not simple) mode and have full backups nightly and transaction log backups every 15 minutes or so. Now you cannot lose much information no matter how badly the process breaks. Your dbas should be trained to be able to recover to a point in time. If you don't have any dbas, I'd suggest the best thing you can do to protect your data is hire some. This is a non-negotiable in any non-trivial database environment and it is terribly risky not to have trained, experienced dbas if your data is critical to the business.
Next, you need to treat SQL like any other code, it should be in source control in scripts. If you are terribly concerned about accidental deletions, then write the scripts for deletes to copy all deletes to a staging table and delete the content of the staging table once a week or so. Enforce this convention in the code reviews. Or better yet set up an auditing process that runs through triggers. Once all records are audited, it is much easier to get back the 150 accidental deletions without having to restore a database. I would never consider having any enterprise application without auditing.
All SQL scripts without exception should be code-reviewed just like other code. All SQL scripts should be tested on QA and passed before moving to porduction. This will greatly reduce the possiblility for error. No developer should have write rights to production, only dbas should have that. Therefore each script should be written so that is can just be run, not run one chunk at a time where you could accidentally forget to highlight the where clause. Train your developers to use transactions correctly in the scripts as well.
Your concern is bad data happening to the database. The solution is to use full logging of all transactions so you can back out of transactions that you want to. This would usually be used in a context of full backups/incremental backups/full logging.
SQL Server, for instance, allows you to restore to a point in time (http://msdn.microsoft.com/en-us/library/ms190982(v=sql.105).aspx), assuming you have full logging.
If you are creating and dropping tables, this could be an expensive solution, in terms of space needed for the log. However, it might meet your needs for development.
You may find that full-logging is too expensive for such an application. In that case, you might want to make periodic backups (daily? hourly?) and just keep these around. For this purpose, I've found LightSpeed to be a good product for fast and efficient backups.
One of the strategies that is commonly adopted is to log the incremental sql statements rather than a collective schema generation so you can control the change at a much granular levels:
ex:
change 1:
UP:
Add column
DOWN:
Remove column
change 2:
UP:
Add trigger
DOWN:
Remove trigger
Once the changes are incrementally captured like this, you can have a simple but efficient script to upgrade (UP) from any version to any version without having to worry about the changes that happening. When the change # are linked to build, it becomes even more effective. When you deploy a build the database is also automatically upgraded(UP) or downgraded(DOWN) to that specific build.
We have an pipeline app which does that at CloudMunch.
background:
I'm in the design phase of building an app.
I want the app to display text and images, the problem is that I will have A LOT of them. hundreds to thousands.
This is my largest app so far, and I am unsure on how to handle all the data.
The question???????:
What would be the best way to store and access these images and text?
Would I use a formal database approach like SQL?
Or would it be better to navigate files/folders e.g. dropping all the files in res/drawable?
potentially useful facts:
The database will be stored and accessed natively so it can be accessed off-line.
The user will not be adding to the database in anyway, only accessing the data.
the database will be updated every 6 months.
The application 'page' will display 1-5 images along with several blocks of text.
Concept:
the app will be like a recipe app...the user will pick some parameters e.g. ingredients, type, diet.. then select a recipe. And then several images and blocks of text will be displayed showing and detailing the process of some recipe.
I apologize if this is repeated but I didn't see a specific answer for my purposes.
The "Best" approach will depend on the functionality of the database server in question.
Generally, you should store the images "In" the database until that becomes a performance issue. Once you start storing images "Outside" of the database you will have to handle all the issue that are normally taken care of by the database. Disk space management, orphan records, file name conflicts, folder file limits, to name just a few. Depending on your situation these may be big issues or thay may be nothing to worry about.
I've seen several application where images (or attachements) were kept "Outside" the database, and in each case it was done poorly. There are just so many issues to handle, and most developers don't even think of half of them. In many cases the performance of storing the images "In" the databse was acceptable, but the developers decided against it because they just knew it would not perform well.
If your using SQL server 2008 the Filestream data type is ideal for your case. It stores the binary files outside of the database but behaves as a normal field. Also you are able to read/write the files using a stream instead of getting/setting the whole file as a byte array (like when using varbin(max))
If you don't have this functionality in your database, I would recommend storing the images outside of the DB
Its probably a better idea to use a file based approach for deployed static resources.
At the very least because taking a dependency on file system is typically easier to manage then taking a dependency on a DB.
Also this line indicates some sort of non-web client
The database will be stored and accessed natively so it can be accessed off-line."
This means if you go with the DB approach you'll have a couple of other interesting problems
Deployment
Depending on the platform deploying a DB can be a real bear depending on your target platform. What happens if they if already have the engine but its a different version.
Resources
Is your DB going to be client/server based (like MySQL/SQL Server etc)? If so then your app has to now manage the current state of its process. If not then you'll be using a file-based db SQL Lite/MS Access, at which point I would question why using a static DB is worth doing at all.
One final note. There's nothing stopping your Content Production environment from using a DB. Its quite common for Content producers to maintain a database for their content that will you will later use to produce the files for publishing/deployment.
I've worked in shops where I've implemented Exception Handling into the event log, and into a table in the database.
Each have their merits, of which I can highlight a few based on my experience:
Event Log
Industry standard location for exceptions (+)
Ease of logging (+)
Can log database connection problems here (+)
Can build report and viewing apps on top of the event log (+)
Needs to be flushed every so often, if alot is reported there (-)
Not as extensible as SQL logging [add custom fields like method name in SQL] (-)
SQL/Database
Can handle large volumes of data (+)
Can handle rapid volume inserts of exceptions (+)
Single storage location for exception in load balanced environment (+)
Very customizable (+)
A little easier to build reporting/notification off of SQL storage (+)
Different from where typical exceptions are stored (-)
Am I missing any major considerations?
I'm sure that a few of these points are debatable, but I'm curious what has worked best for other teams, and why you feel strongly about the choice.
You need to differentiate between logging and tracing. While the lines are a bit fuzzy, I tend to think of logging as "non developer stuff". Things like unhandled exceptions, corrupt files, etc. These are definitely not normal, and should be a very infrequent problem.
Tracing is what a developer is interested in. The stack traces, method parameters, that the web server returned an HTTP Status of 401.3, etc. These are really noisy, and can produce a lot of data in a short amount of time. Normally we have different levels of tracing, to cut back the noise.
For logging in a client app, I think that Event Logs are the way to go (I'd have to double check, but I think ASP.NET Health Monitoring can write to the Event Log as well). Normal users have permissions to write to the event log, as long as you have the Setup (which is installed by an admin anyway) create the event source.
Most of your advantages for Sql logging, while true, aren't applicable to event logging:
Can handle large volumes of data:
Do you really have large volumes of unhandled exceptions or other high level failures?
Can handle rapid volume inserts of exceptions: A single unhandled exception should bring your app down - it's inherently rate limited. Other interesting events to non developers should be similarly aggregated.
Very customizable: The message in an Event Log is pretty much free text. If you need more info, just point to a text or structured XML or binary file log
A little easier to build reporting/notification off of SQL storage: Reporting is built in with the Event Log Viewer, and notification systems are, either inherent - due to an application crash - or mixed in with other really critical notifications - there's little excuse for missing an Event Log message. For corporate or other networked apps, there's a thousand and 1 different apps that already cull from Event Logs for errors...chances are your sysadmin is already using one.
For tracing, of which the specific details of an exception or errors is a part of, I like flat files - they're easy to maintain, easy to grep, and can be imported into Sql for analysis if I like.
90% of the time, you don't need them and they're set to WARN or ERROR. But, when you do set them to INFO or DEBUG, you'll generate a ton of data. An RDBMS has a lot of overhead - for performance (ACID, concurrency, etc.), storage (transaction logs, SCSI RAID-5 drives, etc.), and administration (backups, server maintenance, etc.) - all of which are unnecessary for trace logs.
I wouldn't log straight to the database. As you say, database issues become tricky to log :)
I would log to the filesystem, and then have a job which bulk-inserts from files to the database. Personally I like having the logs in the database in the log run primarily for the scaling situation - I pretty much assume I'll have more than one machine running, and it's handy to be able to effectively have a combined log. (Each entry should state the machine it comes from, of course.)
Report and viewing apps can be done very easily from a database - there may be fewer log-specialized reporting tools out there at the moment, but pretty much all databases have generalised reporting functionality.
For ease of logging, I'd use a framework like log4net which takes a lot of the effort out of it, and is a tried and tested solution. Aside from anything else, that means you can change your output strategy with no code changes. You could even log to both the event log and the database if necessary, or send some logs to one place and some to the other. (I've assumed .NET here, but there are similar logging frameworks for many platforms.)
One thing that needs considering about event logging is that there are products out there which can monitor your servers' event logs (like Microsoft Operations Manager) and intelligently do notification, and gather statistics on their contents.
A "minus" of SQL-based logging is that it adds another layer of dependencies to your application, which may or may not always be acceptable. I've done both in my career. I once or twice even used a MSMQ based message queue to queue log events and empty the queue into a MSSQL database to eliminate the need for my client software to have a connection to the DB.
One note about writing to the event log: that requires certain permissions for your application users that in some environments may be restricted by default.
Where I'm at we do most of our logging to a database, with flat files as backup. It's pretty nice, we can do things like get an RSS feed for an app to watch for a few days when we make a change.