I am not sure this is the appropriate place to post this question, although I arrive to it in the context of writing a relatively complex SQL query. If it is not, please accept my apologies and, if I can abuse your kindness, point me to the appropriate Stack Exchange community where I might find a solution to my problem.
I had to implement a "simple" search; simple for the user to use as it only had one field but rather complicated for me to implement because any expression entered in the search field could be interpreted in quite a few different ways, involving multiple tables and relationships. So I decided to break up the search into simpler queries, each dumping a group of ids based on a specific set of criteria into a table variable. Finally, I would consolidate all the temporary results into a single set, joining in any missing fields and sorting the set according to the user preferences. The results seemed accurate, but at times the performance was surprisingly and unacceptably slow.
After some investigation, I learned that if a table variable is joined with other tables in SQL Server, it may result in slow performance due to inefficient query plan selection because SQL Server does not support statistics or track number of rows in a table variable while compiling a query plan. Apparently, however, I could improve the situation by enabling trace flag 2453, which allows a table variable to trigger recompile when enough number of rows are changed. Indeed, times dropped from approximately 30 seconds to well under 1 second, so I decided to test the trace flag in other queries that use table variables containing potentially many rows.
What I did not expect was that every enabling and disabling of trace flag 2453 would yield an informational event (similar to "DBCC TRACEON 2453, server process ID (SPID) 56. This is an informational message only; no user action is required") both in the SQL Server log and in the operating system event log. But some of the queries in which I have enabled the trace flag are run many times, which is causing those informational messages that require no user action to flood both logs, hiding other potentially useful messages that might require user action.
Through some additional investigation, I have learned that trace flag 2505 can prevent some of these events from being logged, but in my test environment (running SQL Server Express 2012 on Windows 7 SP1) this did not solve my issue.
Does anybody know how to prevent these messages from flooding the logs? Thanks!
Related
I have an issue in our database(AS400- DB2) in one of our tables all the rows were deleted. I do not know if it was a program or SQL that a user executed. All I know it hapend +- 3am in the morning. I did check for any scheduled jobs at that time.
We managed to get the data back from backups but I want to investigate what deleted the records or what user.
Are there any logs on die as400 on physical tables to check what SQL executed and when on a specified table? This will help me determine what caused this.
I tried checking I systems navigator but could not find any logs... Is there a way of getting transnational data on a table using i system navigator or green screen? And If I can get the SQL that executed in the timeline.
Any help would be appreciated.
There was no mention of how the time was inferred\determined, but for lack of journaling, I would suggest a good approach is immediately to gather information about the file and member; DSPOBJD for both *SERVICE and *FULL, DSPFD for *ALL, DMPOBJ, and perhaps even a copy of the row for the TABLE from the catalog [to include the LAST_ALTERED_TIMESTAMP for ALTEREDTS column of SYSTABLES or the based-on field DBXATS from the QADBXREF]. Gathering those, worthwhile almost only if done before any other activity [esp. before any recovery activity], can help establish the time of the event and perhaps allude to what was the event; most timestamps are reflective of only of the most recent activity against the object [rather than as a historical log], so any recovery activity is likely to effect loss of any timestamps that would be reflective of the prior event\activity.
Even if there was no journal for the file and nothing in the plan cache, there may have been [albeit unlikely] an active SQL Monitor. An active monitor should be available visible somewhere in the iNav GUI as well. I am not sure of the visibility of a monitor that may have been active in a prior time-frame.
Similarly despite lack of journaling, there may be some system-level object or user auditing in effect for which the event was tracked either as a command-string or as an action on the file.member; combined with the inferred timing, all audit records spanning just before until just after can be reviewed.
Although there may have been nothing in the scheduled jobs, the History Log (DSPLOG) since that time may show jobs that ended, or [perhaps soon] prior to that time show jobs that started, which are more likely to have been responsible. In my experience, often the name of the job may be indicative; for example the name of the job as the name of the file, perhaps due only to the request having been submitted from PDM. Any spooled [or otherwise still available] joblogs could be reviewed for possible reference to the file and\or member name; perhaps a completion message for a CLRPFM request.
If the action may have been from a program, the file may be recorded as a reference-object such that output from DSPPGMREF may reveal programs with the reference, and any [service] program that is an SQL program could have their embedded SQL statements revealed with PRTSQLINF; the last-used for those programs could be reviewed for possible matches. Note: module and program sources can also be searched, but there is no way to know into what name they were compiled or into what they may have been bound if created only temporarily for the purpose of binding.
Using System i Navigator, expand Databases. Right click on your system database. Select SQL Plan Cache-> Show Statements. From here, you can filter based on a variety of criteria.
This is not sure-fire, but often saves me some time. Using System i Navigator, right-click on the table and choose Index Advisor. If you're lucky, one or more indexes are advised. If so, sort by date last advised and right click on the index with the newest date and select Show Statements... In that dialog box, either sort by date to help narrow things down or just scroll through the statements to find the one you're interested in. Right-click it and select Work with SQL Statement and there you go.
I am creating an application that allows users to construct complex SELECT statements. The SQL that is generated cannot be trusted, and is totally arbitrary.
I need a way to execute the untrusted SQL in relative safety. My plan is to create a database user who only has SELECT privileges on the relevant schemas and tables. The untrusted SQL would be executed as that user.
What could possibility go wrong with that? :)
If we assume postgres itself does not have critical vulnerabilities, the user could do a bunch of cross joins and overload the database. That could be mitigated with a session timeout.
I feel like there is a lot more than could go wrong, but I'm having trouble coming up with a list.
EDIT:
Based on the comments/answers so far, I should note that the number of people using this tool at any given time will be very near 0.
SELECT queries can not change anything in databse. Lack of dba privileges guarantee that any global settings can not be changed. So, overload is truely the only concern.
Onerload can be result of complex queryies or too much simple queries.
Too complex queryies can be ruled out by setting statement_timeout in postgresql.conf
Receiving plenties of simple queryies can be avoided too. Firstly, you can set parallel connection limit per user (alter user with CONNECTION LIMIT). And if you have some interface program between user and postgresql, you can additionally (1) add some extra wait after each query completion, (2) introduce CAPTCHA to avoid automated DOS-attack
ADDITION: PostgreSQL public system functions give many possible attack vectors. They can be called like select pg_advisory_lock(1) and every user have privilege to call them. So, you should restrict access to them. Good option is creating whitelist of all "callable words" or, more precisely, identifiers that can be used with ( after them. And rule out all queryies that include call-like construct identifier ( with an identifier not in white list.
Things that come to mind, in addition to having the user SELECT-only and revoking privileges on functions:
Read-only transaction. When a transaction is started by BEGIN READ ONLY, or SET TRANSACTION READ ONLY as its first instruction, it cannot write anything, independantly of the user permissions.
At the client side, if you want to restrict it to one SELECT, better use a SQL submission function that does not accept several queries bundled into one. For instance, the swiss-knife PQexec method of the libpq API does accept such queries and so does every driver function that is built on top of it, like PHP's pg_query.
http://sqlfiddle.com/ is a service dedicated to running arbitrary SQL statements which may be seen somehow as a proof-of-concept that it's doable without being hacked or DDos'ed all day long.
The problem with this, is i'm not sure if the sql itself will still continue to run in the background after a session timeout (can't really find much evidence either way via google and haven't had any real experience where I've attempted it myself either). If you're limiting to just select access, i think this is about the worst that could happen though. The real issue would be what happens if you got a hundred users trying to do complex cross joins? Session timeout dropping the query or not, it'll put a real heavy load on the database (could very easily be enough to pull the database down entirely)
The only way (from my point of view) to protect yourself against DoS on main server with crafted queries is to set up a read only replica of the Postgres DB and a special limited user on this replica DB. This way the main Postgres server wont be affected by queries on replica.
Also you will get hot standby / continuous replication DB for the case, when main DB fails for some reason.
I'm running the following setup:
Physical Server
Windows 2003 Standard Edition R2 SP2
IIS 6
ColdFusion 8
JDBC connection to iSeries AS400 using JT400 driver
I am running a simple SQL query against a file in the database:
SELECT
column1,
column2,
column3,
....
FROM LIB/MYFILE
No conditions.
The file has 81 columns - aplhanumeric and numeric - and about 16,000 records.
When I run the query in the emulator using the STRSQL command, the query comes back immediately.
When I run the query on my Web Server, it takes about 30 seconds.
Why is this happening, and is there any way to reduce this time?
While I cannot address whatever overhead might be involved in your web server, I can say there are several other factors to consider:
This may likely have to do primarily in the differences between the way the two system interfaces work.
Your interactive STRSQL session will start displaying results as quickly as it receives the first few pages of data. You are able to page down through that initial data, but generally at some point you will see a status message at the bottom of the screen indicating that it is now getting more data.
I assume your web server is waiting until it receives the entire result set. It wants to get all the data as it is building the HTML page, before it sends the page. Thus you will naturally wait longer.
If this is not how your web server application works, then it is likely to be a JT400 JDBC Properties issue.
If you have overridden any default settings, make sure that those are appropriate.
In some situations the OPTIMIZATION_GOAL settings might be a factor. But if you are reading the table (aka physical file or PF) directly, in its physical sequence, without any index or key, then that might not apply here.
Your interactive STRSQL session will default to a setting of *FIRSTIO, meaning that the query is optimized for returning the first pages of data quickly, which corresponds to the way it works.
Your JDBC connection will default to a "query optimize goal" of "0", which will translate to an OPTIMIZATION_GOAL setting of *ALLIO, unless you are using extended dynamic packages. *ALLIO means the optimizer will try to minimize the time needed to return the entire result set, not just the first pages.
Or, perhaps first try simply adding FOR READ ONLY onto the end of your SELECT statement.
Update: a more advanced solution
You may be able to bypass the delay caused by waiting for the entire result set as part of constructing the web page to be sent.
Send a web page out to the browser without any records, or limited records, but use AJAX code to load the remainder of the data behind the scenes.
Use large block fetches whenever feasible, to grab plenty of rows in one clip.
One thing you need to remember, the i saves the access paths it creates in the job in case they are needed again. Which means if you log out and log back in then run your query, it should take longer to run, then the second time you run the query it'll be faster. When running queries in a web application, you may or may not be reusing a job meaning the access paths have to be rebuilt.
If speed is important. I would:
Look into optimizing the query. I know there are better sources, but I can't find them right now.
Create a stored procedure. A stored procedure saves the access paths created.
With only 16000 rows and no WHERE or ORDER BY this thing should scream. Break the problem down to help diagnose where the bottleneck is. Go back to the IBM i, run your query in the SQL command line and then use the B, BOT or BOTTOM command to tell the database to show the last row. THAT will force the database to cough up the entire 16k result set, and give you a better idea of the raw performance on the IBM side. If that's poor, have the IBM administrators run Navigator and monitor the performance for you. It might be something unexpected, like the 'table' is really a view and the columns you are selecting might be user defined functions.
If the performance on the IBM side is OK, then look to what Cold Fusion is doing with the result set. Not being a CF programmer, I'm no help there. But generally, when I am tasked with solving multi-platform performance issues, the client side tends to consume the entire result set and then use program logic to choose what rows to display/work with. The server is MUCH faster than the client, and given the right hints, the database optimiser can make some very good decisions about how to get at those rows.
i ran a t-sql query on SQL Server 2005 DB it was suppose to run for 4 days , but when i checked after 4 days i found the machine was abruptly closed. So i want to trace what happened with the query or in other words whats was the status of the query when machine went down
Is there a mechanism to check this
It may be very difficult to do such a post-mortem. But while it may not be possible to determine what portion of the long running query was active when the shutdown occured, it is important to try and find the root cause of the problem!
Depending on the way the query is written, SQL will attempt to roll-back any SQL that wasn't completed (or any uncommitted transaction if transactions were used). This will happen the first time the server is restarted; If you have any desire to analyze the SQL transaction log (yuck!) make a copy.
Before getting to the part of the query which may have been running at the time of the shutdown, it may be preferable to look into SQL logs (I mean application logs, with information messages, timestamps and such, not SQL transaction logs for a given database), in search of basic facts, and possibly of an error message, happening some time prior to the shutdown and that could plausibly be the origin of the problem. Do also check the Windows event logs, as they may indicate abnormal system-wide conditions that could be a cause or consequence of the SQL shutdown.
The SQL transaction log can be "decompiled" and used to understand what was readily updated/inserted/deleted in the database, however this type of info for a 4 days run may be burried quite deep and long and possibly intersped with unrelated queries. In fact an out of disk space for the SQL log could possibly cause the server to become erratic (I would expect a more elegant handling of such situation, but if somehow the drive in question was also the system drive, bad things could happen...)
Finaly, by analyzing the "4 days long" SQL script it may be possible to infer which parts were completed, thanks to various side-effects of the script. In nothing else, this review may allow putting back the database in a coherent state if necessary, or to truncate the script to exclude the parts readily completed, in preparation for a new run for completign the work.
I'm working on a project where we are thinking of using SQLCacheDependency with SQL Server 2005/2008 and we are wondering how this will affect the performance of the system.
So we are wondering about the following questions
Can the number of SQLCacheDependency objects (query notifications) have negative effect on SQL Server performance i.e. on insert, update and delete operations on affected tables ?
What effect (performance wise) would for example 50000 different query notifications on a single table have in SQL Server 2005/2008 on insertion and deletion on that table.
Are there any recommendations of how to use SQLCacheDependencies? Any official do‘s and don‘ts? We have found some information on the internet but haven‘t found information on performance implications.
If there is anyone here that has some answers to these questions that would be great.
The SQL Cache dependency using the polling mechanism should not be a load on the sql server or the application server.
Lets see what all steps are there for sqlcachedependency to work and analyze them:
Database is enabled for sqlcachedependency.
A table say 'Employee' is enabled for sqlcachedependency. (can be any number of tables)
Web.config is updated to enable sqlcachedependency.
The Page where u r using sql cache dependency is configured.
thats it.
Internally:
step 1. creates a table 'ASPnet_sqlcachetablesforchangenotification' in database which will store the 'Employee' table name for which sqlcachedependency is enabled. and add some stored procedures aswell.
step 2. inserts a 'Employee' table entry in the 'ASPnet_sqlcachetablesforchangenotification' table. Also creates an insert update delete trigger on this 'Employee' table.
step 3. enables application for sqlcachedependency by providing the connectionstring and polltime.
whenever there is a change in 'Employee' table, trigger is fired which inturn updates the 'ASPnet_sqlcachetablesforchangenotification' table.
Now application polls the database say every 5000ms and checks for any changes to the 'ASPnet_sqlcachetablesforchangenotification' table. if there r any changes the respective caches is removed from memory.
The great benefit of caching combined with freshness of data ( atmost data can be 5 seconds stale). The polling is taken care by a background process with should not be a performance hurdle. because as u see from above list the task are least CPU demanding.
SQLCacheDependency is implemented as an indexed view and every time the table is modified this views index gets changed. so many views (SQLCacheDependency objects) on the same table mean quite a perf hit for modifications. however if you have 1 view (SQLCacheDependency object) per table you should have no problems.
the cache changed notification is async and is triggered when the server has resources.
You're right, not much information on this is provided but there's a phrase related to your question in this page http://msdn.microsoft.com/en-us/library/ms178604%28VS.80%29.aspx
"The database operations associated with SQL cache dependency are simple and therefore do not incur a heavy processing cost on the server."
Hope this helps you although your question is a little bit old already.
This page appears to have some good info on setup which technique to use well (granted I did just skim it).
All I can provide is anecdotal evidence for performance, but we use SqlCacheDependency as a sort of "messaging solution" for a large enterprise application that processes on the order of ten thousand messages per hour.
The basic architecture is that our company uses Perforce for source control and we have a "subscription service" that receives messages from a trigger webservice call than gets called on every p4 commit and inserts a record into a SQL database. Our application has the dependency setup to send subscription notifications for every changeliest that affects a branch or path that you are monitoring.
The performance is fine. Trigger runs on the order of 200ms and we have never had a complaint about the latency of relaying the messages to end users.
As always, your mileage may vary.