I'm trying to understand how SQL Server 2005 replication works, and I'm looking at the views titled Msmerge_[Publication]_[Table]_VIEW. These views seems to define the merge filters, and are pretty straight forward, except for one line of sql in the WHERE clause:
AND ({fn ISPALUSER('1A381615-B57D-4915-BA4B-E16BF7A9AC58')} = 1)
What does the ISPALUSER function do? I cannot seem to find it anywhere under functions in management studio, nor really any mention of it on the web.
(The reason I'm looking at these views is that we have a performance issue when a client replicates up new records. Sql like if not exists (select 1 from [MSmerge_[Publication]_[Table]_VIEW] where [rowguid] = #rowguid) is running and taking 10+ seconds per row, which obiously kills performance when you have more than a couple of rows going up)
Seems its checking if the user is in the special security role MSmerge_PAL_role, which seems to govern who has access to the replication functionality.
Therefore, ISPALUSER checks if the user is in that specific role.
Still not sure what the PAL stands for.
Related
I'm developing an app for which I need some simple information about a given Microsoft SQL Server:
The maximum size the server supports¹
The total free space still available for a database to grow²
I googled about this and even though there are many pages on the matter with many different answers, I couldn't find anything that suits me because either it was too fancy (e.g. writting complex tables to files) or didn't actually work. The context I'm in is a simple software to manage the database; it will get those info, store in integer variables and then use them to work. So my idea here is simply to do a query, select style, grab one data and move on. An example of the only "function" I managed to find that would fit these criteria:
SELECT size FROM sys.master_files WHERE DB_NAME(database_id) = \'DB_NAME_HERE\'²
(Note: this is just an example, the function doesn't actually return what I want)
¹: For example, the Express editions may have up til 10Gb of allowed storage capacity (it depends on the version). So the function here should return a value close to that.
²: In advance: to know my database's size doesn't matter since it's a server with many databases. My app's logic needs the free space available for any database to grow, so calculating "total space - my db space" won't work.
I have the following script:
SELECT
DEPT.F03 AS F03, DEPT.F238 AS F238, SDP.F04 AS F04, SDP.F1022 AS F1022,
CAT.F17 AS F17, CAT.F1023 AS F1023, CAT.F1946 AS F1946
FROM
DEPT_TAB DEPT
LEFT OUTER JOIN
SDP_TAB SDP ON SDP.F03 = DEPT.F03,
CAT_TAB CAT
ORDER BY
DEPT.F03
The tables are huge, when I execute the script in SQL Server directly it takes around 4 min to execute, but when I run it in the third party program (SMS LOC based on Delphi) it gives me the error
<msg> out of memory</msg> <sql> the code </sql>
Is there anyway I can lighten the script to be executed? or did anyone had the same problem and solved it somehow?
I remember having had to resort to the ROBUST PLAN query hint once on a query where the query-optimizer kind of lost track and tried to work it out in a way that the hardware couldn't handle.
=> http://technet.microsoft.com/en-us/library/ms181714.aspx
But I'm not sure I understand why it would work for one 'technology' and not another.
Then again, the error message might not be from SQL but rather from the 3rd-party program that gathers the output and does so in a 'less than ideal' way.
Consider adding paging to the user edit screen and the underlying data call. The point being you dont need to see all the rows at one time, but they are available to the user upon request.
This will alleviate much of your performance problem.
I had a project where I had to add over 7 million individual lines of T-SQL code via batch (couldn't figure out how to programatically leverage the new SEQUENCE command). The problem was that there was limited amount of memory available on my VM (I was allocated the max amount of memory for this VM). Because of the large amount lines of T-SQL code I had to first test how many lines it could take before the server crashed. For whatever reason, SQL (2012) doesn't release the memory it uses for large batch jobs such as mine (we're talking around 12 GB of memory) so I had to reboot the server every million or so lines. This is what you may have to do if resources are limited for your project.
I have an sqlite3 table that tells when I gain/lose points in a game. Sample/query result:
SELECT time,p2 FROM events WHERE p1='barrycarter' AND action='points'
ORDER BY time;
1280622305|-22
1280625580|-9
1280627919|20
1280688964|21
1280694395|-11
1280698006|28
1280705461|-14
1280706788|-13
[etc]
I now want my running point total. Given that I start w/ 1000 points,
here's one way to do it.
SELECT DISTINCT(time), (SELECT
1000+SUM(p2) FROM events e WHERE p1='barrycarter' AND action='points'
AND e.time <= e2.time) AS points FROM events e2 WHERE p1='barrycarter'
AND action='points' ORDER BY time
but this is highly inefficient. What's a better way to write this?
MySQL has #variables so you can do things like:
SELECT time, #tot := #tot+points ...
but I'm using sqlite3 and the above isn't ANSI standard SQL anyway.
More info on the db if anyone needs it: http://ccgames.db.94y.info/
EDIT: Thanks for the answers! My dilemma: I let anyone run any
single SELECT query on "http://ccgames.db.94y.info/". I want to give
them useful access to my data, but not to the point of allowing
scripting or allowing multiple queries with state. So I need a single
SQL query that can do accumulation. See also:
Existing solution to share database data usefully but safely?
SQLite is meant to be a small embedded database. Given that definition, it is not unreasonable to find many limitations with it. The task at hand is not solvable using SQLite alone, or it will be terribly slow as you have found. The query you have written is a triangular cross join that will not scale, or rather, will scale badly.
The most efficient way to tackle the problem is through the program that is making use of SQLite, e.g. if you were using Web SQL in HTML5, you can easily accumulate in JavaScript.
There is a discussion about this problem in the sqlite mailing list.
Your 2 options are:
Iterate through all the rows with a cursor and calculate the running sum on the client.
Store sums instead of, or as well as storing points. (if you only store sums you can get the points by doing sum(n) - sum(n-1) which is fast).
I am currently trying to ring more performance out of my reporting website which uses linq to sql and an sql server express 2008 database.
I am finding that as I now approach a million rows in on of my more 'ugly' tables that performance is becoming a real issue, with one report in particular taking 3 minutes to generate.
Essentially, I have a loop that, for each user, hits the database and grabs a collection of data on them. This data is then queried in various ways (and more rows loaded as needed) until I have a nice little summary object that I can fire off to a set of silverlight charts. Lazy loading comes is used and the reporting pulls into data from around 8 linked tables.
The problem is I don't know where the bottleneck now is and how to improve performance. Due to certain constraints I was forced to use uniqueidentifiers for a number of primary keys in the tables involved - could this be an issue?
Basically, I need to put time into increasing performance but don't have enough to do that with both the database or the linq to sql. Is there anyway I can see where the bottlenecks are?
As im running express I don't have access to the profiler. I am considering rewriting my queries into compiled linq to sql but fear the database may be the culprit.
I understand this question is a bit open ended and its hard to answer without knowing much more about my setup (database schema etc) but any advice on how to find out where the bottlenecks are is more appreciated!
Thanks
UPDATE:
Thanks for all the great advice guys, and some links to some great tools.
UPDATE for those interested
I have been unable to make my queries any quicker through tweaking the linq. the problem seems to be that the majority of my database access code takes place in a loop. I can't see a way around it. Basically I am building up a report by looking through a number of users data - hence the loop. Pulling all the records up front seems a bit crazy - 800,000 + rows. My gut feeling is that there is a much better way, but its a technological leap too far for me!
However, adding another index to one of the foreign keys in one of the tables boosted performance so now the report takes 20 seconds to generate as opposed to 3 minutes!
I used this excelent tool: Linq2Sql profiler. It works on the application side, so there is no need for database server profiling functionality.
You have to add one line of initialization code to your application and then in separate desktop application profiler shows you SQL query for each LINQ query with exact line of code where it was executed (cs or aspx), database time and application time of executions and it even detects some common performance problems like n+1 queries (query executed for iteration) or unbounded datasets. You have to pay for it, but the trial version is also available.
As you're using SQL Express which doesn't have Profiler, there is a free third party profiler you can download here. I've used it when running SQL Express. That will allow you to trace what's going on in the database.
Also, you can query the Dynamic Management Views to see what the costly queries are:
e.g. TOP 10 queries that have taken the most time
SELECT TOP 10 t.text, q.*, p.query_plan
FROM sys.dm_exec_query_stats q
CROSS APPLY sys.dm_exec_sql_text(q.sql_handle) t
CROSS APPLY sys.dm_exec_query_plan (q.plan_handle) AS p
ORDER BY q.total_worker_time DESC
There are 2 tools I use for this, LinqPad and the Visual Studio Debugger. First, check out LinqPad, even the free version is very powerful, showing you execution time, the SQL generated and you can use it to run any code snippet...it's tremendously useful.
Second, you can use the Visual studio debugger, this is something we use on our DataContext (note: only use this in debug, it's a performance hit and completely unnecessary outside of debugging)
#if DEBUG
private readonly Stopwatch Watch = new Stopwatch();
private static void Connection_StateChange(object sender, StateChangeEventArgs e)
{
if (e.OriginalState == ConnectionState.Closed && e.CurrentState == ConnectionState.Open)
{
Current.Watch.Start();
}
else if (e.OriginalState == ConnectionState.Open && e.CurrentState == ConnectionState.Closed)
{
Current.Watch.Stop();
string msg = string.Format("SQL took {0}ms", Current.Watch.ElapsedMilliseconds);
Trace.WriteLine(msg);
}
}
#endif
private static DataContext New
{
get
{
var dc = new DataContext(ConnectionString);
#if DEBUG
if (Debugger.IsAttached)
{
dc.Connection.StateChange += Connection_StateChange;
dc.Log = new DebugWriter();
}
#endif
return dc;
}
}
In a debug build, as an operation completes with each context, we see the timestamp in the debug window and the SQL it ran. The DebugWriter class you see can be found here (Credit: Kris Vandermotten). We can quickly see if a query's taking a while. To use it we just initiate a DataContext by:
var DB = DataContext.New;
(The profiler is not an option for me since we don't use SQL server, this answer is simply to give you some alternatives that have been very useful for me)
I have been passed a piece of work that I can either do in my application or perhaps in SQL:
I have to get a date out of a string that may look like this:
1234567-DSP-01/01-VER-01/01
or like this:
1234567-VER-01/01-DSP-01/01
but may look like this:
00 12345 DISCH 01/01-VER-01/01 XXX X XXXXX
Yay. if it is a "DSP" then I want that date, if a "DISCH" then that date.
I am pulling the data out in a SQL Server view and would be happy to have the view transform the data for me. My application could do it but would add processor time. I could also see if the data could be manipulated before it is entered into the DB, I suppose.
Thank you for your time.
An option would be to check for the presence of DSP or DISCH then substring out the date as necessary.
For example (I don't have sqlserver today so I can verify syntax, sorry)
select
date = case date_attribute
when charindex('DSP',date_attribute) > 0 then substring(date_attribute,beg,end)
when charindex('DISCH',date_attribute) > 0 then substring(date_attribute,beg,end)
else 'unknown'
end
from myTable
don't store multiple items in the same column!
store the date in its own column when inserting the row!
add a new nullable column for the date
write an update that pulls the date out and sets the new column
alter the column to be not nullable
fix your save routine to pull the date out and insert it in for you
If you do it in the view your adding processing time on SQL which in general a more expensive resource then an app, web or some other type of client.
I'd recommend you try and format the data out when you insert the data, or you handle in the application tier. Scaling horizontally an app tier is so much easier then scalling your SQL.
Edit
I am talking the database server's physical resources are usually more expensive then a properly designed applications server's physical resources. This is because it is very easy to scale an application horizontally, it is in my opinion an order of magnitude more expensive to scale a DB server horizontally. Especially if your dealing with a transactional database and need to manage merging
I am not saying it is not possible just that scaling a database server horizontally is a much more difficult task, hence it's more expensive. The only reason I pointed this out is the OP raised a concern about using CPU cycles on the app server vs the database server. Most applications I have worked with have been data centric applications which processed through GB's of data to get a user an answer. We initially put everything on the database server because it was easier then doing it in classic asp and vb6 at the time. Over time the DB server was more and more loaded until scaling veritcally was no longer an option.
Database Servers are also designed at retrieving and joining data together. You should leave the formating of the data to the application and business rules (in general of course)