I have been passed a piece of work that I can either do in my application or perhaps in SQL:
I have to get a date out of a string that may look like this:
1234567-DSP-01/01-VER-01/01
or like this:
1234567-VER-01/01-DSP-01/01
but may look like this:
00 12345 DISCH 01/01-VER-01/01 XXX X XXXXX
Yay. if it is a "DSP" then I want that date, if a "DISCH" then that date.
I am pulling the data out in a SQL Server view and would be happy to have the view transform the data for me. My application could do it but would add processor time. I could also see if the data could be manipulated before it is entered into the DB, I suppose.
Thank you for your time.
An option would be to check for the presence of DSP or DISCH then substring out the date as necessary.
For example (I don't have sqlserver today so I can verify syntax, sorry)
select
date = case date_attribute
when charindex('DSP',date_attribute) > 0 then substring(date_attribute,beg,end)
when charindex('DISCH',date_attribute) > 0 then substring(date_attribute,beg,end)
else 'unknown'
end
from myTable
don't store multiple items in the same column!
store the date in its own column when inserting the row!
add a new nullable column for the date
write an update that pulls the date out and sets the new column
alter the column to be not nullable
fix your save routine to pull the date out and insert it in for you
If you do it in the view your adding processing time on SQL which in general a more expensive resource then an app, web or some other type of client.
I'd recommend you try and format the data out when you insert the data, or you handle in the application tier. Scaling horizontally an app tier is so much easier then scalling your SQL.
Edit
I am talking the database server's physical resources are usually more expensive then a properly designed applications server's physical resources. This is because it is very easy to scale an application horizontally, it is in my opinion an order of magnitude more expensive to scale a DB server horizontally. Especially if your dealing with a transactional database and need to manage merging
I am not saying it is not possible just that scaling a database server horizontally is a much more difficult task, hence it's more expensive. The only reason I pointed this out is the OP raised a concern about using CPU cycles on the app server vs the database server. Most applications I have worked with have been data centric applications which processed through GB's of data to get a user an answer. We initially put everything on the database server because it was easier then doing it in classic asp and vb6 at the time. Over time the DB server was more and more loaded until scaling veritcally was no longer an option.
Database Servers are also designed at retrieving and joining data together. You should leave the formating of the data to the application and business rules (in general of course)
Related
I'm developing an app for which I need some simple information about a given Microsoft SQL Server:
The maximum size the server supports¹
The total free space still available for a database to grow²
I googled about this and even though there are many pages on the matter with many different answers, I couldn't find anything that suits me because either it was too fancy (e.g. writting complex tables to files) or didn't actually work. The context I'm in is a simple software to manage the database; it will get those info, store in integer variables and then use them to work. So my idea here is simply to do a query, select style, grab one data and move on. An example of the only "function" I managed to find that would fit these criteria:
SELECT size FROM sys.master_files WHERE DB_NAME(database_id) = \'DB_NAME_HERE\'²
(Note: this is just an example, the function doesn't actually return what I want)
¹: For example, the Express editions may have up til 10Gb of allowed storage capacity (it depends on the version). So the function here should return a value close to that.
²: In advance: to know my database's size doesn't matter since it's a server with many databases. My app's logic needs the free space available for any database to grow, so calculating "total space - my db space" won't work.
I have a simple, pasted below, statement called against an Oracle database. This result set contains names of businesses but it has 24,000 results and these are displayed in a drop down list.
I am looking for ideas on ways to reduce the result set to speed up the data returned to the user interface, maybe something like Google's search or a completely different idea. I am open to whatever thoughts and any direction is welcome.
SELECT BusinessName FROM MyTable ORDER BY BusinessName;
Idea:
SELECT BusinessName FROM MyTable WHERE BusinessName LIKE "A%;
I'm know all about how LIKE clauses are not wise to use but like I said this is a LARGE result set. Maybe something along the lines of a BINARY search?
The last query can perform horribly. String comparisons inside the database can be very slow, and depending on the number of "hits" it can be a huge drag on performance. If that doesn't concern you that's fine. This is especially true if the Company data isn't normalized into it's own db table.
As long as the user knows the company he's looking up, then I would identify an existing JavaScript component in some popular JavaScript library that provides a search text field with a dynamic dropdown that shows matching results would be an effective mechanism. But you might want to use '%A%', if they might look for part of a name. For example, If I'm looking for IBM Rational, LLC. do I want it to show up in results when I search for "Rational"?
Either way, watch your performance and if it makes sense cache that data in the company look up service that sits on the server in front of the DB. Also, make sure you don't respond to every keystroke, but have a timeout 500ms or so, to allow the user to type in multiple chars before going to the server and searching. Also, I would NOT recommend bringing all of the company names to the client. We're always looking to reduce the size and frequency of traversals to the server from the browser page. Waiting for 24k company names to come down to the client when the form loads (or even behind the scenes) when shorter quicker very specific queries will perform sufficiently well seems more efficient to me. Again, test it and identify the performance characteristics that fit your use case best.
These are techniques I've used on projects with large data, like searching for a user from a base of 100,000+ users. Our code was a custom Dojo widget (dijit), I 'm not seeing how to do it directly with the dijit code, but jQuery UI provides the autocomplete widget.
Also use limit on this query with a text field so that the drop down only provides a subset of all the matches, forcing the user to further refine the query.
SELECT BusinessName FROM MyTable ORDER BY BusinessName LIMIT 10
I have a project in mind that will require the majority of queries to be keyed off of lat/long as well as date + time.
Initially, I was thinking of a standard RDBMS where lat, long, and the datetime field are properly indexed. Then, I began thinking of a document based system where the document was essentially a timestamp and each document has lat/long with in it. Each document could have n objects associated with it.
I'm looking for advice on what would be the best type of storage engine for this sort of thing is - which of the above idea would be better or if there is something else completely that is the ideal solution.
Edit: Looking for an open source/free solution. Unfortunately price is an issue!
Thanks
I have used PostGres (free open source db) with PostGIS extensions for working with location data. They are extremely good, even though I was working in an MS environment with all production databases using MSSQL 2005, I used PostGres w/GIS to manipulate and precalculate a lot of geographic data.
PostGis has utilities for import arcview .shp files which is a huge plus since that's how most geographic data is present. It also provides a host of location based sql functions like contains( ... ) and near( ... ); and it provides a mechanism for indexing spatial data.
It's been awhile since I used it, but I remember it being rock solid and very useful.
PostGIS: http://postgis.refractions.net/
SQL Server 2008 has new data types for storing and processing geographic information, in addition to the usual date and time types.
See "Working with Spatial Data (Database Engine)".
SQL Express is a free database + has suppport to store lat/long & dateTime
just a tip - use UTC time for dateTime specially if your application needs to be aware of various locations...
hth.
I'm trying to understand how SQL Server 2005 replication works, and I'm looking at the views titled Msmerge_[Publication]_[Table]_VIEW. These views seems to define the merge filters, and are pretty straight forward, except for one line of sql in the WHERE clause:
AND ({fn ISPALUSER('1A381615-B57D-4915-BA4B-E16BF7A9AC58')} = 1)
What does the ISPALUSER function do? I cannot seem to find it anywhere under functions in management studio, nor really any mention of it on the web.
(The reason I'm looking at these views is that we have a performance issue when a client replicates up new records. Sql like if not exists (select 1 from [MSmerge_[Publication]_[Table]_VIEW] where [rowguid] = #rowguid) is running and taking 10+ seconds per row, which obiously kills performance when you have more than a couple of rows going up)
Seems its checking if the user is in the special security role MSmerge_PAL_role, which seems to govern who has access to the replication functionality.
Therefore, ISPALUSER checks if the user is in that specific role.
Still not sure what the PAL stands for.
I've read that (all things equal) PHP is typically faster than MySQL at arithmetic and string manipulation operations. This being the case, where does one draw the line between what one asks the database to do versus what is done by the web server(s)? We use stored procedures exclusively as our data-access layer. My unwritten rule has always been to leave output formatting (including string manipulation and arithmetic) to the web server. So our queries return:
unformatted dates
null values
no calculated values (i.e. return values for columns "foo" and "bar" and let the web server calculate foo*bar if it needs to display value foobar)
no substring-reduced fields (except when shortened field is so significantly shorter that we want to do it at database level to reduce result set size)
two separate columns to let front-end case the output as required
What I'm interested in is feedback about whether this is generally an appropriate approach or whether others know of compelling performance/maintainability considerations that justify pushing these activities to the database.
Note: I'm intentionally tagging this question to be dbms-agnostic, as I believe this is an architectural consideration that comes into play regardless of one's specific dbms.
I would draw the line on how certain layers could rotate out in place for other implementations. It's very likely that you will never use a different RDBMS or have a mobile version of your site, but you never know.
The more orthogonal a data point is, the closer it should be to being released from the database in that form. If on every theoretical version of your site your values A and B are rendered A * B, that should be returned by your database as A * B and never calculated client side.
Let's say you have something that's format heavy like a date. Sometimes you have short dates, long dates, English dates... One pure form should be returned from the database and then that should be formatted in PHP.
So the orthogonality point works in reverse as well. The more dynamic a data point is in its representation/display, the more it should be handled client side. If a string A is always taken as a substring of the first six characters, then have that be returned from the database as pre-substring'ed. If the length of the substring depends on some factor, like six for mobile and ten for your web app, then return the larger string from the database and format it at run time using PHP.
Usually, data formatting is better done on client side, especially culture-specific formatting.
Dynamic pivoting (i. e. variable columns) is also an example of what is better done on client side
When it comes to string manipulation and dynamic arrays, PHP is far more powerful than any RDBMS I'm aware of.
However, data formatting can use additional data which is also kept in the database. Like, the coloring info for each row can be stored in additional table.
You should then correspond the color to each row on database side, but wrap it into the tags on PHP side.
The rule of thumb is: retrieve everything you need for formatting in as few database round-trips as possible, then do the formatting itself on the client side.
I believe in returning the data pretty much as-is from the database and letting it be formatted on the front-end instead. I don't stick to it religously, but in general I think it's better as it provides greater flexibility - e.g. 1 sproc can service n different requirements for data, each of which can format the data as each individually needs. Otherwise, you end up either with multiple queries returning the same data with slightly different formatting from the DB (from a SQL Server point of view, thus reducing execution plan caching benefits - therefore negative impact on performance).
Leave output formatting to the web server