Query Performance: single column vs multiple column - sql

Which table structure is better among below 2?
OR
in first query i use LIKE operator in query.
In second i use AND operator.
Does first table design has any advantages over second on selecting data?
On what situation i need to decide between first table structure and second one?

The first one would be better if you never needed to work with the Type or the Currency attributes in any way and you used allways only the whole text stored as MEASUREMENT_NAME.
If you plan to work with the values of the Type or the Currency attributes separately, like using their values in where conditions etc., the second option will allways be a better choice.
You can also eventually create a combined structure containing both the whole text MEASUREMENT_NAME and separated values Type & Currency for filtering purposes. This would take more space on disk and would not be optimized, but the whole text or MEASUREMENT_NAME can in the future also eventually contain atributes that are now unknown to you. That could be the reason for storing MEASUREMENT_NAME in the raw format.
In case that the attribute MEASUREMENT_NAME is not something you get from external sources, but it is a data structure made by yourself and you seek a way how to store records with flexible (changing) structure, you better store it as JSON or XML data, Oracle has built in functions for JSON data.
I also recommend to use linked tables for Type or Currency values, so that the main table contains only ID link as a foreign key.

Second table obviously have advantages over first. If you have to query type or currency from first table, you might have to use right, left or any other functions.
Also if you align keys/constraints for second table, follows 2nd normal form.

Related

How to tag and store log descriptions in a SQL database

I have logs being captured that contains both a log message and a log "tag".
These "tags" describe the logging events as key-value pairs, for example:
CPU_USAGE: 52.3
USER_LOGGED_IN: "steve15"
NUMBER_OF_RETURNED_RESULTS: 125
I want to store these key-value pairs in a table. Each parameter can be either a string, float, or integer so I can't put all keys in one column and all values in a 2nd column. What is a good SQL table design to store this kind of data? How is the "tagging" of logs typically done? My ultimate goal is to be able to funnel this information into a dashboard so I can monitor resource usage and bottlenecks on charts.
A constraint is that I would like to be able to add new keys without needing to modify my database schema, so having each parameter as a separate column isn't good.
Various solutions I have thought of are:
Storing each value as a string and adding another column in my "tag" table that dictates the actual type
Use a JSON column
Just altering my table to add a new column every time I think of a new tag to log :(
In SQL, key-value pairs are often stored just using strings. You can use a view or application code to convert back to a more appropriate data type.
I notice that you have left out date/times -- those are a little trickier, because you really want canonical formats such as YYYYMMDD. Or perhaps Unix epoch times (number of seconds since 1970-01-01).
You can extend this by having separate columns for each type you want to store and having a separate type columns.
However, a key-value pair may not be the best approach for this type of data. A common solution is to do the following:
Determine if any columns are "common" enough that you really care about them. This might commonly be a date/time or user or multiple such columns.
Store these columns in separate columns, parsing them when you insert the data (or perhaps using triggers).
Store the rest (or all) as JSON within the row as "extra details".
Or, in Postgres, you can just store the whole thing as JSONB and have indexes on the JSONB column. That gives you both performance and simplicity on inserting the data.

multiple different record types within same file must be converted to sql table

One for the SQL data definition gurus:
I have a mainframe file that has about 35-100 different record types within it. Depending upon the type of record the layout and each column is redefined into whatever. Any column on any different record could become a different length or type. I am not really wanting to split this thing up into 35-100 different tables and relating them together. I did find out that postgres has %ROWTYPE with cursor or table based records. However in all examples the data looked the same. How can I setup a table that would handle this and what sql queries would be needed to return the data? Doesn't have to be postgres but that was the only thing I could find, that looked similar to my problem.
I would just make a table with all TEXT datatype fields at first. TEXT is variable, so it only takes up the space it needs, so it performs very well. From there, you may find it quicker to move the data into better formed tables, if certain data is better with a more specific data type.
It's easier to do it in this order, because bulk insert with COPY is very picky... so with TEXT you just worry about the number of columns and get it in there.
EDIT: I'm referring to Postgres with this answer. Not sure if you wanted another DB specific answer.

SQL lookup in SELECT statement

I've got and sql express database I need to extract some data from. I have three fields. ID,NAME,DATE. In the DATA column there is values like "654;654;526". Yes, semicolons includes. Now those number relate to another table(two - field ID and NAME). The numbers in the DATA column relate to the ID field in the 2nd table. How can I via sql do a replace or lookup so instead of getting the number 654;653;526 I get the NAME field instead.....
See the photo. Might explain this better
http://i.stack.imgur.com/g1OCj.jpg
Redesign the database unless this is a third party database you are supporting. This will never be a good design and should never have been built this way. This is one of those times you bite the bullet and fix it before things get worse which they will. Yeu need a related table to store the values in. One of the very first rules of database design is never store more than one piece of information in a field.
And hopefully those aren't your real field names, they are atriocious too. You need more descriptive field names.
Since it a third party database, you need to look up the split function or create your own. You will want to transform the data to a relational form in a temp table or table varaiable to use in the join later.
The following may help: How to use GROUP BY to concatenate strings in SQL Server?
This can be done, but it won't be nice. You should create a scalar valued function, that takes in the string with id's and returns a string with names.
This denormalized structure is similar to the way values were stored in the quasi-object-relational database known as PICK. Cool database, in many respects ahead of its time, though in other respects, a dinosaur.
If you want to return the multiple names as a delimited string, it's easy to do with a scalar function. If you want to return the multiple rows as a table, your engine has to support functions that return a type of TABLE.

When are computed columns appropriate?

I'm considering designing a table with a computed column in Microsoft SQL Server 2008. It would be a simple calculation like (ISNULL(colA,(0)) + ISNULL(colB,(0))) - like a total. Our application uses Entity Framework 4.
I'm not completely familiar with computed columns so I'm curious what others have to say about when they are appropriate to be used as opposed to other mechanisms which achieve the same result, such as views, or a computed Entity column.
Are there any reasons why I wouldn't want to use a computed column in a table?
If I do use a computed column, should it be persisted or not? I've read about different performance results using persisted, not persisted, with indexed and non indexed computed columns here. Given that my computation seems simple, I'm inclined to say that it shouldn't be persisted.
In my experience, they're most useful/appropriate when they can be used in other places like an index or a check constraint, which sometimes requires that the column be persisted (physically stored in the table). For further details, see Computed Columns and Creating Indexes on Computed Columns.
If your computed column is not persisted, it will be calculated every time you access it in e.g. a SELECT. If the data it's based on changes frequently, that might be okay.
If the data doesn't change frequently, e.g. if you have a computed column to turn your numeric OrderID INT into a human-readable ORD-0001234 or something like that, then definitely make your computed column persisted - in that case, the value will be computed and physically stored on disk, and any subsequent access to it is like reading any other column on your table - no re-computation over and over again.
We've also come to use (and highly appreciate!) computed columns to extract certain pieces of information from XML columns and surfacing them on the table as separate (persisted) columns. That makes querying against those items just much more efficient than constantly having to poke into the XML with XQuery to retrieve the information. For this use case, I think persisted computed columns are a great way to speed up your queries!
Let's say you have a computed column called ProspectRanking that is the result of the evaluation of the values in several columns: ReadingLevel, AnnualIncome, Gender, OwnsBoat, HasPurchasedPremiumGasolineRecently.
Let's also say that many decentralized departments in your large mega-corporation use this data, and they all have their own programmers on staff, but you want the ProspectRanking algorithms to be managed centrally by IT at corporate headquarters, who maintain close communication with the VP of Marketing. Let's also say that the algorithm is frequently tweaked to reflect some changing conditions, like the interest rate or the rate of inflation.
You'd want the computation to be part of the back-end database engine and not in the client consumers of the data, if managing the front-end clients would be like herding cats.
If you can avoid herding cats, do so.
Make Sure You Are Querying Only Columns You Need
I have found using computed columns to be very useful, even if not persisted, especially in an MVVM model where you are only getting the columns you need for that specific view. So long as you are not putting logic that is less performant in the computed-column-code you should be fine. The bottom line is for those computed (not persisted columns) are going to have to be looked for anyways if you are using that data.
When it Comes to Performance
For performance you narrow your query to the rows and the computed columns. If you were putting an index on the computed column (if that is allowed Checked and it is not allowed) I would be cautious because the execution engine might decide to use that index and hurt performance by computing those columns. Most of the time you are just getting a name or description from a join table so I think this is fine.
Don't Brute Force It
The only time it wouldn't make sense to use a lot of computed columns is if you are using a single view-model class that captures all the data in all columns including those computed. In this case, your performance is going to degrade based on the number of computed columns and number of rows in your database that you are selecting from.
Computed Columns for ORM Works Great.
An object relational mapper such as EntityFramework allow you to query a subset of the columns in your query. This works especially well using LINQ to EntityFramework. By using the computed columns you don't have to clutter your ORM class with mapped views for each of the model types.
var data = from e in db.Employees
select new NarrowEmployeeView { Id, Name };
Only the Id and Name are queried.
var data = from e in db.Employees
select new WiderEmployeeView { Id, Name, DepartmentName };
Assuming the DepartmentName is a computed column you then get your computed executed for the latter query.
Peformance Profiler
If you use a peformance profiler and filter against sql queries you can see that in fact the computed columns are ignored when not in the select statement.
Computed columns can be appropriate if you plan to query by that information.
For instance, if you have a dataset that you are going to present in the UI. Having a computed column will allow you to page the view while still allowing sorting and filtering on the computed column. if that computed column is in code only, then it will be much more difficult to reasonably sort or filter the dataset for display based on that value.
Computed column is a business rule and it's more appropriate to implement it on the client and not in the storage. Database is for storing/retrieving data, not for business rule processing. The fact that it can do something doesn't mean you should do it that way. You too you are free to jump from tour Eiffel but it will be a bad decision :)

do i need a separate table for nvarchar(max) descriptions

In one of my very previous company we used to have a separate table that we stored long descriptions on a text type column. I think this was done because of the limitations that come with text type.
Im now designing the tables for the existing application that I am working on and this question comes to my mind. I am resonating towards storing the long description of my items on the same item table on a varchar(max) column. I understand that I cannot index this column but that is OK as I will not be doing searches on these columns.
So far I cannot see any reason to separate this column to another table.
Can you please give me input if I am missing on something or if storing my descriptions on the same table on varchar(max) is good approach? Thanks!
Keep the fields in the table where they belong. Since SQL Server 2005 the engine got a lot smarter in regard to large data types and even variable length short data types. The old TEXT, NTEXT and IMAGE types are deprecated. The new types with MAX length are their replacement. With SQL 2005 each partition has 3 types of underlying allocation units: one for rows, one for LOBs and one for row-overflow. The MAX types are stored in the LOB allocation unit, so in effect the engine is managing for you a separate table to store large objects. The row overflow unit is for in-row variable length data that after an update would no longer fit in the page, so it is 'overflown' into a separate unit.
See Table and Index Organization.
It depends on how often you use them, but yes, you may want the on a separate table. Before you make the decision, you'll want to read up on SQL file paging, page splits, and the details of "how" sql stores the data.
The short answer is that varcharmax() can definitely cause a decrease in performance where those field lengths change a lot due to an increase in page splits which are expensive operations.