Database: Does the appendix 'HX' have significant value? - sql

I'm working with a legacy application with surprise surprise, next to no useful documentation on naming convensions or over all data structure.
There are similarly named tables in the database with one of them ending in HX. Does this hold any significance in anyones database experience?
The data seems to be replicated so I would assume that it is a Historical table of sorts, but I just want to be sure before I avoid populating it.
Thanks in Advance.
Cory

I've seen this before but I don't think it's usage was part of any standard. When I saw it was was used to prefix tables (hx_ReportingUsage) that stored information on historical index usage. Something similar to what this article talks about.
Again, I think this was just an internal naming convention. Your best bet would probably be to search all stored procs, UDFs and code you have access to for the table name and see if you can piece together how it's being used.
If you are using SQL Server you can use this query to look for text in stored procs:
SELECT OBJECT_NAME(id)
FROM syscomments
WHERE [text] LIKE '%foobar%'
AND OBJECTPROPERTY(id, 'IsProcedure') = 1
GROUP BY OBJECT_NAME(id)

In the few places I've come across this it was related to history tables. It has depended on the system. I've seen this mostly in the Oil & Gas world, not so much outside that industry.
I shouldn't just assume that's the case though. If it's possible, I'd do a dependency search to find out if there are any scripts dependent upon the table - if they're history tables, you'll likely find a copy procedure or trigger somewhere to keep them populated.
Check out sp_depends to see if that can shed any light on the subject.
Exec sp_depends 'table_name_hx'
You should get a list of everything that's got references to it and what type of object is referencing.

It might have come from the medical field. They often have abbreviations like Rx for prescription, Sx for symptoms, etc. Hx is medical history: http://en.wikipedia.org/wiki/Medical_history. This could have found its way into other fields and even databases.

Related

What is the difference between a hard coded table and a look up table in SQL?

Someone told me to avoid hard coding values in tables and to rather use look up tables. I'm not sure what the difference is. Can someone please explain?. Thanks in advance.
Hi a lookup table is a normal table - in general you put information in there that is used repeatedly. You may or may not use ids for the values (often you do - integers are ideal for fast lookups, but short character codes can be more user-friendly).
Examples are:
status_codes
id
status
1
pending
2
approved
3
denied
us_states
abbreviation
name
AL
Alabama
AK
Alaska
AR
Arkansas
The advantage of a lookup table are two-fold:
It easy to have two different values "hard-coded" two different way - a situation to be avoided at all costs!
it lets you "maintain" the table in a consistent way through regular updates and even through administration pages in you application. If you hard code data into your queries and stored procedures - well, if anything changes you have to find all the places where values are hard-coded and change them in place.
In your question, the particular phrase hard coded table is somewhat ambiguous - perhaps it means creating a temp table in a query or stored procedure. The reasoning about it, with the pros and cons, is pretty much the same though - ease of use, consistency, and maintainability.
As with many things, it's probably not something that's always true all of the time. If you had a small "lookup table" that you really only have in one query and is highly unlikely to ever need to be updated or used in other queries - its not the end of the world to have it hard-coded in. But if the coding standards where you are dictate otherwise, its best to just go with the flow.

Dynamic Database/Key - Value/Entity - Key Value Dillemma

I have been programming relational database for many years, but now have come across an unusual and tricky problem:
I am building an application that needs to have very quick and easily defined entities (by the user). Instances of these entities could then be created, updated, deleted etc.
There are two options I can think of.
Option 1 - Dynamically created tables
The first option is to write an engine to dynamically generate the tables, and insert the data into these. However, this would become very tricky, as every query would also need to be dynamic, or at least dynamically created stored procedures etc.
Option 2 - Entity - Key - Value Pattern
This is the only realistic option I can think of, where I have 5 table structure:
EntityTypes
EntityTypeID int
EntityTypeName nvarchar(50)
Entities
EntityID int
EntityTypeID int
FieldTypes
FieldTypeID int
FieldTypeName nvarchar(50)
SQLtype int
FieldValues
EntityID int
FIeldID int
Value nvarchar(MAX)
Fields
FieldID int
FieldName nvarchar(50)
FieldTypeID int
The "FieldValues" table would work a little like a datawarehouse fact table, and all my inserts/updates would work by filling a "Key/Value" table valued parameter and passing this to a SPROC (to avoid multiple inserts/updates).
All the tables would be heavily indexed, and I would end up doing many self joins to obtain the data.
I have read a lot about how bad Key/Value databases are, but for this problem it still seems to be the best.
Now my questions!
Can anyone suggest another approach or pattern other than these two options?
Would option two be feasible for medium sized datasets (1 million rows max)?
Are there further optimizations for option 2 I could use?
Any direction and advice much appreciated!
Personally I would just use a "noSQL" (key/value) database like MongoDB.
But if you need to use a relational database option 2 is the way to go. A good example of that kind of model is the Alfresco Data Dictionary (Alfresco is an enterprise content management system). It's design is similar to what you describe, although they have multiple columns for field values (for every simple type available in the database). If you add a good cache system to that (for example Ehcache) it should work fine.
As others have suggested NoSQL, I'm going to say that, in my opinion, schemaless databases really is best suited for use-cases with no schema.
From the description, and the schema you came up with, it looks like your case is not in fact "no schema", but rather it seems to be "user-defined schema".
In fact, the schema you came up with looks very similar to the internal meta-schema of a relational database. (You're sort of building a relational database on top of a relational database, which in my experience is not a good idea, as this "meta-database" will have at least twice the overhead and complexity for any basic operation - tables will get very large, which doesn't scale well, and the data will be difficult to query and update, problems will be difficult to debug, and so on.)
For use-cases like that, you probably want DDL: Data Definition Language.
You didn't say which SQL database you're using, but most SQL databases (such as MySQL, PostgreSQL and MS-SQL) support some dialect of DDL extensions to SQL syntax, which let you manipulate the actual schema.
I've done this successfully for use-cases like yours in the past. It works well for cases where the schema rarely changes, and the data volumes are relatively low for each user. (For high volumes or frequent schema updates, you might want schemaless or some other type of NoSQL database.)
You might need some tables on the side for additional field information that doesn't fit in SQL schema - you may want to duplicate some schema information there as well, as this can be difficult or inefficient to read back from actual schema.
Ensuring atomic updates to your field information tables and the schema probably requires transactions, which may not be supported by your database engine - PostgreSQL at least does support transactional schema updates.
You have to be vigilant when it comes to security - you don't want to open yourself up to users creating, storing or deleting things they're not supposed to.
If it suits your use-case, consider using not only separate tables, but separate databases, which can also by created and destroyed on demand using DDL. This could be applicable if each customer has ownership of data collections that can't, shouldn't, or don't need to be queried across customers. (Arguably, these are rare - typically, you want at least analytics or something across customers, but there are cases where each customer "owns" an isolated, hosted wiki, shop or CMS/DMS of some sort.)
(I saw in your comment that you already decided on NoSQL, so just posting this option here for completeness.)
It sounds like this might be a solution in search of a problem. Is there any chance your domain can be refactored? If not - theres still hope.
Your scalability for option 2 will depend a lot on the width of the custom objects. How many fields can be created dynamically? 1 million entities when each entity has 100 fields could be a drag... Efficient indexing could make performance bearable.
For another option - you could have one data table that has a few string fields, a few double fields, and a few integer fields. For example, a table with String1, String2, String3, Int1, Int2, Int3. A second table with have rows that define a user object and map your "CustomObjectName" => String1, and such. A stored procedure reading INFORMATION_SCHEMA and some dynamic sql would be able to read the schema table and return a strongly typed recordset...
Yet another option (for recent versions of SQL Server) would be to store a row with an id, a type name, and an XML field that contains a XML document that contains the object data. In MS Sql Server this can be queried against directly, and maybe even validated against a schema.
PErsonally I would take the time to define as many attritbutes as you can ratheer than use EAV for everything. Surely you know some of the attributes. Then you only need EAv for the things that are truly client specific.
But if all must be EAV, then a nosql databse is the way to go. Or you can use a relationsla datbase for some stuff and a nosql database for the rest.

Most efficient method for persisting complex types with variable schemas in SQL

What I'm doing
I am creating an SQL table that will provide the back-end storage mechanism for complex-typed objects. I am trying to determine how to accomplish this with the best performance. I need to be able to query on each individual simple type value of the complex type (e.g. the String value of a City in an Address complex type).
I was originally thinking that I could store the complex type values in one record as an XML, but now I am concerned about the search performance of this design. I need to be able to create variable schemas on the fly without changing anything about the database access layer.
Where I'm at now
Right now I am thinking to create the following tables.
TABLE: Schemas
COLUMN NAME DATA TYPE
SchemaId uniqueidentifier
Xsd xml //contains the schema for the document of the given complex type
DeserializeType varchar(200) //The Full Type name of the C# class to which the document deserializes.
TABLE: Documents
COLUMN NAME DATA TYPE
DocumentId uniqueidentifier
SchemaId uniqueidentifier
TABLE: Values //The DocumentId+ValueXPath function as a PK
COLUMN NAME DATA TYPE
DocumentId uniqueidentifier
ValueXPath varchar(250)
Value text
from these tables, when performing queries I would do a series of self-joins on the value table. When I want to get the entire object by the DocumentId, I would have a generic script for creating a view mimics a denormalized datatable of the complex-type.
What I want to know
I believe there are better ways to accomplish what I am trying to, but I am a little too ignorant about the relative performance benefits of different SQL techniques. Specifically I don't know the performance cost of:
1 - comparing the value of a text field versus of a varchar field.
2 - different kind of joins versus nested queries
3 - getting a view versus an xml document from the sql db
4 - doing some other things that I don't even know I don't know would be affecting my query but, I am experienced enough to know exist
I would appreciate any information or resources about these performance issues in sql as well as a recommendation for how to approach this general issue in a more efficient way.
For Example,
Here's an example of what I am currently planning on doing.
I have a C# class Address which looks like
public class Address{
string Line1 {get;set;}
string Line2 {get;set;}
string City {get;set;}
string State {get;set;}
string Zip {get;set;
}
An instance is constructed from new Address{Line1="17 Mulberry Street", Line2="Apt C", City="New York", State="NY", Zip="10001"}
its XML value would be look like.
<Address>
<Line1>17 Mulberry Street</Line1>
<Line2>Apt C</Line2>
<City>New York</City>
<State>NY</State>
<Zip>10001</Zip>
</Address>
Using the db-schema from above I would have a single record in the Schemas table with an XSD definition of the address xml schema. This instance would have a uniqueidentifier (PK of the Documents table) which is assigned to the SchemaId of the Address record in the Schemas table. There would then be five records in the Values table to represent this Address.
They would look like:
DocumentId ValueXPath Value
82415E8A-8D95-4bb3-9E5C-AA4365850C70 /Address/Line1 17 Mulberry Street
82415E8A-8D95-4bb3-9E5C-AA4365850C70 /Address/Line2 Apt C
82415E8A-8D95-4bb3-9E5C-AA4365850C70 /Address/City New York
82415E8A-8D95-4bb3-9E5C-AA4365850C70 /Address/State NY
82415E8A-8D95-4bb3-9E5C-AA4365850C70 /Address/Zip 10001
Just Added a Bounty...
My objective is to obtain the resources I need in order to give my application a data access layer that is fully searchable and has a data-schema generated from the application layer that does not require direct database configuration (i.e. creating a new SQL table) in order to add a new aggregate root to the domain model.
I am open to the possibility of using .NET compatible technologies other than SQL, but I will require that any such suggestions be adequately substantiated in order to be considered.
How about looking for a solution at the architectural level? I was also breaking my head on complex graphs and performance until I discovered CQRS.
[start evangelist mode]
You can go document-based or relational as storage. Even both! (Event Sourcing)
Nice separation of concerns: Read Model vs Write Model
Have your cake and eat it too!
Ok, there is an initial learning / technical curve to get over ;)
[end evangelist mode]
As you stated: "I need to be able to create variable schemas on the fly without changing anything about the database access layer." The key benefit is that your read model can be very fast since it's made for reading. If you add Event Sourcing to the mix, you can drop and rebuild your Read Model to whatever schema you want... even "online".
There are some nice opensource frameworks out there like nServiceBus which saves lots of time and technical challenges. All depends on how far you want to take these concepts what you're willing/can spend time on. You can even start with just basics if you follow Greg Young's approach. See the info in the links below.
See
CQRS Examples and Screencasts
CQRS Questions
Intro (Also see the video)
Somehow what you want sounds like a painful thing to do in SQL. Basically, you should treat the inside of a text field as opaque as when querying an SQL database. Text fields were not made for efficient queries.
If you just want to store serialized objects in a text field, that is fine. But do not try to build queries that look inside the text field to find objects.
Your idea sounds like you want to perform some joins, XML parsing, and XPath application to get to a value. This doesn't strike me as the most efficient thing to do.
So, my advise:
Either just store serialized objects in the db, and do nothing more than load them and perform all other operations in memory
Or, if you need to query complex data structures, you may really want to look into document stores/databases like CouchDB or MongoDB; you can also check Wikipedia on the subject. There are even databases specifically designed for storing XML, even though I personally don't like them very much.
Addendum, per your explanations above
Simply put, don't go over the top with this thing:
If you just want to persist C#/.NET objects, just use the XML Serialization already built into the framework, a single table and be done with it.
If you, for some reason, need to store complex XML, use a dedicated XML store
If you have a fixed database schema, but it is too complex for efficient queries, use a Document Store in memory where you keep a denormalized version of your data for faster queries (or just simplify your database schema)
If you don't really need a fixed schema, use just a Document Store, and forget about having any "schema definition" at all
As for your solution, yes, it could work somehow. As could a plain SQL schema if you set it up right. But for applying an XPath, you'll probably parse the whole XML document each time you access a record, which wouldn't be very efficient to begin with.
If you want to check out Document databases, there are .NET drivers for CouchDB and MongoDB. The eXist XML database offers a number of Web protocols, and you can probably create a client class easily with VisualStudio's point-and-shoot interface. Or just google for someone who already did.
I need to be able to create variable
schemas on the fly without changing
anything about the database access
layer.
You are re-implementing the RDBMS within an RDBMS. The DB can do this already - that is what the DDL statements like create table and create schema are for....
I suggest you look into "schemas" and SQL security. There is no reason with the correct security setup you cannot allow your users to create their own tables to store document attributes in, or even generate them automatically.
Edit:
Slightly longer answer, if you don't have full requirements immediately, I would store the data as XML data type, and query them using XPath queries. This will be OK for occasional queries over smallish numbers of rows (fewer than a few thousand, certainly).
Also, your RDBMS may support indexes over XML, which may be another way of solving your problem. CREATE XML INDEX in SqlServer 2008 for example.
However for frequent queries, you can use triggers or materialized views to create copies of relevant data in table format, so more intensive reports can be speeded up by querying the breakout tables.
I don't know your requirements, but if you are responsible for creating the reports/queries yourself, this may be an approach to use. If you need to enable users to create their own reports that's a bigger mountain to climb.
I guess what i am saying is "are you sure you need to do this and XML can't just do the job".
In part, it will depend of your DB Engine. You're using SQL Server, don't you?
Answering your topics:
1 - Comparing the value of a text field versus of a varchar field: if you're comparing two db fields, varchar fields are smarter. Nvarchar(max) stores data in unicode with 2*l+2 bytes, where "l" is the lengh. For performance issues, you will need consider how much larger tables will be, for selecting the best way to index (or not) your table fields. See the topic.
2 - Sometimes nested queries are easily created and executed, also serving as a way to reduce query time. But, depending of the complexity, would be better to use different kind of joins. The best way is try to do in both ways. Execute two or more times each query, for the DB engine "compiles" a query on first executing, then the subsequent are quite faster. Measure the times for different parameters and choose the best option.
"Sometimes you can rewrite a subquery to use JOIN and achieve better performance. The advantage of creating a JOIN is that you can evaluate tables in a different order from that defined by the query. The advantage of using a subquery is that it is frequently not necessary to scan all rows from the subquery to evaluate the subquery expression. For example, an EXISTS subquery can return TRUE upon seeing the first qualifying row." - link
3- There's no much information in this question, but if you will get the xml document directly from the table, would be a good idea insted a view. Again, it will depends of the view and the document.
4- Other issues is about the total records expected for your table; the indexing of the columns, in wich you need to consider sorting, joining, filtering, PK's and FK's. Each situation could demmand different aproaches. My sugestion is to invest some time reading about your database engine and queries functioning and relating to your system.
I hope I've helped.
Interesting question.
I think you may be asking the wrong question here. Broadly speaking, as long as you have a FULLTEXT index on your text field, queries will be fast. Much faster than varchar if you have to use wild cards, for instance.
However, if I were you, I'd concentrate on the actual queries you're going to be running. Do you need boolean operators? Wildcards? Numerical comparisons? That's where I think you will encounter the real performance worries.
I would imagine you would need queries like:
"find all addresses in the states of New York, New Jersey and Pennsylvania"
"find all addresses between house numbers 1 and 100 on Mulberry Street"
"find all addresses where the zipcode is missing, and the city is New York"
At a high level, the solution you propose is to store your XML somewhere, and then de-normalize that XML into name/value pairs for querying.
Name/value pairs have a long and proud history, but become unwieldy in complex query situations, because you're not using the built-in optimizations and concepts of the relational database model.
Some refinements I'd recommend is to look at the domain model, and at least see if you can factor out separate data types into the "value" column; you might end up with "textValue", "moneyValue", "integerValue" and "dateValue". In the example you give, you might factor "address 1" into "housenumber" (as an integer) and "streetname".
Having said all this - I don't think there's a better solution other than completely changing tack to a document-focused database.

Have any idea about to add column by stored procedure?

I want to add column to table by stored procedure and the name of column should be parameter.'s value.
You'll have to compose a dynamic DDL SQL statement where you concatenate the name of the column received as argument:
CREATE PROCEDURE AddColumnToTable
#columnName VARCHAR(128)
AS
EXEC ('ALTER TABLE tableName ADD' + SPACE(1) + #columnName + SPACE(1) + 'VARCHAR(MAX) NULL')
Note that in this example the name of the table as well as the column's type are hard-coded in the SQL statement. You may want to consider adding them as parameters to the stored procedure for a more generic solution.
Related resources:
Execute (Transact-SQL)
The Curse and Blessings of Dynamic SQL
This is a terrible idea in general. If your schema is requiring changes to tables that need to be done by sp, then they are happening too frequently and the design should be reviewed.
But if you are stuck with this horrible process (and I can't emphasize enough what a truly bad idea it is.) then you also need to have an input pvalue for the data type for the data as well and a nullable one for the size of the data type if need be, varchar(max) is a poor choice for every possible column to be added. It should only be used when you expect to have more than 8000 characters in the column for indexing reasons.
Why is this bad? Well to begin with, you have lost control over the schena. People can anything and there is no review to make sure you don't get twelve versions of the same things with slighly differnt names added. Next how do you intend to fix the application to use these fieldss without knowing what was added? Since you should never return more fields than you need, your production code should not be using select *. Therefore how do you know which fields to add if your users have added them. Users in general aren't knowledgeable enough to add fields. They don't understand database structure or design and don't understand how to normalize and performance tune. Letting people willy-nilly add fields to data tables is short-sighted and will lead to badly performing databases and awkward, hard-to-use interfaces that annoy the customers. If you have properly done your work in designing the database and application, there should be very little that a user needs to add. If you are basing your work on the user being able to have the flexibilty to add, you have a disaster of a project that not only will be hard to maintain, it will perform badly and in general your users will hate it. I've been forced to work work with some of these horrible commercial products (look at Clarity if you want an example of how flexibilty trumped design and made a product that is virtually unuseable).

Strategy for identifying unused tables in SQL Server 2000?

I'm working with a SQL Server 2000 database that likely has a few dozen tables that are no longer accessed. I'd like to clear out the data that we no longer need to be maintaining, but I'm not sure how to identify which tables to remove.
The database is shared by several different applications, so I can't be 100% confident that reviewing these will give me a complete list of the objects that are used.
What I'd like to do, if it's possible, is to get a list of tables that haven't been accessed at all for some period of time. No reads, no writes. How should I approach this?
MSSQL2000 won't give you that kind of information. But a way you can identify what tables ARE used (and then deduce which ones are not) is to use the SQL Profiler, to save all the queries that go to a certain database. Configure the profiler to record the results to a new table, and then check the queries saved there to find all the tables (and views, sps, etc) that are used by your applications.
Another way I think you might check if there's any "writes" is to add a new timestamp column to every table, and a trigger that updates that column every time there's an update or an insert. But keep in mind that if your apps do queries of the type
select * from ...
then they will receive a new column and that might cause you some problems.
Another suggestion for tracking tables that have been written to is to use Red Gate SQL Log Rescue (free). This tool dives into the log of the database and will show you all inserts, updates and deletes. The list is fully searchable, too.
It doesn't meet your criteria for researching reads into the database, but I think the SQL Profiler technique will get you a fair idea as far as that goes.
If you have lastupdate columns you can check for the writes, there is really no easy way to check for reads. You could run profiler, save the trace to a table and check in there
What I usually do is rename the table by prefixing it with an underscrore, when people start to scream I just rename it back
If by not used, you mean your application has no more references to the tables in question and you are using dynamic sql, you could do a search for the table names in your app, if they don't exist blow them away.
I've also outputted all sprocs, functions, etc. to a text file and done a search for the table names. If not found, or found in procedures that will need to be deleted too, blow them away.
It looks like using the Profiler is going to work. Once I've let it run for a while, I should have a good list of used tables. Anyone who doesn't use their tables every day can probably wait for them to be restored from backup. Thanks, folks.
Probably too late to help mogrify, but for anybody doing a search; I would search for all objects using this object in my code, then in SQL Server by running this :
select distinct '[' + object_name(id) + ']'
from syscomments
where text like '%MY_TABLE_NAME%'