I have a series of T-SQL queries that I use that are running very slowly. One part of the query that I suspect is causing some problems is a series of Casts that I have to do on them.
This is the problem. I have to combine the 4 columns together as a nvarchar/varchar as the combination of them form a (semi)-unique key for an entry in another table (horrible idea I know, but I'm stuck with it).
The four columns are:
t_orno, t_reno, t_pono, t_srnb: all INT columns without indexes.
The way I have been doing this is like so:
Cast(t_orno AS nvarchar(10)) + '-' + Cast(t_reno as nvarchar(10)) +
'-' + Cast(t_pono as nvarchar(5)) + '-' + Cast(t_srnb as nvarchar(5))
Unfortunately I'm stuck with having to merge these columns together. Is there a better way of doing this? The queries need to be more efficient and there has got to be a better way than casting all four individually?
Assume: that the database is completely unchangeable -- which sadly it is... (don't want to get into that..)
Thanks for your help.
EDIT: As per a request for more info on the tables:
Both tables that are being queried from only contain one index, and it is on the PK column. Again note, that nothing can be added/changed on these tables.
The table being joined contains the combination of those four columns:
BaanID > nvarchar, no index.
Have you tried the reverse, i.e. splitting "an entry in another table" on the "-" character and casting each to int - may yield better performance?
I would try to use a persisted view and create an index on it. Here is an article that may help you: http://technet.microsoft.com/en-us/library/cc917715.aspx.
Or you could add a computed column to the table containing the t_* columns and index this column.
I believe this is the crucial point:
The table being joined contains the combination of those four columns: BaanID > nvarchar, no index
Unless you are dealing with relatively small tables, joining two tables together on columns that are not indexed is likely to be costly.
Related
In SQL Server, is it possible to generate a GUID using a specific piece of data as an input value. For example,
DECLARE #seed1 VARCHAR(10) = 'Test'
DECLARE #seed1 VARCHAR(10) = 'Testing'
SELECT NEWID(#seed1) -- will always return the same output value
SELECT NEWID(#seed2) -- will always return the same output value, and will be different to the example above
I know this completely goes against the point of GUIDs, in that the ID would not be unique. I'm looking for a way to detect duplicate records based on certain criteria (the #seed value).
I've tried generating a VARBINARY string using the HASHBYTES function, however joining between tables using VARBINARY seems extremely slow. I'm hoping to find a similar alternative that is more efficient.
Edit: for more information on why I'm looking to achieve this.
I'm looking for a fast and efficient way of detecting duplicate information that exists on two tables. For example, I have first name, last name & email. When these are concatenated, should can be used to check whether these records eexists in table A and table B.
Simply joining on these fields is possible and provides the correct result, however is quite slow. Therefore, I was hoping to find a way of transforming the data into something such as a GUID, which would make the joins much more efficient.
I think you can use CHECKSUM function for returning int type.
You should use hashbytes and not checksum like this:
SELECT hashbytes('MD5', 'JOHN' + ',' + 'SMITH' + ',' + 'JSMITH#EXAMPLE.COM')
Although it's only a small chance checksum can produce the same number with 2 completely different values, I've had it happen with datasets of around a million. As iamdave noted (thanks!), it's a good idea to throw in some kind delimiter (a comma in my example) so that you don't compare 'JOH' + 'NSMITH' and 'JOHN' + 'SMITH' as the same.
http://www.sqlservercentral.com/blogs/microsoft-business-intelligence-and-data-warehousing/2012/02/01/checksum-vs-hashbytes/
I am using query string to dynamically loop through table name. Now I need to add a wildcard to the table name so that it picks up new table I get. Example below
WHILE #Year_Id <= 2018
BEGIN
SET #YearVar = CONVERT(varchar(4), #Year_Id)
SET #TABLENAME = '[SWDI].[dbo].[hail-'+#YearVar+']'
SET #SQLQUERY = 'SELECT CELL_ID, LAT, LON, SEVPROB, PROB, MAXSIZE,
_ZTIME'+
' from '+#TABLENAME+
So my earlier tables were hail-2001, hail-2002, hail-2003 till 2017. Now I get tables with name hail-201801, hail-201802..
I want to incorporate the extra 01, 02 as wild card while calling the table.
Thanks a lot for the help. I am new to this.
Uh, no you don't. You clearly don't have a complete understanding of how tables work in a database or in SQL Server.
You gain nothing by having multiple tables with exact same columns and types and whose names are differentiated by numbers or dates. That is not how SQL works. You lose a lot: foreign key references, query simplicity, maintainability, and more.
Instead, include the date column in the data and store everything in one table.
If you are concerned about performance, then you can create an index on the date column to get the data that you need. Another method (if the data is large) is to store the data in separate data partitions. These are an important part of SLQ Server functionality.
As a general solution, you could do something like this:
SET #TableName = '[SWDI].[dbo].[hail-'+#YearVar+']';
-- Check if the year table exists
IF (OBJECT_ID(#TableName, 'U') IS NULL) BEGIN
-- Implement your 'wildcard' logic here
SET #NumVar = '01';
SET #TableName = '[SWDI].[dbo].[hail-'+ #YearVar + #NumVar']';
END
Another solution would be to have the missing numbered tables as views on top of the existing tables, but this might have negative performance effects.
A third one is to have yearly views on top of the new numbered tables, with clever constraints on the tables and in the view definition, this can have insignificant overhead.
Last, but not least, you should consider to build a partitioned view on top of these tables and maintain that view. You can query the view directly without messing with table names all the time.
Please read Gordon's answer!
In any case, I'd hightly suggest to be careful with the dynamic queries. You might want to take a look at functions like PARSENAME and QUOTENAME.
So I have a database table in MySQL that has a column containing a string. Given a target string, I want to find all the rows that have a substring contained in the target, ie all the rows for which the target string is a superstring for the column. At the moment I'm using a query along the lines of:
SELECT * FROM table WHERE 'my superstring' LIKE CONCAT('%', column, '%')
My worry is that this won't scale. I'm currently doing some tests to see if this is a problem but I'm wondering if anyone has any suggestions for an alternative approach. I've had a brief look at MySQL's full-text indexing but that also appears to be geared toward finding a substring in the data, rather than finding out if the data exists in a given string.
You could create a temporary table with a full text index and insert 'my superstring' into it. Then you could use MySQL's full text match syntax in a join query with your permanent table. You'll still be doing a full table scan on your permanent table because you'll be checking for a match against every single row (what you want, right?). But at least 'my superstring' will be indexed so it will likely perform better than what you've got now.
Alternatively, you could consider simply selecting column from table and performing the match in a high level language. Depending on how many rows are in table, this approach might make more sense. Offloading heavy tasks to a client server (web server) can often be a win because it reduces load on the database server.
If your superstrings are URLs, and you want to find substrings in them, it would be useful to know if your substrings can be anchored on the dots.
For instance, you have superstrings :
www.mafia.gov.ru
www.mymafia.gov.ru
www.lobbies.whitehouse.gov
If your rules contain "mafia' and you want the first 2 to match, then what I'll say doesn't apply.
Else, you can parse your URLs into things like : [ 'www', 'mafia', 'gov', 'ru' ]
Then, it will be much easier to look up each element in your table.
Well it appears the answer is that you don't. This type of indexing is generally not available and if you want it within your MySQL database you'll need to create your own extensions to MySQL. The alternative I'm pursuing is to do the indexing in my application.
Thanks to everyone that responded!
I created a search solution using views that needed to be robust enought to grow with the customers needs. For Example:
CREATE TABLE tblMyData
(
MyId bigint identity(1,1),
Col01 varchar(50),
Col02 varchar(50),
Col03 varchar(50)
)
CREATE VIEW viewMySearchData
as
SELECT
MyId,
ISNULL(Col01,'') + ' ' +
ISNULL(Col02,'') + ' ' +
ISNULL(Col03,'') + ' ' AS SearchData
FROM tblMyData
SELECT
t1.MyId,
t1.Col01,
t1.Col02,
t1.Col03
FROM tblMyData t1
INNER JOIN viewMySearchData t2
ON t1.MyId = t2.MyId
WHERE t2.SearchData like '%search string%'
If they then decide to add columns to tblMyData and they want those columns to be searched then modify viewMysearchData by adding the new colums to "AS SearchData" section.
If they decide that there are two many columns in the search then just modify the viewMySearchData by removing the unwanted columns from the "AS SearchData" section.
I have a MySQL table with 3 fields:
Location
Variable
Value
I frequently use the following query:
SELECT *
FROM Table
WHERE Location = '$Location'
AND Variable = '$Variable'
ORDER BY Location, Variable
I have over a million rows in my table and queries are somewhat slow. Would it increase query speed if I added a field VariableLocation, which is the Variable and the Location combined? I would be able to change the query to:
SELECT *
FROM Table
WHERE VariableLocation = '$Location$Variable'
ORDER BY VariableLocation
I would add a covering index, for columns location and variable:
ALTER TABLE
ADD INDEX (variable, location);
...though if the variable & location pairs are unique, they should be the primary key.
Combining the columns will likely cause more grief than it's worth. For example, if you need to pull out records by location or variable only, you'd have to substring the values in a subquery.
Try adding an index which covers the two fields you should then still get a performance boost but also keep your data understandable because it wouldn't seem like the two columns should be combine but you are just doing it to get performance.
I would advise against combining the fields. Instead, create an index that covers both fields in the same order as your ORDER BY clause:
ALTER TABLE tablename ADD INDEX (location, variable);
Combined indices and keys are only used in queries that involve all fields of the index or a subset of these fields read from left to right. Or in other words: If you use location in a WHERE condition, this index would be used, but ordering by variable would not use the index.
When trying to optimize queries, the EXPLAIN command is quite helpful: EXPLAIN in mysql docs
Correction Update:
Courtesy: #paxdiablo:
A column in the table will make no difference. All you need is an index over both columns and the MySQL engine will use that. Adding a column in the table is actually worse than that since it breaks 3NF and wastes space. See http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html which states: SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2; If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly.
Ok, here is my dilemma I have a database set up with about 5 tables all with the exact same data structure. The data is separated in this manner for localization purposes and to split up a total of about 4.5 million records.
A majority of the time only one table is needed and all is well. However, sometimes data is needed from 2 or more of the tables and it needs to be sorted by a user defined column. This is where I am having problems.
data columns:
id, band_name, song_name, album_name, genre
MySQL statment:
SELECT * from us_music, de_music where `genre` = 'punk'
MySQL spits out this error:
#1052 - Column 'genre' in where clause is ambiguous
Obviously, I am doing this wrong. Anyone care to shed some light on this for me?
I think you're looking for the UNION clause, a la
(SELECT * from us_music where `genre` = 'punk')
UNION
(SELECT * from de_music where `genre` = 'punk')
It sounds like you'd be happer with a single table. The five having the same schema, and sometimes needing to be presented as if they came from one table point to putting it all in one table.
Add a new column which can be used to distinguish among the five languages (I'm assuming it's language that is different among the tables since you said it was for localization). Don't worry about having 4.5 million records. Any real database can handle that size no problem. Add the correct indexes, and you'll have no trouble dealing with them as a single table.
Any of the above answers are valid, or an alternative way is to expand the table name to include the database name as well - eg:
SELECT * from us_music, de_music where `us_music.genre` = 'punk' AND `de_music.genre` = 'punk'
The column is ambiguous because it appears in both tables you would need to specify the where (or sort) field fully such as us_music.genre or de_music.genre but you'd usually specify two tables if you were then going to join them together in some fashion. The structure your dealing with is occasionally referred to as a partitioned table although it's usually done to separate the dataset into distinct files as well rather than to just split the dataset arbitrarily. If you're in charge of the database structure and there's no good reason to partition the data then I'd build one big table with an extra "origin" field that contains a country code but you're probably doing it for legitimate performance reason.
Either use a union to join the tables you're interested in http://dev.mysql.com/doc/refman/5.0/en/union.html or by using the Merge database engine http://dev.mysql.com/doc/refman/5.1/en/merge-storage-engine.html.
Your original attempt to span both tables creates an implicit JOIN. This is frowned upon by most experienced SQL programmers because it separates the tables to be combined with the condition of how.
The UNION is a good solution for the tables as they are, but there should be no reason they can't be put into the one table with decent indexing. I've seen adding the correct index to a large table increase query speed by three orders of magnitude.
The union statement cause a deal time in huge data. It is good to perform the select in 2 steps:
select the id
then select the main table with it