slowquery what's wrong? - sql

I caught this query on the slowquerys log.
SELECT GREATEST( IFNULL(`twk-status`.`status`,0) , `groups`.`status`)
FROM `groups`
LEFT JOIN `twk-status` ON
(`twk-status`.`groupID` = `groups`.`ID`
AND `twk-status`.`startdate` <= FROM_UNIXTIME(1317204000)
AND `twk-status`.`enddate` >= FROM_UNIXTIME(1317204000)
AND ISNULL(`twk-status`.`deleted`)
AND `twk-status`.`groupID` = 11)
WHERE `groups`.`ID` = 11;
I ran the query using EXPLAIN producing the following report:
id table possible_keys key
1 twk-status groupID,groupID_2,enddate NULL
1 groups PRIMARY PRIMARY
This mean that mysql didn't used any of the three possible keys for twk-status table. Correct?
Here's the keys from twk-status
groupID (groupID)
groupID_2 (groupID, startdate, enddate, deleted)
enddate (enddate, createdate, deleted)
The query uses groupID, startdate, enddate, deleted I thought groupID_2 whould be the key.
Or is this a query problem and not a index problem?
Thanks in advanced.

From your explain I can judge that the order or tables is swapped.
For each record from twk-status a corresponding record from groups is being searched, since you have a PRIMARY KEY join condition among the others (twk-status.groupID = groups.ID)
However, I'm almost sure that there is something wrong with this EXPLAIN, because MySQL is not able to swap the order for LEFT JOIN.
Could you please post the exact output of EXPLAIN?
Update:
When building the plans, MySQL checks existence of constant values searched in indexed fields.
Some of these checks failed. It's impossible to tell which exactly, but you either don't have groupID = 11 in twk-status or a record with enddate >= FROM_UNIXTIME(1317204000).
MySQL just replaced the whole join with an empty row value for twk-status, and will return it along with the data for group with groupID = 11.

Related

SQL Server : index for finding latest value which is greater than a passed value

I have a table with 4 columns
USER_ID: numeric
EVENT_DATE: date
VERSION: date
SCORE: decimal
I have a clustered index on (USER_ID, EVENT_DATE, VERSION). These three values together are unique.
I need to get the maximum EventDate for a set of UserIds (~1000 different ids) where the Score is larger than a specific value and only consider those entries with a specific Version.
SELECT M.*
FROM (VALUES
( 5237 ),
………1000 more
( 27054 ) ) C (USER_ID)
CROSS APPLY
(SELECT TOP 1 C.USER_ID, M.EVENT_DATE, M.SCORE
FROM MY_HUGE_TABLE M
WHERE C. USER_ID = M. USER_ID
AND M.VERSION = 'xxxx-xx-xx'
AND M.SCORE > 2 --Comment M.SCORE > 2
ORDER BY M.EVENT_DATE DESC) M
Once I execute the query, I get poor results with respect to runtime, due to a missing index on score column (I suppose).
If I delete the filtering on “M.SCORE > 2” I get my results ten times faster, nevertheless the latest Scores may be less than “2”.
Could anyone please hint me on how to setup an index which could allow me to improve my query performance.
Thank you very much in advance
For your query, the optimal index would be on (User_ID, Version, ValueDate desc, Score).
Unfortunately, your clustered index doesn't match. Only the first and third columns match, but they need to match in order. So, only the User_ID can help but that probably doesn't do much to filter the data.

How to insert values from column A of table X to column B of table Y - and order them randomly

I need to collect the values from the column "EmployeeID" of the table "Employees" and insert them into the column "EmployeeID" of the table "Incident".
At the end, the Values in the rows of the column "EmployeeID" should be arranged randomly.
More precisely;
I created 10 employees with their ID's, counting from 1 up to 10.
Those Employees, in fact the ID's, should receive random Incidents to work on.
So ... there are 10 ID's to spread on all Incidents - which might be 1000s.
How do i do this?
It's just for personal exercise on the local maschine.
I googled, but didn't find an explicit answer to my problem.
Should be simple to solve for you champs. :)
May anyone help me, please?
NOTES:
1) I've already created a column called "EmployeeID" in the table "Incident", therefore I'll need an update statement, won't I?
2) Schema:
[dbo].[EmployeeType]
[dbo].[Company]
[dbo].[Division]
[dbo].[Team]
[dbo].[sysdiagrams]
[dbo].[Incident]
[dbo].[Employees]
3) 1. Pre-solution:
CREATE TABLE IncidentToEmployee
(
IncidentToEmployeeID BIGINT IDENTITY(1,1) NOT NULL,
EmployeeID BIGINT NULL,
Incident FLOAT NULL
PRIMARY KEY CLUSTERED (IncidentToEmployeeID)
)
INSERT INTO IncidentToEmployee
SELECT
EmployeeID,
Incident
FROM dbo.Employees,
dbo.Incident
ORDER BY NEWID()
SELECT * FROM IncidentToEmployee
GO
3) 2. Output by INNER JOIN ON
In case you are wondering about the "Alias" column;
Nobody really knows which persons are behind the ID's - that's why I used an Alias column.
SELECT Employees.Alias,
IncidentToEmployee.Incident
FROM Employees
INNER JOIN
IncidentToEmployee ON
Employees.EmployeeID = IncidentToEmployee.EmployeeID
ORDER BY Alias
4) Final Solution
As I mentioned, I added at first a column called "EmployeeID" already to my "Incident" table. That's why I couldn't use an INSERT INTO statement at first and had to use an UPDATE statement. I found the most suitable solution now - without creating a new table as I did as a pre-solution.
Take a look at the following code:
ALTER Table Incident
ADD EmployeeID BIGINT NULL
UPDATE Incident
SET Incident.EmployeeID = EmployeeID
FROM Incident INNER JOIN Employees
ON Incident = EmployeeID
SELECT
EmployeeID,
Incident
FROM dbo.Employees,
dbo.Incident
ORDER BY NEWID()
Thank you all for your help - It took way longer to find a solution as I thought it would take; but I finally made it. Thanks!
UPDATE
I think you need to allocate different task to different user, a better approach will be to create a new table let's say EmployeeIncidents having columns Id(primary) , EmployeeID and IncidentID .
Now you can insert random EmployeesID and random IncidentID to new table, this way you will be able to keep records also ,
Updating Incident table will not be a smart choice.
INSERT INTO EmployeeIncidents
SELECT TOP ( 10 )
EmployeesID ,
IncidentID
FROM dbo.Employees,
dbo.Incident
ORDER BY NEWID()
Written by hand, so may need to tweak syntax, but something like this should do it. The Rand() function will give the same value unless seeded, so you can see with something like date to get randomness.
Insert Into Incidents
Select Top 10
EmployeeID
From Employees
Order By
Rand(GetDate())

Delete duplicates with no primary key

Here want to delete rows with a duplicated column's value (Product) which will be then used as a primary key.
The column is of type nvarchar and we don't want to have 2 rows for one product.
The database is a large one with about thousands rows we need to remove.
During the query for all the duplicates, we want to keep the first item and remove the second one as the duplicate.
There is no primary key yet, and we want to make it after this activity of removing duplicates.
Then the Product columm could be our primary key.
The database is SQL Server CE.
I tried several methods, and mostly getting error similar to :
There was an error parsing the query. [ Token line number = 2,Token line offset = 1,Token in error = FROM ]
A method which I tried :
DELETE FROM TblProducts
FROM TblProducts w
INNER JOIN (
SELECT Product
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1
)Dup ON w.Product = Dup.Product
The preferred way trying to learn and adjust my code with something similar
(It's not correct yet):
SELECT Product, COUNT(*) TotalCount
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
--
;WITH cte -- These 3 lines are the lines I have more doubt on them
AS (SELECT ROW_NUMBER() OVER (PARTITION BY Product
ORDER BY ( SELECT 0)) RN
FROM Word)
DELETE FROM cte
WHERE RN > 1
If you have two DIFFERENT records with the same Product column, then you can SELECT the unwanted records with some criterion, e.g.
CREATE TABLE victims AS
SELECT MAX(entryDate) AS date, Product, COUNT(*) AS dups FROM ProductsTable WHERE ...
GROUP BY Product HAVING dups > 1;
Then you can do a DELETE JOIN between ProductTable and Victims.
Or also you can select Product only, and then do a DELETE for some other JOIN condition, for example having an invalid CustomerId, or EntryDate NULL, or anything else. This works if you know that there is one and only one valid copy of Product, and all the others are recognizable by the invalid data.
Suppose you instead have IDENTICAL records (or you have both identical and non-identical, or you may have several dupes for some product and you don't know which). You run exactly the same query. Then, you run a SELECT query on ProductsTable and SELECT DISTINCT all products matching the product codes to be deduped, grouping by Product, and choosing a suitable aggregate function for all fields (if identical, any aggregate should do. Otherwise I usually try for MAX or MIN). This will "save" exactly one row for each product.
At that point you run the DELETE JOIN and kill all the duplicated products. Then, simply reimport the saved and deduped subset into the main table.
Of course, between the DELETE JOIN and the INSERT SELECT, you will have the DB in a unstable state, with all products with at least one duplicate simply disappeared.
Another way which should work in MySQL:
-- Create an empty table
CREATE TABLE deduped AS SELECT * FROM ProductsTable WHERE false;
CREATE UNIQUE INDEX deduped_ndx ON deduped(Product);
-- DROP duplicate rows, Joe the Butcher's way
INSERT IGNORE INTO deduped SELECT * FROM ProductsTable;
ALTER TABLE ProductsTable RENAME TO ProductsBackup;
ALTER TABLE deduped RENAME TO ProductsTable;
-- TODO: Copy all indexes from ProductsTable on deduped.
NOTE: the way above DOES NOT WORK if you want to distinguish "good records" and "invalid duplicates". It only works if you have redundant DUPLICATE records, or if you do not care which row you keep and which you throw away!
EDIT:
You say that "duplicates" have invalid fields. In that case you can modify the above with a sorting trick:
SELECT * FROM ProductsTable ORDER BY Product, FieldWhichShouldNotBeNULL IS NULL;
Then if you have only one row for product, all well and good, it will get selected. If you have more, the one for which (FieldWhichShouldNeverBeNull IS NULL) is FALSE (i.e. the one where the FieldWhichShouldNeverBeNull is actually not null as it should) will be selected first, and inserted. All others will bounce, silently due to the IGNORE clause, against the uniqueness of Product. Not a really pretty way to do it (and check I didn't mix true with false in my clause!), but it ought to work.
EDIT
actually more of a new answer
This is a simple table to illustrate the problem
CREATE TABLE ProductTable ( Product varchar(10), Description varchar(10) );
INSERT INTO ProductTable VALUES ( 'CBPD10', 'C-Beam Prj' );
INSERT INTO ProductTable VALUES ( 'CBPD11', 'C Proj Mk2' );
INSERT INTO ProductTable VALUES ( 'CBPD12', 'C Proj Mk3' );
There is no index yet, and no primary key. We could still declare Product to be primary key.
But something bad happens. Two new records get in, and both have NULL description.
Yet, the second one is a valid product since we knew nothing of CBPD14 before now, and therefore we do NOT want to lose this record completely. We do want to get rid of the spurious CBPD10 though.
INSERT INTO ProductTable VALUES ( 'CBPD10', NULL );
INSERT INTO ProductTable VALUES ( 'CBPD14', NULL );
A rude DELETE FROM ProductTable WHERE Description IS NULL is out of the question, it would kill CBPD14 which isn't a duplicate.
So we do it like this. First get the list of duplicates:
SELECT Product, COUNT(*) AS Dups FROM ProductTable GROUP BY Product HAVING Dups > 1;
We assume that: "There is at least one good record for every set of bad records".
We check this assumption by positing the opposite and querying for it. If all is copacetic we expect this query to return nothing.
SELECT Dups.Product FROM ProductTable
RIGHT JOIN ( SELECT Product, COUNT(*) AS Dups FROM ProductTable GROUP BY Product HAVING Dups > 1 ) AS Dups
ON (ProductTable.Product = Dups.Product
AND ProductTable.Description IS NOT NULL)
WHERE ProductTable.Description IS NULL;
To further verify, I insert two records that represent this mode of failure; now I do expect the query above to return the new code.
INSERT INTO ProductTable VALUES ( "AC5", NULL ), ( "AC5", NULL );
Now the "check" query indeed returns,
AC5
So, the generation of Dups looks good.
I proceed now to delete all duplicate records that are not valid. If there are duplicate, valid records, they will stay duplicate unless some condition may be found, distinguishing among them one "good" record and declaring all others "invalid" (maybe repeating the procedure with a different field than Description).
But ay, there's a rub. Currently, you cannot delete from a table and select from the same table in a subquery ( http://dev.mysql.com/doc/refman/5.0/en/delete.html ). So a little workaround is needed:
CREATE TEMPORARY TABLE Dups AS
SELECT Product, COUNT(*) AS Duplicates
FROM ProductTable GROUP BY Product HAVING Duplicates > 1;
DELETE ProductTable FROM ProductTable JOIN Dups USING (Product)
WHERE Description IS NULL;
Now this will delete all invalid records, provided that they appear in the Dups table.
Therefore our CBPD14 record will be left untouched, because it does not appear there. The "good" record for CBPD10 will be left untouched because it's not true that its Description is NULL. All the others - poof.
Let me state again that if a record has no valid records and yet it is a duplicate, then all copies of that record will be killed - there will be no survivors.
To avoid this can may first SELECT (using the query above, the check "which should return nothing") the rows representing this mode of failure into another TEMPORARY TABLE, then INSERT them back into the main table after the deletion (using transactions might be in order).
Create a new table by scripting the old one out and renaming it. Also script all objects (indexes etc..) from the old table to the new. Insert the keepers into the new table. If you're database is in bulk-logged or simple recovery model, this operation will be minimally logged. Drop the old table and then rename the new one to the old name.
The advantage of this over a delete will be that the insert can be minimally logged. Deletes do double work because not only does the data get deleted, but the delete has to be written to the transaction log. For big tables, minimally logged inserts will be much faster than deletes.
If it's not that big and you have some downtime, and you have Sql Server Management studio, you can put an identity field on the table using the GUI. Now you have the situation like your CTE, except the rows themselves are truly distinct. So now you can do the following
SELECT MIN(table_a.MyTempIDField)
FROM
table_a lhs
join table_1 rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
table_a.MyTempIDField <> table_b.MyTempIDField
GROUP BY
lhs.field1, rhs.field2 etc
This gives you all the 'good' duplicates. Now you can wrap this query with a DELETE FROM query.
DELETE FROM lhs
FROM table_a lhs
join table_b rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
lhs.MyTempIDField <> rhs.MyTempIDField
and lhs.MyTempIDField not in (
SELECT MIN(lhs.MyTempIDField)
FROM
table_a lhs
join table_a rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
lhs.MyTempIDField <> rhs.MyTempIDField
GROUP BY
lhs.field1, lhs.field2 etc
)
Try this:
DELETE FROM TblProducts
WHERE Product IN
(
SELECT Product
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1)
This suffers from the defect that it deletes ALL the records with a duplicated Product. What you probably want to do is delete all but one of each group of records with a given Product. It might be worthwhile to copy all the duplicates to a separate table first, and then somehow remove duplicates from that table, then apply the above, and then copy remaining products back to the original table.

finding consecutive date pairs in SQL

I have a question here that looks a little like some of the ones that I found in search, but with solutions for slightly different problems and, importantly, ones that don't work in SQL 2000.
I have a very large table with a lot of redundant data that I am trying to reduce down to just the useful entries. It's a history table, and the way it works, if two entries are essentially duplicates and consecutive when sorted by date, the latter can be deleted. The data from the earlier entry will be used when historical data is requested from a date between the effective date of that entry and the next non-duplicate entry.
The data looks something like this:
id user_id effective_date important_value useless_value
1 1 1/3/2007 3 0
2 1 1/4/2007 3 1
3 1 1/6/2007 NULL 1
4 1 2/1/2007 3 0
5 2 1/5/2007 12 1
6 3 1/1/1899 7 0
With this sample set, we would consider two consecutive rows duplicates if the user_id and the important_value are the same. From this sample set, we would only delete row with id=2, preserving the information from 1-3-2007, showing that the important_value changed on 1-6-2007, and then showing the relevant change again on 2-1-2007.
My current approach is awkward and time-consuming, and I know there must be a better way. I wrote a script that uses a cursor to iterate through the user_id values (since that breaks the huge table up into manageable pieces), and creates a temp table of just the rows for that user. Then to get consecutive entries, it takes the temp table, joins it to itself on the condition that there are no other entries in the temp table with a date between the two dates. In the pseudocode below, UDF_SameOrNull is a function that returns 1 if the two values passed in are the same or if they are both NULL.
WHILE (##fetch_status <> -1)
BEGIN
SELECT * FROM History INTO #history WHERE user_id = #UserId
--return entries to delete
SELECT h2.id
INTO #delete_history_ids
FROM #history h1
JOIN #history h2 ON
h1.effective_date < h2.effective_date
AND dbo.UDF_SameOrNull(h1.important_value, h2.important_value)=1
WHERE NOT EXISTS (SELECT 1 FROM #history hx WHERE hx.effective_date > h1.effective_date and hx.effective_date < h2.effective_date)
DELETE h1
FROM History h1
JOIN #delete_history_ids dh ON
h1.id = dh.id
FETCH NEXT FROM UserCursor INTO #UserId
END
It also loops over the same set of duplicates until there are none, since taking out rows creates new consecutive pairs that are potentially dupes. I left that out for simplicity.
Unfortunately, I must use SQL Server 2000 for this task and I am pretty sure that it does not support ROW_NUMBER() for a more elegant way to find consecutive entries.
Thanks for reading. I apologize for any unnecessary backstory or errors in the pseudocode.
OK, I think I figured this one out, excellent question!
First, I made the assumption that the effective_date column will not be duplicated for a user_id. I think it can be modified to work if that is not the case - so let me know if we need to account for that.
The process basically takes the table of values and self-joins on equal user_id and important_value and prior effective_date. Then, we do 1 more self-join on user_id that effectively checks to see if the 2 joined records above are sequential by verifying that there is no effective_date record that occurs between those 2 records.
It's just a select statement for now - it should select all records that are to be deleted. So if you verify that it is returning the correct data, simply change the select * to delete tcheck.
Let me know if you have questions.
select
*
from
History tcheck
inner join History tprev
on tprev.[user_id] = tcheck.[user_id]
and tprev.important_value = tcheck.important_value
and tprev.effective_date < tcheck.effective_date
left join History checkbtwn
on tcheck.[user_id] = checkbtwn.[user_id]
and checkbtwn.effective_date < tcheck.effective_date
and checkbtwn.effective_date > tprev.effective_date
where
checkbtwn.[user_id] is null
OK guys, I did some thinking last night and I think I found the answer. I hope this helps someone else who has to match consecutive pairs in data and for some reason is also stuck in SQL Server 2000.
I was inspired by the other results that say to use ROW_NUMBER(), and I used a very similar approach, but with an identity column.
--create table with identity column
CREATE TABLE #history (
id int,
user_id int,
effective_date datetime,
important_value int,
useless_value int,
idx int IDENTITY(1,1)
)
--insert rows ordered by effective_date and now indexed in order
INSERT INTO #history
SELECT * FROM History
WHERE user_id = #user_id
ORDER BY effective_date
--get pairs where consecutive values match
SELECT *
FROM #history h1
JOIN #history h2 ON
h1.idx+1 = h2.idx
WHERE h1.important_value = h2.important_value
With this approach, I still have to iterate over the results until it returns nothing, but I can't think of any way around that and this approach is miles ahead of my last one.

Getting the last record in SQL in WHERE condition

i have loanTable that contain two field loan_id and status
loan_id status
==============
1 0
2 9
1 6
5 3
4 5
1 4 <-- How do I select this??
4 6
In this Situation i need to show the last Status of loan_id 1 i.e is status 4. Can please help me in this query.
Since the 'last' row for ID 1 is neither the minimum nor the maximum, you are living in a state of mild confusion. Rows in a table have no order. So, you should be providing another column, possibly the date/time when each row is inserted, to provide the sequencing of the data. Another option could be a separate, automatically incremented column which records the sequence in which the rows are inserted. Then the query can be written.
If the extra column is called status_id, then you could write:
SELECT L1.*
FROM LoanTable AS L1
WHERE L1.Status_ID = (SELECT MAX(Status_ID)
FROM LoanTable AS L2
WHERE L2.Loan_ID = 1);
(The table aliases L1 and L2 could be omitted without confusing the DBMS or experienced SQL programmers.)
As it stands, there is no reliable way of knowing which is the last row, so your query is unanswerable.
Does your table happen to have a primary id or a timestamp? If not then what you want is not really possible.
If yes then:
SELECT TOP 1 status
FROM loanTable
WHERE loan_id = 1
ORDER BY primaryId DESC
-- or
-- ORDER BY yourTimestamp DESC
I assume that with "last status" you mean the record that was inserted most recently? AFAIK there is no way to make such a query unless you add timestamp into your table where you store the date and time when the record was added. RDBMS don't keep any internal order of the records.
But if last = last inserted, that's not possible for current schema, until a PK addition:
select top 1 status, loan_id
from loanTable
where loan_id = 1
order by id desc -- PK
Use a data reader. When it exits the while loop it will be on the last row. As the other posters stated unless you put a sort on the query, the row order could change. Even if there is a clustered index on the table it might not return the rows in that order (without a sort on the clustered index).
SqlDataReader rdr = SQLcmd.ExecuteReader();
while (rdr.Read())
{
}
string lastVal = rdr[0].ToString()
rdr.Close();
You could also use a ROW_NUMBER() but that requires a sort and you cannot use ROW_NUMBER() directly in the Where. But you can fool it by creating a derived table. The rdr solution above is faster.
In oracle database this is very simple.
select * from (select * from loanTable order by rownum desc) where rownum=1
Hi if this has not been solved yet.
To get the last record for any field from a table the easiest way would be to add an ID to each record say pID. Also say that in your table you would like to hhet the last record for each 'Name', run the simple query
SELECT Name, MAX(pID) as LastID
INTO [TableName]
FROM [YourTableName]
GROUP BY [Name]/[Any other field you would like your last records to appear by]
You should now have a table containing the Names in one column and the last available ID for that Name.
Now you can use a join to get the other details from your primary table, say this is some price or date then run the following:
SELECT a.*,b.Price/b.date/b.[Whatever other field you want]
FROM [TableName] a LEFT JOIN [YourTableName]
ON a.Name = b.Name and a.LastID = b.pID
This should then give you the last records for each Name, for the first record run the same queries as above just replace the Max by Min above.
This should be easy to follow and should run quicker as well
If you don't have any identifying columns you could use to get the insert order. You can always do it like this. But it's hacky, and not very pretty.
select
t.row1,
t.row2,
ROW_NUMBER() OVER (ORDER BY t.[count]) AS rownum from (
select
tab.row1,
tab.row2,
1 as [count]
from table tab) t
So basically you get the 'natural order' if you can call it that, and add some column with all the same data. This can be used to sort by the 'natural order', giving you an opportunity to place a row number column on the next query.
Personally, if the system you are using hasn't got a time stamp/identity column, and the current users are using the 'natural order', I would quickly add a column and use this query to create some sort of time stamp/incremental key. Rather than risking having some automation mechanism change the 'natural order', breaking the data needed.
I think this code may help you:
WITH cte_Loans
AS
(
SELECT LoanID
,[Status]
,ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS RN
FROM LoanTable
)
SELECT LoanID
,[Status]
FROM LoanTable L1
WHERE RN = ( SELECT max(RN)
FROM LoanTable L2
WHERE L2.LoanID = L1.LoanID)