How to select data from table with big rows where column like column1%column2 - sql

I have a table Product with 240 000 rows.
I want to select from this table data where idproduct = idproductcomponent
Output makes a table with 3 columns A12345678 35655455952625 9638520963258960, so in different column.
Always idproductcomponent of idproducttype 1 is 0, and idproductcomponent of idproducttype 2,3 are the same of idproduct of idproducttype 1.
Pelase, can you share with me any idea for this select ?

Assuming your database is SQL Server (you don't say which one it is) the query could look like:
select
a.name,
b.name,
c.name
from t a
left join t b on b.idproductcomponent = a.idproduct and b.idproducttype = 2
left join t c on c.idproductcomponent = a.idproduct and c.idproducttype = 3
where a.idproducttype = 1
and a.idproduct = 11163 -- parameter you are searching for
To increase the performance of this query you can add the index:
create index ix1 on t (idproductcomponent);

Related

How to check on which column to create Index to optimize performance

I have below query which is costing too much time and i have to optimize the query performance. There is no index on any of the table.
But now for query performance optimization i am thinking to create index. But not sure on particulary which filtered column i have to create index.
I am thinking i will do group by and count the number of distinct records for all the filtered column condition and then decide on which column i should create index but not sure about this.
Select * from ORDER_MART FOL where FOL.PARENT_PROD_SRCID
IN
(
select e.PARENT_PROD_SRCID
from SRC_GRP a
JOIN MAR_GRP b ON a.h_lpgrp_id = b.h_lpgrp_id
JOIN DATA_GRP e ON e.parent_prod_srcid = b.H_LOCPR_ID
WHERE a.CHILD_LOCPR_ID != 0
AND dt_id BETWEEN 20170101 AND 20170731
AND valid_order = 1
AND a.PROD_TP_CODE like 'C%'
)
AND FOL.PROD_SRCID = 0 and IS_CAPS = 1;
Below is my query execution plan:
Select *
from ORDER_MART FOL
INNER JOIN (
select distinct e.PARENT_PROD_SRCID
from SRC_GRP a
JOIN MAR_GRP b ON a.h_lpgrp_id = b.h_lpgrp_id
JOIN DATA_GRP e ON e.parent_prod_srcid = b.H_LOCPR_ID
WHERE a.CHILD_LOCPR_ID != 0 -- remove the lines from INT_CDW_DV.S_LOCAL_PROD_GRP_MAIN with child prod srcid equal to 0
AND dt_id BETWEEN 20170101 AND 20170731
AND valid_order = 1 --and is_caps=1
AND a.PROD_TP_CODE like 'C%'
) sub ON sub.PARENT_PROD_SRCID=FOL.PARENT_PROD_SRCID
where FOL.PROD_SRCID = 0 and IS_CAPS = 1;
What if you use JOIN instead of IN and add distinct to reduce amount of rows in the subquery.

Why is Selecting From Table Variable Far Slower than List of Integers

I have a pretty big MSSQL stored procedure that I need to conditionally check for certain IDs:
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where b.SomeID in (1,2,3,4,5)
I wanted to conditionally check the SomeID field, so I did the following:
if #enteredText = 'This'
INSERT INTO #AwesomeIDs
VALUES(1),(2),(3)
if #enteredText = 'That'
INSERT INTO #AwesomeIDs
VALUES(4),(5)
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where b.SomeID in (Select ID from #AwesomeIDs)
Nothing else has changed, yet I can't even get the latter query to grab 5 records. The top query returns 5000 records in less than 3 seconds. Why is selecting from a table variable so much drastically slower?
Two other possible options you can consider
Option 1
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where
( b.SomeID IN (1,2,3) AND #enteredText = 'This')
OR
( b.SomeID IN (4,5) AND #enteredText = 'That')
Option 2
Select SomeColumns
From BigTable b
Join LotsOfTables l on b.LongStringField = l.LongStringField
Where EXISTS (Select 1
from #AwesomeIDs
WHERE b.SomeID = ID)
Mind you for Table variables , SQL Server always assumes there is only ONE row in the table (except sql 2014 , assumption is 100 rows) and it can affect the estimated and actual plans. But 1 row against 3 not really a deal breaker.

T-SQL cursor or if or case when

I have this table:
Table_NAME_A:
quotid itration QStatus
--------------------------------
5329 1 Assigned
5329 2 Inreview
5329 3 sold
4329 1 sold
4329 2 sold
3214 1 assigned
3214 2 Inreview
Result output should look like this:
quotid itration QStatus
------------------------------
5329 3 sold
4329 2 sold
3214 2 Inreview
T-SQL query, so basically I want the data within "sold" status if not there then "inreview" if not there then "assigned" and also at the same time if "sold" or "inreview" or "assigned" has multiple iteration then i want the highest "iteration".
Please help me, thanks in advance :)
This is a prioritization query. One way to do this is with successive comparisons in a union all:
select a.*
from table_a a
where quote_status = 'sold'
union all
select a.*
from table_a a
where quote_status = 'Inreview' and
not exists (select 1 from table_a a2 where a2.quoteid = a.quoteid and a2.quotestatus = 'sold')
union all
select a.*
from table_a a
where quote_status = 'assigned' and
not exists (select 1
from table_a a2
where a2.quoteid = a.quoteid and a2.quotestatus in ('sold', 'Inreview')
);
For performance on a larger set of data, you would want an index on table_a(quoteid, quotestatus).
You want neither cursors nor if/then for this. Instead, you'll use a series of self-joins to get these results. I'll also use a CTE to simplify getting the max iteration at each step:
with StatusIterations As
(
SELECT quotID, MAX(itration) Iteration, QStatus
FROM table_NAME_A
GROUP BY quotID, QStats
)
select q.quotID, coalesce(sold.Iteration,rev.Iteration,asngd.Iteration) Iteration,
coalesce(sold.QStatus, rev.QStatus, asngd.QStatus) QStatus
from
--initial pass for list of quotes, to ensure every quote is included in the results
(select distinct quotID from table_NAME_A) q
--one additional pass for each possible status
left join StatusIterations sold on sold.quotID = q.quotID and sold.QStatus = 'sold'
left join StatusIterations rev on rev.quotID = q.quotID and rev.QStatus = 'Inreview'
left join StatusIterations asngd on asngd.quotID = q.quotID and asngd.QStatus = 'assigned'
If you have a table that equates a status with a numeric value, you can further improve on this:
Table: Status
QStatus Sequence
'Sold' 3
'Inreview' 2
'Assigned' 1
And the code becomes:
select t.quotID, MAX(t.itration) itration, t.QStatus
from
(
select t.quotID, MAX(s.Sequence) As Sequence
from table_NAME_A t
inner join Status s on s.QStatus = t.QStatus
group by t.quotID
) seq
inner join Status s on s.Sequence = seq.Sequence
inner join table_NAME_A t on t.quotID = seq.quotID and t.QStatus = s.QStatus
group by t.quoteID, t.QStatus
The above may look like complicated at first, but it can be faster and it will scale easily beyond three statuses without changing the code.

sql using select and adding to a table

SELECT `pro`.`St`, `s`.`Quantity`
FROM `s`
LEFT JOIN `web`.`pro` ON `s`.`Pro_id` = `pro`.`ProdID`
the above query results in a table like
st quantity
132 1
11 1
st is from one table and quantity is from another
using this select statement I want to add the resulting quantity to the st row in the other table.
Basically updating the second table by adding the quantity i get from this select statement.
UPDATE `web`.`pro`
SET `pro`.`st` = `pro`.`st` + `s`.`Quantity`
FROM `web`.`pro`
JOIN `shopping cart` `s`
ON `s`.`Pro_id` = `pro`.`ProdID`;
Something like that ?

COUNT (DISTINCT column_name) Discrepancy vs. COUNT (column_name) in SQL Server 2008?

I'm running into a problem that's driving me nuts.
When running the query below, I get a count of 233,769
SELECT COUNT(distinct Member_List_Link.UserID)
FROM Member_List_Link with (nolock)
INNER JOIN MasterMembers with (nolock)
ON Member_List_Link.UserID = MasterMembers.UserID
WHERE MasterMembers.Active = 1 And
Member_List_Link.GroupID = 5 AND
MasterMembers.ValidUsers = 1 AND
Member_List_Link.Status = 1
But if I run the same query without the distinct keyword, I get a count of 233,748
SELECT COUNT(Member_List_Link.UserID)
FROM Member_List_Link with (nolock)
INNER JOIN MasterMembers with (nolock)
ON Member_List_Link.UserID = MasterMembers.UserID
WHERE MasterMembers.Active = 1 And Member_List_Link.GroupID = 5
AND MasterMembers.ValidUsers = 1 AND Member_List_Link.Status = 1
To test, I recreated all the tables and place them into temp tables and ran the queries again:
SELECT COUNT(distinct #Temp_Member_List_Link.UserID)
FROM #Temp_Member_List_Link with (nolock)
INNER JOIN #Temp_MasterMembers with (nolock)
ON #Temp_Member_List_Link.UserID = #Temp_MasterMembers.UserID
WHERE #Temp_MasterMembers.Active = 1 And
#Temp_Member_List_Link.GroupID = 5 AND
#Temp_MasterMembers.ValidUsers = 1 AND
#Temp_Member_List_Link.Status = 1
And without the distinct keyword
SELECT COUNT(#Temp_Member_List_Link.UserID)
FROM #Temp_Member_List_Link with (nolock)
INNER JOIN #Temp_MasterMembers with (nolock)
ON #Temp_Member_List_Link.UserID = #Temp_MasterMembers.UserID
WHERE #Temp_MasterMembers.Active = 1 And
#Temp_Member_List_Link.GroupID = 5 AND
#Temp_MasterMembers.ValidUsers = 1 AND
#Temp_Member_List_Link.Status = 1
On a side note, I recreated the temp tables by simply running (select * from Member_List_Link into #temp...)
And now when I check to see the difference between COUNT(column) vs. COUNT(distinct column) with these temp tables, I don't see any!
So why is there a discrepancy with the original tables?
I'm running SQL Server 2008 (Dev Edition).
UPDATE - Including statistics profile
PhysicalOp column only for the first query (without distinct)
NULL
Compute Scalar
Stream Aggregate
Clustered Index Seek
PhysicalOp column only for the first query (with distinct)
NULL
Compute Scalar
Stream Aggregate
Parallelism
Stream Aggregate
Hash Match
Hash Match
Bitmap
Parallelism
Index Seek
Parallelism
Clustered Index Scan
Rows and Executes for the 1st query (without distinct)
1 1
0 0
1 1
1 1
Rows and Executes for the 2nd query (with distinct)
Rows Executes
1 1
0 0
1 1
16 1
16 16
233767 16
233767 16
281901 16
281901 16
281901 16
234787 16
234787 16
Adding OPTION(MAXDOP 1) to the 2nd query (with distinct)
Rows Executes
1 1
0 0
1 1
233767 1
233767 1
281901 1
548396 1
And the resulting PhysicalOp
NULL
Compute Scalar
Stream Aggregate
Hash Match
Hash Match
Index Seek
Clustered Index Scan
FROM http://msdn.microsoft.com/en-us/library/ms187373.aspx
NOLOCK Is equivalent to READUNCOMMITTED. For more information, see READUNCOMMITTED later in this topic.
READUNCOMMITED will read rows twice if they are the subject of a transation- since both the roll foward and roll back rows exist within the database when the transaction is IN process.
By default all queries are read committed which excludes uncommitted rows
When you insert into a temp table the select will give you only committed rows - I believe this covers all the symptoms you are trying to explain
I think i have got the answer to your question but tell me first is userid a primary key in your original table ?
if yes,then CTAS query to create temp table would not copy any primary key of original table ,it only copy NOT NULL constraint that is not a part of primary key..fine?
now what happened your original table had a primary key so count(distinct column_name) doesnt include tuples with null records and while you created temp tables , primary key doesnt get copied and hence the NOT NULL constraint doesnt get to the temp table!!
is that clear to you?
It's hard to reproduce this behaviour, so I'm punching in the dark here:
The WITH (NOLOCK) statement enables reading of uncommitted data. I'm guessing you've added that to not lock anything for your users? If you remove those and issue a
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
Prior to executing the query, you should get more reliable results. But then, the tables may receive locks while executing the query.
If that doesn't work, my guess is that DISTINCT use an index to optimize. Check the queryplan, and rebuild indexes as necessary. Could be the source of your problem.
What result do you get with
SELECT count(*) FROM (
SELECT distinct Member_List_Link.UserID
FROM Member_List_Link with (nolock)
INNER JOIN MasterMembers with (nolock)
ON Member_List_Link.UserID = MasterMembers.UserID
WHERE MasterMembers.Active = 1 And
Member_List_Link.GroupID = 5 AND
MasterMembers.ValidUsers = 1 AND
Member_List_Link.Status = 1
) as m
AND WITH:
SELECT count(*) FROM (
SELECT distinct Member_List_Link.UserID
FROM Member_List_Link
INNER JOIN MasterMembers
ON Member_List_Link.UserID = MasterMembers.UserID
WHERE MasterMembers.Active = 1 And
Member_List_Link.GroupID = 5 AND
MasterMembers.ValidUsers = 1 AND
Member_List_Link.Status = 1
) as m
Ray, please try the following
SELECT COUNT(*)
FROM
(
SELECT Member_List_Link.UserID, ROW_NUMBER() OVER (PARTITION BY Member_List_Link.UserID ORDER BY (SELECT NULL)) N
FROM Member_List_Link with (nolock)
INNER JOIN MasterMembers with (nolock)
ON Member_List_Link.UserID = MasterMembers.UserID
WHERE MasterMembers.Active = 1 And
Member_List_Link.GroupID = 5 AND
MasterMembers.ValidUsers = 1 AND
Member_List_Link.Status = 1
) A
WHERE N = 1
when you use count with distinct column it doesn't count columns having values null.
create table #tmp(name char(4) null)
insert into #tmp values(null)
insert into #tmp values(null)
insert into #tmp values("AAA")
Query:-
1> select count(*) from #tmp
2> go
3
1> select count(distinct name) from #tmp
2> go
1
1> select distinct name from #tmp
2> go
name
NULL
AAA
but it works in derived table
1> select count(*) from ( select distinct name from #tmp) a
2> go
2
Note:- I tested it in Sybase