SQL Server Query Slow with Smaller Database - sql

I have 2 tables:
asset - with id_asset, name, ticker (60k rows)
quote_close - with id_asset, refdate, quote_close (22MM rows)
I want to make a filter in name and ticker and return:
id_asset
name
ticker
min(refdate) of the id_asset
max(refdate) of the id_asset
quote_close on max(refdate) of the id_asset
I wrote this query:
WITH tableAssetFiltered AS
(
SELECT
id_asset, ticker, name
FROM
asset
WHERE
ticker LIKE ('%VALE%') AND name LIKE ('%PUT%')
)
SELECT
ast.id_asset, ast.ticker, ast.name,
xx.quote_close as LastQuote, xx.MinDate,
xx.refdate as LastDate
FROM
tableAssetFiltered ast
LEFT JOIN
(SELECT
qc.id_asset, qc.refdate, qc.quote_close, tm.MinDate
FROM
quote_close qc
INNER JOIN
(SELECT
t.id_asset, max(t.refdate) as MaxDate, min(t.refdate) as MinDate
FROM
(SELECT
qc.id_asset, qc.refdate, qc.quote_close
FROM
quote_close qc
WHERE
qc.id_asset IN (SELECT id_asset
FROM tableAssetFiltered)
) t
GROUP BY
t.id_asset) tm ON qc.id_asset = tm.id_asset
AND qc.refdate = tm.MaxDate
) xx ON xx.id_asset = ast.id_asset
ORDER BY
ast.ticker
The results with different filter in name and ticker are:
With ticker like ('%VALE%') AND name like ('%PUT%') it took 00:02:28 and returns 491 rows
With name like ('%PUT%') it took 00:00:02 and returns 16697 rows
With ticker like ('%VALE%') it took 00:00:02 and returns 1102 rows
With no likes it took 00:00:03 and returns 51847 rows
What I can't understand is that the query
SELECT id_asset,ticker, name
FROM Viper.dbo.asset
WHERE ticker like ('%VALE%') AND name like ('%PUT%')
took 00:00:00 to run.
Why does a smaller table took more time to run? Any solution to make it faster?

The slowness could be caused by many things, Hardware, network, caching, etc.
To make the query faster,
1. Make sure that there is an index on ticker.
2. Run update statistics on the table.
3. Try to find a way to remove the '%' at the beginning of the string.
This is okay: 'VALE%'
This will slow down your query: '%VALE'

Related

SQL query to return average of results using JOIN and GROUP BY

I have a simple manufacturing job card system that track parts and labor for an assigned job.
It consists of a JobHeader table that holds the Job Card number (JobHeader.JobNo), ID of the part being manufactured (JobHeader.RegNo) and quantity to be manufactured (JobHeader.RegNo).
There is a child table (JobLabour) that tracks all the times that have been worked on the job (JobLabour.WorkedTime)
I'm looking for a query that will return the average time taken to produce a part accross the last 5 job cards for that particular part.
The following query
SELECT TOP 5 JobHeader.RegNo, JobHeader.BOMQty, sum(JobLabour.WorkedTime) AS TotalTime FROM JobHeader INNER JOIN JobLabour ON JobHeader.JobNo=JobLabour.JobNo
WHERE JobHeader.RegNo='RM-BRU-0134'
GROUP BY JobHeader.BOMQty, JobHeader.JobNo, JobHeader.RegNo
will return this:
But what I'm looking for is a query that will return the average BOMQty and average totalTime. Something like this:
Is there a way to do this?
Your question explicitly mentions the "last five" but does not specify how that is determined. Presumably, you have some sort of date/time column in the data that defines this.
In SQL Server, you can use apply:
select jh.*, jl.*
from jobheader jh outer apply
(select top (5) avg(BOMQty) as avg_BOMQty, avg(totalTime) as avg_totalTime
from (select top (5) jl.*
from joblabour jl
where jl.regno = jh.regno
order by jl.<some datetime> -- however you determine the last five
) jl
) jl;
You can add a where clause to the outer query to filter on one or more particular jobs.
If I understand you correctly this will do the work
this will work for 1 RegNo='RM-BRU-0134' at a time
with topFive as (
SELECT TOP 5 JobHeader.RegNo, JobHeader.BOMQty, sum(JobLabour.WorkedTime) AS TotalTime
FROM JobHeader
INNER JOIN JobLabour ON JobHeader.JobNo = JobLabour.JobNo
WHERE JobHeader.RegNo = 'RM-BRU-0134'
GROUP BY JobHeader.BOMQty, JobHeader.JobNo, JobHeader.RegNo
)
select RegNo, avg(BOMQty) as BOMQty, avg(TotalTime) as TotalTime
from topFive
group by RegNo

SQL Filtering duplicate rows due to bad ETL

The database is Postgres but any SQL logic should help.
I am retrieving the set of sales quotations that contain a given product within the bill of materials. I'm doing that in two steps: step 1, retrieve all DISTINCT quote numbers which contain a given product (by product number).
The second step, retrieve the full quote, with all products listed for each unique quote number.
So far, so good. Now the tough bit. Some rows are duplicates, some are not. Those that are duplicates (quote number & quote version & line number) might or might not have maintenance on them. I want to pick the row that has maintenance greater than 0. The duplicate rows I want to exclude are those that have a 0 maintenance. The problem is that some rows, which have no duplicates, have 0 maintenance, so I can't just filter on maintenance.
To make this exciting, the database holds quotes over 20+ years. And the data scientists guys have just admitted that maybe the ETL process has some bugs...
--- step 0
--- cleanup the workspace
SET CLIENT_ENCODING TO 'UTF8';
DROP TABLE IF EXISTS product_quotes;
--- step 1
--- get list of Product Quotes
CREATE TEMPORARY TABLE product_quotes AS (
SELECT DISTINCT master_quote_number
FROM w_quote_line_d
WHERE item_number IN ( << model numbers >> )
);
--- step 2
--- Now join on that list
SELECT
d.quote_line_number,
d.item_number,
d.item_description,
d.item_quantity,
d.unit_of_measure,
f.ref_list_price_amount,
f.quote_amount_entered,
f.negtd_discount,
--- need to calculate discount rate based on list price and negtd discount (%)
CASE
WHEN ref_list_price_amount > 0
THEN 100 - (ref_list_price_amount + negtd_discount) / ref_list_price_amount *100
ELSE 0
END AS discount_percent,
f.warranty_months,
f.master_quote_number,
f.quote_version_number,
f.maintenance_months,
f.territory_wid,
f.district_wid,
f.sales_rep_wid,
f.sales_organization_wid,
f.install_at_customer_wid,
f.ship_to_customer_wid,
f.bill_to_customer_wid,
f.sold_to_customer_wid,
d.net_value,
d.deal_score,
f.transaction_date,
f.reporting_date
FROM w_quote_line_d d
INNER JOIN product_quotes pq ON (pq.master_quote_number = d.master_quote_number)
INNER JOIN w_quote_f f ON
(f.quote_line_number = d.quote_line_number
AND f.master_quote_number = d.master_quote_number
AND f.quote_version_number = d.quote_version_number)
WHERE d.net_value >= 0 AND item_quantity > 0
ORDER BY f.master_quote_number, f.quote_version_number, d.quote_line_number
The logic to filter the duplicate rows is like this:
For each master_quote_number / version_number pair, check to see if there are duplicate line numbers. If so, pick the one with maintenance > 0.
Even in a CASE statement, I'm not sure how to write that.
Thoughts? The database is Postgres but any SQL logic should help.
I think you will want to use Window Functions. They are, in a word, awesome.
Here is a query that would "dedupe" based on your criteria:
select *
from (
select
* -- simplifying here to show the important parts
,row_number() over (
partition by master_quote_number, version_number
order by maintenance desc) as seqnum
from w_quote_line_d d
inner join product_quotes pq
on (pq.master_quote_number = d.master_quote_number)
inner join w_quote_f f
on (f.quote_line_number = d.quote_line_number
and f.master_quote_number = d.master_quote_number
and f.quote_version_number = d.quote_version_number)
) x
where seqnum = 1
The use of row_number() and the chosen partition by and order by criteria guarantee that only ONE row for each combination of quote_number/version_number will get the value of 1, and it will be the one with the highest value in maintenance (if your colleagues are right, there would only be one with a value > 0 anyway).
Can you do something like...
select
*
from
w_quote_line_d d
inner join
(
select
...
,max(maintenance)
from
w_quote_line_d
group by
...
) d1
on
d1.id = d.id
and d1.maintenance = d.maintenance;
Am I understanding your problem correctly?
Edit: Forgot the group by!
I'm not sure, but maybe you could Group By all other columns and use MAX(Maintenance) to get only the greatest.
What do you think?

Using the HAVING Clause with GROUP by to Return Unique Records

Good day all,
I am having difficulty understanding the mechanics of the GROUP BY AND HAVING clause and was hoping for some advice.
I am trying to query two tables - PRODUCTS and ORDER_ITEMS. The PRODUCT_ID column is used to link these two tables.
I wish to view products which have been ordered from a certain supplier (filtered using the SUPPLIER_ID column which is in ORDER_ITEMS); have been successfully ordered before (ORDER_STATUS 6 in ORDER_ITEMS);and which have not been deleted (RECORD_DELETED column in ORDER_ITEMS). I only use the PRODUCTS table to show the name of the product. Furthermore I only want distinct products returned, meaning I want to exclude any results which duplicate the PRODUCT_ID column
This is the query that I am using:
SELECT
PD.PRODUCT_ID,
PD.PRODUCT_NAME,
PD.BARCODE,
PD.SUPPLIER_BARCODE,
COUNT(PD.PRODUCT_ID) AS COUNTED,
ODI.ORDER_ITEM_ID
FROM PRODUCTS PD
INNER JOIN ORDER_ITEMS ODI
ON PD.PRODUCT_ID = ODI.PRODUCT_ID
WHERE ODI.SUPPLIER_ID = 34359738399
AND ORDER_STATUS = 6
AND ODI.RECORD_DELETED = 0
GROUP BY PD.PRODUCT_ID,PD.PRODUCT_NAME,PD.BARCODE,PD.SUPPLIER_BARCODE,ODI.ORDER_ITEM_ID
HAVING COUNT(ODI.PRODUCT_ID) = 1
ORDER BY PRODUCT_ID ASC
Unfortunately this is returning 502 records with many of them duplicating the PRODUCT_ID. If I remove the ORDER_ITEM_ID column from the query 175 records are returned. These 175 records are products that meet the criteria given above. The problem is that I also need to pull the ORDER_ITEM_ID from ORDER_ITEMS (along with some other columns).
I vaguely understand that when I include ORDER_ITEMS the query is going to group the data by the ORDER_ITEM column and so will count the PRODUCT_ID values based on each individual ORDER_ITEM_ID. This results in there always being a count of 1 for each product.
How does one get around this? Also, is there a more suitable way of carrying out this task which would allow me to include one ORDER_ITEM record for every duplicated product? Rather than omitting them altogether as I am doing above?
This is some of the data that is returned by the query above:
PRODUCT_ID,PRODUCT_NAME,BARCODE,SUPPLIER_BARCODE,COUNTED,ORDER_ITEM_ID
34359738628,ADCORTYL INTRA-ARTIC/DERMAL 10MG/ML 5ML,5099627022132,5012712000037,1,34359755708
34359739609,ARTELAC 3.2MG/ML EYE DROPS SOLN,5099627456722,5027519008933,1,34359741719
34359739626,ASACOLON 500MG SUPPOSITORIES,5099627516587,5015313012737,1,34359742783
34359739767,ATROVENT 250MCG/1ML UDV NEB SOLN,5099627639637,5012816012561,1,34359738421
34359739770,ATROVENT 500MCG/2ML UDV NEB SOLN,5099627460293,5012816012592,1,34359743524
34359739893,AZOPT 10MG/ML EYE DROPS SUSP,5099627831543,5015664002753,1,34359749091
34359739893,AZOPT 10MG/ML EYE DROPS SUSP,5099627831543,5015664002753,1,34359749687
34359739893,AZOPT 10MG/ML EYE DROPS SUSP,5099627831543,5015664002753,1,34359749715
34359739893,AZOPT 10MG/ML EYE DROPS SUSP,5099627831543,5015664002753,1,34359754053
34359740053,BACTIGRAS MED DRSS 10CMX10CM STERILE GMS,5099627672368,5000223421984,1,34359748101
34359740062,BACTROBAN 2% OINTMENT,5099627053914,5099211003165,1,34359755226
34359740558,BETNOVATE RD CREAM,5099627005692,5099211001642,1,34359752422
34359740558,BETNOVATE RD CREAM,5099627005692,5099211001642,1,34359738487
34359741045,BISODOL ANTACID TABS,5099627057707,5014398001438,1,34359750542
34359741995,BROLENE 0.1% EYE DROPS SOLN,5099627006323,50982790,1,34359746555
34359741995,BROLENE 0.1% EYE DROPS SOLN,5099627006323,50982790,1,34359751650
34359741995,BROLENE 0.1% EYE DROPS SOLN,5099627006323,50982790,1,34359751783
34359742132,BURINEX 1MG TABS,5099627551328,5702191004212,1,34359749705
34359742152,BUSCOPAN 20MG/ML SOLN FOR INJ,5099627006620,5012816018532,1,34359749083
In the example above, several records were returned with duplicate PRODUCT_ID values e.g ASACOLON 500MG SUPPOSITORIES
You need GROUP_CONCAT/LISTAGG equivalent in SQL Server. You can use XML, STUFF and correlated subquery as replacement.
If PRODUCT_ID is UNIQUE you can use:
WITH cte AS
(
SELECT
PD.PRODUCT_ID,
PD.PRODUCT_NAME,
PD.BARCODE,
PD.SUPPLIER_BARCODE,
ODI.ORDER_ITEM_ID
FROM PRODUCTS PD
JOIN ORDER_ITEMS ODI
ON PD.PRODUCT_ID = ODI.PRODUCT_ID
WHERE ODI.SUPPLIER_ID = 34359738399
AND ORDER_STATUS = 6
AND ODI.RECORD_DELETED = 0
)
SELECT PRODUCT_ID,
PRODUCT_NAME,
BARCODE,
SUPPLIER_BARCODE,
[COUNTED] = COUNT(PD.PRODUCT_ID),
[ORDER_ITEM_ID] = STUFF((SELECT CONCAT(',' , ORDER_ITEM_ID)
FROM cte c2
WHERE c2.PRODUCT_ID = c1.PRODUCT_ID
ORDER BY c2.ORDER_ITEM_ID
FOR XML PATH ('')), 1, 1, '')
FROM cte c1
GROUP BY PRODUCT_ID,PRODUCT_NAME,BARCODE,SUPPLIER_BARCODE
HAVING COUNT(PRODUCT_ID) = 1
ORDER BY PRODUCT_ID ASC;
LiveDemo_SimplifiedVersion
Otherwise correlate using multiple columns:
SELECT CONCAT(',' , ORDER_ITEM_ID)
FROM cte c2
WHERE c2.PRODUCT_ID = c1.PRODUCT_ID
AND c2.PRODUCT_NAME = c1.PRODUCT_NAME
AND ...
ORDER BY c2.ORDER_ITEM_ID
FOR XML PATH ('')), 1, 1, '')

SQL WHERE IN ... to JOIN table

SELECT
sum(CheckFinal.SUM) AS SUME,
strftime('%Y - %m', CheckDate) AS CheckDate
FROM
CheckFinal
WHERE CheckFinal.NUMER IN (
SELECT
CheckDetail.NUMER
FROM
CheckDetail
WHERE
CheckDetail.NUMER IN (
SELECT
PriceList.UniqID AS PriceListUniqID,
PriceList.Name AS PriceListName,
Category.UniqID
FROM
PriceList Join Category on PriceList.CATEGORY = Category.UniqID
WHERE (Category.UniqID = 2)
)
)
GROUP BY strftime('%Y %m', CheckDate);
I have such query to combine data out of 4 tables:
— Category (100 records)
— PriceList (20'000 records)
— CheckDetail (10'000'000 records)
— CheckFinal (2'000'000 records)
In plain word, I'm looking for PriceList items, that are marked as children of Category.UniqID #2, then I would like to collect all CheckDetail.NUMER inset to define all sales value with such PriceList items. Futhermore, I'm looking for possobility to collect all CheckFinal.NUMERs.
The problem I have is:
It's not possible to make SELECT procedure three (3) time nested (SQLite.3), I think it's time to make JOINs but I have no experience in joining
CheckDetail is a HUGE data set, it's take 2 seconds to find just one PriceList item across 10 million records and I have 3'000 items in my query WHERE (Category.UniqID = 2)
In my case, I should lookup 3'000 times through 5'000'000 records, but I have 10 sets and the query will spend about 10 hours to complit.
Is JOIN will optimize query time? How to make such JOIN QUERY?
Is there any GUI tools to make query with constructor or something like that?
UPD:
http://sqlfiddle.com/#!5/50a93/2 (use SQL.js for inserting several rows of data)
WITH JOIN , you query would look like
SELECT
sum(CF.SUM) AS SUME,
strftime('%Y - %m', CF.CheckDate) AS CheckDate
FROM
PriceList
Join Category
on PriceList.CATEGORY = Category.UniqID
AND Category.UniqID = 2
JOIN CheckDetail CD
ON CD.NUMBER = PriceList.UniqID
JOIN CheckFinal CF
ON CF.NUMBER = CD.NUMBER
GROUP BY strftime('%Y - %m', CF.CheckDate);

SQL to query by date dependencies

I have a table of patients which has the following columns: patient_id, obs_id, obs_date. Obs_id is the ID of a clinical observation (such as weight reading, blood pressure reading....etc), and obs_date is when that observation was taken. Each patient could have several readings on different dates...etc. Currently I have a query to get all patients that had obs_id = 1 and insert them into a temporary table (has two columns, patient_id, and flag which I set to 0 here):
insert into temp_table (select patient_id, 0 from patients_table
where obs_id = 1 group by patient_id having count(*) >= 1)
I also execute an update statement to set the flag to 1 for all patients that also had obs_id = 5:
UPDATE temp_table SET flag = 1 WHERE EXISTS (
SELECT patient_id FROM patients_table WHERE obs_id = 5 group by patient_id having count(*) >=1
) v WHERE temp_table.patient_id = v.patient_id
Here's my question: How do I modify both queries (without combining them or removing the group by statement) such that I can answer the following question:
"get all patients who had obs_id = 5 after obs_id = 1". If I add a min(obs_date) or max(obs_date) to the select of each query and then add "AND v.obs_date > temp_table.obs_date" to the second one, is that correct??
The reason why I need not remove the group by statement or combine is because these queries are generated by a code generator (from a web app), and i'd like to do that modification without messing up the code generator or re-writing it.
Many thanks in advance,
The advantage of SQL is that it works with sets. You don't need to create temporary tables or get all procedural.
As you describe the problem (find all patients who have obs_id 5 after obs_id 1), I'd start with something like this
select distinct p1.patient_id
from patients_table p1, patients_table p2
where
p1.obs_id = 1 and
p2.obs_id = 5 and
p2.patient_id = p1.patient_id and
p2.obs_date > p1.obs_date
Of course, that doesn't help you deal with your code generator. Sometimes, tools that make things easier can also get in the way.