Copy and Cascade insert using PL/SQL - sql

Given data structure:
I have the following table My_List, where Sup_ID is Primary Key
My_List
+--------+----------+-----------+
| Sup_ID | Sup_Name | Sup_Code |
+--------+----------+-----------+
| 1 | AA | 23 |
| 2 | BB | 87 |
| 3 | CC | 90 |
+--------+----------+-----------+
And the following table _MyList_details, where Buy_ID is Primary Key and Sup_ID is Foreign Key points at My_List.Sup_ID
My_List_details
+--------+--------+------------+------------+------------+
| Buy_ID | Sup_ID | Sup_Detail | Max_Amount | Min_Amount |
+--------+--------+------------+------------+------------+
| 23 | 1 | AAA | 1 | 10 |
| 33 | 2 | BBB | 11 | 20 |
| 43 | 3 | CCC | 21 | 30 |
+--------+--------+------------+------------+------------+
Finally, I have the table My_Sequence as follow:
My_Sequence
+-----+------+
| Seq | Name |
+-----+------+
| 4 | x |
| 5 | y |
| 6 | z |
+-----+------+
---------------------------------------------------
Objectives
Write PL/SQL script to:
Using a cursor, I need to copy My_List records and re-insert it with the new Sup_ID copied from My_Sequence.Seq.
I need to copy My_List_details records and re-insert them with the new Sup_ID foreign key.
------------------------------------------------------------------------------
Expected Outcome
My_List
+--------+----------+----------+
| Sup_ID | Sub_Name | Sub_Code |
+--------+----------+----------+
| 1 | AA | 23 |
| 2 | BB | 87 |
| 3 | CC | 90 |
| 4 | AA | 23 |
| 5 | BB | 87 |
| 6 | CC | 90 |
+--------+----------+----------+
My_List_details
+--------+--------+------------+------------+------------+
| Buy_ID | Sup_ID | Sub_Detail | Max_Amount | Min_Amount |
+--------+--------+------------+------------+------------+
| 23 | 1 | AAA | 1 | 10 |
| 33 | 2 | BBB | 11 | 20 |
| 43 | 3 | CCC | 21 | 30 |
| 53 | 4 | AAA | 1 | 10 |
| 63 | 5 | BBB | 11 | 20 |
| 73 | 6 | CCC | 21 | 30 |
+--------+--------+------------+------------+------------+
What I have started with is the following:
DECLARE
NEW_Sup_ID Sup_ID%type := Seq;
c_Sup_Name Sup_Name%type;
c_Sup_Code Sup_Code%type;
c_Buy_ID Buy_ID%type;
c_Sup_Detail Sup_Detail%type;
c_Max_Amount Max_Amount%type
c_My_Min_Amount Min_Amount%type
CURSOR c_My_List
IS
SELECT * FROM My_List;
CURSOR c_My_List_details
IS
SELECT * FROM My_List_details
BEGIN
FETCH c_My_List INTO NEW_Sup_ID, c_Sup_Name, c_Sup_Code;
INSERT INTO My_List;
FETCH c_My_List_details INTO c_Buy_ID, NEW_Sup_ID, c_Sup_Detail, c_Max_Amount, c_Min_Amount
INSERT INTO My_List_details
END;
/
Aside from the syntax errors, I do not see my script copy row by row and insert them to both tables accordingly. Further, the number of My_Sequence records is bigger than the number of My_List records. So what I need is, if My_List records are 50, I need the script to copy the first 50 Seq from My_Sequence.
---------------------------------------------------------------------------------
Question
How to achieve this result? I have searched and found Tom Kyte for cascade update but I am not sure if I do need to use this package, I am a bit beginner in PL/SQL and it is a bit complicated for me to utilize such a comprehensive package. Further, it's for cascade update and my case is about re-insert. I'd appreciate any help

The following Sql Statements will perform the task on the schema defined at this SqlFiddle. Note that I have changed a couple of field and table names - because they clash with Oracle terms. SqlFiddle seems to have some problems with my code, but it has been tested on another (amphibious) client which shall remain nameless.
The crucial point (As I said in my comments) is deriving a rule to map old sequence number to new. The view SEQUENCE_MAP performs this task in the queries below.
You may be disappointed by my reply because it depends upon there being the exact same number of sequence records as LIST/LIST_DETAILS, and hence it can only be run once. Your final PL/SQL can perform the necessary checks, I hope.
Hopefully it is a matter of refining the sequence_map logic to get you where you want to be.
Avoid using cursors; ideally when manipulating relational data you need to think in terms of sets of data rather than rows. This is because if you use set-thinking Oracle can do its magic in optimising, parallelising and so-on. Oracle is brilliant at scaling up - If a table is split over multiple disks, for example, it may process your request with data from the multiple disks simultaneously. If you force it into a row-by-row, procedural logic you may find that the applications you write do not scale up well.
CREATE OR REPLACE VIEW SEQUENCE_MAP AS (
SELECT OLD_SEQ, NEW_SEQ FROM
(
( SELECT ROWNUM AS RN, SUP_ID AS OLD_SEQ FROM
(SELECT SUP_ID FROM LIST ORDER BY SUP_ID) ) O
JOIN
( SELECT ROWNUM AS RN, SUP_ID AS NEW_SEQ FROM
(SELECT SEQ AS SUP_ID FROM SEQUENCE_TABLE ORDER BY SEQ) ) N
ON N.RN = O.RN
)
);
INSERT INTO LIST
(
SELECT
NEW_SEQ, SUB_NAME, SUB_CODE
FROM
SEQUENCE_MAP
JOIN LIST L ON
L.SUP_ID = SEQUENCE_MAP.OLD_SEQ
);
INSERT INTO LIST_DETAILS
(
SELECT
BUY_ID, NEW_SEQ, SUB_DETAIL, MAX_FIELD, MIN_FIELD
FROM
SEQUENCE_MAP
JOIN LIST_DETAILS L ON
L.SUP_ID = SEQUENCE_MAP.OLD_SEQ
);

I would do 2 inner loops, and search the next sequence to use.
I imagine the new buy_id is assigned via trigger using a sequence, or something equivalent, else you'll have to generate it in your code.
I have no Oracle database available to test it, so don't pay attention to syntax.
DECLARE
NEW_Sup_ID Sup_ID%type := Seq;
c_Sup_ID Sup_ID%type := Seq;
c_Sup_Name Sup_Name%type;
c_Sup_Code Sup_Code%type;
c_Buy_ID Buy_ID%type;
c_Sup_Detail Sup_Detail%type;
c_Max_Amount Max_Amount%type;
c_My_Min_Amount Min_Amount%type;
CURSOR c_My_List
IS
SELECT * FROM My_List;
CURSOR c_My_List_details
IS
SELECT * FROM My_List_details where sup_id=c_Sup_ID;
BEGIN
for c_My_List IN c_Sup_ID, c_Sup_Name, c_Sup_Code loop
select min(seq) from My_sequence into NEW_Sup_ID;
INSERT INTO My_List (sup_id,...) values (NEW_Sup_ID,...);
for c_My_List_details IN c_Buy_ID, NEW_Sup_ID, c_Sup_Detail, c_Max_Amount, c_Min_Amount loop
INSERT INTO My_List_details (sup_id, ...) values (NEW_Sup_ID,...);
end loop;
deelte from from My_sequence where seq= NEW_Sup_ID;
end loop;
commit;
END;
/

Related

How to fetch records from DB which fulfill a certain criteria

I have the following problem and wanted to ask if this is the correct way to do it or if there is a better way of doing it:
Assume I have the following table/data in my DB:
|---|----|------|-------------|---------|---------|
|id |city|street|street_number|lastname |firstname|
|---|----|------|-------------|---------|---------|
| 1 | ar | K1 | 13 |Davenport| Hector |
| 2 | ar | L1 | 27 |Cannon | Teresa |
| 3 | ar | A1 | 135 |Brewer | Izaac |
| 4 | dc | A2 | 8 |Fowler | Milan |
| 5 | fr | C1 | 18 |Kaiser | Ibrar |
| 6 | fr | C1 | 28 |Weaver | Kiri |
| 7 | ny | O1 | 37 |Petersen | Derrick |
I now get some some requests of the following structures: (city/street/street_number)
E.g.: {(ar,K1,13),(dc,A2,8),(ny,01,37)}
I want to retrieve the last name of the person living there. Since the request amount is quite large I don't want to run over all the request one-by-one. My current implementation is to insert the data into a temporary table and join the values.
Is this the right approach or is there some better way of doing this?
You can construct a query using in with tuples:
select t.*
from t
where (city, street, street_number) in ( (('ar', 'K1', '13'), ('dc', 'A2', '8'), ('ny', '01', '37') );
However, if the data starts in the database, then a temporary table or subquery is better than bringing the results back to the application and constructing such a query.
I think you can use the hierarchy query and string function as follows:
WITH YOUR_INPUT_DATA AS
(SELECT '(ar,K1,13),(dc,A2,8),(ny,01,37)' AS INPUT_STR FROM DUAL),
--
CTE AS
( SELECT REGEXP_SUBSTR(STR,'[^,]',1,2) AS STR1,
REGEXP_SUBSTR(STR,'[^,]',1,3) AS STR2,
REGEXP_SUBSTR(STR,'[^,]',1,4) AS STR3
FROM (SELECT SUBSTR(INPUT_STR,
INSTR(INPUT_STR,'(',1,LEVEL),
INSTR(INPUT_STR,')',1,LEVEL) - INSTR(INPUT_STR,'(',1,LEVEL) + 1) STR
FROM YOUR_INPUT_DATA
CONNECT BY LEVEL <= REGEXP_COUNT(INPUT_STR,'\),\(') + 1))
--
SELECT * FROM YOUR_TABLE WHERE (city,street,street_number)
IN (SELECT STR1,STR2,STR3 FROM CTE);

Mutating error on an AFTER insert trigger

CREATE OR REPLACE TRIGGER TRG_INVOICE
AFTER INSERT
ON INVOICE
FOR EACH ROW
DECLARE
V_SERVICE_COST FLOAT;
V_SPARE_PART_COST FLOAT;
V_TOTAL_COST FLOAT;
V_INVOICE_DATE DATE;
V_DUEDATE DATE;
V_REQ_ID INVOICE.SERVICE_REQ_ID%TYPE;
V_INV_ID INVOICE.INVOICE_ID%TYPE;
BEGIN
V_REQ_ID := :NEW.SERVICE_REQ_ID;
V_INV_ID := :NEW.INVOICE_ID;
SELECT SUM(S.SERVICE_COST) INTO V_SERVICE_COST
FROM INVOICE I, SERVICE_REQUEST SR, SERVICE S, SERVICE_REQUEST_TYPE SRT
WHERE I.SERVICE_REQ_ID = SR.SERVICE_REQ_ID
AND SR.SERVICE_REQ_ID = SRT.SERVICE_REQ_ID
AND SRT.SERVICE_ID = S.SERVICE_ID
AND I.SERVICE_REQ_ID = V_REQ_ID;
SELECT SUM(SP.PRICE) INTO V_SPARE_PART_COST
FROM INVOICE I, SERVICE_REQUEST SR, SERVICE S, SERVICE_REQUEST_TYPE SRT,
SPARE_PART_SERVICE SRP,
SPARE_PART SP
WHERE I.SERVICE_REQ_ID = SR.SERVICE_REQ_ID
AND SR.SERVICE_REQ_ID = SRT.SERVICE_REQ_ID
AND SRT.SERVICE_ID = S.SERVICE_ID
AND S.SERVICE_ID = SRP.SERVICE_ID
AND SRP.SPARE_PART_ID = SP.SPARE_PART_ID
AND I.SERVICE_REQ_ID = V_REQ_ID;
V_TOTAL_COST := V_SERVICE_COST + V_SPARE_PART_COST;
SELECT SYSDATE INTO V_INVOICE_DATE FROM DUAL;
SELECT ADD_MONTHS(SYSDATE, 1) INTO V_DUEDATE FROM DUAL;
UPDATE INVOICE
SET COST_SERVICE_REQ = V_SERVICE_COST, COST_SPARE_PART =
V_SPARE_PART_COST,
TOTAL_BALANCE = V_TOTAL_COST, PAYMENT_DUEDATE = V_DUEDATE, INVOICE_DATE =
V_INVOICE_DATE
WHERE INVOICE_ID = V_INV_ID;
END;
I'm trying to calculate some columns after the user inserts a row.
Using the service_request_id I want to calculate the service/parts/total cost. Also, I would like to generate the creation and due dates. But, I keep getting
INVOICE is mutating, trigger/function may not see it
Not sure how the table is mutating after the insert statement.
Not sure how the table is mutating after the insert statement.
Imagine a simple table:
create table x(
x int,
my_sum int
);
and an AFTER INSERT FOR EACH ROW trigger, similar to yours, which calculates a sum of all values in the table and updates my_sum column.
Now imagine this insert statement:
insert into x( x )
select 1 as x from dual
connect by level <= 1000;
This single statement basically inserts 1000 records, each one with 1 value, see this demo: http://sqlfiddle.com/#!4/0f211/7
Since in SQL each individual statement must be ATOMIC (more on this here: Statement-Level Read Consistency, Oracle is free to perform this query in any way as long as the final result is correct (consistent). It can save records in the order of execution, maybe in reverse order, it can divide the batch into 10 threads and do it in parallel.
Since the trigger is fired individually after inserting each row, and it cannot know in advance the "final" result, then considering the above all the below results are possible depending on "internal" method choosed by Oracle to execute this query. As you see, these result do not meet the definition of consistency. And Oracle prevents this issuing mutating table error.
In other words - your assumption are bad and your design is flawed, you need to change it.
| X | MY_SUM |
|---|--------|
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
...
...
or maybe :
| X | MY_SUM |
|---|--------|
| 1 | 1000 |
| 1 | 1000 |
| 1 | 1000 |
| 1 | 1000 |
| 1 | 1000 |
| 1 | 1000 |
| 1 | 1000 |
...
or maybe:
| X | MY_SUM |
|---|--------|
| 1 | 4 |
| 1 | 8 |
| 1 | 12 |
| 1 | 16 |
| 1 | 20 |
| 1 | 24 |
| 1 | 28 |
...
...

Trying to optimize a *random* query in Oracle SQL

I need to optimize a procedure in Oracle SQL, mainly using indexes. This is the statement:
CREATE OR REPLACE PROCEDURE DEL_OBS(cuantos number) IS
begin
FOR I IN (SELECT * FROM (SELECT * FROM observations ORDER BY DBMS_RANDOM.VALUE)WHERE ROWNUM<=cuantos)
LOOP
DELETE FROM OBSERVATIONS WHERE nplate=i.nplate AND odatetime=i.odatetime;
END LOOP;
end del_obs;
My plan was to create an index related with rownum since it is what appears to be used to do the deletes. But I don't know if it is going to be worthy. The problem with this procedure is that its randomness causes a lot of consistent gets. Can anyone help me with this?? Thanks :)
Note: I cannot change the code, only make improvements afterwards
Use the ROWID pseudo-column to filter the columns:
CREATE OR REPLACE PROCEDURE DEL_OBS(
cuantos number
)
IS
BEGIN
DELETE FROM OBSERVATIONS
WHERE ROWID IN (
SELECT rid
FROM (
SELECT ROWID AS rid
FROM observations
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM < cuantos
);
END del_obs;
If you have an index on the table then it can use a index fast full scan:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( id ) AS
SELECT LEVEL FROM DUAL CONNECT BY LEVEL <= 50000;
Query 1: No Index:
DELETE FROM table_name
WHERE ROWID IN (
SELECT rid
FROM (
SELECT ROWID AS rid
FROM table_name
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM <= 10000
)
Execution Plan:
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
----------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 24 | 123 | 00:00:02 |
| 1 | DELETE | TABLE_NAME | | | | |
| 2 | NESTED LOOPS | | 1 | 24 | 123 | 00:00:02 |
| 3 | VIEW | VW_NSO_1 | 10000 | 120000 | 121 | 00:00:02 |
| 4 | SORT UNIQUE | | 1 | 120000 | | |
| * 5 | COUNT STOPKEY | | | | | |
| 6 | VIEW | | 19974 | 239688 | 121 | 00:00:02 |
| * 7 | SORT ORDER BY STOPKEY | | 19974 | 239688 | 121 | 00:00:02 |
| 8 | TABLE ACCESS FULL | TABLE_NAME | 19974 | 239688 | 25 | 00:00:01 |
| 9 | TABLE ACCESS BY USER ROWID | TABLE_NAME | 1 | 12 | 1 | 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 5 - filter(ROWNUM<=10000)
* 7 - filter(ROWNUM<=10000)
Query 2 Add an index:
ALTER TABLE table_name ADD CONSTRAINT tn__id__pk PRIMARY KEY ( id )
Query 3 With the index:
DELETE FROM table_name
WHERE ROWID IN (
SELECT rid
FROM (
SELECT ROWID AS rid
FROM table_name
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM <= 10000
)
Execution Plan:
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 37 | 13 | 00:00:01 |
| 1 | DELETE | TABLE_NAME | | | | |
| 2 | NESTED LOOPS | | 1 | 37 | 13 | 00:00:01 |
| 3 | VIEW | VW_NSO_1 | 9968 | 119616 | 11 | 00:00:01 |
| 4 | SORT UNIQUE | | 1 | 119616 | | |
| * 5 | COUNT STOPKEY | | | | | |
| 6 | VIEW | | 9968 | 119616 | 11 | 00:00:01 |
| * 7 | SORT ORDER BY STOPKEY | | 9968 | 119616 | 11 | 00:00:01 |
| 8 | INDEX FAST FULL SCAN | TN__ID__PK | 9968 | 119616 | 9 | 00:00:01 |
| 9 | TABLE ACCESS BY USER ROWID | TABLE_NAME | 1 | 25 | 1 | 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 5 - filter(ROWNUM<=10000)
* 7 - filter(ROWNUM<=10000)
If you cannot do it in single SQL statement using ROWID then you can rewrite your existing procedure to use exactly the same queries but use the FORALL statement:
CREATE OR REPLACE PROCEDURE DEL_OBS(cuantos number)
IS
TYPE obs_tab IS TABLE OF observations%ROWTYPE;
begin
SELECT *
BULK COLLECT INTO obs_tab
FROM (
SELECT * FROM observations ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM<=cuantos;
FORALL i IN 1 .. obs_tab.COUNT
DELETE FROM OBSERVATIONS
WHERE nplate = obs_tab(i).nplate
AND odatetime = obs_tab(i).odatetime;
END del_obs;
What you definitively need is an index on OBSERVATIONS to allow the DELETEwith an index access.
CREATE INDEX cuantos ON OBSERVATIONS(nplate, odatetime);
The execution of the procedure will lead to one FULL TABLE SCANot the OBSERVATIONS table and to one INDEX ACCESS for each deleted record.
For a limited number deleted recrods it will behave similar as the set DELETEproposed in other answer; for larger number of deleted records the elapsed time will linerary scale with the number of deletes.
For a non-trival number of deleted records you must assume that the index is not completely in the buffer pool and lots of disc access will be requried. So you'll end with approximately 100 deleted rows per second.
In other words to delete 100K rows it will take ca. 1/4 hour.
To delete 1M rows you need 2 3/4 of an hour.
You see while deleting in this scale the first part of the task - the FULL SCAN of your table is neglectable, it will take few minutes only. The only possibility to get acceptable response time in this case is to switch the logic to a single DELETEstatement as proposed in other answers.
This behavior is also called the rule: "Row by Row is Slow by Slow" (i.e. processing in a loop works fine, but only with a limited number of records).
You can do this using a single delete statement:
delete from observations o
where (o.nplate, o.odatetime) in (select nplace, odatetime
from (select o2.nplate, o2.odatetime
from observations o2
order by DBMS_RANDOM.VALUE
) o2
where rownum <= v_cuantos
);
This is often faster than executing multiple queries for each row being deleted.
Try this. test on MSSQL hopes so it will work also on Oracle. please remarks the status.
CREATE OR REPLACE PROCEDURE DEL_OBS(cuantos number) IS
begin
DELETE OBSERVATIONS FROM OBSERVATIONS
join (select * from OBSERVATIONS ORDER BY VALUE ) as i on
nplate=i.nplate AND
odatetime=i.odatetime AND
i.ROWNUM<=cuantos;
End DEL_OBS;
Since you say that nplate and odatetime are the primary key of observations, then I am guessing the problem is here:
SELECT * FROM (
SELECT *
FROM observations
ORDER BY DBMS_RANDOM.VALUE)
WHERE ROWNUM<=cuantos;
There is no way to prevent that from performing a full scan of observations, plus a lot of sorting if that's a big table.
You need to change the code that runs. By far, the easiest way to change the code is to change the source code and recompile it.
However, there are ways to change the code that executes without changing the source code. Here are two:
(1) Use DBMS_FGAC to add a policy that detects whether you are in this procedure and, if so, add a predicate to the observations table like this:
AND rowid IN
( SELECT obs_sample.rowid
FROM observations sample (0.05) obs_sample)
(2) Use DBMS_ADVANCED_REWRITE to rewrite your query changing:
FROM observations
.. to ..
FROM observations SAMPLE (0.05)
Using the text of your query in the re-write policy should prevent it from affecting other queries against the observations table.
Neither of these are easy (at all), but can be worth a try if you are really stuck.

Pick a record based on a given value in postgres

I have a table in postgres like below,
alg_campaignid | alg_score | cp | sum
----------------+-----------+---------+----------
9829 | 30.44056 | 12.4000 | 12.4000
9880 | 29.59280 | 12.0600 | 24.4600
9882 | 29.59280 | 12.0600 | 36.5200
9827 | 29.27504 | 11.9300 | 48.4500
9821 | 29.14840 | 11.8800 | 60.3300
9881 | 29.14840 | 11.8800 | 72.2100
9883 | 29.14840 | 11.8800 | 84.0900
10026 | 28.79280 | 11.7300 | 95.8200
10680 | 10.31504 | 4.1800 | 100.0000
From which i have to select a record based on randomly generated number from 0 to 100.i.e first record should be returned if random number picked is between 0 and 12.4000,second if rendom is between 12.4000 and 24.4600,and likewise last if random no is between 95.8200 and 100.0000.
For Example
if the random number picked is 8 then the first record should be returned
or
if the random number picked is 48 then the fourth record should be returned
Is it possible to do this postgres if so kindly recommend a solution for this..
Yes, you can do this in Postgres. If you want to generate the number in the database:
with r as (
select random() * 100 as r
)
select t.*
from table t cross join r
where t.sum <= r.r
order by t.sum desc
limit 1;

Eliminate full table scan due to BETWEEN (and GROUP BY)

Description
According to the explain command, there is a range that is causing a query to perform a full table scan (160k rows). How do I keep the range condition and reduce the scanning? I expect the culprit to be:
Y.YEAR BETWEEN 1900 AND 2009 AND
Code
Here is the code that has the range condition (the STATION_DISTRICT is likely superfluous).
SELECT
COUNT(1) as MEASUREMENTS,
AVG(D.AMOUNT) as AMOUNT,
Y.YEAR as YEAR,
MAKEDATE(Y.YEAR,1) as AMOUNT_DATE
FROM
CITY C,
STATION S,
STATION_DISTRICT SD,
YEAR_REF Y FORCE INDEX(YEAR_IDX),
MONTH_REF M,
DAILY D
WHERE
-- For a specific city ...
--
C.ID = 10663 AND
-- Find all the stations within a specific unit radius ...
--
6371.009 *
SQRT(
POW(RADIANS(C.LATITUDE_DECIMAL - S.LATITUDE_DECIMAL), 2) +
(COS(RADIANS(C.LATITUDE_DECIMAL + S.LATITUDE_DECIMAL) / 2) *
POW(RADIANS(C.LONGITUDE_DECIMAL - S.LONGITUDE_DECIMAL), 2)) ) <= 50 AND
-- Get the station district identification for the matching station.
--
S.STATION_DISTRICT_ID = SD.ID AND
-- Gather all known years for that station ...
--
Y.STATION_DISTRICT_ID = SD.ID AND
-- The data before 1900 is shaky; insufficient after 2009.
--
Y.YEAR BETWEEN 1900 AND 2009 AND
-- Filtered by all known months ...
--
M.YEAR_REF_ID = Y.ID AND
-- Whittled down by category ...
--
M.CATEGORY_ID = '003' AND
-- Into the valid daily climate data.
--
M.ID = D.MONTH_REF_ID AND
D.DAILY_FLAG_ID <> 'M'
GROUP BY
Y.YEAR
Update
The SQL is performing a full table scan, which results in MySQL performing a "copy to tmp table", as shown here:
+----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+
| 1 | SIMPLE | C | const | PRIMARY | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | Y | range | YEAR_IDX | YEAR_IDX | 4 | NULL | 160422 | Using where |
| 1 | SIMPLE | SD | eq_ref | PRIMARY | PRIMARY | 4 | climate.Y.STATION_DISTRICT_ID | 1 | Using index |
| 1 | SIMPLE | S | eq_ref | PRIMARY | PRIMARY | 4 | climate.SD.ID | 1 | Using where |
| 1 | SIMPLE | M | ref | PRIMARY,YEAR_REF_IDX,CATEGORY_IDX | YEAR_REF_IDX | 8 | climate.Y.ID | 54 | Using where |
| 1 | SIMPLE | D | ref | INDEX | INDEX | 8 | climate.M.ID | 11 | Using where |
+----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+
Answer
After using the STRAIGHT_JOIN:
+----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+
| 1 | SIMPLE | C | const | PRIMARY | PRIMARY | 4 | const | 1 | Using temporary; Using filesort |
| 1 | SIMPLE | S | ALL | PRIMARY | NULL | NULL | NULL | 7795 | Using where |
| 1 | SIMPLE | SD | eq_ref | PRIMARY | PRIMARY | 4 | climate.S.STATION_DISTRICT_ID | 1 | Using index |
| 1 | SIMPLE | Y | ref | PRIMARY,STAT_YEAR_IDX | STAT_YEAR_IDX | 4 | climate.S.STATION_DISTRICT_ID | 1650 | Using where |
| 1 | SIMPLE | M | ref | PRIMARY,YEAR_REF_IDX,CATEGORY_IDX | YEAR_REF_IDX | 8 | climate.Y.ID | 54 | Using where |
| 1 | SIMPLE | D | ref | INDEX | INDEX | 8 | climate.M.ID | 11 | Using where |
+----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+
Related
http://dev.mysql.com/doc/refman/5.0/en/how-to-avoid-table-scan.html
http://dev.mysql.com/doc/refman/5.0/en/where-optimizations.html
Optimize SQL that uses between clause
Thank you!
ONE Request... It looks like you KNOW your data. Add the keyword "STRAIGHT_JOIN" and see the results...
SELECT STRAIGHT_JOIN ... the rest of your query...
Straight-join tells MySql to DO IT AS I HAVE LISTED. So, your CITY table is the first in the FROM list, thus indicating you expect that to be your primary... Additionally, your WHERE clause of the CITY is the immediate filter. With that being said, it will probably fly through the rest of the query...
Hope it helps... Its worked for me with gov't data of millions of records queried and joined to 10+ lookup tables where mySql was trying to think for me.
in order to do efficient between queries you are going to want a b tree index on your YEAR column. for example:
CREATE INDEX id_index USING BTREE ON YEAR_REF (YEAR);
BTREE indexes allow for efficient range queries, if this is in fact the root problem then having an index like this should get rid of the full table scan and have it only scan the part of the table that is in the range. read more about btrees on wikipedia
However, as with any optimisation advice, you should measure to make sure that you don't do more harm than good.
Can you change from searching within a radius to search in a bounding box?
You know the city so you can calculate a bounding box in your application.
Perhaps this
S.LATITUDE_DECIMAL >= latitude_lower and
S.LATITUDE_DECIMAL <= latitude_upper and
S.LONGITUDE_DECIMAL >= longitude_lower and
S.LONGITUDE_DECIMAL <= longitude_upper
could be a little faster?