Oracle SQL: Indexes not being used - sql

I have the following indexes
create index i_payment_amount ON payment(amount);
create index i_customer_createdate ON customer(createdate);
and the following query
select c.createdate, c.firstname, c.lastname, round(sum(p.amount)) as spentmoney
from customer c
join rental r
on c.customerid = r.customerid
join payment p
on p.rentalid = r.rentalid
where c.createdate > date '2019-06-01' + 30
or (select round(sum(pp.amount)) from payment pp
join rental rr
on rr.rentalid = pp.rentalid
where rr.rentalid = r.rentalid) < 50
group by c.firstname, c.lastname,c.createdate
order by c.firstname, c.lastname;
The query is calculating the customers who registered in one month before 2019-06-01 and also getting customers who did not spend more than $50. And I wanted to optimize it with the help of indexes.
I created b-tree indexes and With the first try, I wanted to even make the indexes appear in the query plan, but they didn't.
I also couldn't create a function-based index for the payment, because it does not support the group function (sum).
Are there any suggestions to create a proper index, which will optimize the query or even use them?

I don't see any obvious candidates for indexing in the query.
Say there are 2 million rows in customer, and 1 million of them have createdate > date '2019-06-01' + 30 (which can be simplified to createdate > date '2019-07-01'). Using an index to find those 1 million rows and then visiting the customer table a million times is going to be a lot more I/O than just full-scanning the table once. Range-partitioning customer might help, if you are licensed for it and depending on the data distribution.
Possibly an index on payment(rentalid, amount) could be treated by the optimiser as a skinny table which would be more efficient to full-scan than the payment table itself, since those are the only two columns you need from the table, making the join to payment more efficient. However, that is only one of three tables involved in the query so I wouldn't expect a massive improvement.
I notice in your question you mention that you want the customers who registered in one month before 2019-06-01, but I don't see that condition in your query. If registration date is c.createdate then perhaps
where c.createdate > date '2019-06-01' + 30
should be something more like
where c.createdate between date '2019-06-01' and date '2019-07-01'
in which case an index on c.createdate starts to look more useful, but that's a different query.

Would including the HAVING clause do any good? Because, you have all those values, already so you'd avoid the subquery entirely. Something like this:
SELECT c.createdate,
c.firstname,
c.lastname,
ROUND (SUM (p.amount)) AS spentmoney
FROM customer c
JOIN rental r ON c.customerid = r.customerid
JOIN payment p ON p.rentalid = r.rentalid
GROUP BY c.firstname, c.lastname, c.createdate
HAVING SUM (p.amount) < 50
OR c.createdate > DATE '2019-06-01' + 30
ORDER BY c.firstname, c.lastname;

Related

Where clause within Case Statement

I need to drill into detail on to an existing report which shows me the total profitability per customer, per month on a telecoms company.
The company sells calls, service charges and has discounts.
This is the query that I'm using, using one customer in particular:
SELECT
custumer_name,
ISNULL(SUM(CASE CONVERT(VARCHAR(10), month_end, 103)
WHEN '31/10/2016'
THEN totcall + isnull(totrec, 0) - discount
ELSE 0
END), 0) AS '31/10/2016',
SUM(totcall) AS 'Total Calls',
SUM(totrec) AS 'Total Rec',
SUM(discount) AS 'Discounts'
FROM
total_sales
INNER JOIN
customer_details ON billingaddress = customer_details .siteid
INNER JOIN
sales_month d ON total_sales.periodid = d.monthid
INNER JOIN
customer b ON customer_details .id = b.id AND b.is_customer = 1
WHERE
b.custumer_name = '2
GROUP BY
b.custumer_name
ORDER BY
b.custumer_name ASC
This is bringing back the correct total however when I need to show the drilldown of the total on calls, rec & discounts it is showing me the actual sum of every months' data which is stored on that table.
I'm not sure how can I get the last 3 columns to itemise the total without specifying the actual month (as the report has columns for the last 12 months of data.
Please help?
Thank you!
PS. The DB is a SQL Server 2008

Get the price of purchase at the moment of the sale

I'm trying to get the price of purchase at the moment of the sale. There are differents prices of purchase for the same product in my table.
The price of purchase is defined by two dates. One date for the start of the validity and one other for the end of validity : DATE_DEB_VALID and DATE_FIN_VALID.
If I want to know how many I won at the time of sale (FLIGHT_DATE), I have to know the purchase price to the same period.
My flight date must be between "DATE_DEB_VALID" and "DATE_FIN_VALID"
The two tables :
TB_DW_VAB_SALES
- ID_TEC_SALES
- TRANSACTION_NUMBER
- CARRIER_CODE
- MASTER_FLIGHT_NUMBER
- FLIGHT_NUMBER
- FLIGHT_DATE
- FLIGHT_CITY_DEP
- FLIGHT_CITY_ARR
- PRODUCT_CODE
- QUANTITY
- UNIT_SALES_PRICE
- PROMOTION_CODE
- CREW_PRICE
- COMPLEMENTARY
- SALES_TYPE
- DATE_CHGT
TB_DW_VAB_PURCHASE
- PRODUCT_CODE
- PRODUCT_CODE_GUEST
- LIB_PRODUCT
- PRICE DATE_DEB_VALID
- DATE_FIN_VALID
- DATE_CHGT
My request :
SELECT (TB_DW_VAB_PURCHASE.PRODUCT_CODE),PRICE
FROM TB_DW_VAB_PURCHASE,TB_DW_VAB_SALES
WHERE FLIGHT_DATE BETWEEN DATE_DEB_VALID AND DATE_FIN_VALID AND to_char(FLIGHT_DATE,'YYYY')= '2016' OR to_char(FLIGHT_DATE,'YYYY')= '2015'
GROUP BY TB_DW_VAB_PURCHASE.PRODUCT_CODE, PRICE
Here We have the whole request (for informations) :
SELECT to_char(DATE_VENTE,'MM/YYYY'),sum(MARGE_TOTALE) FROM (
SELECT
CTE1.CA AS CHIFFRE_AFFAIRE_TOTAL,
CTE2.PRICE AS COUT_UNITAIRE,
CTE1.FLIGHT_DATE as DATE_VENTE,
CTE1.QTE*CTE2.PRICE AS COUT_ACHAT,
(CTE1.CA-CTE1.QTE*CTE2.PRICE) AS MARGE,
sum((CTE1.CA-CTE1.QTE*CTE2.PRICE)) as MARGE_TOTALE
FROM (
SELECT PRODUCT_CODE,
sum(QUANTITY*UNIT_SALES_PRICE) AS CA,
FLIGHT_DATE,
sum(QUANTITY) as QTE
FROM TB_DW_VAB_SALES
where SALES_TYPE = 'SALES' and to_char(FLIGHT_DATE,'YYYY')= '2015' OR to_char(FLIGHT_DATE,'YYYY')= '2016'
group by to_char(FLIGHT_DATE,'MM'),FLIGHT_DATE,PRODUCT_CODE
ORDER BY to_char(FLIGHT_DATE,'MM') ASC
)CTE1
inner join
(
SELECT (TB_DW_VAB_PURCHASE.PRODUCT_CODE),PRICE
FROM TB_DW_VAB_PURCHASE,TB_DW_VAB_SALES
WHERE FLIGHT_DATE BETWEEN DATE_DEB_VALID AND DATE_FIN_VALID AND to_char(FLIGHT_DATE,'YYYY')= '2016' OR to_char(FLIGHT_DATE,'YYYY')= '2015'
GROUP BY TB_DW_VAB_PURCHASE.PRODUCT_CODE, PRICE
) CTE2
on CTE1.PRODUCT_CODE=CTE2.PRODUCT_CODE
group by to_char(FLIGHT_DATE,'MM'), FLIGHT_DATE, 'MM', CTE1.FLIGHT_DATE, (CTE1.CA-CTE1.QTE*CTE2.PRICE),
CTE1.CA, CTE1.QTE, CTE2.PRICE, CTE1.QTE*CTE2.PRICE
) group by to_char(DATE_VENTE,'MM/YYYY') ORDER BY to_char(DATE_VENTE,'MM/YYYY') ASC;
Thank you !
As Nemeros already pointed out, you have a cross-join in your CTE2 subquery, as you aren't linking sales and purchases together - other than by the flight/valid dates, but across all products, which can't be what you intended.
It looks like you're calculating things in your inline views (naming them CTE1/2 is slightly confusing as that usually refers to common table expressions or subquery factoring) which you then use for further calculations, but I don't think the intermediate steps or values are needed.
It (looks* like your query can be simplified to something like:
select to_char(trunc(s.flight_date, 'MM'),'MM/YYYY') as mois_du_sales,
sum(s.quantity*(s.unit_sales_price - p.price)) as marge_totale
from tb_dw_vab_sales s
join tb_dw_vab_purchase p
on p.product_code = s.product_code
and s.flight_date between p.date_deb_valid and p.date_fin_valid
where s.sales_type = 'SALES'
and s.flight_date >= date '2015-01-01'
and s.flight_date < date '2017-01-01'
group by trunc(s.flight_date, 'MM')
order by trunc(s.flight_date, 'MM') asc;
Of course without sample data or expected results I can't verify I've interpreted it correctly. It may also not solve all of your performance problems - you still need to look at the tables and indexes and the execution plan to see what it's actually doing, and you might want to consider partitioning if you have a large volume of data and a convenient partition key (like sales year).
Using a filter like to_char(FLIGHT_DATE,'YYYY')= '2015' forces every value in that column (or at least those that match sales_type, if that is indexed and selective enough) to be converted to strings and then compared to the fixed value - twice as you're checking both years separately - and that stops any index on flight_date being used. Using a date range allows the index to be used, though unless you have data spanning many years it may still not be selective enough to be used, and the optimiser may still choose full table scans. But only one of each table, not two as you were potentially doing.
I will just fix your initial query till you just forgotted your basic algebra (AND have more priority than OR) :
SELECT (TB_DW_VAB_PURCHASE.PRODUCT_CODE),PRICE
FROM TB_DW_VAB_PURCHASE,TB_DW_VAB_SALES
WHERE FLIGHT_DATE BETWEEN DATE_DEB_VALID AND DATE_FIN_VALID AND
(to_char(FLIGHT_DATE,'YYYY')= '2016' OR to_char(FLIGHT_DATE,'YYYY')= '2015')
GROUP BY TB_DW_VAB_PURCHASE.PRODUCT_CODE, PRICE
Now if you had used normalized join, you would have also seen that you had forgotten a join clause on the product_code, so finally :
SELECT PUR.PRODUCT_CODE, PRICE
FROM TB_DW_VAB_PURCHASE PUR
inner join TB_DW_VAB_SALES SAL ON PUR.PRODUCT_CODE = PUR.PRODUCT_CODE
AND FLIGHT_DATE BETWEEN DATE_DEB_VALID AND DATE_FIN_VALID
WHERE to_char(FLIGHT_DATE,'YYYY')= '2016' OR to_char(FLIGHT_DATE,'YYYY')= '2015'
GROUP BY PUR.PRODUCT_CODE, PRICE

SQL query feels inefficient - how can I improve it?

I'm using the SQL code below in SQLite to get a list of trades from a table containing trades and then combining it with total portfolio value on the day from a holdings table that has position and price data for a set of instruments.
The holdings table has about 150000 records and the trades table has about 1700
SELECT t.*, (SELECT p.adjclose FROM prices AS p
WHERE t.instrument = p.instrument
AND p.date = "2013-02-28 00:00:00") as close,
su.mv as mv
FROM trades AS t
left outer join
(SELECT h.date, SUM(h.price * h.position) as mv FROM holdings AS h
WHERE h.portfolio = "usequity"
AND h.date >= "2013-01-11 00:00:00"
AND h.date <= "2013-02-2"
GROUP BY h.date) as su
ON t.date = su.date
WHERE t.portname = "usequity"
AND t.date >= "2013-01-11 00:00:00"
AND t.date <= "2013-02-28 00:00:00";
Running the SQL code returns
[2014-12-01 19:21:00] 123 row(s) retrieved starting from 1 in 572/627 ms
Which seems really slow for a small dataset. Both tables are indexed on instrument and date.
I don't know how to index the table su on the fly so I'm not sure how to improve this code. Any help greatly appreciated.
EDIT
explain query plan shows
selectid,order,from,detail
1,0,0,"SEARCH TABLE holdings AS h USING AUTOMATIC COVERING INDEX (portfolio=?) (~7 rows)"
1,0,0,"USE TEMP B-TREE FOR GROUP BY"
0,0,0,"SCAN TABLE trades AS t (~11111 rows)"
0,1,1,"SEARCH SUBQUERY 1 AS su USING AUTOMATIC COVERING INDEX (date=?) (~3 rows)"
0,0,0,"EXECUTE CORRELATED SCALAR SUBQUERY 2"
2,0,0,"SEARCH TABLE prices AS p USING INDEX p1 (instrument=? AND date=?) (~9 rows)"
The lookup on prices is fast (it's using the index for both columns).
You could create a temporary table for the su subquery and add an index to that, but the AUTOMATIC INDEX shows that the database is already doing this.
The lookup on holdings is done with a temporary index; you should create an explicit index for that. (An index on both portfolio and date would be even more efficient.)
You could avoid the need for a temporary table by looking up the values from holdings dynamically, like you're already doing for the closing price (but this might not be an improvement if there are many trades on the same day):
SELECT t.*,
(SELECT p.adjclose
FROM prices AS p
WHERE p.instrument = t.instrument
AND p.date = '2013-02-28 00:00:00'
) AS close,
(SELECT SUM(h.price * h.position)
FROM holdings AS h
WHERE h.portfolio = 'usequity'
AND h.date = t.date
) AS mv
FROM trades AS t
WHERE t.portname = 'usequity'
AND t.date BETWEEN '2013-01-11 00:00:00'
AND '2013-02-28 00:00:00';

Grouping matching names with totals Firebird 2.5

I did a basic Firebird Report to call on all debtors and transactions
The report looks as follows
SELECT
POSPAY.TXNO,
DEBTORS.COMPANY,
POSPAY.AMOUNT,
POSINVTRANS.TXDATE
FROM
POSPAY
INNER JOIN DEBTORS ON (POSPAY.ACCTNUMBER = DEBTORS.ACCOUNT)
INNER JOIN POSINVTRANS ON (POSPAY.TXNO = POSINVTRANS.TXNO)
WHERE
PAYMNTTYPID = '7'
and
weekly = :weekly and
txdate >= :fromdate and
txdate <= :todate
This works correctly and gives me output on Debtor Name, TXNO, TXDATE, AMOUNT
I now want to write a similar report but need to group the debtors and give totals on the transactions ie I need the output Debtor name (If JOHN is twice, need to list once), Total ammount (Sum of John's transactions)
I still need innerjoin on debtors but no longer on posinvtrans, I was thinking it should look something like
SELECT
POSPAY.TXNO,
DEBTORS.COMPANY,
POSPAY.AMOUNT
FROM
POSPAY
INNER JOIN DEBTORS ON (POSPAY.ACCTNUMBER = DEBTORS.ACCOUNT)
WHERE
PAYMNTTYPID = '7'
and
weekly = :weekly and
txdate >= :fromdate and
txdate <= :todate
Group by DEBTORS.COMPANY
but no luck, get errors on Group by
'invalid expression in the select list (not containing in either an aggregate function or the GROUP BY CLAUSE)'
any suggestions?
The list of fields in the select list have to be either also listed in the group by list or be aggregate functions like count(*), max(amount), etc.
The problem is that you have not told Firebird what to do with POSPAY.TXNO and POSPAY.AMOUNT and it is not sufficient to tell what you do want to happen to them.
I suggest you remove those 2 fields from the query and have a select list of DEBTORS.COMPANY, sum(POSPAY.AMOUNT) as a starting point.
If you use GROUP BY you either need to include a column in the GROUP BY, or apply an aggregate function on the column. In your example you need to leave out POSPAY.TXNO, as that is transaction specific (or you could use the aggregate function LIST), and you need to apply the aggregate function SUM to AMOUNT to get the total:
SELECT
DEBTORS.COMPANY,
SUM(POSPAY.AMOUNT)
FROM
POSPAY
INNER JOIN DEBTORS ON (POSPAY.ACCTNUMBER = DEBTORS.ACCOUNT)
WHERE
PAYMNTTYPID = '7'
and
weekly = :weekly and
txdate >= :fromdate and
txdate <= :todate
Group by DEBTORS.COMPANY

Count records with a criteria like "within days"

I have a table as below on sql.
OrderID Account OrderMethod OrderDate DispatchDate DispatchMethod
2145 qaz 14 20/3/2011 23/3/2011 2
4156 aby 12 15/6/2011 25/6/2011 1
I want to count all records that have reordered 'within 30 days' of dispatch date where Dispatch Method is '2' and OrderMethod is '12' and it has come from the same Account.
I want to ask if this all can be achieved with one query or do I need to create different tables and do it in stages as I think I wll have to do now? Please can someone help with a code/query?
Many thanks
T
Try the following, replacing [tablename] with the name of your table.
SELECT Count(OriginalOrders.OrderID) AS [Total_Orders]
FROM [tablename] AS OriginalOrders
INNER JOIN [tablename] AS Reorders
ON OriginalOrders.Account = Reorders.Account
AND OriginalOrders.OrderDate < Reorders.OrderDate
AND DATEDIFF(day, OriginalOrders.DispatchDate, Reorders.OrderDate) <= 30
AND Reorders.DispatchMethod = '2'
AND Reorders.OrderMethod = '12';
By using an inner join you'll be sure to only grab orders that meet all the criteria.
By linking the two tables (which are essentially the same table with itself using aliases) you make sure only orders under the same account are counted.
The results from the join are further filtered based on the criteria you mentioned requiring only orders that have been placed within 30 days of the dispatch date of a previous order.
Totally possible with one query, though my SQL is a little stale..
select count(*) from table
where DispatchMethod = 2
AND OrderMethod = 12
AND DATEDIFF(day, OrderDate, DispatchDate) <= 30;
(Untested, but it's something similar)
One query can do it.
SELECT COUNT(*)FROM myTable reOrder
INNER JOIN myTable originalOrder
ON reOrder.Account = originalOrder.Account
AND reOrder.OrderID <> originalOrder.OrderID
-- all re-orders that are within 30 days or the
-- original orders dispatch date
AND DATEDIFF(d, originalOrder.DispatchDate, reOrder.OrderDate) <= 30
WHERE reOrder.DispatchMethod = 2
AND reOrder.OrderMethod = 12
You need a self-join.
The query below assumes that a given account will have either 1 or 2 records in the table - 2 if they've reordered, else 1.
If 3 records exist for a given account, 2 orders + 1 reorder then this won't work - but we'd then need more information on how to distinguish between an order and a reorder.
SELECT COUNT(*) FROM myTable new, myTable prev
WHERE new.DispatchMethod = 2
AND new.OrderMethod = 12
AND DATEDIFF(day, prev.DispatchDate, new.OrderDate) <=30
AND prev.Account == new.Account
AND prev.OrderDate < new.OrderDate
Can we use GROUP BY in this case, such as the following?
SELECT COUNT(Account)
FROM myTable
WHERE DispatchMethod = 2 AND OrderMethod = 12
AND DATEDIFF(d, DispatchDate, OrderDate) <=30
GROUP BY Account
Will the above work or am I missing something here?