SQL Query optimization Statistics - sql

I was wondering if someone can give me a query to get the result.
Initial Table:
ID Year Quantity
Result Table:
ID 2009_Quantity 2010_Quantity 2011_Quantity ...
What I did now was:
Select ID
, (select sum(Id.Quantity) from Initial_Database id where id.Year = 2009)
, (select sum(Id.Quantity) from Initial_Database id where id.Year = 2010)
from Initial_Database i
Group BY ID
But this is taking hours for millions of records, and this is nog an option.
I also tried:
Select ID
, CASE WHEN i.Year = 2009 THEN sum(Id.Quantity) ELSE 0 END
, CASE WHEN i.Year = 2010 THEN sum(Id.Quantity) ELSE 0 END
from Initial_Database i
Group BY ID
Faster, But this gives me 2 lines which I don't want.

Try like this:
Select ID
, SUM(CASE WHEN i.Year = 2009 THEN Id.Quantity ELSE 0 END)
, SUM(CASE WHEN i.Year = 2010 THEN Id.Quantity ELSE 0 END)
from Initial_Database i
Group BY ID

Related

SQL return results only if all rows match in join

I have the following tables
table name column names
----------- ------------------------
delivery_ service svc_name | svc_cost
product prod_id
service_prod svc_name | prod_id
order order_id
order_item order_id | prod_id
Now I want to calculate the total delivery service cost (svc_cost) of all items in an order. Of course, this total makes sense only if all items in the order are eligible for that delivery service.
For instance, product fresh tomatos only have express delivery service, whereas product dvd has both express and normal shipping as delivery service. Consequently, the delivery cost of an order with items fresh tomatos and dvd should only take express delivery service costs into account, since normal shipping is not eligible for the total of the order.
I'm not sure how I should translate this into SQL.
Any tips on where to start are welcome
Just an untested notepad scribble in anticipation of clarifications.
SELECT order_id, product_count
, CASE
WHEN strictly_express_cost = 0 THEN strictly_normal_cost + whatever_normal_cost
WHEN strictly_normal_cost = 0 THEN strictly_express_cost + whatever_express_cost
ELSE strictly_express_cost + strictly_normal_cost + whatever_normal_cost
END AS total_cost
FROM
(
SELECT order_id
, SUM(CASE WHEN express_cost > 0 AND normal_cost = 0 THEN express_cost ELSE 0 END) AS strictly_express_cost
, SUM(CASE WHEN express_cost = 0 AND normal_cost > 0 THEN normal_cost ELSE 0 END) AS strictly_normal_cost
, SUM(CASE WHEN express_cost > 0 AND normal_cost > 0 THEN express_cost ELSE 0 END) AS whatever_express_cost
, SUM(CASE WHEN express_cost > 0 AND normal_cost > 0 THEN normal_cost ELSE 0 END) AS whatever_normal_cost
, COUNT(DISTINCT prod_id) AS product_count
FROM
(
SELECT orditm.order_id, orditm.prod_id
, SUM(CASE WHEN svcprd.svc_name = 'express' THEN dlvsvc.svc_cost ELSE 0 END) AS express_cost
, SUM(CASE WHEN svcprd.svc_name = 'normal' THEN dlvsvc.svc_cost ELSE 0 END) AS normal_cost
FROM order_item AS orditm
LEFT JOIN service_prod AS svcprd
ON svcprd.prod_id = orditm.prod_id
LEFT JOIN delivery_service AS dlvsvc
ON dlvsvc.svc_name = svcprd.svc_name
GROUP BY orditm.order_id, orditm.prod_id
) q1
GROUP BY order_id
) q2

BigQuery: group counters by month after self-join

I have table that looks like this:
I'm trying to build a query, that will show specific partnerId counters groupped by keywordName and month.
To solve first part(without grouping by month), I've built this query:
SELECT keywordName, COUNT(keywordName) as total, IFNULL(b.ebay_count, 0) as ebay, IFNULL(c.amazon_count, 0) as amazon,
FROM LogFilesv2_Dataset.FR_Clickstats_v2 a
LEFT JOIN
(SELECT keywordName as kw , SUM(CASE WHEN partnerId='eBay' THEN 1 ELSE 0 END) as ebay_count
FROM LogFilesv2_Dataset.FR_Clickstats_v2
WHERE partnerId = 'eBay' GROUP BY kw) b
ON keywordName = b.kw
LEFT JOIN
(SELECT keywordName as kw , SUM(CASE WHEN partnerId='AmazonApi' THEN 1 ELSE 0 END) as amazon_count
FROM LogFilesv2_Dataset.FR_Clickstats_v2
WHERE partnerId = 'AmazonApi' GROUP BY kw) c
ON keywordName = c.kw
WHERE keywordName = 'flipper' -- just to filter out single kw.
GROUP BY keywordName, ebay, amazon
It works quite well and returns following output:
Now I'm trying to make additional group by month, but all my attempts returned incorrect results.
Final output supposed to be similar to this:
You can do this with conditional aggregation:
select
date_trunc(dt, month) dt,
keywordName,
count(*) total,
sum(case when partnerId = 'eBay' then 1 else 0 end) ebay,
sum(case when partnerId = 'AmazonApi' then 1 else 0 end) amazon
from LogFilesv2_Dataset.FR_Clickstats_v2
group by date_trun(dt, month), keywordName

SQL query : transform rows to columns

Here's an example of my table.
I need to do a query that shows those IDs who have 0 as a fee on one of two months (11 or 12) or both.
So from the example, I need to show ID 1,3,4 but not 2, like on the screenshot below.
I tried the query below:
SELECT
t1.id, t1.month, t1.fee, t2.id, t2.month, t2.fee
FROM
table t1, table t2
WHERE t1.id = t2.id
AND t1.month = '11'
AND t2.month = '12'
AND (t1.fee = 0 OR t2.fee = 0);
But with this query, I only see ID 1,3 but not ID 4. I guess it's because of t1.id = t2.id but no idea how to do otherwise.
You can use conditional aggregation. In Postgres, this can make use of the filter syntax:
SELECT t.id,
11 as month,
MAX(t.fee) FILTER (WHERE t.month = 11) as fee_11,
12 as month,
MAX(t.fee) FILTER (WHERE t.month = 12) as fee_12
FROM t
GROUP BY t.id
HAVING MAX(t.fee) FILTER (WHERE t.month = 11) = 0 OR
MAX(t.fee) FILTER (WHERE t.month = 12) = 0;
Note: The two month columns are redundant.
you need conditional aggregation
select id,month,max(case when month=11 then fee end) fee11,
max(case when month=12 then fee end) as fee12
from (
select * from table t1
where t1.id in ( select id from table where fee=0)
) a group by id,month
Sql ansi compliant query
SELECT id,
MAX(CASE WHEN MONTH = 11 THEN MONTH ELSE NULL END) AS month11,
MAX(CASE WHEN MONTH = 11 THEN fee ELSE NULL END) AS fee11,
MAX(CASE WHEN MONTH = 12 THEN MONTH ELSE NULL END) AS month12,
MAX(CASE WHEN MONTH = 12 THEN fee ELSE NULL END ) AS fee12
FROM t
GROUP BY id
HAVING ( MAX(CASE WHEN MONTH = 11 THEN fee ELSE NULL END) = 0 OR MAX(CASE WHEN MONTH = 12 THEN fee ELSE NULL END ) = 0 )
ORDER BY id

How to improve slow sql query with aggregate functions

I want to show top ten customers,sales,margin where customers is registred during this accounting year. The query takes about 65seconds to run and it is not accepted :-(
As you may see i am not good at sql and will be very happy for help to improve the query.
SELECT Top 10
AcTr.R3, Actor.Nm,
SUM(CASE WHEN AcTr.AcNo<='3999' THEN AcAm*-1 ELSE 0 END) AS Sales ,
SUM(AcAm*-1) AS TB
FROM AcTr, Actor
WHERE (Actor.CustNo = AcTr.R3) AND
(Actor.CustNo <> '0') AND
(Actor.CreDt >= '20180901') AND
(Actor.CreDt <= '20190430') AND
AcTr.AcYr = '2018' AND
AcTr.AcPr <= '8' AND
AcTr.AcNo>='3000' AND
AcTr.AcNo <= '4999'
GROUP BY AcTr.R3, Actor.Nm
ORDER BY Sales DESC
Welcome to the community. You have a good start, but future, it is more helpful if you can provide (as commented), the CREATE table declarations so users know the actual data types. Not always required, but helps.
As for your query layout, it is more common to show the JOIN syntax instead of WHERE showing relations between tables, but that comes in time and practice.
Indexes help and should be based on a combination of both WHERE/JOIN criteria AND Grouping fields. Also, if fields are numeric, then do not 'quote' them, just leave as numbers. For example, your AcYr, AcPr, AcNo. I would think that an account number really would be a string value vs number for accounting purposes.
I would suggest the following indexes on your tables
Table Index
Actr ( AcYr, AcPr, AcNo, R3 )
Actor ( CustNo, CreDt )
The Actr table I have the filtering criteria first and the R3 last to help optimize the GROUP BY. The Actor table by the customer number, then the CreDt (Create date??), and is it really a string, or is it a date field? If so, the date criteria would be something like '2018-09-01' and '2019-04-30'
select TOP 10
Actor.Nm,
PreSum.Sales,
PreSm.TB
from
( select
R3,
SUM(CASE WHEN AcTr.AcNo <= '3999'
THEN AcAm * -1 ELSE 0 END) AS Sales,
SUM( AcAm * -1) AS TB
from
Actr
where
AcTr.AcYr = 2018
AND AcTr.AcPr <= 8
AND AcTr.AcNo >= '3000'
AND AcTr.AcNo <= '4999'
GROUP BY
AcTr.R3 ) PreSum
JOIN Actor
on PreSum.R3 = Actor.CustNo
AND Actor.CustNo <> 0
AND Actor.CreDt >= '20180901'
AND Actor.CreDt <= '20190430'
order by
Sales DESC
Per latest inquiry / comment, wanting by year comparison and getting rid of the top 10 performers per a given time period.
select
Actor.Nm,
PreSum.Sales2018,
PreSum.Sales2019,
PreSum.TB2018,
PreSum.TB2019
from
( select
AcTr.R3,
SUM(CASE WHEN AcTr.AcYr = 2018
AND AcTr.AcNo <= '3999'
THEN AcAm * -1 ELSE 0 END) AS Sales2018,
SUM(CASE WHEN AcTr.AcYr = 2019 AND AcTr.AcNo <= '3999'
THEN AcAm * -1 ELSE 0 END) AS Sales2019,
SUM( CASE WHEN AcTr.AcYr = 2018
THEN AcAm * -1 else 0 end ) AS TB2018
SUM( CASE WHEN AcTr.AcYr = 2019
THEN AcAm * -1 else 0 end ) AS TB2019
from
Actr
where
AcTr.AcYr IN ( 2018, 2019 )
AND AcTr.AcPr <= 8
AND AcTr.AcNo >= '3000'
AND AcTr.AcNo <= '4999'
GROUP BY
AcTr.R3 ) PreSum
JOIN Actor
on PreSum.R3 = Actor.CustNo
AND Actor.CustNo <> 0
AND Actor.CreDt >= '20180901'
AND Actor.CreDt <= '20190430'
order by
Sales DESC

Multiple Queries in different table

(Also posted here.)
So I have two tables, one is invalid table and the other is valid table.
valid table:
id
status
date
invalid table:
id
status
date
I have to produce a report with this output:
date on-time late total valid invalid1 invalid2 total rate
--------- ------- ---- ----- ----- -------- -------- ----- ----
9/10/2011 4 10 14 3 3 3 6
date: common fields on the 2 tables, field to group by, how many records on that day has
on-time: count of all the id on the valid table
late: count of all the records(id) on the invalid table
total: total of on-time and late
valid: count of id on the valid table with the "valid" status
invalid1: count of id on the invalid table with "invalid1" status
invalid2: count of id on the invalid table with "invalid2" status
total: total of valid, invalid1, invalid2
rate: average of totals
It's basically multiple queries with different table. How can I achieve it?
Someting like this?
SELECT
*,
(result.total + result._total) / 2 AS rate
FROM (
SELECT
date,
SUM(CASE WHEN data.valid = 1 THEN 1 ELSE 0 END) AS ontime,
SUM(CASE WHEN data.valid = 0 THEN 1 ELSE 0 END) AS late,
COUNT(*) AS total,
SUM(CASE WHEN data.valid = 1 AND data.status = 'valid' THEN 1 ELSE 0 END) AS valid,
SUM(CASE WHEN data.valid = 0 AND data.status = 'invalid1' THEN 1 ELSE 0 END) AS invalid1,
SUM(CASE WHEN data.valid = 0 AND data.status = 'invalid2' THEN 1 ELSE 0 END) AS invalid2,
SUM(CASE WHEN data.status IN ('valid', 'invalid', 'invalid2') THEN 1 ELSE 0 END) AS _total
FROM (
SELECT
date,
status,
valid = 1
FROM
Valid
UNION ALL
SELECT
date,
status,
valid = 0
FROM
InValid ) AS data
GROUP BY
date) AS result
SELECT date, ontime, late, ontime+late total, valid, invalid1, invalid2, valid+invalid1+invalid2 total
FROM
(SELECT date,
COUNT(*) late,
COUNT(IIF(status = 'invalid1', 1, NULL)) invalid1,
COUNT(IIF(status = 'invalid2', 1, NULL)) invalid2,
FROM invalid
GROUP BY date
) JOIN (
SELECT date,
COUNT(*) ontime,
COUNT(IIF(status = 'valud', 1, NULL)) valid,
FROM valid
GROUP BY date
) USING (date)
First of all, it seems that you are holding exactly the same information in 2 tables - I would recommend merging those tables together and add an additional boolean column called valid to hold the info related to validity of the record.
The query on your existent DB structure might look something like this:
SELECT unioned.* FROM (
( SELECT v.date AS date, v.status AS status, v.id AS id, COUNT(id) AS valid, 0 AS invalid1, 0 AS invalid2 FROM valid v GROUP BY v.date)
UNION
( SELECT i1.date AS date, i1.status AS status, i1.id AS id, 0 AS valid, COUNT(i1.id) AS invalid1, 0 AS invalid2 FROM invalid1 i1 GROUP BY i1.date)
UNION
( SELECT i2.date AS date, i2.status AS status, i2.id AS id, 0 AS valid, 0 AS invalid1, COUNT(i.id) AS invalid2 FROM invalid1 i1 GROUP BY i1.date)
) AS unioned GROUP BY unioned.date