Get the average with quantity - sql

I have a problem to calculate easily some simple average. My table :
id / user / action / data
1 / a / unit_price / 40
2 / a / quantity / 1
3 / b / unit_price / 70
4 / b / quantity / 2
Unit_price is a price for a user and quantity is quantity. So there i should get :
(40 + 70 + 70) / 3 = 60
If i do an
(AVG(action) WHERE action = unit_price)
I get :
(70+40)/2 = 55
If I do an
(SUM(action) WHERE action = unit_price) / (SUM(action) WHERE action = quantity)
I get :
110 / 3 = 36.6
The easiest way I found is to don't put the unit_price but the global price then make a division in the PHP code to get the unit_price, but I was hoping SQL could do something for me.

select coalesce(sum(quantity * unit_price) /sum(quantity), 0) from
(select
sum(case when action='unit_price' then data else 0 end) as unit_price,
sum(case when action='quantity' then data else 0 end) as quantity
from test
group by user) as a
SqlFiddle

You can use something like this which basically pivots the data to a more usable format and then gets the values that you need:
select avg(unit_price) AvgUnitPrice,
sum(unit_price*quantity)/sum(quantity) AvgPrice
from
(
select user,
max(case when action = 'unit_price' then data end) unit_price,
max(case when action = 'quantity' then data end) quantity
from table1
group by user
) x;
See SQL Fiddle With Demo

Ok, obviously your table design is not optimal, you should have unit_price and quantity as separate columns. But, working with what you have, try this:
SELECT SUM(A.data*B.data)/SUM(B.data) Calculation
FROM ( SELECT user, data
FROM YourTable
WHERE action = 'unit_price') AS A
INNER JOIN (SELECT user, data
FROM YourTable
WHERE action = 'quantity') AS B
ON A.user = B.user

I would join the table to itself in order to get the two records beloning together on one line
SELECT
SUM(unit_price * quantity) / SUM(quantity) AS average_unit_price
FROM
(SELECT
U.data AS unit_price, Q.data AS quantity
FROM
theTable U
INNER JOIN theTable Q
ON U.user = Q.user
WHERE
U.action = 'unit_price' AND
Q.action = 'quantity')
If you have more than two records per user and the ids of the both records are consequtive, then you would have to change the WHERE-clause to
WHERE
U.action = 'unit_price' AND
Q.action = 'quantity' AND
U.id + 1 = Q.id
Note:
If you calculate AVG(unit_price * quantity) you get the average sum per user.
(1*40 + 2*70) / 2 = 90
If you calculate SUM(unit_price * quantity) / SUM(quantity) you get the average unit price.
(1*40 + 2*70) / 3 = 60

Something like this should work; syntax might not be perfect since I didn't try it out, but you get the main idea at least.
SELECT sumUnitPrice.sum / sumQuantity.sum
FROM
(
(SELECT SUM(data) as sum
FROM WhateverTheHellYourTableIsNamed
WHERE action = 'unit_price') sumUnitPrice
(SELECT SUM(data) as sum
FROM WhateverTheHellYourTableIsNamed
WHERE action = 'quantity') sumQuantity
)

Your table design doesn't look good.
Make 2 tables instead:
ITEM
ItemId int not null PK,
Name varchar(200) not null,
UnitPrice decimal (10,2) not null
SALES
SalesId int not null PK,
ItemId int not null FK,
Quantity decimal(10,2)
PK - primary key, FK - foreign key
Average:
select
I.Name, avg(I.UnitPrice * S.Quantity) as avgSales
from
Sales S
join Items I on I.ItemId = S.ItemId
group by
I.Name

Related

How to get a fraction of counters of subquery from different subqueries in one select?

I have a table with reviews for products. I want to sort product_ids that have more than 100 verified reviews(verified review is a review with verified_purshace=True) by the fraction of 5 star-reviews to all reviews. I tried to implement this in one select, but after numerous tries, I finish with the need to create views. I managed to write a query that counts a number of 5-star reviews, but can`t do better. Can anybody give me a hint?
My best query:
select *,count(*)
from (
select *
from reviews
where star_rating = 5
) low_reviews
left join (
select distinct filtered_reviews.product_id
from (
select *
from (
select verified_reviews.product_id, count(*) as verified_reviews_number
from (
select *
from reviews
where verified_purchase=True
) as verified_reviews
) as counted_verified_reviews
where counted_verified_reviews.verified_reviews_number > 100
) as filtered_reviews
) filtered_product_ids on low_reviews.product_id = filtered_product_ids.product_id;
Data example:
review_id customer_id product_id star_rating helpful_votes total_votes vine verified_purshase review_headline review_body review_date
14830128 R158AS05ZMH7VQ 0615349439 5 2 2 N false Planting a Church ... Witnessing To Dracula... 2011-02-14
I want to sort product_ids that have more than 100 verified reviews(verified review is a review with verified_purshace=True) by the fraction of 5 star-reviews to all reviews.
You don't provide sample data, but I would expect a query like this:
select product_id
from reviews
where verified_purchase
group by product_id
having count(*) > 100
order by avg( (review = 5)::int ) desc;
The expression avg( (review = 5)::int ) is a shorthand way of saying count(*) filter (where review = 5) * 1.0 / count(review). It works because it converts the expression review = 5 to an int, which is 1 for true and 0 for false. The average is the proportion of times when it is true.
Actually, the above assumes that you only care about review start ratings for verified purchases. If you want to include all reviews (even non-verified ones) for the ordering:
select product_id
from reviews
group by product_id
having count(*) filter (where verified_purchase) > 100
order by avg( (review = 5)::int ) desc;

Cross joins in results

Mapping:A single address id can have different tracking ids. Each tracking id and each address id will have distinct lat and long pairs. Each tracking id can have multiple route ids although most of the time it will be a single route id to tracking id mapping.
Update: The tracking ids that I am selecting from T1_2 may or may not exist in the other tables. Also, there are no duplicates for each of the temp tables I am using for the final select statement(based on the key value).
I am having a problem with the results of the following query.The query is supposed to produce metrics for distance deviations of delivery points from addresses. Its performing some cross joins on the columns and hence the data is more than it should be. I know this is related to granularity and is a basic mistake but its hard for me to find where I went wrong. If someone can give me some pointers,please do. A subset of the results have been attached as a link and I have also highlighted a sample tracking id which should have come only once(with only route id).The results should include address ids repeating as many times as there are distinct tracking_ids which in turn should be in sync with no_pkg column. The query is also attached for reference.
Results subset
CREATE OR REPLACE FUNCTION f_stop_distance (Float, Float, Float, Float) /* This calculates distance in meters between two sets of lat and long */
RETURNS FLOAT
IMMUTABLE
AS $$
SELECT
2 * 6373000 * ASIN( SQRT( ( SIN( RADIANS(($3 - $1) / 2) ) ) ^ 2 + COS(RADIANS($1)) * COS(RADIANS($3)) * (SIN(RADIANS(($4 - $2) / 2))) ^ 2))
$$ LANGUAGE sql
;
CREATE TEMPORARY TABLE T1 AS /* This is to get top 1000 address ids which are unique identifiers for addresses in terms of orders frequency which is decided by number of distinct ordering order ids */
SELECT destination_address_id
,COUNT(DISTINCT ordering_order_id)a
,COUNT(DISTINCT tracking_id) no_pkg
FROM lmaa_pm.perfectmile_onroad_events_na
where shipment_status = 'DELIVERED'
AND delivery_station_code = 'DCH1'
AND event_day BETWEEN '2018-12-01' AND '2018-12-31'
AND tracking_id IS NOT NULL
GROUP BY destination_address_id,delivery_station_code
ORDER BY a DESC
LIMIT 1000
;
CREATE TEMPORARY TABLE T1_2 AS /* This is to get tracking ids corresponding to those top 1000 address ids */
SELECT DISTINCT destination_address_id
,tracking_id
FROM lmaa_pm.perfectmile_onroad_events_na
WHERE destination_address_id IN (SELECT destination_address_id FROM T1)
AND event_day BETWEEN '2018-12-01' AND '2018-12-31'
AND shipment_status = 'DELIVERED'
AND delivery_station_code = 'DCH1'
AND tracking_id IS NOT NULL
GROUP BY 1,2
;
CREATE TEMPORARY TABLE T2 AS /* This is to get lat long pairs for addresses and delivery point respectively */
SELECT DISTINCT gdd.lat1
,gdd.long1
,gdd.external_address_id destination_address_id
,gdd.tracking_id
,gdd.actual_lat
,gdd.actual_long
,ROW_NUMBER() OVER(PARTITION BY tracking_id ORDER BY deliverydate DESC) rn /* This is to avoid duplicates since this table contains duplicates */
FROM gtech.geocoding_data_daily_na gdd
WHERE gdd.shipment_status_id in (51,'DELIVERED')
AND tracking_id IN(SELECT tracking_id FROM T1_2)
AND confidence1 = 'high'
AND gdd.station_code='DCH1'
AND deliverydate BETWEEN '2018-12-01' AND '2018-12-31'
AND actual_lat IS NOT NULL
AND actual_long IS NOT NULL
;
CREATE TEMPORARY TABLE T2_2 AS
SELECT *
FROM T2
WHERE rn = 1
;
CREATE TEMPORARY TABLE T3 AS
SELECT T2_2.lat1
,T2_2.long1
,T2_2.actual_lat
,T2_2.actual_long
,T2_2.tracking_id
,T2_2.destination_address_id
,CASE /* This function is for identifying distance deviations in the order of 0 - 10 metres, 10-20 metres and so on */
WHEN f_stop_distance(lat1,long1,actual_lat,actual_long) <=10 THEN '0_to_10'
WHEN f_stop_distance(lat1,long1,actual_lat,actual_long) >10
and f_stop_distance(lat1,long1,actual_lat,actual_long) <=20 THEN '10_to_20'
WHEN f_stop_distance(lat1,long1,actual_lat,actual_long)>20
and f_stop_distance(lat1,long1,actual_lat,actual_long) <=50 THEN '20_to_50'
WHEN f_stop_distance(lat1,long1,actual_lat,actual_long) >50 THEN 'gt_50'
END AS Dev_from_address
FROM T2_2
ORDER BY T2_2.tracking_id
;
CREATE TEMPORARY TABLE T4 AS /* Doing some percentage calculations based on the new buckets created in the previous temp table namely percentage calculations out of total */
SELECT SUM(CASE WHEN Dev_from_address = '0_to_10' THEN 1 ELSE 0 END)a
,SUM(CASE WHEN Dev_from_address = '10_to_20' THEN 1 ELSE 0 END)b
,SUM(CASE WHEN Dev_from_address = '20_to_50' THEN 1 ELSE 0 END)c
,SUM(CASE WHEN Dev_from_address = 'gt_50' THEN 1 ELSE 0 END)d
,tracking_id
,(a/(a+b+c+d)::DECIMAL(10,2) * 100) AS e
,(b/(a+b+c+d)::DECIMAL(10,2) * 100) AS f
,(c/(a+b+c+d)::DECIMAL(10,2) * 100) AS g
,(d/(a+b+c+d)::DECIMAL(10,2) * 100) AS h
FROM T3
GROUP BY tracking_id
;
CREATE TEMPORARY TABLE T5 AS /* adding info for route id to the existing data */
SELECT DISTINCT route_id
,tracking_id
,ROW_NUMBER() OVER (PARTITION BY tracking_id ORDER BY DATE DESC) rnnn /* to avoid duplicates */
FROM omw.route_actuals_na
WHERE tracking_id IN (SELECT tracking_id FROM T1_2)
AND stop_type = 'Dropoff'
AND scan_status = 'DELIVERED'
;
CREATE TEMPORARY TABLE T5_final AS
SELECT *
FROM T5
WHERE rnnn = 1
;
/* final select */
SELECT DISTINCT T1_2.destination_address_id
,T3.lat1
,T3.long1
,T3.actual_lat
,T3.actual_long
,T3.Dev_from_address
,T1_2.tracking_id
,T1.no_pkg
,T4.e
,T4.f
,T4.g
,T4.h
,T5_final.route_id
FROM T3
JOIN T4 ON T4.tracking_id = T3.tracking_id
JOIN T1 ON T1.destination_address_id = T3.destination_address_id
JOIN T1_2 ON T1_2.destination_address_id = T3.destination_address_id
JOIN T5_final ON T5_final.tracking_id = T3.tracking_id
ORDER BY T1_2.destination_address_id
strictly - no full cross joins there - however you may have a many to many join.
To track this down try taking a look at each of your joins to see whether you have >1 key value
select tracking_id,count(*) from t4 group by 1 having count(*) > 1;
select destination_address_id,count(*) from t1 group by 1 having count(*) > 1;
select tracking_id ,count(*) from t5_final group by 1 having count(*) > 1;
where you have values returned, that could be your cause. this may help you identify where you have a many to many join.

SQL Query to split shipping cost over multiple items / rows

I am trying to create a query that will allow me to split a single figure over multiple rows.
For example, purchase order x may have 15 items allocated to it. Shipping cost is, say, 9.95. How can I calculate 1/15th of 9.95 and then update the cost price of the stock with 1/15th of the shipping cost?
Therefore the cost price of the item would increase from 4.50 to 5.16 (4.50 + 0.66).
Here's a solution for SQL Server:
update ol
set price = price + 5.0 / ol.LineCount
from [Order] o
join (
select *
, count(*) over () as LineCount
from OrderLine
) ol
on o.ID = ol.OrderID
where o.OrderNr = 'Ord1';
Live example at SQL Fiddle.
If you're using another DBMS, please update your post!
I have created a product table and updated the shipping price of items.
Hope that's what you are looking for.
INSERT INTO [TravelAgentDB].[dbo].[Product]
([ID]
,[Name]
,[Price]
,[shippingPrice])
VALUES
(1,'Nexus 7' , 250 , 20),
(2,'Nexus 7 case' , 50 , 20),
(3,'Nexus 7 headphone' , 20 , 20)
GO
select * from product
Declare #itemsCount int
Select #itemsCount = count(*) From product where id in (1,2,3)
Declare #totalShippingPrice Decimal(18,2) = 9.95
Declare #shippingPriceperItem Decimal(18,2) = #totalShippingPrice / #itemsCount
Update Product
set [shippingPrice] = #shippingPriceperItem
Where id in (1,2,3)
select * from product

select least row per group in SQL

I am trying to select the min price of each condition category. I did some search and wrote the code below. However, it shows null for the selected fields. Any solution?
SELECT Sales.Sale_ID, Sales.Sale_Price, Sales.Condition
FROM Items
LEFT JOIN Sales ON ( Items.Item_ID = Sales.Item_ID
AND Sales.Expires_DateTime > NOW( )
AND Sales.Sale_Price = (
SELECT MIN( s2.Sale_Price )
FROM Sales s2
WHERE Sales.`Condition` = s2.`Condition` ) )
WHERE Items.ISBN =9780077225957
A little more complicated solution, but one that includes your Sale_ID is below.
SELECT TOP 1 Sale_Price, Sale_ID, Condition
FROM Sales
WHERE Sale_Price IN (SELECT MIN(Sale_Price)
FROM Sales
WHERE
Expires_DateTime > NOW()
AND
Item_ID IN
(SELECT Item_ID FROM Items WHERE ISBN = 9780077225957)
GROUP BY Condition )
The 'TOP 1' is there in case more than 1 sale had the same minimum price and you only wanted one returned.
(internal query taken directly from #Michael Ames answer)
If you don't need Sales.Sale_ID, this solution is simpler:
SELECT MIN(Sale_Price), Condition
FROM Sales
WHERE Expires_DateTime > NOW()
AND Item_ID IN
(SELECT Item_ID FROM Items WHERE ISBN = 9780077225957)
GROUP BY Condition
Good luck!

Update and nested select statement?

I want to update prices of those products thats not been purchased by 1 year. How do I do that?
My current query is:
UPDATE product
SET price = price * 0.9
WHERE date_purchase > SYSDATE - 365
AND pid IN ([How do i select the items thats not been purchased in 1year??]);
I have 2 tables:
Product => pid, p_name, etc... (pid = product id, p_name = product name)
Purchase => pid, date_purchase, etc
I'd go with a NOT EXISTS as it makes the requirement more transparent.
update product
set price = price * 0.9
where not exists
(select 1 from PURCHASE pchase
WHERE pchase.pid = PRODUCT.pid
and pchase.date_purchase > add_months(sysdate,-12))
of course you would want to consider what to do with products that have only been just introduced (eg a week old) and never sold.
I think this might come close
update product
set price = price * 0.9
where pid NOT IN (
select pr.pid
from product pr
left outer join purchase pu
on pu.pid = pr.pid
where (( pu.date_purchase != null)
AND pu.date_purchase < (SYSDATE - 365))
or pu.pid == null
);