calculate rate of attribute by id sql - sql

First, this is my table schema:
order_id, product_id, add_to_cart_order, reordered
My problem is calculate the rate of reordered by product. So we can see "add_to_cart_order" is useless, I don't know for "order_id". "reordered" can be have '1' and '0' value.
For the moment, I can have the count of "reordored" by product_id with
SELECT
product_id,
COUNT(reordered)
FROM
train
WHERE
reordered = '1'
GROUP BY
product_id;
and the count of occurrence of a product with
SELECT
product_id, COUNT(*)
FROM
train
GROUP BY
product_id;
I tried
SELECT
t1.product_id,
COUNT(t1.product_id) / (SELECT COUNT(reordered)
FROM train t2
WHERE t2.reordered = '1'
AND t1.product_id = t2.product_id
GROUP BY product_id)
FROM
train t1
GROUP BY
t1.product_id;
But it takes too much time (I don't know if it's the right request because I don't have results yet)

Is this what you are looking for?
SELECT Product_id, SUM(CASE WHEN reordered=1 THEN 1 ELSE 0 END ) /
COUNT(*) AS ReorderedRate
FROM
train
GROUP BY Product_id

Try this elegant
SELECT t1.product_id, SUM(CASE WHEN reordered = 1 THEN 1 ELSE 0 END) / COUNT(t1.product_id)
FROM train t1
GROUP BY t1.product_id;

I think the simplest method is to use AVG():
SELECT product_id,
AVG(CASE WHEN reordered = '1' THEN 1.0 ELSE 0 END)
FROM train
GROUP BY product_id;
If reordered is really a number that only takes on the values 0 and 1, then you can further simplify this to either:
SELECT product_id, AVG(reordered)
FROM train
GROUP BY product_id;
or:
SELECT product_id, AVG(reordered * 1.0)
FROM train
GROUP BY product_id;
The second is needed in databases where the average of an integer is returned as an integer.

this will compute for each product_id :
the number of lines in train cnt_prod
the number of lines in train cnt_prod_reorder that was reordered
SELECT t1.product_id, COUNT(t1.product_id) as cnt_prd,
COUNT(case when t.1.reordered='1' then 1 else NULL end ) as cnt_prd_reord
from train t1 group by t1.product_id;
So after you can do :
select st.product_id , st.cnt_prd , st.cnt_prd / st.cnt_prd_reord
from (
SELECT t1.product_id, COUNT(t1.product_id) as cnt_prd,
COUNT(case when t.1.reordered='1' then 1 else NULL end ) as cnt_prd_reord
from train t1 group by t1.product_id
) as st ;

Related

How to select the max of revenue for each user_id with row number in SQL?

In my dataset there are some user_id that each of them has several row number (from 1 to n) that each row has a specific revenue. I want to select the maximum of the revenue for each user_id with the row number belongs to this revenue. I want to have a query with result of the highlighted rows.
One method is a correlated subquery:
select t.*
from t
where t.revenue = (select max(t2.revenue) from t t2 where t2.user_id = t.user_id);
If there are ties for the maximum, this returns all the highest value rows.
select *,
case when revenue = max(revenue) over (partition by user_id) then 1 else 0 end as highlight
from T
select tt.*
from #tbl tt
join (select user_Id, max(revenue) as revenue from #tbl group by user_Id) tm on tt.user_Id = tm.user_Id and tt.revenue = tm.revenue

sql - select all rows that have all multiple same cols

I have a table with 4 columns.
date
store_id
product_id
label_id
and I need to find all store_ids that have all products_id with same label_id (for example 4)in one day.
for example:
store_id | label_id | product_id | data|
4 4 5 9/2
5 4 7 9/2
4 3 12 9/2
4 4 7 9/2
so it should return 4 because it's the only store that contains all possible products with label 4 at one day.
I have tried something like this:
(select store_id, date
from table
where label_id = 4
group by store_id, date
order by date)
I dont know how to write the outer query, I tried:
select * from table
where product_id = all(Inner query)
but it didnt work.
Thanks
It is unclear from your question whether the labels are specific to a given day or through the entire period. But a variation of Tim's answer seems appropriate. For any label:
SELECT t.date, t.label, t.store_id
FROM t
GROUP BY t.date, t.label, t.store_id
HAVING COUNT(DISTINCT t.product_id) = (SELECT COUNT(DISTINCT t2product_id)
FROM t t2
WHERE t2.label = t.label
);
For a particular label:
SELECT t.date, t.store_id
FROM t
WHERE t.label = 4
GROUP BY t.date,t.store_id
HAVING COUNT(DISTINCT t.product_id) = (SELECT COUNT(DISTINCT t2product_id)
FROM t t2
WHERE t2.label = t.label
);
If the labels are specific to the date, then you need that comparison in the outer queries as well.
Here is one way:
SELECT date, store_id
FROM yourTable
GROUP BY date, store_id
HAVING COUNT(DISTINCT product_id) = (SELECT COUNT(DISTINCT product_id)
FROM yourTable t2
WHERE t2.date = t1.date)
ORDER BY date, product_id;
This query reads in a pretty straightforward way, and it says to find every product, on some date, whose distinct product count is the same as the distinct product count on the same day, across all stores.
I'd probably aggregate to lists of products in a string or array:
with products_per_day_and_store as
(
select
store_id,
date,
string_agg(distinct product_id order by product_id) as products
from mytable
where label_id = 4
group by store_id, date
)
, products_per_day
(
select
date,
string_agg(distinct product_id order by product_id) as products
from mytable
where label_id = 4
group by date
)
select distinct ppdas.store_id
from products_per_day_and_store ppdas
join products_per_day ppd using (date, products);

GROUP BY after CASE WHEN

I am trying to create a table from a join and summing some fields based on id. This part is working great. I am also trying to add an additional column and using a case when statement I want to populate it.
Here is the script
CREATE TABLE TABLE1
AS
SELECT ID, IDC, SUM(AMOUNT) PRICE, SUM(COST) COST, SUM(AMOUNT-COST) PROFIT,
CASE PROFIT
WHEN PROFIT < 1000 THEN 'Low'
WHEN PROFIT < 5000 THEN 'Medium'
ELSE 'High'
END AS PROFITLEVEL
FROM
(SELECT DISTINCT ID, IDC, AMOUNT, COST
FROM ORDER_ITEMS
LEFT JOIN ORDERS
ON ID = IDC)
GROUP BY ID, IDC;
This however returns a ORA-00905 : Missing keyword error.
Any help would be appreciated
You are using the CASE in a wrong way; besides, you try to use the alias PROFIT at the same level you define it.
You need to edit you CASE and use the expression that gives the PROFIT instead of the alias PROFIT:
CREATE TABLE TABLE1 AS
SELECT ID,
IDC,
SUM(AMOUNT) PRICE,
SUM(COST) COST,
SUM(AMOUNT - COST) PROFIT,
CASE
WHEN SUM(AMOUNT - COST) < 1000 THEN 'Low'
WHEN SUM(AMOUNT - COST) < 5000 THEN 'Medium'
ELSE 'High'
END AS PROFITLEVEL
FROM (SELECT DISTINCT ID,
IDC,
AMOUNT,
COST
FROM ORDER_ITEMS LEFT JOIN ORDERS ON ID = IDC)
GROUP BY ID, IDC;
The way you tried to use the CASE is useful if you need to check single values; for example:
select level,
case level
when 1 then 'one'
when 2 then 'two'
else 'other'
end
from dual
connect by level <=3

How to query specific values for some columns and sum of values in others SQL

I'm trying to query some data from SQL such that it sums some columns, gets the max of another column and the corresponding row for a third column. For example,
|dataset|
|shares| |date| |price|
100 05/13/16 20.4
200 05/15/16 21.2
300 06/12/16 19.3
400 02/22/16 20.0
I want my output to be:
|shares| |date| |price|
1000 06/12/16 19.3
The shares have been summed up, the date is max(date), and the price is the price at max(date).
So far, I have:
select sum(shares), max(date), max(price)
but that gives me an incorrect price.
EDIT:
I realize I was unclear in my OP, all the other relevant data is in one table, and the price is in other. My full code is:
select id, stock, side, exchange, max(startdate), max(enddate),
sum(shares), sum(execution_price*shares)/sum(shares), max(limitprice), max(price)
from table1 t1
INNER JOIN table2 t2 on t2.id = t1.id
where location = 'CHICAGO' and startdate > '1/1/2016' and order_type = 'limit'
group by id, stock, side, exchange
You can do this with window functions and aggregation. Here is an example:
select sum(shared), max(date), max(case when seqnum = 1 then price end) as price
from (select t.*, row_number() over (order by date desc) as seqnum
from t
) t;
EDIT:
If the results that you are looking at are in fact the result of a query, you can do:
with t as (<your query here>)
select sum(shared), max(date), max(case when seqnum = 1 then price end) as price
from (select t.*, row_number() over (order by date desc) as seqnum
from t
) t;
Heres one way to do it .... the join would obviously include the ticker symbol for the share also
select
a.sum_share,
a.max_date
b.price
FROM
(
select ticker , sum(shares) sum_share, max(date) max_date from table where ticker = 'MSFT' group by ticker
) a
inner join table on a.max_date = b.date and a.ticker = b.ticker

sqlite: get the average of the top X% for every item

Is it possible to get the average of the top X% items in a group?
For example:
I have a table which has a item_id, timestamp and price column. The output should be grouped by item_id and timestamp and the 'price-column' should get averaged. For the averaging only the lowest X% prices within that group should be used.
I've found similar questions (How to select top x records for every group) but this won't work with sqlite.
Getting the top n records within each group requires counting. Assuming that there are no duplicates, the following query returns the number of records for an item:
select t.*,
(select count(*) from t t2 where t2.item_id = t.item_id
) as NumPrices
from t
This is called a correlated subquery. Now, let's extend the idea to include a rank and then calculate the average for the right group:
select item_id, avg(price)
from (select t.*,
(select count(*) from t t2 where t2.item_id = t.item_id
) as NumPrices,
(select count(*) from t t2 where t2.item_id = t.item_id and t2.price <= t.price
) as PriceRank
from t
) t
where (100.0*PriceRank / NumPrices) <= X
group by item_id
To improve performance, you will want an index on (item_id, price).
To get the count of records in the group with ID I and timestamp T, use this query:
SELECT COUNT(*)
FROM MyTable
WHERE item_id = I
AND timestamp = T
To get the limit, multiply with X, and use ROUND/CAST to convert to an integer:
SELECT CAST(ROUND(COUNT(*) * X / 100) AS INTEGER)
FROM MyTable
WHERE item_id = I
AND timestamp = T
To get all records in a specific group that are inside that limit, order the records in the group by price, and limit the returned count:
SELECT *
FROM MyTable
WHERE item_id = I
AND timestamp = T
ORDER BY price
LIMIT (SELECT CAST(ROUND(COUNT(*) * X / 100) AS INTEGER)
FROM MyTable
WHERE item_id = I
AND timestamp = T)
In theory, to get the group averages, add GROUP BY around that:
SELECT item_id,
timestamp,
(SELECT AVG(price)
FROM (SELECT price
FROM MyTable T2
WHERE T2.item_id = T1.item_id
AND T2.timestamp = T1.timestamp
ORDER BY price
LIMIT (SELECT CAST(ROUND(COUNT(*) * X / 100) AS INTEGER)
FROM MyTable T3
WHERE T3.item_id = T1.item_id
AND T3.timestamp = T1.timestamp)
)
) AS AvgPriceLowestX
FROM MyTable T1
GROUP BY item_id,
timestamp
However, it appears that SQLite does not allow access to correlation variables from inside the LIMIT clause, so this does not work in practice.
You would have to get the IDs of all groups (SELECT DISTINCT item_id, timestamp FROM MyTable) and execute the third query above for each group.
In any case, ensure that you have one index on the three columns item_id, timestamp, and price to get good performance.