Remove duplicates from SQL Window function - sql

I'm trying to sum values inside a window function but I can't figure out have to prevent summing duplicates. Below is a snippet of the results I have right now. For the last column I want to calculate REG_MOVEMENT summed across unique STORE_ID's and then divide it by the number of unique stores. This column should be 5603.5 ((9359 + 1848)/2) since there are 3 rows with the same STORE_ID and one different.
KEY_ID
PRODUCT_ID
STORE_ID
REG_MOVEMENT
(No column name)
154
5214266
28002
9359
7481.25
155
5214266
28002
9359
7481.25
156
5214266
28002
9359
7481.25
173
5214266
28005
1848
7481.25
My current code is
SELECT
KEY_ID,
PRODUCT_ID,
STORE_ID,
REG_MOVEMENT,
SUM(REG_MOVEMENT) OVER(PARTITION BY PRODUCT_ID) / CONUT(STORE_ID) OVER(PARTITION BY PRODUCT_ID)

You need a distinct count in the denominator, but SQL Server does not allow this in a single count window function call. As a workaround, we can use DENSE_RANK:
WITH cte AS (
SELECT *, DENSE_RANK() OVER (PARTITION BY PRODUCT_ID ORDER BY STORE_ID) dr
FROM yourTable
)
SELECT
KEY_ID,
PRODUCT_ID,
STORE_ID,
REG_MOVEMENT,
SUM(REG_MOVEMENT) OVER (PARTITION BY PRODUCT_ID) /
MAX(dr) OVER (PARTITION BY PRODUCT_ID) AS new_col
FROM cte
ORDER BY PRODUCT_ID, STORE_ID;

One way with a subquery to de-duplicate (store_id, reg_movement) rows:
select
KEY_ID, PRODUCT_ID, STORE_ID, REG_MOVEMENT,
(select avg(reg_movement)
from (select distinct store_id, reg_movement
from Tbl) Unq
) As NewCol
from Tbl
(Tbl is yourtable)

SELECT AVG(reg_movement)
FROM (
SELECT DISTINCT store_id,
CAST(reg_movement AS FLOAT) AS reg_movement
FROM Table1
) a

Related

Oracle SQL Count distinct values in a certain column

I am trying to query a table with a certain logic and I want to remove the records which have a count of 2 or more distinct values in PERSON_ID column. I cannot find an appropriate window query to achieve this. I already tried using:
SELECT
CUSTOMER_ID, PERSON_ID, CODE,
DENSE_RANK() OVER (PARTITION BY CUSTOMER_iD, PERSON_ID ORDER BY PERSON_ID ASC) AS NR
FROM TBL_1;
But I get the following result:
I want to achieve the result below, which counts the distinct values within PERSON_ID column based on a certain CUSTOMER_ID. In my case Customer "444333" would be a record which I want to remove because it has 2 distinct Person_Id's
here is what you need:
SELECT
customer_id, count(distinct PERSON_ID) distinct_person_count
FROM TBL_1
group by customer_id
and if you want to show it for eahc row , you can join it again with the table :
select * from TBL_1 t
join (
select customer_id, count(distinct PERSON_ID) distinct_person_count
from TBL_1
group by customer_id
) tt
on t.customer_id = tt.customer_id
note: you can't have distinct within window functions
If you want the distinct count on each row, then use a window function:
select t.*,
count(distinct person_id) over (partition by customer_id)
from t;
Oracle does support distinct in window functions.

Selecting City from Customer ID in SQL

Customer have ordered from different cities. Thus we have multiple cities against same customer_id. I want to display that city against customer id which has occurred maximum number of times , in case where customer has ordered same number of orders from multiple cities that city should be selected from where he has placed last order. I have tried something like
SELECT customer_id,delivery_city,COUNT(DISTINCT delivery_city)
FROM analytics.f_order
GROUP BY customer_id,delivery_city
HAVING COUNT(DISTINCT delivery_city) > 1
WITH cte as (
SELECT customer_id,
delivery_city,
COUNT(delivery_city) as city_count,
MAX(order_date) as last_order
FROM analytics.f_order
GROUP BY customer_id, delivery_city
), ranking as (
SELECT *, row_number() over (partition by customer_id
order by city_count DESC, last_order DESC) as rn
FROM cte
)
SELECT *
FROM ranking
WHERE rn = 1
select customer_id,
delivery_city,
amount
from
(
select t.*,
rank() over (partition by customer_id order by amount asc) as rank
from(
SELECT customer_id,
delivery_city,
COUNT(DISTINCT delivery_city) as amount
FROM analytics.f_order
GROUP BY customer_id,delivery_city
) t
)
where rank = 1

query with partition and count

Given the following table (it records users' item viewing history with session)
create table view_log (
server_time timestamp,
device char(2),
session_id char(10),
uid char(7),
item_id char(7)
);
I'm trying to understand what the following code does..
create table coo_cs as
select
item_id,
session_id,
count(distinct session_id) / (sum(count(distinct session_id)) over (partition by item_id)) cs
from view_log
group by item_id, session_id;
I've tried to break down the line with the partition to understand what it's doing but then it emits DISTINCT is not implemented for window functions.
I understand basic partition and group by but can't make sense of the above sql..
edit
there's a rather large data for test...
http://pakdd2017.recobell.io/site_view_log_small.csv000.gz
Some databases do not (yet) support count(distinct) as a window function. For this query, the count(distinct) is not necessary, because you are aggregating by the same column used for the count(distinct). Hence, count(distinct session_id) is 1 on each row.
Your query is essentially:
select item_id, session_id,
1.0 / count(session_id) over (partition by item_id)) as cs
from view_log
group by item_id, session_id;
I wouldn't be surprising if you wanted the ratios at the level of item_id, so the intended query is:
select item_id, count(distinct session_id),
count(distinct session_id) * 1.0 / sum(count(distinct session_id)) over ()) as cs
from view_log
group by item_id;
If so, the equivalent logic can use a subquery:
select vl.*, sum(numsession) over () as cs
from (select item_id, count(distinct session_id) as numsessions
from view_log vl
group by item_id
) vl;

SQL query for table with multiple keys?

I am sorry if this seems too easy but I was asked this question and I couldn't answer even after preparing SQL thoroughly :(. Can someone answer this?
There's a table - Seller id, product id, warehouse id, quantity of products at each warehouse for each product as per each seller.
We have to list the Product Ids with Seller Id who has highest number of products for that product and the total number of units he has for that product.
I think I got confused because there were 3 keys in the table.
It's not quite clear which DBMS you are using currently. The below should work if your DBMS support window functions.
You can find count of rows for each product and seller, rank each seller within each product using window function rank and then use filter to get only top ranked sellers in each product along with count of units.
select
product_id,
seller_id,
no_of_products
from (
select
product_id,
seller_id,
count(*) no_of_products,
rank() over (partition by product_id order by count(*) desc) rnk
from your_table
group by
product_id,
seller_id
) t where rnk = 1;
If window functions are not supported, you can use correlated query to achieve the same effect:
select
product_id,
seller_id,
count(*) no_of_products
from your_table a
group by
product_id,
seller_id
having count(*) = (
select max(cnt)
from (
select count(*) cnt
from your_table b
where b.product_id = a.product_id
group by seller_id
) t
);
Don't know why having id columns would mess you up... group by the right columns, sum up the totals and just return the first row:
select *
from (
select sellerid, productid, sum(quantity) as total_sold
from theres_a_table
group by sellerid, productid
) x
order by total_sold desc
fetch first 1 row only
If I do not think about optimization, straight forward answer is like this
select *
from
(
select seller_id, product_id, sum(product_qty) as seller_prod_qty
from your_table
group by seller_id, product_id
) spqo
inner join
(
select product_id, max(seller_prod_qty) as max_prod_qty
from
(
select seller_id, product_id, sum(product_qty) as seller_prod_qty
from your_table
group by seller_id, product_id
) spqi
group by product_id
) pmaxq
on spqo.product_id = pmaxq.product_id
and spqo.seller_prod_qty = pmaxq.max_prod_qty
both spqi (inner) and sqpo (outer) give you seller, product, sum of quantity across warehouses. pmaxq gives you max of each product again across warehouses, and then final inner join picks up sum of quantities if seller has highest (max) of the product (could be multiple sellers with the same quantity). I think this is the answer you are looking for. However, I'm sure query can be improved, since what I'm posting is the "conceptual" one :)

Maximum count - PostgreSQL

I'm looking to pull the max(count(*)) of something from a table.
Effectively what i'm trying to do is pull out a customers favourite brand. So they buy 300 bars of soap a year but I'd like to know which their favourite is. So the max(count(brand_id) basically.
I was thinking of doing it like this:
SELECT
transaction.customer_id,
max(occ)
FROM
( SELECT
transaction.customer_id,
count(transaction.brand_id) as occ,
FROM
transaction
GROUP BY
transaction.customer_id,
) AS foo
GROUP BY
transaction.customer_id
Thanks in advance
you can do it like this:
with cte as (
select customer_id, brand_id, count(*) as cnt
from test1
group by customer_id, brand_id
)
select distinct on (customer_id)
customer_id, brand_id, cnt
from cte
order by customer_id, cnt desc
Keep in mind, that if there more than one brand with equal count for some customer, you'll end up with one arbitrary record. If you want to get all records, use dense_rank() function:
with cte1 as (
select customer_id, brand_id, count(*) as cnt
from test1
group by customer_id, brand_id
), cte2 as (
select
customer_id, brand_id,
dense_rank() over(partition by customer_id order by cnt desc) as rn
from cte1
)
select customer_id, brand_id
from cte2
where rn = 1
sql fiddle demo
For PostgreSQL 8.3:
select distinct on (customer_id)
customer_id, brand_id, cnt
from (
select customer_id, brand_id, count(*) as cnt
from test1
group by customer_id, brand_id
) as c
order by customer_id, cnt desc;
sql fiddle demo
or like this
with cte as (
SELECT
transaction.customer_id,
count(transaction.brand_id) as occ,
FROM
transaction
GROUP BY
transaction.customer_id
)
select max(occ) from cte