Cannot set Percent_rank() - defaults to 0.0 - sql

I'm trying to SET a column, but the result will end up as 0.0 for each row.
If I use the same syntax (the select part of it) in SELECT, the results display correctly.
UPDATE table1
SET ranking = (SELECT
PERCENT_RANK() OVER(PARTITION BY city ORDER BY sales DESC)
from table1
group by store_id)
Is it possible to make this work?

The subquery:
SELECT PERCENT_RANK() OVER(PARTITION BY city ORDER BY sales DESC)
from table1
group by store_id
returns 1 row for each store_id and SQLite picks just one and updates with that row's value of PERCENT_RANK() all the rows of the table.
You must correlate the subquery with table1
UPDATE table1
SET ranking = (
SELECT pr
FROM (
SELECT store_id,
PERCENT_RANK() OVER(PARTITION BY city ORDER BY sales DESC) pr
FROM table1
GROUP BY store_id
) t
WHERE table1.store_id = t.store_id
);
Or, if your version of SQLite is 3.33.0 use the UPDATE...FROM... syntax:
UPDATE table1 AS t1
SET ranking = t.pr
FROM (
SELECT store_id,
PERCENT_RANK() OVER(PARTITION BY city ORDER BY sales DESC) pr
FROM table1
GROUP BY store_id
) t
WHERE t1.store_id = t.store_id;

Related

SUM most recent ID/Product combinations for the latest date

select * from
(select Id, Prodcut, Billing_date
, row_number() over (partition by Id, product order by Billing_date desc) as RowNumber
,sum(Revenue)
from Table1
group by 1,2,3,4,1) a
where a.rowNumber = 1
There are rows where Id+product combination repeats for latest billing date and which causing some data to be missed out. I am trying to add sum with row_number to sum all the ID&product combinations for the latest date but not able to make it work.
Can anyone please help me out here!
Data Sample Image
Database: Athena, Dbeaver
I would expect this to do what you want:
select *
from (select Id, Product, Billing_date,
row_number() over (partition by Id, product order by Billing_date desc) as seqnum,
sum(Revenue)
from Table1
group by Id, Product, Billing_date
) t1
where seqnum = 1;
Your group by columns do not seem correct. I'm surprised your query runs in any datbase.

How to get last date on max count transaction_id by branch in sql

I would like to get last date on max count txn_id base on branch_name.
This is the data
I want the result like this
here is my script. but I get only one row.
select max(date),
account,
branch_name,
province,
district
from
(select date,
account,
branch_name,
province,
district,
RANK() OVER (ODER BY txn_no desc) rnk
from
(select count(tr.txn_id) txn_no,
tr.date,
u.account,
b.branch_name,
b.province,
b.district
from transaction tr
inner join users u
on u.user_id = tr.user_id
inner join branch b
on b.user_id = u.user_id
where 1=1
and tr.date >= to_date('01/04/2021','dd/mm/yyyy') and tr.date < to_date('30/04/2021','dd/mm/yyyy')
group by tr.date,
u.account,
b.branch_name,
b.province,
b.district
))
where rnk = 1
group by tr.date,
u.account,
b.branch_name,
b.province,
b.district
WITH
cte1 AS ( SELECT *, COUNT(*) OVER (PARTITION user, branch) cnt
FROM source_table ),
cte2 AS ( SELECT *, RANK() OVER (PARTITION BY user ORDER BY cnt DESC) rnk
FROM cte1 )
SELECT *
FROM cte2
WHERE rnk = 1
If more than one branch have the same amount of rows then all of them will be returned. If only one of them must be returned in this case then according additional criteria must be used and ORDER BY expression clause in cte2 must be accordingly expanded. If any (random) must be returned in this case then RANK() in cte2 must be replaced with ROW_NUMBER().
I would do it with SELECT TOP 1 ... instead of RANK(), something like:
SELECT date,
txn_id,
account,
branch_name,
province,
district
FROM transaction t
WHERE branch_name = (
SELECT TOP 1 branch_name
FROM (
SELECT branch_name, count(*) as cnt
FROM transaction
WHERE account = t.account
GROUP BY branch_name
ORDER BY 2 DESC
) s
AND date = (
SELECT TOP 1 date
FROM (
SELECT date, count(*) as cnt
FROM transaction
WHERE account = t.account
AND branch_name = (
SELECT TOP 1 branch_name
FROM (
SELECT branch_name, count(*) as cnt
FROM transaction
WHERE account = t.account
GROUP BY branch_name
ORDER BY 2 DESC
) s2
GROUP BY branch_name
ORDER BY 2 DESC
) s2
)
If there are multiple transactions on the last date, they all are going to be returned though.
You want the most recent transaction from the account/branch with the most transactions. If so, you can do this as:
select t.*
from (select t.*,
max(cnt) over (partition by account) as max_cnt
from (select t.*,
count(*) over (partition by account, branch_name) as cnt,
row_number() over (partition by account, branch_name order by date desc) as seqnum
from t
) t
) t
where max_cnt = cnt and seqnum = 1;
Note: If there are multiple branches that have the same count, then they are all returned. Your question does not specify how to deal with such duplicates.

Selecting the latest order

I need to select the data of all my customers with the records displayed in the image. But I need to get the most recent record only, for example I need to get the order # E987 for John and E888 for Adam. As you can see from the example, when I do the select statement, I get all the order records.
You don't mention the specific database, so I'll answer with a generic solution.
You can do:
select *
from (
select t.*,
row_number() over(partition by name order by order_date desc) as rn
from t
) x
where rn = 1
You can use analytical function row_number.
Select * from
(Select t.*,
Row_number() over (partition by customer_id order by order_date desc) as rn
From your_table t) t
Where rn = 1
Or you can use not exists as follows:
Select *
From yoir_table t
Where not exists
(Select 1 from your_table tt
Where t.customer_id = tt.custome_id
And tt.order_date > t.order_date)
You can do it with a subquery that finds the last order date.
SELECT t.*
FROM yoir_table t
JOIN (SELECT tt.custome_id,
MAX(tt.order_date) MaxOrderDate
FROM yoir_table tt
GROUP BY tt.custome_id) AS tt
ON t.custome_id = tt.custome_id
AND t.order_date = tt.MaxOrderDate

How to delete 90% of records from each group of a table (postgres)

I have a table called 'sales' in postgres which has a column called 'region'. I am trying to find out a way to delete 90% of records from each 'region' of the same table.
I am using the below query. But the same is not working in postgres and also the table does not have a primary/unique key column
delete from table
( select row_number() over (partition by region) as PAR
from sales
)b
where PAR >=
( select S*0.1 as ninety
from
( select region, count(*) as S
from sales
group by region
)a
and b.region = a.region
can anyone provide any better solution to this.
If you have an unique id in the table, you can do:
delete
from t
using (select t.*,
row_number() over (partition by region order by region) as seqnum, -- I always include order by
count(*) over (partition by region) as cnt
from t
) tt
where t.id = tt.id and
tt.seqnum < 0.9 * cnt;

SQL Return MAX Values from Multiple Rows

I have tried many solutions and nothing seems to work. I am trying to return the MAX status date for a project. If that project has multiple items on the same date, then I need to return the MAX ID. So far I have tried this:
SELECT PRJSTAT_ID, PRJSTAT_PRJA_ID, PRJSTAT_STATUS, PRJSTAT_DATE
From Project_Status
JOIN
(SELECT MAX(PRJSTAT_PRJA_ID) as MaxID, MAX(PRJSTAT_DATE) as MaxDate
FROM Project_Status
Group by PRJSTAT_PRJA_ID)
On
PRJSTAT_PRJA_ID = MaxID and PRJSTAT_DATE = MaxDate
Order by PRJSTAT_PRJA_ID
It returns the following:
I am getting multiple records for PRJSTAT_PRJA_ID, but I only want to return the row with the MAX PRJSTAT_ID. Any thoughts?
Take out the MAX on the ID on the subquery:
SELECT PRJSTAT_ID, PRJSTAT_PRJA_ID, PRJSTAT_STATUS, PRJSTAT_DATE
From Project_Status
JOIN
(SELECT PRJSTAT_PRJA_ID as ID, MAX(PRJSTAT_DATE) as MaxDate
FROM Project_Status
Group by PRJSTAT_PRJA_ID)
On
PRJSTAT_PRJA_ID = ID and PRJSTAT_DATE = MaxDate
Order by PRJSTAT_PRJA_ID
Or remove the need to join:
SELECT * FROM
(SELECT PRJSTAT_ID, PRJSTAT_PRJA_ID, PRJSTAT_STATUS, PRJSTAT_DATE,
ROW_NUMBER() OVER (PARTITION BY PRJSTAT_PRJA_ID ORDER BY PRJSTAT_DATE DESC)
AS SEQ,
ROW_NUMBER() OVER (PARTITION BY PRJSTAT_PRJA_ID ORDER BY PRJSTAT_PRJA_ID
DESC) AS IDSEQ
From Project_Status
)PR
WHERE SEQ = 1
AND IDSEQ = 1
Your problem is ties. You want the record with the maximum date per PRJSTAT_PRJA_ID and in case of a tie the record with the highest ID. The easiest way to rank records per group and only keep the best record is ROW_NUMBER:
select prjstat_id, prjstat_prja_id, prjstat_status, prjstat_date
from
(
select
project_status.*,
row_number() over (partition by prjstat_prja_id
order by prjstat_date desc, prjstat_id desc) as rn
from project_status
)
where rn = 1
order by prjstat_prja_id;