Add row_number to maximum value - sql

This is a simplified version of my problem (and I couldn't come up with an example that makes sense in real-life).
Assume I have a Table Person
Table Person
ID Name Number Category
1 Follett null Thriller
2 Rowling null Fantasy
3 Martin 80 Fantasy
4 Cage 55 Thriller
5 Baldacci null Thriller
Now I want to get the following Result:
ID Name Number Category
1 Follett 56 Thriller
2 Rowling 81 Fantasy
3 Martin 80 Fantasy
4 Cage 55 Thriller
5 Baldacci 57 Thriller
Group by Category
Select the maximum Number Value of each Category
Add the row_number (partitioned by the Category) to that number and set the new value, (edit:) BUT only for Numbers that where null before.
The parts I have so far (NOT WORKING, more to illustrate what I'd like to do, I do know why this can't possibly work)
UPDATE Person P
SET Number = sub.current + sub.row
FROM (
SELECT
Id,
max(Number) as current,
(ROW_NUMBER() OVER(PARTITION BY Category)) AS row
FROM Person
GROUP BY Category
) as sub
WHERE P.Id = sub.Id
Note: For the corner case where all Numbers are null for a Category the max(Number) should just be 0 and the new values should simply be the row_numbers().
I am using Postgresql.

You can get the value in a select using:
select p.*,
(coalesce(max(number) over (partition by category), 0) +
row_number() over (partition by category order by number)
) as newnumber
from person p;
You can then put this in an update statement as:
update person
set number = pp.newnumber
from (select p.*,
(coalesce(max(number) over (partition by category), 0) +
row_number() over (partition by category order by number)
) as newnumber
from person p
) pp
where pp.id = p.id and p.number is null;
As a note: If you are attempting to create a unique value doing this, it might not work. The sequential numbers for a particular category might conflict with the numbers from another category. If this is what you are trying to do, then ask another question with more details.

Related

SQL query which will extract conditionally the values from top categories the first and the 2nd where CATEGORY is OTHER

I have this table. The table just a small example and has more obs.
id
CATEGORY
AMOUNT
1
TECH
120
1
FUN
220
2
OTHER
340
2
PARENTS
220
made by id category amount spent in each category.I want to select ID and Category in which the ID spents the most but in case if category is OTHER I want to get 2nd most spending category.
I have a constraint. I CANNOT use the the subquery and select with filter WHERE CATEGORY <> 'OTHER'. It just makes my machine to go out of the memory (For reasons Idk)
This is what I have tried.
I have tried to create a row_number () over (partition by id order by amount desc) rn.
and then
select id, category from table where row num = 1 group by 1,2
**buttt. I don't know how to say to query. If CATEGORY is OTHER then take row num=2 . **
id
CATEGORY
AMOUNT
ROW NUM
1
TECH
120
2
1
FUN
220
1
2
OTHER
340
1
2
PARENTS
220
2
Another thing I was thinking to do is to write qualify function
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY AMOUNT DESC) <1.
Also here I am getting only 1st records in which there is also OTHER. If I could filter it out within QUALIFY and say if CATEGORY is 'OTHER' don't consider it.
I am using Databricks.

Get occurrence count of specific categories in a table

Looking to get the transition count of categories from a table. For Name type B, category transitions from Good to Bad so count is 2. For Name type A, it transitions from Good - Moderate - Good - Moderate - Bad, hence gets a count of 5.
Any help would be appreciated.
This is my input data:
Name
order no
category
A
1
Good
A
2
Good
A
3
MODERATE
A
4
Good
A
5
MODERATE
A
6
Bad
A
7
Bad
B
1
Good
B
2
Good
B
3
Good
B
4
BAD
And this is my desired output:
Name
category_transition_count
A
5
B
2
select name
,count(cnt) as category_transition_count
from
(select name
,case when category <> lag(category) over(partition by Name order by order_no) or lag(category) over(partition by Name order by order_no) is null then 1 end as cnt
from t) t
group by name
name
category_transition_count
A
5
B
2
Fiddle
You could use the lag window function to get the category of the previous row, and then compare it with the current row to see if it changed, and count those occurrences. Note that by definition the lag of the first value is null, which can't be different from the current value. so you'll need to handle that explicitly:
SELECT name, COUNT(changed) + 1
FROM (SELECT name,
CASE WHEN category <> LAG(category) OVER (PARTITION BY name ORDER BY order_no ASC)
THEN 1
END AS changed
FROM mytable) t
GROUP BY name
SQLFiddle (PostgreSQL) demo

Complex SQL query or queries

I looked at other examples, but I don't know enough about SQL to adapt it to my needs. I have a table that looks like this:
ID Month NAME COUNT First LAST TOTAL
------------------------------------------------------
1 JAN2013 fred 4
2 MAR2013 fred 5
3 APR2014 fred 1
4 JAN2013 Tom 6
5 MAR2014 Tom 1
6 APR2014 Tom 1
This could be in separate queries, but I need 'First' to equal the first month that a particular name is used, so every row with fred would have JAN2013 in the first field for example. I need the 'Last" column to equal the month of the last record of each name, and finally I need the 'total' column to be the sum of all the counts for each name, so in each row that had fred the total would be 10 in this sample data. This is over my head. Can one of you assist?
This is crude but should do the trick. I renamed your fields a bit because you are using a bunch of "RESERVED" sql words and that is bad form.
;WITH cte as
(
Select
[NAME]
,[nmCOUNT]
,ROW_NUMBER() over (partition by NAME order by txtMONTH ASC) as 'FirstMonth'
,ROW_NUMBER() over (partition by NAME order by txtMONTH DESC) as 'LastMonth'
,SUM([nmCOUNT]) as 'TotNameCount'
From Table
Group by NAME, [nmCOUNT]
)
,cteFirst as
(
Select
NAME
,[nmCOUNT]
,[TotNameCount]
,[txtMONTH] as 'ansFirst'
From cte
Where FirstMonth = 1
)
,cteLast as
(
Select
NAME
,[txtMONTH] as 'ansLast'
From cte
Where LastMonth = 1
Select c.NAME, c.nmCount, c.ansFirst, l.ansLast, c.TotNameCount
From cteFirst c
LEFT JOIN cteLast l on c.NAME = l.NAME

Invalid count and sum in cross tab query using PostgreSQL

I am using PostgreSQL 9.3 version database.
I have a situation where I want to count the number of products sales and sum the amount of product and also want to show the cities in a column where the product have sale.
Example
Setup
create table products (
name varchar(20),
price integer,
city varchar(20)
);
insert into products values
('P1',1200,'London'),
('P1',100,'Melborun'),
('P1',1400,'Moscow'),
('P2',1560,'Munich'),
('P2',2300,'Shunghai'),
('P2',3000,'Dubai');
Crosstab query:
select * from crosstab (
'select name,count(*),sum(price),city,count(city)
from products
group by name,city
order by name,city
'
,
'select distinct city from products order by 1'
)
as tb (
name varchar(20),TotalSales bigint,TotalAmount bigint,London bigint,Melborun bigint,Moscow bigint,Munich bigint,Shunghai bigint,Dubai bigint
);
Output
name totalsales totalamount london melborun moscow munich shunghai dubai
---------------------------------------------------------------------------------------------------------
P1 1 1200 1 1 1
P2 1 3000 1 1 1
Expected Output:
name totalsales totalamount london melborun moscow munich shunghai dubai
---------------------------------------------------------------------------------------------------------
P1 3 2700 1 1 1
P2 3 6860 1 1 1
Your first mistake seems to be simple. According to the 2nd parameter of the crosstab() function, 'Dubai' must come as first city (sorted by city). Details:
PostgreSQL Crosstab Query
The unexpected values for totalsales and totalamount represent values from the first row for each name group. "Extra" columns are treated like that. Details:
Pivot on Multiple Columns using Tablefunc
To get sums per name, run window functions over your aggregate functions. Details:
Get the distinct sum of a joined table column
select * from crosstab (
'select name
,sum(count(*)) OVER (PARTITION BY name)
,sum(sum(price)) OVER (PARTITION BY name)
,city
,count(city)
from products
group by name,city
order by name,city
'
-- ,'select distinct city from products order by 1' -- replaced
,$$SELECT unnest('{Dubai,London,Melborun
,Moscow,Munich,Shunghai}'::varchar[])$$
) AS tb (
name varchar(20), TotalSales bigint, TotalAmount bigint
,Dubai bigint
,London bigint
,Melborun bigint
,Moscow bigint
,Munich bigint
,Shunghai bigint
);
Better yet, provide a static set as 2nd parameter. Output columns are hard coded, it may be unreliable to generate data columns dynamically. If you a another row with a new city, this would break.
This way you can also order your columns as you like. Just keep output columns and 2nd parameter in sync.
Honestly I think your database needs some drastic normalization and your results in several columns (one for each city name) is not something I would do myself.
Nevertheless if you want to stick to it you can do it this way.
For the first step you need get the correct amounts. This would do the trick quite fast:
select name, count(1) totalsales, sum(price) totalAmount
from products
group by name;
This will be your result:
NAME TOTALSALES TOTALAMOUNT
P2 3 6860
P1 3 2700
You would get the Products/City this way:
select name, city, count(1) totalCityName
from products
group by name, city
order by name, city;
This result:
NAME CITY TOTALCITYNAME
P1 London 1
P1 Melborun 1
P1 Moscow 1
P2 Dubai 1
P2 Munich 1
P2 Shunghai 1
If you really would like a column per city you could do something like:
select name,
count(1) totalsales,
sum(price) totalAmount,
(select count(1)
from Products a
where a.City = 'London' and a.name = p.name) London,
...
from products p
group by name;
But I would not recommend it!!!
This would be the result:
NAME TOTALSALES TOTALAMOUNT LONDON ...
P1 3 2700 1
P2 3 6860 0
Demonstration here.

Simple SQL query with select and group by

I have some kind of problem to understand something.
I have the next table:
ID PROD PRICE
1 A 10
2 B 20
3 C 30
4 A 1
5 B 12
6 C 2
7 A 7
8 B 8
9 C 9
10 A 5
11 B 2
I want to get all the minimum prices of all the prod, meaning I want to get 3 records, the minimum price for every prod.
From the example above, this is what I want to get:
ID PROD MIN(PRICE)
4 A 1
11 B 2
6 C 2
This is the query I wrote:
select id, prod, min(price)
from A1
group by(prod);
But this is the records I got:
ID PROD MIN(PRICE)
1 A 1
2 B 2
3 C 2
As you can see the ID value is wrong, it is only give me some kind of line counter and not the actual ID value.
You can check it at the next link
What I'm doing wrong?
SELECT a.*
FROM A1 a
INNER JOIN
(
SELECT Prod, MIN(Price) minPrice
FROM A1
GROUP BY Prod
) b ON a.Prod = b.Prod AND
a.Price = b.minPrice
SQLFiddle Demo
For MSSQL
SELECT ID, Prod, Price
FROM
(
SELECT ID, Prod, Price,
ROW_NUMBER() OVER(Partition BY Prod ORDER BY Price ASC) s
FROM A1
) a
WHERE s = 1
SQLFiddle Demo
You must be using MySQL or perhaps PostgreSQL.
In standard SQL, all non-aggregate columns in the select-list must be cited in the GROUP BY clause.
I'm not clear whether you need the ID column. If not, then use:
SELECT prod, MIN(price) AS min_price
FROM A1
GROUP BY prod;
If you need the matching ID number, then that becomes a sub-query:
SELECT id, prod, price
FROM A1
JOIN (SELECT prod, MIN(price) AS min_price
FROM A1
GROUP BY prod
) AS A2 ON A1.prod = A2.prod AND A1.price = A2.min_price;
Can you please explain what is the problem with what I wrote, and yes I need the ID column.
select id, prod, min(price)
from A1
group by(prod);
In standard SQL, you would get an error message (or, if not standard, in most SQL DBMS).
Where you are allowed to omit the ID column from the GROUP BY clause, then you get a quasi-random value for ID for the correct prod and MIN(price) values. Basically, the optimizer will choose any convenient ID that it knows about, based on its whims. Specifically, it does not do the sub-query and join that the full answer does. For example, it might do a sequential scan, and the ID it returns might be the first, or last, that it encounters for the given prod value, or it might be some other value — I'm not even sure whether the ID returned for prod = 'A' has to be an ID that was associated with prod = 'A'; you'd have to read the manual carefully. Basically, your query is indeterminate, so many return values are permissible and 'correct' (but not what you wanted).
Note that if you grouped by ID and not prod, then the result in prod would be determinate. That's because the ID column is a candidate key (unique identifier) for the table. (I believe PostgreSQL distinguishes between the two cases — but I'm not certain of that; MySQL does not.)