Percentage by group - oracle - sql

I have this sample.
What I need is getting an average per key not key and value. However, the syntax I used appear to give me the average per key and value.
select avg(value2),KEY,VALUE from testavg
GROUP BY key,value
order by key, value
Doing otherwise will yield a syntax error. The results I need are as follow:
10 A 0.96
10 B 0.04
12 C 1
But the statement I used yields the incorrect results above.
Could this be achieved by issuing 1 single oracle select statement? I have included the statement to create the entire table.
CREATE TABLE "TESTAVG"
( "KEY" NUMBER,
"VALUE" VARCHAR2(20 BYTE),
"VALUE2" NUMBER
)
Insert into TESTAVG (KEY,VALUE,VALUE2) values (10,'A',12);
Insert into TESTAVG (KEY,VALUE,VALUE2) values (10,'A',13);
Insert into TESTAVG (KEY,VALUE,VALUE2) values (10,'B',1);
Insert into TESTAVG (KEY,VALUE,VALUE2) values (12,'C',20);

This query might run faster on larger data - only reads the table once:
select distinct key, value,
sum(value2) over (partition by key, value) / sum(value2) over (partition by key) r
from testavg
/
KEY VALUE R
---------- -------------------- ----------
10 A .961538462
10 B .038461538
12 C 1

select avg(value2),KEY from testavg
GROUP BY key
order by key;
8.66666666666666666666666666666666666667 10
20 12
EDIT: Specs are still not clear but this might be what you need...
with gr1 as (select key,sum(value2) sumvalue
from testavg
group by key)
, gr2 as (select key,value,sum(value2) sumvalue
from testavg
GROUP BY key,value)
select gr1.key,gr2.value,gr2.sumvalue/gr1.sumvalue
from gr1
, gr2
where gr1.key = gr2.key;
10 B 0.0384615384615384615384615384615384615385
12 C 1
10 A 0.9615384615384615384615384615384615384615

Related

SQL - group by occurrence and return id

I have a table of IDs and value:
ID Value
X 1
X 1
X 2
Y 5
Y 5
Y 5
Z 3
Z 6
I want to see which ID contains more than 1 different value. In this case return ID X and Y because X contains[1,2] and Z contains [3,6]:
ID
X
Z
I have tried this:
select ID from
(
SELECT ID
,count(*) over (partition by [Value]) as c
FROM mytable
) a
where c>1
But this is not returning the desired answer
I prefer aggregating this way:
SELECT ID
FROM mytable
GROUP BY ID
HAVING MIN(Value) <> MAX(Value);
On many databases, the above HAVING clause will be sargable, meaning that an index on (ID, Value) can be used. The version which checks COUNT(DISTINCT Value) may not be able to use such an index.
Try this,
SELECT ID
FROM mytable
GROUP BY ID
HAVING COUNT(DISTINCT Value) > 1;
Just group them by ID and check wheter it got more than 1 occurrencies in Value field. Something like this
SELECT ID
FROM table
GROUP BY ID
HAVING COUNT(DISTINCT Value) > 1
CREATE TABLE yourtable(
ID VARCHAR(30) NOT NULL
,Value int NOT NULL
);
INSERT INTO yourtable
(ID,Value) VALUES
('X',1),
('X',1),
('X',2),
('Y',5),
('Y',5),
('Y',5),
('Z',3),
('Z',6);
Other approaches are far better,but I used Rank and Subquery to distinguish ID with more than one occurrence.
SELECT ID
FROM   (SELECT *,
               Rank()
                 OVER(
                   partition BY ID
                   ORDER BY Value) ID2
        FROM   yourtable) a
WHERE ID2 > 1
dbfiddle

SQL recursively creating matching groups based on reference table

Imagine you had a data source like:
Id
Val
Data_Date
1
A
2022-01-01
2
B
2022-01-05
3
C
2022-01-09
4
D
2022-01-31
5
E
2022-02-01
With a reference table matching values in this way:
Target_Val
Matching_Val
Valid_Start
Valid_End
B
A
2022-01-04
2022-01-06
C
B
2022-01-09
2022-01-09
D
A
2022-01-31
2022-01-31
Imagine you want to create a table grouping values together where there is a match in the reference table within X days, say 4.
And you want to apply this matching recursively.
Output would be something like this:
Group_Id
Id
1
1
1
2
1
3
2
4
3
5
The logic here would be that C matches to B in the appropriate date range, and B matches to A in the appropriate date range, therefore they are all one group.
But although D matches to A, it is too far apart (greater than 4 days). And E doesn't match to anything.
There could be any depth (A > B > C > D ...)
Is there an appropriate algorithm in SQL to accomplish this? The values of the group IDs are unimportant and just meant to group data points together.
Here's my attempt. You do indeed need a recursive CTE, but you need to join the source table to groups table and then join back to the source table to ensure that the child fits within the parent's 4 day window. E.g. in the case of D and A, as you mention, they match, but they aren't close enough to be counted.
Then I added a calc to work out which rows were valid hierarchies and used that for the recursive join, because we can exclude anything not part of a hierachy.
After that we need to order the records by their depth so we know which parent record is first, e.g. in the case of A > B > C.
Then DENSE_RANK over the results to get your final groups. This will need some testing with deeper levels of recursion though, but this should point you in the right direction:
CREATE TABLE SourceData
(
Id INTEGER,
Val CHAR(1),
Data_Date DATE
);
CREATE TABLE Groups
(
Target_Val CHAR(1),
Matching_Val CHAR(1),
Valid_Start DATE,
Valid_End DATE
);
INSERT INTO SourceData (Id, Val, Data_Date) VALUES (1,'A','2022-01-01');
INSERT INTO SourceData (Id, Val, Data_Date) VALUES (2,'B','2022-01-05');
INSERT INTO SourceData (Id, Val, Data_Date) VALUES (3,'C','2022-01-09');
INSERT INTO SourceData (Id, Val, Data_Date) VALUES (4,'D','2022-01-31');
INSERT INTO SourceData (Id, Val, Data_Date) VALUES (5,'E','2022-02-01');
INSERT INTO Groups (Target_Val, Matching_Val, Valid_Start, Valid_End ) VALUES ('B','A','2022-01-04','2022-01-06');
INSERT INTO Groups (Target_Val, Matching_Val, Valid_Start, Valid_End ) VALUES ('C','B','2022-01-09','2022-01-09');
INSERT INTO Groups (Target_Val, Matching_Val, Valid_Start, Valid_End ) VALUES ('D','A','2022-01-31','2022-01-31');
WITH sourceCTE AS
(
SELECT sd.Id, sd.Val, sd.Data_Date, g.Valid_Start, g.Valid_End, IIF(s.Val IS NULL, sd.Val, g.Matching_Val) [ParentVal], CAST(NULL AS DATE) [start], CAST(NULL AS DATE) [end], 1 [Depth],
IIF(s.Val IS NULL, 0, 1) IsHeirarchy
FROM SourceData sd
LEFT JOIN Groups g ON g.Target_Val = sd.Val AND sd.Data_Date BETWEEN g.Valid_Start AND g.Valid_End
LEFT JOIN SourceData s ON s.Val = g.Matching_Val AND ABS(DATEDIFF(DAY, s.Data_Date, sd.Data_Date)) < 5
UNION ALL
SELECT s.Id, s.Val, s.Data_Date, g.Valid_Start, g.Valid_End, g.Matching_Val, g.Valid_Start, g.Valid_End, s.[Depth] + 1, 1
FROM sourceCTE s
INNER JOIN Groups g ON g.Target_Val = s.[ParentVal] AND s.IsHeirarchy = 1
),
ResultCTE AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY Id ORDER BY [Depth] DESC) [RNum]
FROM sourceCTE
)
SELECT DENSE_RANK() OVER (ORDER BY ParentVal) [Group_Id], Id
FROM ResultCTE
WHERE [RNum] = 1
Here's a working fiddle.
I can't promise this is the best solution, because just like the query optimiser I gave up after about 2 hours, ha.
Also, for any future questions, please provide sample data in script format to save time creating the structure.

Add column to ensure composite key is unique

I have a table which needs to have a composite primary key based on 2 columns (Material number, Plant).
For example, this is how it is currently (note that these rows are not unique):
MATERIAL_NUMBER PLANT NUMBER
------------------ ----- ------
000000000000500672 G072 1
000000000000500672 G072 1
000000000000500672 G087 1
000000000000500672 G207 1
000000000000500672 G207 1
However, I'll need to add the additional column (NUMBER) to the composite key such that each row is unique, and it must work like this:
For each MATERIAL_NUMBER, for each PLANT, let NUMBER start at 1 and increment by 1 for each duplicate record.
This would be the desired output:
MATERIAL_NUMBER PLANT NUMBER
------------------ ----- ------
000000000000500672 G072 1
000000000000500672 G072 2
000000000000500672 G087 1
000000000000500672 G207 1
000000000000500672 G207 2
How would I go about achieving this, specifically in SQL Server?
Best Regards!
SOLVED.
See below:
SELECT MATERIAL_NUMBER, PLANT, (ROW_NUMBER() OVER (PARTITION BY MATERIAL_NUMBER, PLANT ORDER BY VALID_FROM)) as NUMBER
FROM Table_Name
Will output the table in question, with the NUMBER column properly defined
Suppose this is actual table,
create table #temp1(MATERIAL_NUMBER varchar(30),PLANT varchar(30), NUMBER int)
Suppose you want to insert only single record then,
declare #Num int
select #Num=isnull(max(number),0) from #temp1 where MATERIAL_NUMBER='000000000000500672' and PLANT='G072'
insert into #temp1 (MATERIAL_NUMBER,PLANT , NUMBER )
values ('000000000000500672','G072',#Num+1)
Suppose you want to insert bulk record.Your bulk record sample data is like
create table #temp11(MATERIAL_NUMBER varchar(30),PLANT varchar(30))
insert into #temp11 (MATERIAL_NUMBER,PLANT)values
('000000000000500672','G072')
,('000000000000500672','G072')
,('000000000000500672','G087')
,('000000000000500672','G207')
,('000000000000500672','G207')
You want to insert `#temp11` in `#temp1` maintaining number id
insert into #temp1 (MATERIAL_NUMBER,PLANT , NUMBER )
select t11.MATERIAL_NUMBER,t11.PLANT
,ROW_NUMBER()over(partition by t11.MATERIAL_NUMBER,t11.PLANT order by (select null))+isnull(maxnum,0) as Number from #temp11 t11
outer apply(select MATERIAL_NUMBER,PLANT,max(NUMBER)maxnum from #temp1 t where t.MATERIAL_NUMBER=t11.MATERIAL_NUMBER
and t.PLANT=t11.PLANT group by MATERIAL_NUMBER,PLANT) t
select * from #temp1
drop table #temp1
drop table #temp11
Main question is Why you need number column ? In mot of the cases you don't need number column,you can use ROW_NUMBER()over(partition by t11.MATERIAL_NUMBER,t11.PLANT order by (select null)) to display where you need. This will be more efficient.
Or tell the actual situation and number of rows involved where you will be needing Number column.

Select random row for each group

I have a table like this
ID ATTRIBUTE
1 A
1 A
1 B
1 C
2 B
2 C
2 C
3 A
3 B
3 C
I'd like to select just one random attribute for each ID. The result therefore could look like this (although this is just one of many options
ATTRIBUTE
B
C
C
This is my attempt on this problem
SELECT
"ATTRIBUTE"
FROM
(
SELECT
"ID",
"ATTRIBUTE",
row_number() OVER (PARTITION BY "ID" ORDER BY random()) rownum
FROM
table
) shuffled
WHERE
rownum = 1
however, I don't know if this is a good solution, as I need to introduce row numbers, which is a bit cumbersome.
Do you have a better one?
select distinct on (id) id, attribute
from like_this
order by id, random()
If you only need the attribute column:
select distinct on (id) attribute
from like_this
order by id, random()
Notice that you still need to order by id first as it is a column of the distinct on.
If you only want the distinct attributes:
select distinct attribute
from (
select distinct on (id) attribute
from like_this
order by id, random()
) s
Put a big random number in front of each record (id) and choose within each group the record with the lowest random number.
$ cat test.txt
\N 1 a
\N 2 b
\N 2 c
\N 2 d
\N 3 e
\N 4 f
$ mysql
USE test;
DROP TABLE test;
CREATE TABLE test (id0 INT NOT NULL AUTO_INCREMENT, id VARCHAR(1), attribute VARCHAR(1), PRIMARY KEY (id0));
LOAD DATA LOCAL INFILE '~/mysql/test.txt' INTO TABLE test FIELDS TERMINATED BY '\t';
DROP TABLE rtest;
CREATE TABLE rtest (random INT(8), id0 VARCHAR(1), id VARCHAR(1), attribute VARCHAR(1), PRIMARY KEY (id, random));
INSERT INTO rtest
SELECT CAST(1000000. * rand() AS INT) AS random, test.* FROM test;
SELECT rtest.* FROM rtest,
(SELECT id, min(random) AS random FROM rtest GROUP BY id) AS sample WHERE rtest.random=sample.random AND rtest.id=sample.id;

change "many-to-many" to "one-to-many"

I have following table and data:
create table Foo
(
id int not null,
hid int not null,
value int not null
)
insert into Foo(id, hid, value) values(1,1,1) -- use this as 1 < 3
insert into Foo(id, hid, value) values(1,2,3)
insert into Foo(id, hid, value) values(2,3,3) -- use this as 3 < 5
insert into Foo(id, hid, value) values(2,4,5)
insert into Foo(id, hid, value) values(3,2,2) -- use this or next one as value are the same
insert into Foo(id, hid, value) values(3,3,2)
Currently the "id" and "hid" has many-to-many association, what I want to achieve is to make the "hid" as "one" instead of "many", the rule is to use the minimum "value" in the table, see comment in above sql code.
Is this possible use some query to achieve this instead of a cursor?
Thanks!
SQL 2005:
WITH X AS ( SELECT id, min(value) as minval from Foo group by id )
SELECT * FROM
(
SELECT Foo.*, RANK() OVER ( PARTITION by Foo.id order by Foo.hid, Foo.value ) as Rank
FROM Foo JOIN X on Foo.id = X.id and Foo.value = X.minval
) tmp
WHERE Rank = 1
id hid value Rank
----------- ----------- ----------- --------------------
1 1 1 1
2 3 3 1
3 2 2 1
The first line (WITH clause) gets a set of ids with the min value (my arbitrary choice).
The RANK is used to eliminate duplicates - there may be a better way.
With MySql or SQL 2000 I guess you could do this with a complicated set of subqueries.
Not sure if you are looking for a query, or instructions on how to modify your schema, but here is a query:
select id, min(hid) as hid, min(value) as value
from Foo
group by id