Select random row for each group - sql

I have a table like this
ID ATTRIBUTE
1 A
1 A
1 B
1 C
2 B
2 C
2 C
3 A
3 B
3 C
I'd like to select just one random attribute for each ID. The result therefore could look like this (although this is just one of many options
ATTRIBUTE
B
C
C
This is my attempt on this problem
SELECT
"ATTRIBUTE"
FROM
(
SELECT
"ID",
"ATTRIBUTE",
row_number() OVER (PARTITION BY "ID" ORDER BY random()) rownum
FROM
table
) shuffled
WHERE
rownum = 1
however, I don't know if this is a good solution, as I need to introduce row numbers, which is a bit cumbersome.
Do you have a better one?

select distinct on (id) id, attribute
from like_this
order by id, random()
If you only need the attribute column:
select distinct on (id) attribute
from like_this
order by id, random()
Notice that you still need to order by id first as it is a column of the distinct on.
If you only want the distinct attributes:
select distinct attribute
from (
select distinct on (id) attribute
from like_this
order by id, random()
) s

Put a big random number in front of each record (id) and choose within each group the record with the lowest random number.
$ cat test.txt
\N 1 a
\N 2 b
\N 2 c
\N 2 d
\N 3 e
\N 4 f
$ mysql
USE test;
DROP TABLE test;
CREATE TABLE test (id0 INT NOT NULL AUTO_INCREMENT, id VARCHAR(1), attribute VARCHAR(1), PRIMARY KEY (id0));
LOAD DATA LOCAL INFILE '~/mysql/test.txt' INTO TABLE test FIELDS TERMINATED BY '\t';
DROP TABLE rtest;
CREATE TABLE rtest (random INT(8), id0 VARCHAR(1), id VARCHAR(1), attribute VARCHAR(1), PRIMARY KEY (id, random));
INSERT INTO rtest
SELECT CAST(1000000. * rand() AS INT) AS random, test.* FROM test;
SELECT rtest.* FROM rtest,
(SELECT id, min(random) AS random FROM rtest GROUP BY id) AS sample WHERE rtest.random=sample.random AND rtest.id=sample.id;

Related

SQL - group by occurrence and return id

I have a table of IDs and value:
ID Value
X 1
X 1
X 2
Y 5
Y 5
Y 5
Z 3
Z 6
I want to see which ID contains more than 1 different value. In this case return ID X and Y because X contains[1,2] and Z contains [3,6]:
ID
X
Z
I have tried this:
select ID from
(
SELECT ID
,count(*) over (partition by [Value]) as c
FROM mytable
) a
where c>1
But this is not returning the desired answer
I prefer aggregating this way:
SELECT ID
FROM mytable
GROUP BY ID
HAVING MIN(Value) <> MAX(Value);
On many databases, the above HAVING clause will be sargable, meaning that an index on (ID, Value) can be used. The version which checks COUNT(DISTINCT Value) may not be able to use such an index.
Try this,
SELECT ID
FROM mytable
GROUP BY ID
HAVING COUNT(DISTINCT Value) > 1;
Just group them by ID and check wheter it got more than 1 occurrencies in Value field. Something like this
SELECT ID
FROM table
GROUP BY ID
HAVING COUNT(DISTINCT Value) > 1
CREATE TABLE yourtable(
ID VARCHAR(30) NOT NULL
,Value int NOT NULL
);
INSERT INTO yourtable
(ID,Value) VALUES
('X',1),
('X',1),
('X',2),
('Y',5),
('Y',5),
('Y',5),
('Z',3),
('Z',6);
Other approaches are far better,but I used Rank and Subquery to distinguish ID with more than one occurrence.
SELECT ID
FROM   (SELECT *,
               Rank()
                 OVER(
                   partition BY ID
                   ORDER BY Value) ID2
        FROM   yourtable) a
WHERE ID2 > 1
dbfiddle

Need Sorting With External Array or Comma Separated data

Am working with PostgreSQL 8.0.2, I have table
create table rate_date (id serial, rate_name text);
and it's data is
id rate_name
--------------
1 startRate
2 MidRate
3 xlRate
4 xxlRate
After select it will show data with default order or order by applied to any column of same table. My requirement is I have separate entity from where I will get data as (xlRate, MidRate,startRate,xxlRate) so I want to use this data to sort the select on table rate_data. I have tried for values join but it's not working and no other solution am able to think will work. If any one have idea please share detail.
Output should be
xlRate
MidRate
startRate
xxlRate
my attempt/thinking.
select id, rate_name
from rate_date r
join (
VALUES (1, 'xlRate'),(2, 'MidRate')
) as x(a,b) on x.b = c.rate_name
I am not sure if this is helpful but in Oracle you could achieve that this way:
select *
from
(
select id, rate_name,
case rate_name
when 'xlRate' then 1
when 'MidRate' then 2
when 'startRate' then 3
when 'xxlRate' then 4
else 100
end my_order
from rate_date r
)
order by my_order
May be you can do something like this in PostgreSQL?

SQL grouping by distinct values in a multi-value string column

(I want to perform a group-by based on the distinct values in a string column that has multiple values
The said column has a list of strings in a standard format separated by commas. The potential values are only a,b,c,d.
For example the column collection (type: String) contains:
Row 1: ["a","b"]
Row 2: ["b","c"]
Row 3: ["b","c","a"]
Row 4: ["d"]`
The expected output is a count of unique values:
collection | count
a | 2
b | 3
c | 2
d | 1
For all the below i used this table:
create table tmp (
id INT auto_increment,
test VARCHAR(255),
PRIMARY KEY (id)
);
insert into tmp (test) values
("a,b"),
("b,c"),
("b,c,a"),
("d")
;
If the possible values are only a,b,c,d you can try one of this:
Tke note that this will only works if you have not so similar values like test and test_new, because then the test would be joined also with all test_new rows and the count would not match
select collection, COUNT(*) as count from tmp JOIN (
select CONCAT("%", tb.collection, "%") as like_collection, collection from (
select "a" COLLATE utf8_general_ci as collection
union select "b" COLLATE utf8_general_ci as collection
union select "c" COLLATE utf8_general_ci as collection
union select "d" COLLATE utf8_general_ci as collection
) tb
) tb1
ON tmp.test LIKE tb1.like_collection
GROUP BY tb1.collection;
Which will give you the result you want
collection | count
a | 2
b | 3
c | 2
d | 1
or you can try this one
SELECT
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%a%') as a_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%b%') as b_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%c%') as c_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%d%') as d_count
;
The result would be like this
a_count | b_count | c_count | d_count
2 | 3 | 2 | 1
What you need to do is to first explode the collection column into separate rows (like a flatMap operation). In redshift the only way to generate new rows is to JOIN - so let's CROSS JOIN your input table with a static table having consecutive numbers, and take only ones having id less or equal to number of elements in the collection. Then we'll use split_part function to read the item at correct index. Once we have the exploaded table, we'll do a simple GROUP BY.
If your items are stored as JSON array strings ('["a", "b", "c"]') then you can use JSON_ARRAY_LENGTH and JSON_EXTRACT_ARRAY_ELEMENT_TEXT instead of REGEXP_COUNT and SPLIT_PART respectively.
with
index as (
select 1 as i
union all select 2
union all select 3
union all select 4 -- could be substituted with 'select row_number() over () as i from arbitrary_table limit 4'
),
agg as (
select 'a,b' as collection
union all select 'b,c'
union all select 'b,c,a'
union all select 'd'
)
select
split_part(collection, ',', i) as item,
count(*)
from index,agg
where regexp_count(agg.collection, ',') + 1 >= index.i -- only get rows where number of items matches
group by 1

How to select only the next smaller value

I am trying to select smaller number from the database with the SQL.
I have table in which I have records like this
ID NodeName NodeType
4 A A
2 B B
2 C C
1 D D
0 E E
and other columns like name, and type.
If I pass "4" as a parameter then I want to receive the next smallest number records:
ID NodeName NodeType
2 B B
2 C C
Right now if I am using the < sign then it is giving me
ID NodeName NodeType
2 B B
2 C C
1 D D
0 E E
How can I get this working?
You can use WITH TIES clause:
SELECT TOP (1) WITH TIES *
FROM mytable
WHERE ID < 4
ORDER BY ID DESC
TOP clause in conjunction with WHERE and ORDER BY selects the next smallest value to 4. WITH TIES clause guarantees that all these values will be returned, in case there is more than one.
Demo here
select ID
from dbo.yourtable
where ID in
(
select top 1 ID
from dbo.your_table
where ID < 4
order by ID desc
);
Note: where dbo.your_table is your source table
What this does it uses an inner query to pull the next smallest ID below your selected value. Then the outer query just pulls all records that have that same match to the ID of the next smallest value.
Here's a full working example:
use TestDatabase;
go
create table dbo.TestTable1
(
ID int not null
);
go
insert into dbo.TestTable1 (ID)
values (6), (4), (2), (2), (1), (0);
go
select ID
from dbo.TestTable1
where ID in
(
select top 1 ID
from dbo.TestTable1
where ID < 4
order by ID desc
);
/*
ID
2
2
*/

Percentage by group - oracle

I have this sample.
What I need is getting an average per key not key and value. However, the syntax I used appear to give me the average per key and value.
select avg(value2),KEY,VALUE from testavg
GROUP BY key,value
order by key, value
Doing otherwise will yield a syntax error. The results I need are as follow:
10 A 0.96
10 B 0.04
12 C 1
But the statement I used yields the incorrect results above.
Could this be achieved by issuing 1 single oracle select statement? I have included the statement to create the entire table.
CREATE TABLE "TESTAVG"
( "KEY" NUMBER,
"VALUE" VARCHAR2(20 BYTE),
"VALUE2" NUMBER
)
Insert into TESTAVG (KEY,VALUE,VALUE2) values (10,'A',12);
Insert into TESTAVG (KEY,VALUE,VALUE2) values (10,'A',13);
Insert into TESTAVG (KEY,VALUE,VALUE2) values (10,'B',1);
Insert into TESTAVG (KEY,VALUE,VALUE2) values (12,'C',20);
This query might run faster on larger data - only reads the table once:
select distinct key, value,
sum(value2) over (partition by key, value) / sum(value2) over (partition by key) r
from testavg
/
KEY VALUE R
---------- -------------------- ----------
10 A .961538462
10 B .038461538
12 C 1
select avg(value2),KEY from testavg
GROUP BY key
order by key;
8.66666666666666666666666666666666666667 10
20 12
EDIT: Specs are still not clear but this might be what you need...
with gr1 as (select key,sum(value2) sumvalue
from testavg
group by key)
, gr2 as (select key,value,sum(value2) sumvalue
from testavg
GROUP BY key,value)
select gr1.key,gr2.value,gr2.sumvalue/gr1.sumvalue
from gr1
, gr2
where gr1.key = gr2.key;
10 B 0.0384615384615384615384615384615384615385
12 C 1
10 A 0.9615384615384615384615384615384615384615