Query Distinct Column Names with Highest 2nd Column - sql

My title may be trash so feel free to word it better, but this is what I am trying to write a query for the below data
letter number
A 2
A 1
A 7
B 3
B 9
C 1
C 1
C 0
C 7
C 5
D 8
D 8
D 4
E 2
I want it to display the distinct letters along with the highest number for each letter. So something like this:
A 7
B 9
C 7
D 8
E 2
I have the base of it down I think but just can't seem to get only 1 result for each letter.
SELECT DISTINCT letter, number from "Table name"

Use the following SQL:
select letter,
max(number) as max_number
from your_table
group by letter
order by letter;

Related

Shuffle a specific column in a table on BigQuery

I have a table that looks like this:
id label
1 A
2 A
3 A
4 B
5 C
6 C
7 A
8 A
9 C
10 B
I want to get another column label_shuffled that is the existing column label but shuffled. I need it to be efficient and fast.
Desired output:
id label label_shuffled
1 A A
2 A B
3 A C
4 B A
5 C C
6 C A
7 A C
8 A A
9 C B
10 B A
Any suggestions?
An option is use window function ROW_NUMBER to enumerate the rows randomly and then join:
WITH suffle AS (
SELECT
id,
label,
ROW_NUMBER() OVER () row_number,
ROW_NUMBER() OVER (ORDER BY RAND()) row_number_suffled
FROM labels
)
SELECT
l.id,
l.label,
s.label as label_suffled
FROM suffle l
JOIN suffle s on l.row_number = s.row_number_suffled

How to update table with concatenation

I have table like this
create table aaa (id int not null, data varchar(50), numb int);
with data like this
begin
for i in 1..30 loop
insert into aaa
values (i, dbms_random.string('L',1),dbms_random.value(0,10));
end loop;
end;
now im making this
select a.id, a.data, a.numb,
count(*) over (partition by a.numb order by a.data) count,
b.id, b.data,b.numb
from aaa a, aaa b
where a.numb=b.numb
and a.data!=b.data
order by a.data;
and i want to update every row where those numbers are the same but with different letters, and in result i want to have new data with more than one letter (for example in data column- "a c d e"), i just want to create concatenation within. How can i make that? the point is to make something like group by for number but for that grouped column i would like to put additional value.
that is how it looks like in begining
id | data |numb
1 q 1
2 z 8
3 i 7
4 a 2
5 q 4
6 h 1
7 b 9
8 u 9
9 s 4
That i would like to get at end
id | data |numb
1 q h 1
2 z 8
3 i 7
4 a 2
5 q s 4
7 b u 9
Try this
SELECT MIN(id),
LISTAGG(data,' ') WITHIN GROUP(
ORDER BY data
) data,
numb
FROM aaa GROUP BY numb
ORDER BY 1
Demo
This selects 10 random strings 1 to 4 letters long, letters in words may repeat:
select level, dbms_random.string('l', dbms_random.value(1, 4))
from dual connect by level <= 10
This selects 1 to 10 random strings 1 to 26 letters long, letters do not repeat and are sorted:
with aaa(id, data, numb) as (
select level, dbms_random.string('L', 1),
round(dbms_random.value(0, 10))
from dual connect by level <= 30)
select numb, listagg(data) within group (order by data) list
from (select distinct data, numb from aaa)
group by numb

hive query join on columns which are alias of max yields no result

I got two hive tables, t1 and t2, which have exactly the same content, just like shown below, two columns,'a' and 'b', and 9 rows.
1 a
2 b
3 c
4 d
5 e
6 f
7 g
8 h
9 i
problem is that the following hive_QL gets nothing,
select
t2_t.a,
t2_t.m
from
(select
a,
max(b) as m
from
t1
group by
a
) t1_t
join
(select
a,
max(b) as m
from
t2
group by
a
) t2_t
on
t1_t.m=t2_t.m
but if I change
t2_t.a,
t2_t.m
to '*', the hive_QL works just as normal, the output would be,
1 a 1 a
2 b 2 b
3 c 3 c
4 d 4 d
5 e 5 e
6 f 6 f
7 g 7 g
8 h 8 h
9 i 9 i
my hive client version is 1.2.1,
so please help me to understand this.
The problem happens when IdentityProjectRemover Optimization applied to the Operator Tree of the Hive query, SEL operator that above the FIL operator is removed incorrectly.
The SEL operator was pruning columns out of the input tuples. As a result the query returns incorrect result.
To solve this problem, set hive.optimize.remove.identity.project=false to turn off the optimizer.
For more details, Please refer to this Hive issue HIVE-10996

How to get an index of different category returned by "order by" sql in oracle?

We can easily get a sql result as following:
SQL>select Name, Value from table order by Name;
Name Value
------------
A 1
A 2
B 1
C 5
C 6
C 7
However, is there a way to link the name to a number so that an index of different names can be formed? Suppose we don't know how many different names are in the table and don't know what they are.
Name Value idx
-----------------
A 1 0
A 2 0
B 1 1
C 5 2
C 6 2
C 7 2
This can easily be done using a window function:
select Name,
Value,
dense_rank() over (order by name) - 1 as idx
from table
order by Name;

Finding unique values with multiple columns using certain condition

ID? A B C
--- -- -- --
1 J 1 B
2 J 1 S
3 M 1 B
4 M 1 S
5 M 2 B
6 M 2 S
7 T 1 B
8 T 2 S
9 C 1 B
10 C 1 S
11 C 2 B
12 N 1 S
13 N 2 S
14 N 3 S
15 Q 1 S
16 Q 1 S
17 Z 1 B
I need to find unique values with multiple column with some added condition. The unique value are combination of Col A,B and C.
If Col A has only two rows (like record 1 and 2) and the Column B is same on both data and there is a different value as in Column C then i dont need those records.
If Col A has only multiple rows (like record 3 to 6 ) with different Col B and C combination we want to see those values.
If Col A has multiple rows (like record 7 to 8 ) with different Col B and C combination we want to see those values.
If Col A has only multiple rows (like record 9 to 11 ) with similar/different Col B and C combination we want to see those values.
If Col A has only multiple rows (like record 12onwards ) with similar Col C and similar or different Column B we dont need those values...
If single value like Row 17 there is no need to display either
Tried a lot but not getting exact answer any help is greatly appreciated..
Trying to go through all the logic, I think you want all rows where the values of both columns A and B differ. An easy way to see whether records differ is by looking at the min and max values. And, you can do this using analytic functions:
select A, B, C
from (select t.*,
count(*) over (partition by A) as Acnt,
min(B) over (partition by A) as Bmin,
max(B) over (partition by A) as Bmax,
min(C) over (partition by A) as Cmin,
max(C) over (partition by A) as Cmax
from t
) t
where (Bmin <> Bmax or Cmin <> Cmax)
Your example data does not have any actual duplicates, so I don't think a count(distinct) is necessary. Your rules say nothing about what to do when A only appears once. This version will filter those rows out.