Shuffle a specific column in a table on BigQuery - sql

I have a table that looks like this:
id label
1 A
2 A
3 A
4 B
5 C
6 C
7 A
8 A
9 C
10 B
I want to get another column label_shuffled that is the existing column label but shuffled. I need it to be efficient and fast.
Desired output:
id label label_shuffled
1 A A
2 A B
3 A C
4 B A
5 C C
6 C A
7 A C
8 A A
9 C B
10 B A
Any suggestions?

An option is use window function ROW_NUMBER to enumerate the rows randomly and then join:
WITH suffle AS (
SELECT
id,
label,
ROW_NUMBER() OVER () row_number,
ROW_NUMBER() OVER (ORDER BY RAND()) row_number_suffled
FROM labels
)
SELECT
l.id,
l.label,
s.label as label_suffled
FROM suffle l
JOIN suffle s on l.row_number = s.row_number_suffled

Related

sql - query for all values in table with limit

I have an SQL query which I run in Amazon Athena:
select
A,
B,
C,
D,
from
T
where
A = '1000'
order by
B desc
limit 1
where I order by B and take the first row only for the value 1000 for A. However I want to run this query for all values of A in T i.e for each A in T get the first row only and append to the results.
How do I do this?
Example of table data:
A B C D
1000 '12/01/2021' 1 7
1000 '10/01/2020' 2 8
1333 '06/01/1920' 3 9
1333 '07/01/1920' 4 10
1999 '09/03/1960' 5 11
1999 '09/03/1950' 6 12
and the result I want to get is:
1000 '12/01/2021' 1 7
1333 '07/01/1920' 4 10
1999 '09/03/1960' 5 11
You can try to use ROW_NUMBER window function to make it.
SELECT A,
B,
C,
D
FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY A ORDER BY B DESC) rn
FROM T
) t1
WHERE rn = 1

SQL Order By Custom Sequence

I have a data in this order
Id Value
-- ----
1 a
1 b
1 c
2 a
2 c
3 b
4 c
4 b
4 a
I want to sort data in this order
Id Value
-- ----
1 a
2 a
3 b
4 c
1 b
2 c
4 b
1 c
4 a
You seem to want to intersperse the numbers. For this purpose, you can use row_number():
order by row_number() over (partition by id order by value),
id

SQL query to find the entries corresponding to the maximum count of each type

I have a table X in Postgres with the following entries
A B C
2 3 1
3 3 1
0 4 1
1 4 1
2 4 1
3 4 1
0 5 1
1 5 1
2 5 1
3 5 1
0 2 2
1 2 3
I would like to find out the entries having maximum of Column C for every kind of A and B i.e (group by B) with the most efficient query possible and return corresponding A and B.
Expected Output:
A B C
1 2 3
2 3 1
0 4 1
0 5 1
Please help me with this problem . Thank you
demo: db<>fiddle
Using DISTINCT ON:
SELECT DISTINCT ON (B)
A, B, C
FROM
my_table
ORDER BY B, C DESC, A
DISTINCT ON gives you exactly the first row for an ordered group. In this case B is grouped.
After ordering B (which is necessary): We first order the maximum C (with DESC) to the top of each group. Then (if there are tied MAX(C) values) we order the A to get the minimum A to the top.
Seems like it is a greatest n per group problem:
WITH cte AS (
SELECT *, RANK() OVER (PARTITION BY B ORDER BY C DESC, A ASC) AS rnk
FROM t
)
SELECT *
FROM cte
WHERE rnk = 1
You're not clear which A needs to be considered, the above returns the row with smallest A.
itseems to me you need max()
select A,B, max(c) from table_name
group by A,B
this will work:
select * from (SELECT t.*,
rank() OVER (PARTITION BY A,B order by C) rank
FROM tablename t)
where rank=1 ;

SQL - after sorting, return only rows with certain consecutive values in a column

I have columns name, timestamp, doing. I've already sorted by name, then by timestamp, and I expect that moving down the doing column within a group with the same name looks like A, A, A, B, B, A, A, ... - alternating series of A and B. I need to get only the rows which comprise the first B row after a transition from A to B within a group with the same name.
name timestamp doing
1 1 A
1 2 A
1 3 B
1 4 B
1 5 A
2 2 B
2 4 A
2 6 B
2 8 A
I would like to return
name timestamp doing
1 3 B
2 6 B
But not
2 2 B
because it is not a transition from A to B within name = 2
I think you just want lag():
select t.*
from (select t.*,
lag(doing) over (partition by name order by timestamp) as prev_doing
from t
) t
where prev_doing = 'A' and doing = 'B';

Pairs with no duplicates grouped together

I have a table
ID GROUPID NAME
== ======= ========
1 100 A
2 100 B
3 200 C
4 200 D
5 300 E
6 100 F
I would like to create a table containing the permutation pairs within a group without any pairs that are the same on both first and second that looks like this:
PAIRID FIRST SECOND
====== ===== ======
1 1 2
2 1 6
3 2 1
4 2 6
5 3 4
6 4 3
7 6 1
8 6 2
I would like to do it in PL/SQL or straight SQL inserts if possible. I did this through Java already using a recursive function to go through the permutations.
You could self join the table:
SELECT ROW_NUMBER() OVER (ORDER BY a.id, b.id) AS pairid,
a.id AS FIRST, b.id AS second
FROM mytable a
JOIN mytable b ON a.groupid = b.groupid AND a.id <> b.id
ORDER BY 1 ASC;