Rank() with Null first in Bigquery based on multiple columns - sql

I have a data like as shown below
Subject_id T1 T2 T3 T4 T5
1234
1234 21 22 23 24 25
3456 34 31
3456 34 31 36 37 39
5678 65 64 62 61 67
5678 65 64 62 67
9876 12 13 14 15 16
4790 47 87 52 13 16
As you can see above, subject_ids 1234,3456 and 5678 are repeating.
I would like to remove those repeating subjects when they have null/empty/blank value in any of the columns like T1,T2,T3,T4,T5.
Now the problem is in real time, I have more than 250 columns and not sure whether I can put 250 where clause checking for null value. So, I was trying with row_number(), rank(). Not sure which one is better. The below is what I was trying
SELECT *,ROW_NUMBER() OVER(PARTITION BY subject_id,T1,T2,T3,T4,T5) NULLS FIRST
from table A;
But it throws syntax error Syntax error: Unexpected keyword NULLS at [1:62]
I expect my output to be like below
Subject_id T1 T2 T3 T4 T5
1234 21 22 23 24 25
3456 34 31 36 37 39
5678 65 64 62 61 67
9876 12 13 14 15 16
4790 47 87 52 13 16
As you can see, the output doesn't contain rows which had at least 1 null/empty/blank value in T1,T2,T3,T4,T5 columns.
Can help please?

Below is for BigQuery Standard SQL
#standardSQL
SELECT *
FROM `project.dataset.table` t
WHERE NOT REGEXP_CONTAINS(FORMAT('%t', t), r'NULL')
If to apply to sample data from your question - output is
Row Subject_id t1 t2 t3 t4 t5
1 1234 21 22 23 24 25
2 3456 34 31 36 37 39
3 5678 65 64 62 61 67
4 9876 12 13 14 15 16
5 4790 47 87 52 13 16

I think you want:
SELECT *,
ROW_NUMBER() OVER (PARTITION BY subject_id
ORDER BY (T1 IS NULL OR T2 IS NULL OR T3 IS NULL OR T4 IS NULL OR T5 IS NULL) DESC
)
FROM table A;
I might approach this problem differently, but this appears to be what you are trying to write.

Related

SQL query to extract matching diagonal pairs in SQL Server database

I have a database table (mytable) with 2 columns x and y as shown below, from which I intend to extract rows with matching diagonal pairs of (x,y) and (y,x) e.g., 4 21 and 21 4
x y
86 86
27 27
45 45
95 95
11 11
18 8
85 85
2 2
77 77
91 91
15 15
84 84
51 51
32 32
35 35
8 8
92 92
67 67
62 62
33 33
13 13
15 11
18 18
3 3
38 38
80 80
34 34
6 6
72 72
14 12
44 44
4 22
90 90
47 47
78 78
23 3
42 42
56 56
79 79
55 55
65 65
17 17
64 64
4 4
28 28
19 19
17 9
36 36
25 25
81 81
60 60
48 48
5 5
88 88
7 19
21 21
29 29
52 52
9 17
9 9
13 13
16 10
1 1
31 31
46 46
7 7
58 58
23 23
87 87
83 83
66 66
93 93
24 2
98 98
53 53
20 6
61 61
20 20
96 96
99 99
73 73
2 24
14 14
71 71
5 21
22 4
75 75
6 20
97 97
41 41
26 26
22 22
8 18
74 74
40 40
21 5
94 94
76 76
49 49
11 15
59 59
89 89
68 68
24 24
37 37
12 12
63 63
43 43
16 16
100 100
39 39
25 1
69 69
54 54
50 50
30 30
10 10
I have tried the accepted code on stackoverflow here (enter link description here) on my mytable which gives me the expected results on Oracle DB.
select least(x, y) as x, greatest(x, y) as y
from mytable
group by least(x, y), greatest(x, y)
having count(*) = 2
union all
select x, y
from mytable
where not exists (select 1 from mytable mytable2 where mytable2.y = mytable.x and mytable2.x = mytable2.y)
order by x asc;
Now I need to execute the same query on MS SQL DB but according to my understanding MS SQL DB does not support the least and greatest functions. I have tried to use the case conditions, for instance for the first part of the SQL query on the link provided, I am considering the below but so far I cannot replicate similar results:
select x,y,z
from (
select x, y,
case
when (x < y) then x
when (y > x) then y
end as z
from mytable
group by x, y
) as t
Any suggestions on what I need to consider to complete the query in SQL Server database, so that I produce the final output as below?
It would also be great if somebody has an idea on how I can use SQL's lag() function to assist me in achieving the same result. For instance I am trying something like below.
;with t1 as (
select x as x1, y as y1, lag(x,1) over(order by x asc) as z1
from mytable
),
t2 as (
select x as x2, y as y2, lag(y,1) over(order by x asc) as z2
from mytable
)
select t1.*,t2.*
from t1 full outer join t2 on t1.x1 = t2.x2
Expected output:
x y
2 24
4 22
5 21
6 20
8 18
9 17
11 15
13 13
The equivalent of the functions LEAST() and GREATEST() is to use CASE expressions:
SELECT CASE WHEN x < y THEN x ELSE y END AS x,
CASE WHEN x > y THEN x ELSE y END AS y
FROM mytable
GROUP BY CASE WHEN x < y THEN x ELSE y END,
CASE WHEN x > y THEN x ELSE y END
HAVING COUNT(*) = 2 -- change to COUNT(*) > 1 if each combination may exist more than twice
ORDER BY x, y;
The above query will return a row for a combination of (x, y) that exists twice even if (y, x) does not exist.
If this is not what you want, use a self join and UNION ALL:
SELECT DISTINCT t1.*
FROM mytable t1 INNER JOIN mytable t2
ON t2.x = t1.y AND t2.y = t1.x
WHERE t1.x < t1.y
UNION ALL
SELECT x, y
FROM mytable
WHERE x = y
GROUP BY x, y
HAVING COUNT(*) > 1
ORDER BY x, y;
See the demo.

Sql that uses one table and gives an output of all possible combinations

I have a table with the following information that I would like the output to be formatted with every combination. For each record there should be an instance of one other record next to it until it has gone through the complete file. What i want to do is use the 4 values to calculate a relationship between Vaue1 / Value2 and new Value1/ new Value2
id Value1 value2
100 34 48
101 35 45
102 22 15
103 35 17
104 37 10
and the output should be
100 34 48 101 35 45
100 34 48 102 22 15
100 34 48 103 35 17
100 34 48 104 37 10
101 35 45 102 22 15
101 35 45 103 35 17
101 35 45 104 37 10
102 22 15 103 35 17
102 22 15 104 37 10
103 22 15 104 37 10
As can been seen those are all the combinations of the sql table but i have thousands of these i want to do.
Will there be a sql query that i could get this formatting and going through the table making new rows on the output that are not duplicate.
Thank you
You can use join:
select t1.id, t1.value1, t1.value2, t2.id, t2.value1, t2.value2
from t t1 join
t t2
on t1.id < t2.id
order by t1.id, t2.id;

How to Batch select in oracle sql

i have one mock table table_a as below:
id a b c d
1 11 22 33 44
2 22 33 44 55
3 33 44 55 66
4 44 55 66 77
5 55 66 77 88
6 66 77 88 99
7 77 88 99 100
8 88 99 11 22
suppose the known info is c and d, if i want to get entry id 2 & 6, i can run
' select * from table_a where (c, d) in ((44,55), (88,99))'.
Here is my question. If this table has 1 million rows , and i want to get 1 thousand rows out , just by knowing their c and d values, is there any better way to do it? My concern to use above script to do it is performance. Thanks.
If you have an index on (c, d), then Oracle should use the index for the in query:
create index idx_table_a_c_d on table_a(c, d);

How to divide a result set into equal parts?

I have a table new_table
ID PROC_ID DEP_ID OLD_STAFF NEW_STAFF
1 15 43 58 ?
2 19 43 58 ?
3 29 43 58 ?
4 31 43 58 ?
5 35 43 58 ?
6 37 43 58 ?
7 38 43 58 ?
8 39 43 58 ?
9 58 43 58 ?
10 79 43 58 ?
How I can select all proc_ids and update new_staff, for example
ID PROC_ID DEP_ID OLD_STAFF NEW_STAFF
1 15 43 58 15
2 19 43 58 15
3 29 43 58 15
4 31 43 58 15
5 35 43 58 23
6 37 43 58 23
7 38 43 58 23
8 39 43 58 28
9 58 43 58 28
10 79 43 58 28
15 - 4(proc_id)
23 - 3(proc_id)
28 - 3(proc_id)
58 - is busi
where 15, 23, 28 and 58 staffs in one dep
"how to divide equal parts"
Oracle has a function, ntile() which splits a result set into equal buckets. For instance this query puts your posted data into four buckets:
SQL> select id
2 , proc_id
3 , ntile(4) over (order by id asc) as gen_staff
4 from new_table;
ID PROC_ID GEN_STAFF
---------- ---------- ----------
1 15 1
2 19 1
3 29 1
4 31 2
5 35 2
6 37 2
7 38 3
8 39 3
9 58 4
10 79 4
10 rows selected.
SQL>
This isn't quite the solution you want but you need to clarify your requirements before it's possible to provide a complete answer.
update new_table
set new_staff='15'
where ID in('1','2','3','4')
update new_table
set new_staff='28'
where ID in('8','9','10')
update new_table
set new_staff='23'
where ID in('5','6','7')
Not sure if this is what you mean.

SQL I need the highest number from column + count duplicate values

I'm looking for a query that gives a list of the RepairCost for each BikeNumber,
but the duplicate values have to be counted as well. So BikeNumber 18 cost total 22 + 58 = 80
Id RepairCost BikeNumber
16 82 23
88 51 20
12 20 19
33 22 **18**
40 58 **18**
69 41 17
10 2 16
66 35 15
If i understand the question, the query is pretty simple:
SELECT BikeNumber, SUM(RepairCost)
FROM YourTable
GROUP BY BikeNumber