I have a table column like this
A, A, B, C, A, A, B, D, E, E, E
I Would like to label number for each ROW & block like this
(A, 1), (A, 1), (B, 2), (C, 3), (A, 4), (A, 4), (B, 5), (D, 6), (E, 7), (E, 7), (E, 7)
How to do? Thank you.
Assuming you have a table like this:
SELECT * FROM t ORDER BY ord
let, ord
--------
A, 1
A, 2
B, 3
C, 4
A, 5
A, 6
B, 7
D, 8
E, 9
E, 10
E, 11
If you do this:
with cte as(
select let, ord, case when lag(let) over(order by ord) <> let then 1 else 0 end as letchanged
from yourtable
)
select let,
1 + sum(letchanged) over(order by ord rows unbounded preceding) as ctr
from cte
Then you will get:
let, ctr
--------
A, 1
A, 1
B, 2
C, 3
A, 4
A, 4
B, 5
D, 6
E, 7
E, 7
E, 7
Related
When removing duplicate rows in bigquery using multiple columns, a common solution is to use row_number() and partition by the multiple columns that are being removed. In our circumstance, we have a wide table (30 columns) and want to remove duplicates based on the uniqueness of 29 of these columns:
with
t1 as (
select 1 as a, 2 as b, 3 as c, 4 as d, 5 as e, 6 as f, 7 as g, 8 as h, 9 as i union all
select 2 as a, 3 as b, 3 as c, 4 as d, 5 as e, 6 as f, 7 as g, 8 as h, 9 as i union all
select 3 as a, 4 as b, 3 as c, 4 as d, 5 as e, 6 as f, 7 as g, 8 as h, 9 as i union all
select 4 as a, 5 as b, 3 as c, 4 as d, 5 as e, 6 as f, 7 as g, 8 as h, 9 as i union all
select 5 as a, 6 as b, 3 as c, 4 as d, 5 as e, 6 as f, 7 as g, 8 as h, 9 as i union all
select 6 as a, 2 as b, 3 as c, 4 as d, 5 as e, 6 as f, 7 as g, 8 as h, 9 as i
)
In the table above, we want to remove duplicates considering all columns except for column a. Therefore, rows 1 and 6 are duplicates and we want to remove either one, preferably removing the row with the higher value in column a, so row 6 in this example. Is this possible to do without using row_number() over (partition by b,c,d,e,f,g,h,i,...)
You may consider below query
SELECT *
FROM t1
QUALIFY ROW_NUMBER() OVER (
PARTITION BY TO_JSON_STRING((SELECT AS STRUCT t1.* EXCEPT(a)))
ORDER BY a ASC
) = 1;
Another option
select any_value(t).* replace(max(a) as a)
from your_table t
group by to_json_string((select as struct * except(a) from unnest([t])))
with output
I had the following code snippet
WITH sequences AS
(SELECT 1 AS id, [STRUCT(0 AS a, 1 AS b)] AS some_numbers
UNION ALL SELECT 2 AS id, [STRUCT(2 AS b, 4 AS a)] AS some_numbers
UNION ALL SELECT 3 AS id, [STRUCT(5 AS b, 3 AS a), STRUCT (7 AS b, 4 AS a)]
AS some_numbers)
SELECT id AS matching_rows
FROM sequences
WHERE EXISTS (SELECT 1
FROM UNNEST(some_numbers)
WHERE b > 3);
And I got the following output
Row matching_rows
1 2
2 3
As per the where condition the result must be 3rd row only. Why it shows 2nd row also..?
Struct fields are unioned by position, not name. So this:
WITH sequences AS
(SELECT 1 AS id, [STRUCT(0 AS a, 1 AS b)] AS some_numbers
UNION ALL SELECT 2 AS id, [STRUCT(2 AS b, 4 AS a)] AS some_numbers
UNION ALL SELECT 3 AS id, [STRUCT(5 AS b, 3 AS a), STRUCT (7 AS b, 4 AS a)]
AS some_numbers)
SELECT id AS matching_rows
FROM sequences
WHERE EXISTS (SELECT 1
FROM UNNEST(some_numbers)
WHERE b > 3);
is equivalent to this:
WITH sequences AS
(SELECT 1 AS id, [STRUCT(0 AS a, 1 AS b)] AS some_numbers
UNION ALL SELECT 2 AS id, [STRUCT(2 AS a, 4 AS b)] AS some_numbers
UNION ALL SELECT 3 AS id, [STRUCT(5 AS a, 3 AS b), STRUCT (7 AS a, 4 AS b)]
AS some_numbers)
SELECT id AS matching_rows
FROM sequences
WHERE EXISTS (SELECT 1
FROM UNNEST(some_numbers)
WHERE b > 3);
Aside from the first query in the union, too, you can remove the AS <name> aliases since they don't affect the result.
I'm looking for a way to group by a number of columns in BigQuery but keep more detail than otherwise possible of the rows being aggregated.
Data:
ID A B C D
2 1 2 3 4
2 2 3 4 5
1 1 2 1 3
How my query will look something like this:
SELECT id, TAKE_ANY(a), sum(b), count(d), max(d), MAGIC(a,b,c,d) FROM table GROUP BY 1
And the output I would like is something like:
1, 1, 2, 1, 3, [ (1,2,1,3)]
2, 2, 5, 2, 5, [ (1,2,3,4), (2,3,4,5) ]
Anything exist like the MAGIC function that will package the data into a structure of some sort?
Below option (for BigQuery Standard SQL) is for the case when by [ (1,2,3,4), (2,3,4,5) ] you actually mean STRING vs. ARRAY of STRUCTs (which is not very clear from question but i see possible)
#standardSQL
SELECT
id,
ANY_VALUE(a) any_a,
SUM(b) sum_b,
COUNT(d) count_d,
MAX(d) max_d,
FORMAT('[%s]', STRING_AGG(FORMAT('(%i,%i,%i,%i)', a, b, c, d), ',')) a_b_c_d
FROM `project.dataset.table`
GROUP BY id
If to apply to dummy data from your question as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 2 id, 1 a, 2 b, 3 c, 4 d UNION ALL
SELECT 2, 2, 3, 4, 5 UNION ALL
SELECT 1, 1, 2, 1, 3
)
SELECT
id,
ANY_VALUE(a) any_a,
SUM(b) sum_b,
COUNT(d) count_d,
MAX(d) max_d,
FORMAT('[%s]', STRING_AGG(FORMAT('(%i,%i,%i,%i)', a, b, c, d), ',')) a_b_c_d
FROM `project.dataset.table`
GROUP BY id
ORDER BY id
result will be
Row id any_a sum_b count_d max_d a_b_c_d
1 1 1 2 1 3 [(1,2,1,3)]
2 2 1 5 2 5 [(1,2,3,4),(2,3,4,5)]
Inside of your select list, use ARRAY_AGG with the STRUCT function. For example,
SELECT id, ARRAY_AGG(STRUCT(a, b, c, d))
FROM table
GROUP BY id
This will return an array containing all the values of those columns for each group.
I'm quite sure it's possible, but I can't quite remember how.
Consider following table:
A B C
1 1 A
1 2 A
1 2 B
2 1 C
2 2 A
2 2 B
2 2 C
I would like to present it as:
A B C
1 1 A
1 2 A
B
2 1 C
2 2 A
B
C
In other words, group on a unique (A,B).
I was thinking along the lines of GROUP BY ROLLUP, but I can't really figure out how to just make rows null without a group by function.
(note: I imagine this has been asked before, but I just can't find the right search terms to find it. Thanks)
Try this:
create table t
(a number,
b number,
c varchar2(1));
insert into t values(1, 1, 'A');
insert into t values(1, 2, 'A');
insert into t values(1, 2, 'B');
insert into t values(2, 1, 'C');
insert into t values(2, 2, 'A');
insert into t values(2, 2, 'B');
insert into t values(2, 2, 'C');
select case when rn = 1
then a
else null end as a,
case when rn = 1
then b
else null end as b,
c
from (select a, b, c,
row_number() over (partition by a, b order by c) as rn,
row_number() over (order by a, b, c) as rn_total
from t)
order by rn_total;
A B C
- - -
1 1 A
1 2 A
B
2 1 C
2 2 A
B
C
And finally, clean your test environment:
drop table t purge;
You can do it even without a subquery:
select case when row_number() over (partition by a, b order by c) = 1
then a
else null end as a,
case when row_number() over (partition by a, b order by c) = 1
then b
else null end as b,
c
from t
order by t.a, t.b, c ;
Tested at SQL-Fiddle
ID A B c D E(Time)
---------------------------
1 J 1 A B 1
2 J 1 A S 2
3 M 1 A B 1
4 M 1 A B 2
5 M 2 A S 3
6 M 2 A S 4
7 T 1 A B 1
8 T 2 A S 2
9 T 1 A B 3
10 k 1 A B 1
11 k 1 A B 2
I need to find unique values with multiple column with some added condition. The unique value are combination of Col A,B and D.
If Col A has only two rows (like record 1 and 2) and the Column B is same on both data and there is a different value as in Column D , BUT the S are only coming after B we dont want to see those records
If Col A has only multiple rows (like record 3 to 6 ) with different Col B and D,
whereas in COulmn D S are coming after B we dont want to see those records.
If Col A has only multiple rows (like record 7 to 9 ) with different Col B and D,
whereas in COulmn D there is a S before B we want to see those records.
If Col A has only multiple rows (like record 10 to 11 ) with different Col B and same column D we dont want to see those records.
any input , able to get to see the first and last of it using partition by and using unbounded in query...
Seems like the basic logic to look for is on to See if S preceds any B on Column D then show all those records using the partition...
Desired output is row 7-9: THis is Based on logic for same column A , we had a Sell before Buy from customer on Column D when order by column E time.
ID A B C D E(Time)
---------------------------------------------------
7 T 1 A B 1
8 T 2 A S 2
9 T 1 A B 3
I started to write a query to do this, but ran out of "Spare time", your criteria is very hard to follow, If you comment out the "Where" at the bottom of the query it functions but doesn't yet produce your desired effect.
Possibly this can lead you in a direction to achive your goal ...
WITH Src AS (
SELECT 1 AS ID, 'J' AS A, 1 AS B, 'A' AS C, 'B' AS D, 1 AS E
UNION ALL SELECT 2, 'J', 1, 'A', 'S', 2
UNION ALL SELECT 3, 'M', 1, 'A', 'B', 1
UNION ALL SELECT 4, 'M', 1, 'A', 'B', 2
UNION ALL SELECT 5, 'M', 2, 'A', 'S', 3
UNION ALL SELECT 6, 'M', 2, 'A', 'S', 4
UNION ALL SELECT 7, 'T', 1, 'A', 'B', 1
UNION ALL SELECT 8, 'T', 2, 'A', 'S', 2
UNION ALL SELECT 9, 'T', 1, 'A', 'B', 3
UNION ALL SELECT 10, 'k', 1, 'A', 'B', 1
UNION ALL SELECT 11, 'k', 1, 'A', 'B', 2
), ACnt AS (
SELECT A, Count(*) AS Cnt
FROM Src
GROUP BY A
), FirstD AS (
SELECT A, D
FROM Src
WHERE E=1
), FirstSRow AS (
SELECT A, Min(E) AS E
FROM Src
WHERE D='S'
GROUP BY A
), LastBRow AS (
SELECT A, Max(E) AS E
FROM Src
WHERE D='B'
GROUP BY A
), Mins AS (
SELECT A, Min(D) AS D, Min(B) AS B
FROM Src
GROUP BY A
), Maxs AS (
SELECT A, Max(D) AS D, Max(B) AS B
FROM Src
GROUP BY A
)
SELECT Src.*
FROM Src
JOIN ACnt ON ACnt.A=Src.A
JOIN FirstD ON FirstD.A=Src.A
JOIN Mins ON Mins.A=Src.A
JOIN Maxs ON Maxs.A=Src.A
LEFT JOIN FirstSRow ON FirstSRow.A=Src.A
LEFT JOIN LastBRow ON LastBRow.A=Src.A
WHERE
NOT (ACnt.Cnt=2 AND Mins.B=Maxs.B AND Mins.D<>Maxs.D AND FirstSRow.E < LastBRow.E)
AND NOT (ACnt.Cnt>=3 AND Mins.B<>Maxs.B AND Mins.D<>Maxs.D AND FirstD.D='B')
AND (ACnt.Cnt>=3 AND Mins.B<>Maxs.B AND Mins.D<>Maxs.D AND FirstD.D='B')