Concatenate an array and two values into an array of unique values - google-bigquery

I have a table with three columns. Two of the columns (str1 and str2) are nullable strings and the other is a nullable array of strings (arr). I need to collect all of these into a single column arr while removing possible duplicate values (the original array column is guaranteed not to have duplicates).
The best I've been able to do is add one string column at a time, but I'd like to know if there's a better way:
WITH foo AS (
SELECT 'A' AS str1, 'B' AS str2, ['C', 'D'] AS arr UNION ALL
-- all get concatenated
-- ['C', 'D', 'A', 'B']
SELECT 'A' AS str1, 'B' AS str2, ['C', 'A'] AS arr UNION ALL
SELECT 'A' AS str1, 'A' AS str2, ['C', 'D'] AS arr UNION ALL
-- 'A' is not duplicated
-- ['C', 'A', 'B'] and ['C', 'D', 'A']
SELECT NULL AS str1, 'B' AS str2, ['C', 'A'] AS arr UNION ALL
-- NULL str1 (or str2) is ignored
-- ['C', 'A', 'B']
SELECT 'A' AS str1, 'B' AS str2, NULL AS arr UNION ALL
-- NULL array is ignored, str1 and 2 are array_concat'ed
-- ['A', 'B']
SELECT NULL AS str1, NULL AS str2, NULL AS arr
-- handles all NULLs
-- NULL ([] or [''] would be ok)
)
SELECT CASE WHEN str2 IS NULL
OR str2 IN UNNEST(arr)
THEN arr
WHEN arr IS NULL
THEN [str2]
ELSE ARRAY_CONCAT(arr, [str2])
END AS arr
FROM (
SELECT CASE WHEN str1 IS NULL
OR str1 IN UNNEST(arr)
THEN arr
WHEN arr IS NULL
THEN [str1]
ELSE ARRAY_CONCAT(arr, [str1])
END AS arr
, str2
FROM foo
)
In this implementation, if all the component values are NULL, arr will be NULL. However, an empty array or an array with a single '' would also be acceptable in this situation. Also, the order of items in the final array is irrelevant.
So, is there a cleaner, smarter way of doing this?

Consider below query
SELECT ARRAY(SELECT DISTINCT str
FROM UNNEST([str1, str2] || IFNULL(arr, [])) str
WHERE str IS NOT NULL
) AS arr
FROM foo;

Below is a little unorthodox approach, but still can be useful (at least from learning perspective)
select array(
select distinct str
from unnest(regexp_extract_all(to_json_string(t), r'"([^"]+)"')) str
where not str in ('str1', 'str2', 'arr')
) arr
from foo t

Related

Query to select distinct values of a column and count them, but also detail them by another column

Basiclly how to turn this
Type
Subtype
Notes
A
S1
string1
A
S2
string1
A
S2
string1
A
S2
string1
A
S3
string1
A
S3
string1
into this
Type
Notes
Details
A
string1
S1 (1), S2 (3), S3(2)
is it even possible?
In oracle, I've been going about it with SELECT DISTINCTs and GROUP BYs, some JOINS but I'm not really getting what I want
Aggregate twice:
SELECT type,
notes,
LISTAGG(subtype || ' (' || num_subtypes || ')', ', ')
WITHIN GROUP (ORDER BY subtype) AS details
FROM (
SELECT type,
notes,
subtype,
COUNT(*) AS num_subtypes
FROM table_name
GROUP BY type, notes, subtype
)
GROUP BY type, notes
Which, for the sample data:
CREATE TABLE table_name (Type, Subtype, Notes) AS
SELECT 'A', 'S1', 'string1' FROM DUAL UNION ALL
SELECT 'A', 'S2', 'string1' FROM DUAL UNION ALL
SELECT 'A', 'S2', 'string1' FROM DUAL UNION ALL
SELECT 'A', 'S2', 'string1' FROM DUAL UNION ALL
SELECT 'A', 'S3', 'string1' FROM DUAL UNION ALL
SELECT 'A', 'S3', 'string1' FROM DUAL;
Outputs:
TYPE
NOTES
DETAILS
A
string1
S1 (1), S2 (3), S3 (2)
fiddle

Snowflake: replacing column values

I have a column that contains both numbers and alphanumeric characters in it. For number values in the column, I am just trying to replace them a different set of numbers. For alphanumeric columns, I am replacing them w/ different letters and numbers. Below are a few values:
select * from t1;
1234
AB145C
678BC
8765
3786CA
Below SQL is not working for some reason:
select col1,
case
when regexp_like(col1,'^[A-Z]+$')
then replace(replace(replace(replace(replace(replace(col1,'A','Z'),'B','Y'),'C','X'),'D','W'),'E','V'),'F','U')
when try_to_number(col1) is not null
then round(to_number(col1)*1.5)
end as col1_replaced
from t1;
what I could be doing wrong here?
Output I'm getting now:
COL1 COL1_REPLACED
1234 1851
AB145C NULL
678BC NULL
8765 13148
3786CA NULL
Desired output:
COL1 COL1_REPLACED
1234 1851
AB145C ZY145X
678BC 678YX
8765 13148
3786CA 3786XZ
All branches of a CASE expression generally need to have the same type. Since the first branch is generating text, therefore the ELSE branch should also do the same. You may cast the ROUND expression to text here:
SELECT col1,
CASE WHEN REGEXP_LIKE(col1, '^[A-Z]+$')
THEN REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(col1, 'A', 'Z'), 'B', 'Y'), 'C', 'X'), 'D', 'W'), 'E', 'V'), 'F', 'U')
WHEN TRY_TO_NUMBER(col1) IS NOT NULL
THEN CAST(ROUND(TO_NUMBER(col1)*1.5) AS VARCHAR(15))
END AS col1_replaced
FROM t1;

log file with insert - how to easly combine column name with inserted value

Has anybody a good solution how to combine column name with inserted value from prepared INSERT SQL?
I have log file. In this log file I have INSERT query. This query contains over 100 columns for example:
INSERT INTO tab
(col_001, col_002, col_003, col_004, col_005, col_006, col_007, col_008, col_009, col_010)
VALUES ('a', 'b', 'c,,,', 'd', 'e', 'f', 'g', 'h', 'i', 'j');
Do you have any ideas how to easly combine column name to value like below:
col_001 = 'a'
col_002 = 'b'
col_003 = 'c,,,'
col_004 = 'd'
col_005 = 'e'
col_006 = 'f'
col_007 = 'g'
col_008 = 'h'
col_009 = 'i'
col_010 = 'j
Lets imagine that I need to find what value will be inserted in column col_067.
Thanks.
Use unpivot for convert column to row
select colname,
colvalue
from tab
unpivot
(
colvalue
for colname in (col_001, col_002, col_003, col_004, col_005, col_006, col_007, col_008, col_009, col_010)
) unpiv;
Demo in db<>fiddle

how to custom sort data in sql server

I am using Sql server 2008 r2, I need to order by the following data:
CardNo
R-1
R-2
R-12
R-1A
R-3
R-2B
Result should look like this
CardNo
R-1
R-1A
R-2
R-2B
R-3
R-12
I have tried different combinations in order by clause but of no use like:
select * from [Coll2012-13] where
SUBSTRING(CardNo, 1, 1) IN ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'W', 'X', 'Y', 'Z')
AND SUBSTRING(CardNo, 2, 1) IN ('-')
AND SUBSTRING(CardNo, 3, 1) IN ('1', '2', '3', '4', '5', '6', '7', '8', '9', '0')
and Landmark='Anandbagh' order by LEN(CardNo),CardNo ASC
Assumption: Values are always of the format 'Letter-Alphanumeric string'
Try this:
select card_no
from [Coll2012-13]
order by left(card_no,1),
case
when isnumeric(substring(card_no,3, len(card_no))) = 1
then cast(substring(card_no,3, len(card_no)) as int)
else cast(substring(card_no,3, patindex('%[A-Z]%',substring(card_no,3, len(card_no)))-1) as int)
end,
case
when patindex('%[A-Z]%', substring(card_no,3,len(card_no))) > 0
then substring(card_no,patindex('%[A-Z]%', substring(card_no,3,len(card_no)))+2,1)
end
How this works: First check the starting letter. Next, check if the alphanumeric part is in fact only numeric. If so, get the integer value of that part. If it is not, get the numeric part of it and use that as the sort value. Finally, if the alphanumeric part does contain a letter, use that as another sort value.
Demo here.
You should replace all non digit chars with "" and convert the rest in a number. Then sort by the number.
You can try to use a function as described here SQL Server 2000: how do I return only the number from a phone number column
for this
A simple - but 'should be maintained' - solution is to create a sorting table and join that table to your result set. The table contains all CardNo values and associates a sort order to them.
EDIT:
CREATE TABLE CardNoOrderHelper (
CardNo VARCHAR(16)
, OrderRank INT CONSTRAINT DF_CardNoOrderHelper_OrderRank DEFAULT 0
, PRIMARY KEY CLUSTERED (
CardNo ASC
)
);
-- Fill your table with the expected sort order (dinstinct insert, than adjust the order ranks)
SELECT
*
FROM
[Coll2012-13] AS T
LEFT JOIN CardNoOrderHelper CH
ON T.CardNo = CH.CardNo
ORDER BY
T.OrderRank
As I said, you have to maintain this table. When your resultset is small this can be done manually.
Try this
with cte
AS
(
select
*
,substring(CardNo, 1, charindex('-')-1) RealRank1
,substring(CardNo, charindex('-')+ 1, 10) RealRank2
from
[Coll2012-13]
)
select
*
from
cte
order by
RealRank1
,RealRank2
I have the similar problem. I have used this query below.
To use this query u must know the ID (string) maximum length and I have adjusted the format in this query to use Mark-Number+Alphabet (example R-1,R-1A, R-11, R-11A, R-1AA,R-1B)
Query:
select
b.CardNo, b.separatorIndex
,b.Mark, b.Mark_length
,case b.isNumericMark1 + b.isNumericMark2 + b.isNumericMark3
when 1 then cast (b.Mark1 as int)
when 2 then cast (b.Mark1 + b.Mark2 as int)
when 3 then cast (b.Mark1 + b.Mark2 + b.Mark3 as int)
end as Mark1
from
(
select
a.CardNo
,charindex('-',a.cardNo,0) as separatorIndex
, len(a.cardNo) - charindex('-',a.cardNo,0) as Mark_length
, substring(a.CardNo,0,charindex('-',a.cardNo,0)) as Mark
, substring(a.CardNo,charIndex('-',a.cardNo,0)+1,1) as Mark1
, isnumeric(substring(a.CardNo,charIndex('-',a.cardNo,0)+1,1*1)) as isNumericMark1
, substring(a.CardNo,charIndex('-',a.cardNo,0)+2,1) as Mark2
, isnumeric(substring(a.CardNo,charIndex('-',a.cardNo,0)+2,1)) as isNumericMark2
, substring(a.CardNo,charIndex('-',a.cardNo,0)+3,1) as Mark3
, isnumeric(substring(a.CardNo,charIndex('-',a.cardNo,0)+3,1)) as isNumericMark3
from [Coll2012-13] a
) b
order by Mark,Mark1,Mark_length
Result:
CardNo separatorIndex Mark Mark_length Mark1
-------------------- -------------- -------------------- ----------- -----------
R-1 2 R 1 1
R-1A 2 R 2 1
R-2 2 R 1 2
R-2B 2 R 2 2
R-3 2 R 1 3
R-12 2 R 2 12
Hope this help.

Oracle - DECODE - How will it sort when not every case is specified?

I have to use DECODE to implement custom sort:
SELECT col1, col2 FROM tbl ORDER BY DECODE(col1, 'a', 3, 'b', 2, 'c', 1) DESC
What will happen if col1 has more values that the three specified in decode clause?
DECODE will return NULL, for the values of col1 which are not specified.
The NULL-Values will be placed at the front per default .
if you want to change this behavior you can either define the default value in DECODE
SELECT col1, col2 FROM tbl ORDER BY DECODE(col1, 'a', 3, 'b', 2, 'c', 1, 0) DESC
or NULLS LAST in the order clause
SELECT col1, col2 FROM tbl ORDER BY DECODE(col1, 'a', 3, 'b', 2, 'c', 1) DESC NULLS LAST
the decode function will return NULL value and it is at the bottom of your sort. You can verify it:
select decode('z','a', 3, 'b', 2, 'c', 1) from dual;
you can also control the appearance of the null value with NULLS LAST/NULLS FIRST in the order clause.
Normal it expects some result as a default if not NULL is all you get so add some value at the end like this
SELECT col1, col2 FROM tbl ORDER BY DECODE(col1, 'a', 3, 'b', 2, 'c', 1, 0) DESC
That way if col1 has more values they all will return 0