Query table with unpredictable number of columns by BigQuery - google-bigquery

Let's say, I have a table from log, it looks like this:
Log Table
I need a result table like:
Result table
But the problem is my dataset is not only 7, but maybe 24, 100 value columns.
My query with 7 value columns is:
select
*
from My_Dataset
unpivot
(status for value in (value_1, value_2, value_3, value_4, value_5, value_6, value_7))```
But is there anyway to automatic this process for value_n?
Thank you.

Consider below approach
select id, arr[offset(1)] as value
from your_table t,
unnest(split(translate(to_json_string(t), '{}"', ''))) kv,
unnest([struct(split(kv, ':') as arr)])
where starts_with(arr[offset(0)], 'value_')
if applied to sample data in your question (i used only three value_N columns but it works for any!)
Another option (maybe less verbose and simpler to swallow)
select id, val
from your_table t, unnest([to_json_string(t)]) json,
unnest(`bqutil.fn.json_extract_keys`(json)) col with offset
join unnest(`bqutil.fn.json_extract_values`(json)) val with offset
using(offset)
where starts_with(col, 'value_')
obviously with same output as above first option

It's possible via SQL scripting.
You have to get a column list of your table first and save it in variable. Then call dynamic query with EXECUTE IMMEDIATE
DECLARE field_list STRING;
SET field_list = ((
SELECT STRING_AGG(column_name) FROM `my_project_id`.my_dataset.INFORMATION_SCHEMA.COLUMNS
WHERE column_name LIKE "value_%" AND table_name = 'my_table'
));
EXECUTE IMMEDIATE "SELECT id, SPLIT(value, '_')[OFFSET(1)] value FROM my_dataset.my_table UNPIVOT (status FOR value IN ("||field_list||"))"

Related

ORACLE TO_CHAR SPECIFY OUTPUT DATA TYPE

I have column with data such as '123456789012'
I want to divide each of each 3 chars from the data with a '/' in between so that the output will be like: "123/456/789/012"
I tried "SELECT TO_CHAR(DATA, '999/999/999/999') FROM TABLE 1" but it does not print out the output as what I wanted. Previously I did "SELECT TO_CHAR(DATA, '$999,999,999,999.99') FROM TABLE 1 and it printed out as "$123,456,789,012.00" so I thought I could do the same for other case as well, but I guess that's not the case.
There is also a case where I also want to put '#' in front of the data so the output will be something like this: #12345678901234. Can I use TO_CHAR for this problem too?
Is these possible? Because when I go through the documentation of oracle about TO_CHAR, it stated a few format that can be use for TO_CHAR function and the format that I want is not listed there.
Thank you in advance. :D
Here is one option with varchar2 datatype:
with test as (
select '123456789012' a from dual
)
select listagg(substr(a,(level-1)*3+1,3),'/') within group (order by rownum) num
from test
connect by level <=length(a)
or
with test as (
select '123456789012.23' a from dual
)
select '$'||listagg(substr((regexp_substr(a,'[0-9]{1,}')),(level-1)*3+1,3),',') within group (order by rownum)||regexp_substr(a,'[.][0-9]{1,}') num
from test
connect by level <=length(a)
output:
1st query
123/456/789/012
2nd query
$123,456,789,012.23
If you wants groups of three then you can use the group separator G, and specify the character to use:
SELECT TO_CHAR(DATA, 'FM999G999G999G999', 'NLS_NUMERIC_CHARACTERS=./') FROM TABLE_1
123/456/789/012
If you want a leading # then you can use the currency indicator L, and again specify the character to use:
SELECT TO_CHAR(DATA, 'FML999999999999', 'NLS_CURRENCY=#') FROM TABLE_1
#123456789012
Or combine both:
SELECT TO_CHAR(DATA, 'FML999G999G999G999', 'NLS_CURRENCY=# NLS_NUMERIC_CHARACTERS=./') FROM TABLE_1
#123/456/789/012
db<>fiddle
The data type is always a string; only the format changes.

Order by generated column

I have following data structure:
CREATE TABLE test(
id INTEGER PRIMARY KEY,
data TEXT NOT NULL
);
INSERT INTO test (data) VALUES ('10.20.3.40'), ('10.100.3'), ('10.20.20.40')
The problem is that i need to order by data column using integer logic (dot-separated string as array of integers).
Using order by it returns data sorted as text:
SELECT data FROM test ORDER BY data
10.100.3
10.20.20.40
10.20.3.40
Result I need to achieve:
10.20.3.40
10.20.20.40
10.100.3
The simplest method to sort it properly without reimplementing arrays in SQLite which I've found is to add zero-padding to each part of data.
So basically I need to:
Select all data from table;
Split value of data column;
Add zero-padding and join it back;
Join newly generated column with reformatted data
Order by this column
What have I already done:
WITH RECURSIVE split_str(source, part) AS (
SELECT '10.20.3.40' || '.', NULL
UNION ALL
SELECT
substr(source, instr(source, '.') + 1),
substr('000000' || source, instr(source, '.'), 6)
FROM split_str WHERE source != ''
)
SELECT group_concat(part, '.') AS new_data FROM split_str WHERE part IS NOT NULL
It splits constant string '10.20.3.40' by dot, add leading zeros to each part and join it back using group_concat(). It returns:
000010.000020.000003.000040
Now I need to apply such a modification to values of data column from test table and somehow use this values for sorting. That's result I'm trying to get:
I'm not an expert in SQL (obviously) and don't understand how to apply expression in WITH clause on each data column separately.
As you can't use GROUP_CONCAT() if you want to preserve the order, just build up another string instead.
Then, only take the records where there's no more 'unpadded' string still to be processed.
WITH RECURSIVE
test_set(original)
AS
(
SELECT '10.20.3.40'
UNION ALL
SELECT '10.100.3'
UNION ALL
SELECT '10.20.20.40'
),
split_str(original, remaining, padded)
AS
(
SELECT original, original || '.', '' FROM test_set
---------
UNION ALL
---------
SELECT
original,
substr(remaining, instr(remaining, '.') + 1),
padded || '.' || substr('000000' || remaining, instr(remaining, '.'), 6)
FROM
split_str
WHERE
remaining != ''
)
SELECT
original,
padded
FROM
split_str
WHERE
remaining = ''
ORDER BY
padded
Demo: db<>fiddle.uk
(You may or may not want to strip the leading ., depending on your needs.)

How to easily remove count=1 on aliased field in SQL?

I have the following data in a table:
GROUP1|FIELD
Z_12TXT|111
Z_2TXT|222
Z_31TBT|333
Z_4TXT|444
Z_52TNT|555
Z_6TNT|666
And I engineer in a field that removes the leading numbers after the '_'
GROUP1|GROUP_ALIAS|FIELD
Z_12TXT|Z_TXT|111
Z_2TXT|Z_TXT|222
Z_31TBT|Z_TBT|333 <- to be removed
Z_4TXT|Z_TXT|444
Z_52TNT|Z_TNT|555
Z_6TNT|Z_TNT|666
How can I easily query the original table for only GROUP's that correspond to GROUP_ALIASES with only one Distinct FIELD in it?
Desired result:
GROUP1|GROUP_ALIAS|FIELD
Z_12TXT|Z_TXT|111
Z_2TXT|Z_TXT|222
Z_4TXT|Z_TXT|444
Z_52TNT|Z_TNT|555
Z_6TNT|Z_TNT|666
This is how I get all the GROUP_ALIAS's I don't want:
SELECT GROUP_ALIAS
FROM
(SELECT
GROUP1,FIELD,
case when instr(GROUP1, '_') = 2
then
substr(GROUP1, 1, 2) ||
ltrim(substr(GROUP1, 3), '0123456789')
else
substr(GROUP1 , 1, 1) ||
ltrim(substr(GROUP1, 2), '0123456789')
end GROUP_ALIAS
FROM MY_TABLE
GROUP BY GROUP_ALIAS
HAVING COUNT(FIELD)=1
Probably I could make the engineered field a second time simply on the original table and check that it isn't in the result from the latter, but want to avoid so much nesting. I don't know how to partition or do anything more sophisticated on my case statement making this engineered field, though.
UPDATE
Thanks for all the great replies below. Something about the SQL used must differ from what I thought because I'm getting info like:
GROUP1|GROUP_ALIAS|FIELD
111,222|,111|111
111,222|,222|222
etc.
Not sure why since the solutions work on my unabstracted data in db-fiddle. If anyone can spot what db it's actually using that would help but I'll also check on my end.
Here is one way, using analytic count. If you are not familiar with the with clause, read up on it - it's a very neat way to make your code readable. The way I declare column names in the with clause works since Oracle 11.2; if your version is older than that, the code needs to be re-written just slightly.
I also computed the "engineered field" in a more compact way. Use whatever you need to.
I used sample_data for the table name; adapt as needed.
with
add_alias (group1, group_alias, field) as (
select group1,
substr(group1, 1, instr(group1, '_')) ||
ltrim(substr(group1, instr(group1, '_') + 1), '0123456789'),
field
from sample_data
)
, add_counts (group1, group_alias, field, ct) as (
select group1, group_alias, field, count(*) over (partition by group_alias)
from add_alias
)
select group1, group_alias, field
from add_counts
where ct > 1
;
With Oracle you can use REGEXP_REPLACE and analytic functions:
select Group1, group_alias, field
from (select group1, REGEXP_REPLACE(group1,'_\d+','_') group_alias, field,
count(*) over (PARTITION BY REGEXP_REPLACE(group1,'_\d+','_')) as count from test) a
where count > 1
db-fiddle

How to use INTO and GROUP BY clause together

SELECT cast ( SUBSTRING ( CAST ("ProcessingDate" AS text), 5, 2 ) as integer),
COUNT(*) INTO resultValue1,resultValue2
FROM "DemoLogs"."Project"
WHERE "Addr" = 'Y' AND "ProcessingDate" >= 20160110
GROUP BY 1
ORDER BY 1;
In my database, the ProcessingDate is stored as YYYYMMDD. So, I am extracting its month from it.
This query is working fine if we remove the INTO clause but, I want to store the result to use it further.
So what should be the datatype of the variable resultValue1 and resultValue2 how to store the data(because data will be multiple).
As I am new to PostgreSQL I don't know how to do this can anybody help me out.
Here resultValue1 & resultValue2 will be tables not variables. you can group by using the column names.
Give some column alias names for both columns and group by using them.
You probably want this.
SELECT cast ( SUBSTRING ( cast ("ProcessingDate" as text),5 , 2 ) as
integer) AS resultValue1, COUNT(*) AS resultValue2
INTO <NewTable> --NewTable will be created with those two columns
FROM "DemoLogs"."Project"
-- conditions
Group By 1
-- other contitions/clauses
;
Kindly refer this INTO Documentation.
Hope this helps.
Try this:
SELECT cast ( SUBSTRING ( cast ("ProcessingDate" as text),5 , 2 ) as integer)resultValue1,
COUNT(*)resultValue2
INTO <Table Name>
FROM "DemoLogs"."Project"
WHERE "Addr" = 'Y'
AND "ProcessingDate" >= '20160110'
Group By 1
Order By 1;
Store the above query in the variable and remove that INTO clause from it.
So, query will be now :
query = SELECT cast ( SUBSTRING ( CAST ("ProcessingDate" AS text), 5, 2 ) as
integer),
COUNT(*)
FROM "DemoLogs"."Project"
WHERE "Addr" = 'Y' AND "ProcessingDate" >= 20160110
GROUP BY 1
ORDER BY 1;
Now declare result_data of type record and use in the following manner:
We are looping over result_data because it gives number of rows as output
and I have declared resultset of type text to store the result (as I needed further)
FOR result_data IN EXECUTE query
LOOP
RAISE NOTICE 'result : %',to_json(result_data);
resultset = resultset || to_json(result_data);
END LOOP;

Get group maxima from combined strings

I have a table with a column code containing multiple pieces of data like this:
001/2017/TT/000001
001/2017/TT/000002
001/2017/TN/000003
001/2017/TN/000001
001/2017/TN/000002
001/2016/TT/000001
001/2016/TT/000002
001/2016/TT/000001
002/2016/TT/000002
There are 4 items in 001/2016/TT/000001: 001, 2016, TT and 000001.
How can I extract the max for every group formed by the first 3 items? The result I want is this:
001/2017/TT/000003
001/2017/TN/000002
001/2016/TT/000002
002/2016/TT/000002
Edit
The subfield separator is /, and the length of subfields can vary.
I use PostgreSQL 9.3.
Obviously, you should normalize the table and split the combined string into 4 columns with proper data type. The function split_part() is the tool of choice if the separator '/' is constant in your string and the length of can vary.
CREATE TABLE tbl_better AS
SELECT split_part(code, '/', 1)::int AS col_1 -- better names?
, split_part(code, '/', 2)::int AS col_2
, split_part(code, '/', 3) AS col_3 -- text?
, split_part(code, '/', 4)::int AS col_4
FROM tbl_bad
ORDER BY 1,2,3,4 -- optionally cluster data.
Then the task is trivial:
SELECT col_1, col_2, col_3, max(col_4) AS max_nr
FROM tbl_better
GROUP BY 1, 2, 3;
Related:
Split comma separated column data into additional columns
Of course, you can do it on the fly, too. For varying subfield length you could use substring() with a regular expression like this:
SELECT max(substring(code, '([^/]*)$')) AS max_nr
FROM tbl_bad
GROUP BY substring(code, '^(.*)/');
Related (with basic explanation for regexp pattern):
Filter strings with regex before casting to numeric
Or to get only the complete string as result:
SELECT DISTINCT ON (substring(code, '^(.*)/'))
code
FROM tbl_bad
ORDER BY substring(code, '^(.*)/'), code DESC;
About DISTINCT ON:
Select first row in each GROUP BY group?
Be aware that data items cast to a suitable type may behave differently from their string representation. The max of 900001 and 1000001 is 900001 for text and 1000001 for integer ...
Use the LEFT and RIGHT functions.
SELECT MAX(RIGHT(code,6)) AS MAX_CODE
FROM yourtable
GROUP BY LEFT(code,12)
check this out, possible helpfull
select
distinct on (tab[4],tab[2]) tab[4],tab[3],tab[2],tab[1]
from
(
select
string_to_array(exe.x,'/') as tab,
exe.x
from
(
select
unnest
(
array
['001/2017/TT/000001',
'001/2017/TT/000002',
'001/2017/TN/000003',
'001/2017/TN/000001',
'001/2017/TN/000002',
'001/2016/TT/000001',
'001/2016/TT/000002',
'001/2016/TT/000001',
'002/2016/TT/000002']
) as x
) exe
) exe2
order by tab[4] desc,tab[2] desc,tab[3] desc;