I am trying to create a column with a case statement, then concatenate the column. Here is an example code.
WITH base AS (
SELECT ID, Date, Action, case when (Date is null then Action || '**' else Action End) Action_with_no_date
FROM <Table_Name>
)
SELECT ID, "array_join"("array_agg"(DISTINCT Action_with_no_date), ', ') Action_with_no_date
FROM base
GROUP BY ID;
Basically, the Action_with_no_date will display the concatenation of values in Action with '**' string added to the values where Date is null for each ID
After I did this, I found an edge case.
If there is the same Action (i.e. play) taken for one ID, and if one action has date and the other one doesn't, then the output will have one play and one play** for the ID
However, I want this to display just one play with **.
Below is the example data for ID = 1
ID Date Action
1 1/2/22 read
1 1/3/22 play
1 NULL play
and expected result for the ID
ID Action_with_no_date
1 read, play**
How should I handle this?
You can calculate ** suffix if there is any row with null per id and action using analytic max() with case expression. Then concatenate suffix with action.
Demo:
with mytable as (
SELECT * FROM (
VALUES
(1, '1/2/22', 'read'),
(1, '1/3/22', 'play'),
(1, NULL, 'play')
) AS t (id, date, action)
)
select id, array_join(array_agg(DISTINCT action||suffix), ', ')
from
(
select id, date, action,
max(case when date is null then '**' else '' end) over(partition by id, action) as suffix
from mytable
)s
group by id
Result:
1 play**, read
Related
Here is my BigQuery table. I am trying to find out the URLs that were displayed but not viewed.
create table dataset.url_visits(ID INT64 ,displayed_url string , viewed_url string);
select * from dataset.url_visits;
ID Displayed_URL Viewed_URL
1 url11,url12 url12
2 url9,url12,url13 url9
3 url1,url2,url3 NULL
In this example, I want to display
ID Displayed_URL Viewed_URL unviewed_URL
1 url11,url12 url12 url11
2 url9,url12,url13 url9 url12,url13
3 url1,url2,url3 NULL url1,url2,url3
Split the each string into an array and unnest them. Do a case to check if the items are in each other and combine to an array or a string.
Select ID, string_agg(viewing ) as viewed,
string_agg(not_viewing ) as not_viewed,
array_agg(viewing ignore nulls) as viewed_array
from (
Select ID ,
case when display in unnest(split(Viewed_URL)) then display else null end as viewing,
case when display in unnest(split(Viewed_URL)) then null else display end as not_viewing,
from (
Select 1 as ID, "url11,url12" as Displayed_URL, "url12" as Viewed_URL UNION ALL
Select 2, "url9,url12,url13", "url9" UNION ALL
Select 3, "url1,url2,url3", NULL UNION ALL
Select 4, "url9,url12,url13", "url9,url12"
),unnest(split(Displayed_URL)) as display
)
group by 1
Consider below approach
select *, (
select string_agg(url)
from unnest(split(Displayed_URL)) url
where url != ifnull(Viewed_URL, '')
) unviewed_URL
from `project.dataset.table`
if applied to sample data in your question - output is
I’m trying to write a query in BigQuery that produces the count of the unique transactions and the combination of column names populated.
I have a table:
TRAN CODE
Full Name
Given Name
Surname
DOB
Phone
The result set I’m after is:
TRAN CODE
UNIQUE TRANSACTIONS
NAME OF POPULATED COLUMNS
A
3
Full Name
A
4
Full Name,Phone
B
5
Given Name,Surname
B
10
Given Name,Surname,DOB,Phone
The result set shows that for TRAN CODE A
3 distinct customers provided Full Name
4 distinct customers provided Full Name and Phone #
For TRAN CODE B
5 distinct customers provided Given Name and Surname
10 distinct customers provided Given Name, Surname, DOB, Phone #
Currently to produce my results I’m doing it manually.
I tried using ARRAY_AGG but couldn’t get it working.
Any advice work be appreciated.
Thank you.
I think you want something like this:
select tran_code,
array_to_string(array[case when full_name is not null then 'full_name' end,
case when given_name is not null then 'given_name' end,
case when surname is not null then 'surname' end,
case when dob is not null then 'dob' end,
case when phone is not null then 'phone' end
], ','),
count(*)
from t
group by 1, 2
Consider below approach - no any dependency on column names rather than TRAN_CODE - quite generic!
select TRAN_CODE,
count(distinct POPULATED_VALUES) as UNIQUE_TRANSACTIONS,
POPULATED_COLUMNS
from (
select TRAN_CODE,
( select as struct
string_agg(col, ', ' order by offset) POPULATED_COLUMNS,
string_agg(val order by offset) POPULATED_VALUES,
string_agg(cast(offset as string) order by offset) pos
from unnest(regexp_extract_all(to_json_string(t), r'"([^"]+?)":')) col with offset
join unnest(regexp_extract_all(to_json_string(t), r'"[^"]+?":("[^"]+?"|null)')) val with offset
using(offset)
where val != 'null'
and col != 'TRAN_CODE'
).*
from `project.dataset.table` t
)
group by TRAN_CODE, POPULATED_COLUMNS
order by TRAN_CODE, any_value(pos)
below is output example
#Gordon_Linoff's solution is the best, but an alternative would be to do the following:
SELECT
TRAN_CODE,
COUNT(TRAN_ROW) AS unique_transactions,
populated_columns
FROM (
SELECT
TRAN_CODE,
TRAN_ROW,
# COUNT(value) AS unique_transactions,
STRING_AGG(field, ",") AS populated_columns
FROM (
SELECT
* EXCEPT(DOB),
CAST(DOB AS STRING ) AS DOB,
ROW_NUMBER() OVER () AS TRAN_ROW
FROM
sample) UNPIVOT(value FOR field IN (Full_name,
Given_name,
Surname,
DOB,
Phone))
GROUP BY
TRAN_CODE,
TRAN_ROW )
GROUP BY
TRAN_CODE,
populated_columns
But this should be more expensive...
example:
there is a json array column(type:string) from a hive table like:
"[{"filed":"name", "value":"alice"}, {"filed":"age", "value":"14"}......]"
how to convert it into :
name age
alice 14
by hive sql?
I've tried lateral view explode but it's not working.
thanks a lot!
This is working example of how it can be parsed in Hive. Customize it yourself and debug on real data, see comments in the code:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field_map['field'] = 'name' then field_map['value'] end) as name,
max(case when field_map['field'] = 'age' then field_map['value'] end) as age --do the same for all fields
from
(
select t.id,
t.str as original_string,
str_to_map(regexp_replace(regexp_replace(trim(a.field),', +',','),'\\{|\\}|"','')) field_map --remove extra characters and convert to map
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14
One more approach using get_json_object:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field = 'name' then value end) as name,
max(case when field = 'age' then value end) as age --do the same for all fields
from
(
select t.id,
get_json_object(trim(a.field),'$.field') field,
get_json_object(trim(a.field),'$.value') value
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14
I use the bellow query to extract a value from a column that stores JSON objects.
The issue with it, it does only pull the first value matching to the regex inside SUBSTRING which is -$4,000.00, is there's a parameter to pass to the SUBSTRING to pull the value -$1,990.00 as well in another column.
SELECT attribute_actions_text
, SUBSTRING(attribute_actions_text FROM '"Member [Dd]iscount:":"(.+?)"') AS column_1
, '' AS column_2
FROM (
VALUES
('[{"Member Discount:":"-$4,000.00"},{"Member discount:":"-$1,990.00"}]')
, (NULL)
) ls(attribute_actions_text)
Desired result :
column_1 column_2
-$4,000.00 -$1,990.00
Try this
WITH data(id,attribute_actions_text) as (
VALUES
(1,'[{"Member Discount:":"-$4,000.00"},{"Member Discount:":"-$1,990.00"}]')
, (2,'[{"Member Discount:":"-$4,200.00"},{"Member Discount:":"-$1,890.00"}]')
, (3,NULL)
), match as (
SELECT
id,
m,
ROW_NUMBER()
OVER (PARTITION BY id) AS r
FROM data, regexp_matches(data.attribute_actions_text, '"Member [Dd]iscount:":"(.+?)"', 'g') AS m
)
SELECT
id
,(select m from match where id = d.id AND r=1) as col1
,(select m from match where id = d.id AND r=2) as col2
FROM data d
Result
1,"{-$4,000.00}","{-$1,990.00}"
2,"{-$4,200.00}","{-$1,890.00}"
3,NULL,NULL
I have it
-- -- -- --
01 A1 B1 99
01 A1 B1 98
02 A2 B2 97
02 A2 B2 96
I need this
-- -- -- --
01 A1 B1 99
98
02 A2 B2 97
96
------------
I can not repeat the data that I will present in a excel,
My result needs to be just so.
In my actual table, the last column are responses of forms and the first columns (those that can not repeat) are customer data as (phone, name ...).
The end result of this "query" will populate a "DataTable" and will be presented in a file "xlsx".
Thanks for sharing knowledge ^^
If you have SQL2012+
SELECT
ISNULL(NULLIF(Column1,LAG(Column1) OVER(ORDER BY Column1)),'')
,ISNULL(NULLIF(Column2,LAG(Column2) OVER(ORDER BY Column1,Column2)),'')
,ISNULL(NULLIF(Column3,LAG(Column3) OVER(ORDER BY Column1,Column2,Column3)),'')
,Column4
FROM #mytable
ORDER BY Column1,Column2,Column3,Column4 DESC
It's a little messy, but you can do it in the database. You basically make a subquery that gets the smallest value, and then join that to the regular table and blank out values that don't match. I created your sample set like this:
CREATE TABLE mytable (N1 VARCHAR(2), A VARCHAR(2), B VARCHAR(2), N2 VARCHAR(2))
INSERT INTO mytable VALUES
('01', 'A1', 'B1', '99'),
('01', 'A1', 'B1', '98'),
('02', 'A2', 'B2', '97'),
('02', 'A2', 'B2', '96')
And then was able to get the result like this:
SELECT
CASE WHEN O.N2 = I.N2 THEN O.N1 ELSE '' END,
CASE WHEN O.N2 = I.N2 THEN O.A ELSE '' END,
CASE WHEN O.N2 = I.N2 THEN O.B ELSE '' END,
O.N2
FROM
(SELECT MAX(N2) AS N2, N1, A, B FROM mytable GROUP BY N1, A, B) I
INNER JOIN mytable O
ON O.A = I.A AND O.B = I.B AND O.N1 = I.N1
ORDER BY O.N1 ASC
we can use ROW_NUMBER to get the sequence and substitute '' for all rows where sequence is greater than 1
with CTE
AS
( SELECT ID, ColumnA, ColumnB, value,ROW_NUMBER() over ( PARTITION by id order by id) as seq
FROM tableA
)
, CTE1
AS
(
select id, ColumnA, ColumnB, value, seq from CTE where seq =1
UNION
SELECT id ,'','', value , seq from CTE where seq >1
)
SELECT case when seq >1 THEN NULL ELSE id END as id, columnA, columnB, value from CTE1
You can achieve what you want using a query.
You haven't provided DDL so I am going to asume your columns are called a, b, c and d respectively
; WITH cte AS (
SELECT a
, b
, c
, d
, Row_Number() OVER (PARTITION BY a, b, c ORDER BY d) As sequence
FROM your_table
)
SELECT CASE WHEN sequence = 1 THEN a ELSE '' END As a
, CASE WHEN sequence = 1 THEN b ELSE '' END As b
, CASE WHEN sequence = 1 THEN c ELSE '' END As c
, d
FROM cte
ORDER
BY a
, b
, c
, d
The idea is to assign an incremental counter to each row, that restarts after each change of a + b + c.
We then use a conditional statement to show a value or not (basically only show on the first instance of each group)
The analytic ROW_NUMBER() function is good for this. I've made up column names because you didn't supply any. To assign a row number by customer, use something like this:
SELECT
Name,
Phone,
Address,
Response,
ROW_NUMBER() OVER (PARTITION BY Name, Phone, Address ORDER BY Response) AS CustRow
FROM myTable
That will assign row number within each customer. Try it yourself and I think it will make sense.
You can put it into a subquery or CTE from there and only show customer ID information like name, phone, and address when you're on the first row for each customer:
SELECT
CASE WHEN CustRow = 1 THEN Name ELSE '' END AS Name,
CASE WHEN CustRow = 1 THEN Phone ELSE '' END AS Phone,
CASE WHEN CustRow = 1 THEN Address ELSE '' END AS Address,
Response
FROM (
SELECT
Name,
Phone,
Address,
Response,
ROW_NUMBER() OVER (PARTITION BY Name, Phone, Address ORDER BY Response) AS CustRow
FROM myTable) custSubquery
ORDER BY Name, Phone, Address
The custSubquery on the second-to-last line is because SQL Server requires all subqueries to be aliased, even if the alias isn't used.
The most important thing is to determine how your last column will be ordered for display and to make sure that it's consistent in the ROW_NUMBER() function as well as the final ORDER BY.
If you need more help, please supply table and column names, and specify how results are ordered within each customer.