Create a nested json string in bigquery - sql

i want to create a nested json string where one phone number returns all product_name in one row only
I have tried but the output of TO_JSON_STRING isn't what i need. Here is the image of the query result
Image
Here the query that i used:
select cus.phone,
TO_JSON_STRING(STRUCT(
line.product_name
)) as attributes
from `dtm_med.t1_customer` cus
left join `dtm_med.t2_toa_total_sales_line` line on cus.phone = line.phone
left join `med_product.raw_cms_users` u on u.id = line.patient_id
where date_diff(current_date(), date(latest_order_date), week) < 26
and sale_contribution > 3000000
and transaction_count > 2
I want all the product_name in one row and only one phone number, not duplicated.
is there a way to do that in bigquery?

Here This might Help you.
Credit to the Helpers of this question:
listagg function alternative in bigquery
You can use STRING_AGG() for csv or ARRAY_AGG() if you want a list-like structure (array). Then GROUP BY the other two columns.

Related

Count the most popular occurrences of a hashtag in a string column postgreSQL

I have a column in my dataset with the following format:
hashtags
1 [#newyears, #christmas, #christmas]
2 [#easter, #newyears, #fourthofjuly]
3 [#valentines, #christmas, #easter]
I have managed to count the hashtags like so:
SELECT hashtags, (LENGTH(hashtags) - LENGTH(REPLACE(hashtags, ',', '')) + 1) AS hashtag_count
FROM full_data
ORDER BY hashtag_count DESC NULLS LAST
But I'm not sure if it's possible to count the occurrences of each hashtag. Is it possible to return the count of the most popular hashtags in the following format:
hashtags count
christmas 3
newyears 2
The datatype is just varchar, but I'm a bit confused on how I should approach this. Any help would be appreciated!
That's a bad idea to store this data. It's risky because we don't know whether the text will always be stored in exactly this form. Better save the different strings in separate columns.
Anyway, if you can't improve that and must deal with this structure, we could basically use a combination of UNNEST, STRING_TO_ARRAY and GROUP BY to split the hashtags and count them.
So the general idea is something like this:
WITH unnested AS
(SELECT
UNNEST(STRING_TO_ARRAY(hashtags, ',')) AS hashtag
FROM full_data)
SELECT hashtag, COUNT(hashtag)
FROM unnested
GROUP BY hashtag
ORDER BY COUNT(hashtag) DESC;
Due to the braces and spaces within your column, this will not produce the correct result.
So we could additionaly use TRIM and TRANSLATE to get rid of all other things except the hashtags.
With your sample data, following construct will produce the intended outcome:
WITH unnested AS
(SELECT
TRIM(TRANSLATE(UNNEST(STRING_TO_ARRAY(hashtags, ',')),'#,[,]','')) AS hashtag
FROM full_data)
SELECT hashtag, COUNT(hashtag)
FROM unnested
GROUP BY hashtag
ORDER BY COUNT(hashtag) DESC;
See here
But as already said, this is unpleasant and risky.
So if possible, find out which hashtags are possible (it seems these are all special days) and then create columns or a mapping table for them.
This said, store 0 or 1 in the column to indicate whether the hashtag appears or not and then sum the values per column.
I think you should split all the data in Array to record and then count it with Group by. Something like this query
SELECT hashtag, count(*) as hashtag_count
FROM full_data, unnest(hashtags) s(hashtag)
GROUP BY hashtag
ORDER BY hashtag_count DESC
Hopefully, it will match your request!
You can do it as follows :
select unnest(string_to_array(REGEXP_REPLACE(hashtags,'[^\w,]+','','g'), ',')) as tags, count(1)
from full_data
group by tags
order by count(1) desc
Result :
tags count
christmas 3
newyears 2
easter 2
fourthofjuly 1
valentines 1
REGEXP_REPLACE to remove any special characters.
string_to_array to generate an array
unnest to expand an array to a set of rows
Demo here

Postgresql ARRAY_AGG on array only returns first value

In Postgres 10 I'm having an issue converting an integer to a weekday name and grouping all record values via ARRAY_AGG to form a string.
The following subquery only returns the first value in the arrays indexed by timetable_periods.day (which is an integer)
SELECT ARRAY_TO_STRING(ARRAY_AGG((ARRAY['Mon','Tue','Wed','Thu','Fri','Sat','Sun'])[timetable_periods.day]), '-')
FROM timetable_periods
WHERE courses.id = timetable_periods.course_id
GROUP BY timetable_periods.course_id
whereas this shows all days concatenated in a string, as expected:
SELECT ARRAY_TO_STRING(ARRAY_AGG(timetable_periods.day), ', ')
FROM timetable_periods
WHERE courses.id = timetable_periods.course_id
GROUP BY timetable_periods.course_id
E.G. A Course has 2 timetable_periods, with day values 0 and 2 (i.e. Monday and Wednesday)
The first query only returns "Tue" instead of "Mon, Wed" (so both an indexing issue and only returning the first day).
The second query returns "0, 2" as expected
Am I doing something wrong in the use of ARRAY( with the weeknames?
Thanks
Update: The queries above are subqueries, with the courses table in the main query's FROM
You should post correct SQL statements. I suspect a JOIN of courses and timetable_periods, but courses is missing in the FROM clause. Furthermore, both queries contain AND followed by GROUP BY - this will not work.
From your writings I guess you want something like:
select
c.id,
string_agg((array['Mon','Tue','Wed','Thu','Fri','Sat','Sun'])[tp.day + 1], ', ') as day_names
from
courses c
inner join timetable_periods tp on c.id = tp.course_id
group by
c.id
Your attempts to access the day names array were quite correct. But indexing arrays is 1-based. Concatenating text values can be done with string_agg.

SQL Select n random groups and return all records

I can't seem to find a solution to this exact question, without chaining together 2 or more queries together via pandas manipulation. (I had previously been attempting a random sampling in postgresql in the vein of cur.execute("select distinct group from data where random() < {0}".format(rand_coef)), but I was unable to combine the resulting array into a single query, nor specify the exact n value.)
A hypothetical dataset and query is as follows:
Say I want n = 3 random groups from the following data.
id, group, value
1,a,23
1,a,3
1,b,2
1,a,432
1,b,123
1,d,23
1,d,11
1,c,23
1,c,234
1,a,223
1,c,32
An example result of a query would be n=3 random groups (i.e. b,c,d):
id, group, value
1,b,2
1,b,123
1,d,23
1,d,11
1,c,23
1,c,234
1,c,32
How might this work?
One method would be:
select t.*
from t join
(select group
from t
group by group
order by random()
limit 3
) g
on t.group = g.group;
Note that group is a really bad name for a column, because it is a SQL keyword.

Group By Using Wildcards in Big Query

I have this query:
SELECT SomeTableA.*
FROM SomeTableB
LEFT JOIN SomeTableA USING (XYZ)
GROUP BY SomeTableA.*
I know that I cannot do the GROUP BY part with wildcards. At the same time, I don't really like listing all the columns (can be up to 20) manually.
Could this be added as new feature? Or is there any way how to easily get the list of all 20 columns from SomeTableA for the GROUP BY part?
If you really have the exact query shown in your question - then try below instead - no grouping required
#standardSQL
SELECT DISTINCT *
FROM `project.dataset.tableA`
WHERE xyz IN (SELECT xyz FROM `project.dataset.tableB`)
As of Group By Using Wildcards in Big Query this sounds more like grouping by struct which is not supported so you can submit feature request if you want - https://issuetracker.google.com/issues/new?component=187149&template=0

Oracle Group by issue

I have the below query. The problem is the last column productdesc is returning two records and the query fails because of distinct. Now i need to add one more column in where clause of the select query so that it returns one record. The issue is that the column i need
to add should not be a part of group by clause.
SELECT product_billing_id,
billing_ele,
SUM(round(summary_net_amt_excl_gst/100)) gross,
(SELECT DISTINCT description
FROM RES.tariff_nt
WHERE product_billing_id = aa.product_billing_id
AND billing_ele = aa.billing_ele) productdescr
FROM bil.bill_sum aa
WHERE file_id = 38613 --1=1
AND line_type = 'D'
AND (product_billing_id, billing_ele) IN (SELECT DISTINCT
product_billing_id,
billing_ele
FROM bil.bill_l2 )
AND trans_type_desc <> 'Change'
GROUP BY product_billing_id, billing_ele
I want to modify the select statement to the below way by adding a new filter to the where clause so that it returns one record .
(SELECT DISTINCT description
FROM RRES.tariff_nt
WHERE product_billing_id = aa.product_billing_id
AND billing_ele = aa.billing_ele
AND (rate_structure_start_date <= TO_DATE(aa.p_effective_date,'yyyymmdd')
AND rate_structure_end_date > TO_DATE(aa.p_effective_date,'yyyymmdd'))
) productdescr
The aa.p_effective_date should not be a part of GROUP BY clause. How can I do it? Oracle is the Database.
So there are multiple RES.tariff records for a given product_billing_id/billing_ele, differentiated by the start/end dates
You want the description for the record that encompasses the 'p_effective_date' from bil.bill_sum. The kicker is that you can't (or don't want to) include that in the group by. That suggests you've got multiple rows in bil.bill_sum with different effective dates.
The issue is what do you want to happen if you are summarising up those multiple rows with different dates. Which of those dates do you want to use as the one to get the description.
If it doesn't matter, simply use MIN(aa.p_effective_date), or MAX.
Have you looked into the Oracle analytical functions. This is good link Analytical Functions by Example