Group Concatenate Strings (Rows) in BigQuery - sql

I am working with Google BigQuery & I have a query that looks like the following:
SELECT
prod.abc
uniqueid,
variable2,
cust.variable1,
purch.variable2,
from mydata.order
left join
UNNEST(purchases) as purch,
UNNEST(codes_abs) as cod, UNNEST(cod.try_products) as prod
When I do this, this results in a table that looks like this:
|prod.abc| uniqueid | variable2 | ...|
|APP123 | customer1| value | ...|
|BLU155 | customer1| value | ...|
|TRI134 | customer1| value | ...|
|LO123 | customer2| value | ...|
|ZU9274 | customer2| value | ...|
|TO134 | customer3| value | ...|
What I would like to do is to concatenate values in column "prod.abc", group them by "uniqueid" and separate them by ",". I found numerous solutions online, however, since I have unnested other variables in my query, none of the solutions I found seem to work in my case. The values do not need to be ordered in any way. Basically, what I would like to end up with is:
|prod.abc | uniqueid | variable2 | ...|
|APP123, BLU155, TRI134 | customer1| value | ...|
|LO123, ZU9274 | customer2| value | ...|
|TO134 | customer3| value | ...|
It would also be okay to get a table like this where duplicates are kept, as I could remove them later on:
|prod.abc | uniqueid | variable2 | ...|
|APP123, BLU155, TRI134 | customer1| value | ...|
|APP123, BLU155, TRI134 | customer1| value | ...|
|APP123, BLU155, TRI134 | customer1| value | ...|
|LO123, ZU9274 | customer2| value | ...|
|LO123, ZU9274 | customer2| value | ...|
|TO134 | customer3| value | ...|
Any help is much appreciated. Thank you!

Do each unnest separately:
Does aggregation work?
SELECT STRING_AGG(item.abc, ',')
uniqueid, variable2, cust.variable1, purch.variable2
FROM mydata.order LEFT JOIN
UNNEST(purchases) as purch
ON true LEFT JOIN
UNNEST(codes_abs) as cod
ON true LEFT JOIN
UNNEST(cod.try_items) as item
ON true
GROUP BY uniqueid, variable2, cust.variable1, purch.variable2;

Below is for BigQuery Standard SQL
#standardSQL
SELECT
STRING_AGG(prod.abc, ', ') AS abc
uniqueid,
variable2,
cust.variable1,
purch.variable2,
FROM mydata.order
LEFT JOIN UNNEST(purchases) AS purch
LEFT JOIN UNNEST(codes_abs) AS cod
LEFT JOIN UNNEST(cod.try_products) AS prod
GROUP BY uniqueid,
variable2,
cust.variable1,
purch.variable2

Related

Postgresql query substract from one table

I have a one tables in Postgresql and cannot find how to build a query.
The table contains columns nr_serii and deleteing_time. I trying to count nr_serii and substract from this positions with deleting_time.
My query:
select nr_serii , count(nr_serii ) as ilosc,count(deleting_time) as ilosc_delete
from MyTable
group by nr_serii, deleting_time
output is:
+--------------------+
| "666666";1;1 |
| "456456";1;0 |
| "333333";3;0 |
| "333333";1;1 |
| "111111";1;1 |
| "111111";3;0 |
+--------------------+
The part of table with raw data:
+--------------------------------+
| "666666";"2020-11-20 14:08:13" |
| "456456";"" |
| "333333";"" |
| "333333";"" |
| "333333";"" |
| "333333";"2020-11-20 14:02:23" |
| "111111";"" |
| "111111";"" |
| "111111";"2020-11-20 14:08:04" |
| "111111";"" |
+--------------------------------+
And i need substract column ilosc and column ilosc_delete
example:
nr_serii:333333 ilosc:3-1=2
Expected output:
+-------------+
| "666666";-1 |
| "456456";1 |
| "333333";2 |
| "111111";2 |
| ... |
+-------------+
I think this is very simple solution for this but i have empty in my head.
I see what you want now. You want to subtract the number where deleting_time is not null from the ones where it is null:
select nr_serii,
count(*) filter (where deleting_time is null) - count(deleting_time) as ilosc_delete
from MyTable
group by nr_serii;
Here is a db<>fiddle.

SELECTing Related Rows Based on a Single Row Match

I have the following table running on Postgres SQL 9.5:
+---+------------+-------------+
|ID | trans_id | message |
+---+------------+-------------+
| 1 | 1234567 | abc123-ef |
| 2 | 1234567 | def234-gh |
| 3 | 1234567 | ghi567-ij |
| 4 | 8902345 | ced123-ef |
| 5 | 8902345 | def234-bz |
| 6 | 8902345 | ghi567-ij |
| 7 | 6789012 | abc123-ab |
| 8 | 6789012 | def234-cd |
| 9 | 6789012 | ghi567-ef |
|10 | 4567890 | abc123-ab |
|11 | 4567890 | gex890-aj |
|12 | 4567890 | ghi567-ef |
+---+------------+-------------+
I am looking for the rows for each trans_id based on a LIKE query, like this:
SELECT * FROM table
WHERE message LIKE '%def-234%'
This, of course, returns just three rows, the three that match my pattern in the message column. What I am looking for, instead, is all the rows matching that trans_id in groups of messages that match. That is, if a single row matches the pattern, get all the rows with the trans_id of that matching row.
That is, the results would be:
+---+------------+-------------+
|ID | trans_id | message |
+---+------------+-------------+
| 1 | 1234567 | abc123-ef |
| 2 | 1234567 | def234-gh |
| 3 | 1234567 | ghi567-ij |
| 4 | 8902345 | ced123-ef |
| 5 | 8902345 | def234-bz |
| 6 | 8902345 | ghi567-ij |
| 7 | 6789012 | abc123-ab |
| 8 | 6789012 | def234-cd |
| 9 | 6789012 | ghi567-ef |
+---+------------+-------------+
Notice rows 10, 11, and 12 were not SELECTed because there was not one of them that matched the %def-234% pattern.
I have tried (and failed) to write a sub-query to get the all the related rows when a single message matches a pattern:
SELECT sub.*
FROM (
SELECT DISTINCT trans_id FROM table WHERE message LIKE '%def-234%'
) sub
WHERE table.trans_id = sub.trans_id
I could easily do this with two queries, but the first query to get a list of matching trans_ids to include in a WHERE trans_id IN (<huge list of trans_ids>) clause would be very large, and would not be a very inefficient way of doing this, and I believe there exists a way to do it with a single query.
Thank you!
This will do the job I think :
WITH sub AS (
SELECT trans_id
FROM table
WHERE message LIKE '%def-234%'
)
SELECT *
FROM table JOIN sub USING (trans_id);
Hope this help.
Try this:
SELECT ID, trans_id, message
FROM (
SELECT ID, trans_id, message,
COUNT(*) FILTER (WHERE message LIKE '%def234%')
OVER (PARTITION BY trans_id) AS pattern_cnt
FROM mytable) AS t
WHERE pattern_cnt >= 1
Using a FILTER clause in the windowed version of COUNT function we can get the number of records matching the predefined pattern within each trans_id slice. The outer query uses this count to filter out irrelevant slices.
Demo here
You can do this.
WITH trans
AS
(SELECT DISTINCT trans_id
FROM t1
WHERE message LIKE '%def234%')
SELECT t1.*
FROM t1,
trans
WHERE t1.trans_id = trans.trans_id;
I think this will perform better. If you have enough data, you can do an explain on both Sub query and CTE and compare the output.

HiveQl: extract based on a string

I have the following table:
ID | Keyword | Date
87NB | skill,love,hate,funny,very funny | 02/19/2004
27YV | funny,tiger,movie,king | 08/10/2014
92JK | sun,light,funny,baby | 06/27/2015
65TH | moon,cow,bird,car | 04/22/2017
From the above table, i want to obtain ID's of everyone who have "funny" as a keyword. The result would be
ID
87NB
27YV
92JK
you can use split and then the function array_contains
select ID from yourtable where array_contains(split(Keyword, ","), "funny");
select ID
from t
where find_in_set('funny',Keyword) > 0
;
+------+
| id |
+------+
| 87NB |
+------+
| 27YV |
+------+
| 92JK |
+------+

SSRS Distinct Count Of Condition Once Per Group

Let's say I have data similar to this:
|NAME |AMOUNT|RANDOM_FLAG|
|------|------|-----------|
|MARK | 100| X |
|MARK | 400| |
|MARK | 200| X |
|AMY | 100| X |
|AMY | 400| |
|AMY | 300| |
|ABE | 300| |
|ABE | 900| |
|ABE | 700| |
How can I get a distinct count of names with at least one RANDOM_FLAG set. In my total row, I want to see a count of 2 since both Mark and Amy had the flag set, regardless of how many times it is selected. I have tried every thing I can think of in SSRS. I'm guessing there is a way to nest aggregates to get to this, but I can't come up with it. I do have a group on NAME.
You can use a conditional COUNTDISTINCT in SSRS.
=COUNTDISTINCT(
IIF( not ISNOTHING(Fields!Random_Flag.Value) and Fields!Random_Flag <> "",Fields!Name.Value,Nothing),
"DataSetName"
)
Replace DataSetName by the name of your dataset.
select count(name) from table_name where name in (select distinct name from table_name where random_flag="x");

SQL join based on Date

I have two tables:
Table A
+-------+----------+
| prop | str_date |
+-------+----------+
| AL408 | 3/1/2009 |
| AL408 | 4/1/2009 |
| AL408 | 5/1/2009 |
| AL408 | 6/1/2009 |
+-------+----------+
Table B
+---------+-----------+----------+
| prop_id | agrx_date | brand_id |
+---------+-----------+----------+
| AL408 | 5/5/1986 | CI |
| AL408 | 6/30/1994 | CI |
| AL408 | 5/3/1999 | CI |
| AL408 | 4/21/2006 | CI |
| AL408 | 3/20/2009 | QI |
+---------+-----------+----------+
I'd like pull in brand_id into my result query but the brand_id changes accordingly by comparing str_date to agrx_date. For the month after a brand_id has changed via the agrx_date, the result would reflect that new brand_id. All str_dates are monthly values.
The end result would look like this:
+-------+----------+--------+
| prop | str_date | Result |
+-------+----------+--------+
| AL408 | 3/1/2009 | CI |
| AL408 | 4/1/2009 | QI |
| AL408 | 5/1/2009 | QI |
| AL408 | 6/1/2009 | QI |
+-------+----------+--------+
Here's what I have so far (which is not correct) and I'm not sure how to get my end result.
select
a.prop
,a.str_date
,b.agrx_date
,b.brand_id
from tableA a
left join tableB b
on a.prop = b.prop_id
and a.str_date < b.agrx_date
where a.prop = 'AL408'
I'm passing this through Tableau so I cannot use CTE or other temp tables.
You could create a date range using a lead() analytical function. The date range could then be used as part of a theta join to pull in the correct brand. This is a pretty simple way to pull the date value from the next record, see the definition of next_agrx_date below.
The range would be inclusive for the start (>=), but noninclusive on the end (<). You also need to handle the null case for open-ended ranges. You can find this logic in the join below.
select
a.prop
,a.str_date
,b.agrx_date
,b.brand_id
from tableA a
left join
( select
prop
,agrx_date
,brand_id
,lead(agrx_date) over (partition by prop order by agrx_date) next_agrx_date
from tableB ) b
on (b.prop = a.prop and a.str_date >= b.agrx_date and (a.str_date < b.next_agrx_date or b.next_agrx_date is null))
order by prop, str_date
You can use DATE_FORMAT to change the dates to match formatting.
Example
DATE_FORMAT(str_date,'%m-%d-%Y')
or whatever field and format you want to use.