How to parse integer values from regex and sum in BigQuery - google-bigquery

I have a column that contains complex string and I am trying to extract out values from this string column. Here is the temp table and values -
with temp as (
select 1 as event_id, ';t-Tew00;1;1.00;252=100.00,;SM-R190;1;1.00;252=200.00,;SM-G998B/DS;1;6347.00;252=300.00,;EF-PG99P;1;249.00;252=400.00' as event_list union all
select 2 as event_id, ';asdI-Tww5300;1;1.00;252=99.00,,;EP-TA845;.252=49.00' as event_list union all
select 3 as event_id, ';asdI-Tww5300;1;1.00;252=10.00,,;EP-TA845;,.252=20.00,:etw:1002:2020,'
)
select *
from temp
I want to extract out all the double/int values after the appearance of 252= in the event_list column. For instance, in the first record, I would like to extract the values 100.00,200.00,300.00 and 400.00
I would like to add a separate column in the output that will add all such values together. So the output column for first record would be 1000.00. Likewise, 99+49 for 2nd record and 10+20 for 3rd record.
If no such appearance of 252= appears then output must be 0.
How can I achieve this in BigQuery

Try below
select event_id,
(
select ifnull(sum(cast(value as float64)), 0)
from unnest(regexp_extract_all(event_list, r'252=(\d*.?\d*)')) value
) as total_252
from temp
if aplied to sample data in your question - output is

Related

Query to select rows that don't partially match

I have the following table
STORE_ID|PRICE_1|PRODUCT_ID
--------+-------+----------
1052| 4.99|5157917035
1052| 4.99|5157917035
1052| 4.99|5157917036
1052| 4.99|5157917036
1052| 4.99|5157917037
As you can see these product IDs starts with "5157817". Is there a way to select only part of the value, in this case ignoring the last 3 digits and then filter out rows that are not distinct
Is there a way to select only part of the value
Sure; usually, we use substr function. If column's datatype is one of the CHAR family, just apply it directly. Otherwise, if it is a NUMBER, first convert it to a string (using the to_char function).
For example:
SQL> create table test (col_n number, col_c varchar2(10));
Table created.
SQL> insert into test values (5157917035, '5157917035');
1 row created.
SQL> select substr(to_char(col_n), 1, 7) sub_n,
2 substr(col_c, 1, 7) sub_c
3 from test;
SUB_N SUB_C
---------------------------- ----------------------------
5157917 5157917
SQL>
I didn't quite understand what result you expect out of data set you posted, but - if you ran e.g.
select DISTINCT store_id,
price_1,
substr(product_id, 1, 7)
from your_table
you'd get only one row.

Calculate sum based on distinct values in other column

I have a table where I want to sum based on distinct values in id column.
Main Table
Pipe separated main file
Id|col1|col2|col3|col4|col5|col6|col7|Dim1|Dim2|Values
r1||||1.2||||sc1|c1|1.2
r4||||||0.98||sc1|c1|0.98
r5|||0.89|||||sc1|c1|0.89
r1||||1.2||||sc2|c1|1.2
r2|||||0.98|||sc2|c1|0.98
r3||||1.22||||sc2|c1|1.22
r4|||||0.98|||sc2|c1|0.98
Output
Pipe separated result
col1|col2|col3|col4|col5|col6|Dim2
0|0|0.89|2.42|1.96|0.98|c1
For columns col1 -col7, I want to do this:
select sum(col1),sum(col2),sum(col3),sum(col4)...sum(col7)
group by dim2 based on distinct values in id.
I could use row_number()over(partition by id) and use the row_number=1, but I can't use subquery. I am looking to do this in just one query.

Combining columns in a complex pivot

I have this table:
INPUT
I wish to transform it into another table, that contains
The Date/Id/Order columns (primary key columns)
A TotalCount column, containing the value of the original table's Count column where all the Cond columns are NULL
One Count column for each CondX column, containing the value of the original table's Count column where CondX = 1 and the rest of the Cond = NULL
One Count column for each combination of non-null (Cond1 OR Cond2 OR Cond3) + (CondA OR CondB), containing the value of the original table's Count column where the two applicable Cond = 1 and the rest = NULL
Example:
So basically, I want my new table to have these columns:
Date, Id, Order, TotalCount
Cond1Count, Cond2Count, Cond3Count, CondACount, CondBCount
Cond1AndCondACount, Cond1AndCondBCount, Cond2AndCondACount, Cond2AndCondBCount...
From the sample image, we'd have these values in the end:
DESIRED OUTPUT
(note: CondBCount = 0 for Order = 2, missed it in the image edition)
I'd show some SQL if I had any, but I'm actually not quite sure where to start with this problem. I could naively do a bunch of different SELECT Count WHERE ..., but I'm wondering if there's a better solution.
Without your table structure. You can sum multiple columns with sum & values, even in combination with CASE
Example:
SELECT *
FROM (
SELECT Date, Id, [Order],
(SELECT SUM(v)
FROM (VALUES (ISNULL(Cond1,0)), (ISNULL(Cond2,0)),...) AS value(v)) Cond1ACount
FROM YourTable
) sub
GROUP BY Date, Id, [Order]

Taking out common data

I want to compare two column and take out the common rows which are present in table1 and table 2 from two different tables.
table 1 table 2 result
mobnum A mobnum B 988123456
988123456 988124567201718 988123457
988124567 988123456201718
944123456 988623456201718
I'm not quite sure since you haven't really formated your data in a nice way but I think the code below will give you what you want, I included the second table in the where () in order to only select matching values. If you need the rows simply change "Select Num" to Select the unique Id's and go from there.
Table Test_1:
Num
988123456
988124
988124567
944123456
Table Test_2:
Num
988123456
988123457
9881234
9886234
Query:
select Num from Test_1 where Num in (Select Num from Test_2)
Output:
Num
988123456

Entering a dummy row when query returns with no data

I run a query which is used i excel. The data is updated once a week through a procedure. Some times the procedure returns with no data. In those cases I would like to add a dummy row with the date, so the users of the excel sees that there is no data for this date.
I have a column called TAG. It is tag 1-4. If tag 3 returns with no data, I would like to insert a row.
I just have a tmp-table. If it does not contain data for tag 3 I would like to add a dummy row with just a date but no data
CREATE procedure [dbo].[pr_quality_weekly]
as
begin
insert into #tmp(date_run,country,tag,company,campaign,seller,orderdate,reg_date,enddate,cust_desc1,cust_desc2,phone_1,phone_2,delivery,reason,date_of_birth,product)
Select * FROM prod_no
where date between datfrom and dateto
insert into #tmp(date_run,country,tag,company,campaign,seller,orderdate,reg_date,enddate,cust_desc1,cust_desc2,phone_1,phone_2,delivery,reason,date_of_birth,product)
Select * FROM prod_se
where date between datfrom and dateto
Insert into prod_weekly (date_run,country,tag,company,campaign,seller,orderdate,reg_date,enddate,cust_desc1,cust_desc2,phone_1,phone_2,delivery,reason,date_of_birth,product)
(Select date_run,country,tag,company,campaign,seller,orderdate,reg_date,enddate,cust_desc1,cust_desc2,phone_1,phone_2,delivery,reason,date_of_birth,product
from #tmp)
END
while i have no idea what your code is doing, creating that line could be simple but if you have NOT NULL columns in your prod weekly table, you'll have to provide empty strings as filler
Select date_run,country,days.tag,company,campaign
, seller,orderdate,reg_date,enddate
, cust_desc1,cust_desc2
, phone_1,phone_2,delivery,reason,date_of_birth,product
from (values (1),(2),(3),(4),(5)) days(tag)
left join #tmp on days.tag=#temp.tag