Applying BigQuery javascript UDF to groups - google-bigquery

I am trying to apply a javascript user defined function to groups. In the following code, --group by my_group within tuple is commented out. I want to apply the temp function test on every my_group within test_data. The code runs if group by is commented out. If I try to include group by, it produces a "scalar subquery produced more than one element". What change should I make so that I can output an array per group (my_group)?
#standardSQL create function
CREATE TEMP FUNCTION test(a ARRAY<STRING>)
RETURNS ARRAY< STRING >
LANGUAGE js AS '''
var combine = function(a) {
var fn = function(n, src, got, all) {
if (n == 0) {
if (got.length > 0) {
all[all.length] = got;
} return;
}
for (var j = 0; j < src.length; j++) {
fn(n - 1, src.slice(j + 1), got.concat([src[j]]), all);
} return;
}
var all = [];
for (var i = 1; i < a.length; i++) {
fn(i, a, [], all);
}
all.push(a);
return all;
}
return combine(a)
''';
WITH test_data AS (
SELECT 'Shirt' item, 'Cashless' my_group UNION ALL
SELECT 'Jeans', 'Cashless' UNION ALL
SELECT 'Jeans', 'Cash' UNION ALL
SELECT 'Cap', 'Cash' UNION ALL
SELECT 'Shirt', 'Cash' UNION ALL
SELECT 'Cap', 'Cashless'
),
tuple as (
SELECT ARRAY_AGG(DISTINCT item) items
FROM test_data
--group by my_group (uncommenting it creates error)
)
select * from unnest(test((select items from tuple)))
I am looking for an output like the following:
my_group Item
Cash Shirt
Cash Jeans
Cash Cap
Cash Shirt,Jeans
Cash Shirt,Cap
Cash Jeans,Cap
Cash Shirt,Jeans,Cap
Cashless Shirt
Cashless Jeans
Cashless Cap
Cashless Shirt,Jeans
Cashless Shirt,Cap
Cashless Jeans,Cap
Cashless Shirt,Jeans,Cap

Consider below (omitting the function piece to keep answer compact enough ...)
WITH test_data AS (
SELECT 'Shirt' item, 'Cashless' my_group UNION ALL
SELECT 'Jeans', 'Cashless' UNION ALL
SELECT 'Jeans', 'Cash' UNION ALL
SELECT 'Cap', 'Cash' UNION ALL
SELECT 'Shirt', 'Cash' UNION ALL
SELECT 'Cap', 'Cashless'
), tuple as (
SELECT my_group, ARRAY_AGG(DISTINCT item) items
FROM test_data
group by my_group
)
select my_group, item
from tuple,
unnest(test(items)) item
with output

Related

create a group of linked items

There is a list of users, who buy different product items. I want to group the item by user buying behavior. If any user buys two products, these shall be in the same group. The buying links the products.
user
item
1
cat food
1
cat toy
2
cat toy
2
cat snacks
10
dog food
10
dog collar
11
dog food
11
candy
12
candy
12
apples
15
paper
In this sample case all items for a cat shall be grouped together: "cat food" to "cat toy" to "cat snacks". The items with dog, candy, apples should be one group, because user buying’s link these. The paper is another group.
There are about 200 different products in the table and I need to do a disjoint-set union (DSU).
In JavaScript there several implementation of Disjoint Set Union (DSU), here this was used for the user defined function (UDF) in BigQuery. The main idea is to use a find and union function and to save the linking in a tree, represented as an array, please see here for details.
create temp function DSU(A array<struct<a string,b string>>)
returns array<struct<a string,b string>>
language js as
"""
// https://gist.github.com/KSoto/3300322fc2fb9b270dce2bf1e3d80cf3
// Disjoint-set bigquery
class DSU {
constructor() {
this.parents = [];
}
find(x) {
if(typeof this.parents[x] != "undefined") {
if(this.parents[x]<0) {
return x;
} else {
if(this.parents[x]!=x) {
this.parents[x]=this.find(this.parents[x]);
}
return (this.parents[x]);
}
} else {
this.parents[x]=-1;
return x;
}
}
union(x,y) {
var xpar = this.find(x);
var ypar = this.find(y);
if(xpar != ypar) {
this.parents[xpar]+=this.parents[ypar];
this.parents[ypar]=xpar;
}
}
console_print() {
// console.log(this.parents);
}
}
var dsu = new DSU();
for(var i in A){
dsu.union(A[i].a,A[i].b);
}
var out=[]
for(var i in A){
out[i]={b:dsu.find(A[i].a),a:A[i].a};
}
return out;
""";
with #recursive
your_table as (
SELECT 1 as user, "cat food" as item
UNION ALL SELECT 1, "cat toy"
UNION ALL SELECT 2, "cat snacks"
UNION ALL SELECT 2, "cat toy"
UNION ALL SELECT 10, "dog food"
union all select 10, "dog collar"
union all select 11, "dog food"
union all select 11, "candy"
union all select 12, "candy"
union all select 12, "apples"
union all select 15, "paper"
), helper as (
select distinct a, b
from (
Select user,min(item) as b, array_agg(item) as a_list
from your_table
group by 1
), unnest(a_list) as a
)
Select * except(tmp_count),
first_value(item) over(partition by b order by tmp_count desc,b) as item_most_common
from
(
select * ,
count(item) over(partition by b,item) as tmp_count
from your_table
left join (select X.a, min(X.b) as b from (select DSU(array_agg(struct(''||a,''||b))) as X from helper),unnest(X) X group by 1 order by 1) as combinder
on ''||item=combinder.a
)
The data is in the table your_table. A helper table is used to buid all pairs of two items, which any user brought. Combined as an array, this is giving to the UDF DSU. This function returns all items in column a and in column b the group. We want the most common item of the group to be shown as group name, therefore we use some window functions to determine it.

sql server, replace chars in string with values in table

how can i replace values in string with values that are in a table?
for example
select *
into #t
from
(
select 'bla'c1,'' c2 union all
select 'table'c1,'TABLE' c2 union all
select 'value'c1,'000' c2 union all
select '...'c1,'' c2
)t1
declare #s nvarchaR(max)='this my string and i want to replace all values that are in table #t'
i have some values in my table and i want to replace C1 with C2 in my string.
the results should be
this my string and i want to replace all 000 that are in TABLE #t
UPDATE:
i solved with a CLR
using System;
using Microsoft.SqlServer.Server;
using System.Data.SqlTypes;
using System.Data.Linq;
namespace ReplaceValues
{
public partial class Functions
{
[SqlFunction
(
//DataAccess = DataAccessKind.Read,
SystemDataAccess = SystemDataAccessKind.Read
)
]
public static string ReplaceValues(string row, string delimitator, string values, string replace/*, bool CaseSensitive*/)
{
//return row;
string[] tmp_values = values.Split(new string[] { delimitator }, StringSplitOptions.None);
string[] tmp_replace = replace.Split(new string[] { delimitator }, StringSplitOptions.None);
row = row.ToUpper();
for (int i = 0; i < Math.Min(tmp_values.Length, tmp_replace.Length); i++)
{
row = row.Replace(tmp_values[i].ToUpper(), tmp_replace[i]);
}
return row;
}
}
}
and then
select *
into #t
from
(
select 'value1'OldValue,'one'NewValue union all
select 'value2'OldValue,'two'NewValue union all
select 'value3'OldValue,'three'NewValue union all
select 'value4'OldValue,'four'NewValue
)t1
select dbo.ReplaceValues(t1.column,'|',t2.v,t2.r)
from MyTable t1
cross apply
(
select dbo.inlineaggr(i1.OldValue,'|',1,1)v,
dbo.inlineaggr(i1.NewValue,'|',1,1)r
from #t i1
)t2
i have to improved it to manage better the case sensitive, but performance are not bad.
(also 'inlineaggr' is a CLR i wrote years ago)
You can do this via recursion. Assuming you have a table of find-replace pairs, you can number the rows and then use recursive cte:
create table #t(c1 nvarchar(100), c2 nvarchar(100));
insert into #t(c1, c2) values
('bla', ''),
('table', 'table'),
('value', '000'),
('...', '');
declare #s nvarchar(max) = 'this my string and i want to replace all values that are in table #t';
with ncte as (
select row_number() over (order by (select null)) as rn, *
from #t
), rcte as (
select rn, replace(#s, c1, c2) as newstr
from ncte
where rn = 1
union all
select ncte.rn, replace(rcte.newstr, ncte.c1, ncte.c2)
from ncte
join rcte on ncte.rn = rcte.rn + 1
)
select *
from rcte
where rn = 4

How can convert SQL to lambda or LINQ

How can I convert below SQL to lambda or LINQ?
with cte
as (select * from test1
union all
select * from test2)
select * from cte
union all
select sum(columnA),sum(columnB),sum(columnC) from cte
In Linq UNION ALL is .Concat(), so:
var cte = test1.Concat(test2);
var sums = new MyModel
{
columnA = cte.Sum(c => c.columnA),
columnB = cte.Sum(c => c.columnB),
columnC = cte.Sum(c => c.columnC),
}
return cte.Concat(IEnumerable.Repeat(sums, 1));
You must remember that test1 and test2 must be type MyModel and MyModel contains only columnA, columnB and columnC.
I put two tables together in one datagridvie but in the last row of datagridview I need the total for both tables in the country, I can do one row in total for one table and another row for the other table I also don't need it, like I can only have one line with the total of both tables.
DataContex db = new DataContex();
var query = (
from v1 in db.View1
where shf.Date >= dpDate.Value && shf.Date <= dpDate1.Value
select new
{
v1.Name,
v1.Date,
v1.Quality,
v1.Rat,
v1.Total
}
).Concat
(
from v2 in db.View2
where f.Date >= dpDate.Value && f.Date <= dpDate1.Value
select new
{
v2.Name,
v2.Date,
v2.Quality,
v2.Rat,
v2.Total
}
).Concat
(from View2 in
(from v2 in db.View2
where v2.Date >= dpDate.Value && sh.Date <= dpDate1.Value
select new
{
v2.Name,
v2.Date,
v2.Quality,
v2.Rate,
v2.Total
})
group v2 by new { v2.NRFA } into g
select new
{
Name = "Total:",
Date = dpDate1.Value,
Quality = (decimal?)g.Sum(p => p.Quality),
Rate = (decimal?)g.Sum(p => p.Rate),
Total = (decimal?)g.Sum(p => p.Total)
}
);
Blockquote

Convert SQL to Linq select in where

I have three table below:
TABLE_PRODUCT (IdProduct, ProductName, ProductUnit)
TABLE_STORE_HOUSE (IdContain, IdProduct, ProductNumber, TimeInput)
TABLE_SELL (IdSell, IdContain, ProductNumberSell, TimeSell)
Current, How to using LinQ query get TABLE_STORE_HOUSE.IdProduct witch condition TABLE_STORE_HOUSE.ProductNumber - Sum(TABLE_SELL.ProductNumberSell) > 0 and TABLE_STORE_HOUSE.TimeInput is smallest
Help me convert Sql to Linq..............
select top 1 IdContain
from
TABLE_STORE_HOUSE
where IdProduct = '6'
and
ProductNumber - (select sum(ProductNumber)
from TABLE_SELL
Where TABLE_SELL.IdContain = IdContain)> 0
order by TimeInput desc;
Can you try this?
from t in TABLE_STORE_HOUSEs
let TSell = (
from s in TABLE_SELLs
where s.IdContain == t.IdContain
orderby s.ProductNumber
select new {
s.ProductNumber
}
)
where t.IdProduct == 6 && (t.ProductNumber - TSell.Sum(si => si.ProductNumber)) > 0
select new { t.IdContain }
for top 1 you can use Take() function.

How to transform path/string by collapsing repeated elements?

There is a filed in my table that represents pathways like below:
Item1->Item1->Item2-> Item3->Item3->Item3->Item1
In most cases this is quite looong sequence with many instances of same consecutive Items.
How I can shorted above path to something like below? in BigQuery!
Item1(x2)->Item2->Item3(x3)->Item1
I wanted to convince myself that this was possible just through array manipulation (using standard SQL), and I came up with a solution. An alternate way to solve the problem would be to use analytic functions, where you could detect changes in item along the path.
CREATE TEMPORARY FUNCTION PartsToString(
parts_and_offsets ARRAY<STRUCT<part STRING, off INT64>>) AS ((
SELECT
STRING_AGG(
CONCAT(part_and_offset.part,
IF(parts_and_offsets[OFFSET(off + 1)].off - part_and_offset.off = 1,
"",
CONCAT("(x", CAST(parts_and_offsets[OFFSET(off + 1)].off - part_and_offset.off AS STRING), ")"))))
FROM UNNEST(parts_and_offsets) AS part_and_offset WITH OFFSET off
WHERE off + 1 < ARRAY_LENGTH(parts_and_offsets)
));
CREATE TEMPORARY FUNCTION PathwayToParts(pathway STRING) AS ((
SELECT
ARRAY_CONCAT(
ARRAY_AGG(
STRUCT(part, off)),
[STRUCT("" AS part, ARRAY_LENGTH(ANY_VALUE(parts)) AS off)]) AS parts_and_offsets
FROM (SELECT SPLIT(pathway, "->") AS parts),
UNNEST(parts) AS part WITH OFFSET off
WHERE off = 0 OR part != parts[OFFSET(off - 1)]
));
WITH YourTable AS (
SELECT "Item1->Item2->Item2->Item2->Item3->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item2->Item3->Item3->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item4" AS pathway
UNION ALL SELECT "Item1->Item2->Item2->Item3->Item1->Item1->Item1->Item2->Item3->Item3->Item2->Item2->Item2->Item1->Item4" AS pathway
UNION ALL SELECT "Item1->Item1->Item1" AS pathway
UNION ALL SELECT "Item1->Item2->Item2" AS pathway
UNION ALL SELECT "Item1->Item1->Item2" AS pathway
UNION ALL SELECT "Item1->Item2->Item3" AS pathway
)
SELECT PartsToString(PathwayToParts(pathway)) AS parts_string
FROM YourTable;
Using Scalar JS UDF (Standard SQL) <-- would be my choice
CREATE TEMPORARY FUNCTION collapse_repeated(pathway STRING)
RETURNS STRING LANGUAGE js AS """
var items = pathway.split('->');
short = ''; elem = items[0]; count = 0;
for (var i = 0; i < items.length; i++) {
if (items[i] !== elem) {
if (short.length > 0) {short += '->'}
short += elem; if (count > 1) {short += '(x' + count.toString() + ')';}
elem = items[i]; count = 1;
} else {
count++;
}
}
if (short.length > 0) {short += '->'}
short += elem; if (count > 1) {short += '(x' + count.toString() + ')';}
return short;
""";
WITH YourTable AS (
SELECT "Item1->Item2->Item2->Item2->Item3->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item2->Item3->Item3->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item4" AS pathway
UNION ALL SELECT "Item1->Item2->Item2->Item3->Item1->Item1->Item1->Item2->Item3->Item3->Item2->Item2->Item2->Item1->Item4" AS pathway
UNION ALL SELECT "Item1->Item1->Item1" AS pathway
UNION ALL SELECT "Item1->Item2->Item2" AS pathway
)
SELECT collapse_repeated(pathway) AS shorten_pathway, pathway
FROM YourTable
Note: Same JS can be easily “translated” to JS UDF in Legacy SQL
Using Window Functions (Legacy SQL)
SELECT GROUP_CONCAT_UNQUOTED(IF(repeats=1, item, CONCAT(item, "(x", STRING(repeats), ")")), "->"), pathway
FROM (
SELECT MIN(pos) AS ord, MIN(item) AS item, COUNT(1) AS repeats, pathway
FROM (
SELECT item, pos, IFNULL(grp, 0)AS grp, pathway FROM (
SELECT item, pos, SUM(change) OVER(PARTITION BY pathway ORDER BY pos ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS grp, pathway
FROM (
SELECT item, pos, IF(item=next_item, 0, 1) AS change, pathway FROM (
SELECT item, pos, LEAD(item) OVER(PARTITION BY pathway ORDER BY pos) AS next_item, pathway
FROM (
SELECT item, POSITION(item) AS pos, pathway FROM (
SELECT SPLIT(pathway, "->") AS item, pathway FROM
(SELECT "Item1->Item2->Item2->Item2->Item3->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item2->Item3->Item3->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item2->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item1->Item4" AS pathway),
(SELECT "Item1->Item2->Item2->Item3->Item1->Item1->Item1->Item2->Item3->Item3->Item2->Item2->Item2->Item1->Item4" AS pathway),
(SELECT "Item1->Item1->Item1" AS pathway),
(SELECT "Item1->Item2->Item2" AS pathway)
)
)
)
)
)
)
GROUP BY grp, pathway
ORDER BY ord
)
GROUP BY pathway