create a group of linked items

create a group of linked items - google-bigquery

There is a list of users, who buy different product items. I want to group the item by user buying behavior. If any user buys two products, these shall be in the same group. The buying links the products.
user
item
1
cat food
1
cat toy
2
cat toy
2
cat snacks
10
dog food
10
dog collar
11
dog food
11
candy
12
candy
12
apples
15
paper
In this sample case all items for a cat shall be grouped together: "cat food" to "cat toy" to "cat snacks". The items with dog, candy, apples should be one group, because user buying’s link these. The paper is another group.
There are about 200 different products in the table and I need to do a disjoint-set union (DSU).

In JavaScript there several implementation of Disjoint Set Union (DSU), here this was used for the user defined function (UDF) in BigQuery. The main idea is to use a find and union function and to save the linking in a tree, represented as an array, please see here for details.
create temp function DSU(A array<struct<a string,b string>>)
returns array<struct<a string,b string>>
language js as
"""
// https://gist.github.com/KSoto/3300322fc2fb9b270dce2bf1e3d80cf3
// Disjoint-set bigquery
class DSU {
constructor() {
this.parents = [];
}
find(x) {
if(typeof this.parents[x] != "undefined") {
if(this.parents[x]<0) {
return x;
} else {
if(this.parents[x]!=x) {
this.parents[x]=this.find(this.parents[x]);
}
return (this.parents[x]);
}
} else {
this.parents[x]=-1;
return x;
}
}
union(x,y) {
var xpar = this.find(x);
var ypar = this.find(y);
if(xpar != ypar) {
this.parents[xpar]+=this.parents[ypar];
this.parents[ypar]=xpar;
}
}
console_print() {
// console.log(this.parents);
}
}
var dsu = new DSU();
for(var i in A){
dsu.union(A[i].a,A[i].b);
}
var out=[]
for(var i in A){
out[i]={b:dsu.find(A[i].a),a:A[i].a};
}
return out;
""";
with #recursive
your_table as (
SELECT 1 as user, "cat food" as item
UNION ALL SELECT 1, "cat toy"
UNION ALL SELECT 2, "cat snacks"
UNION ALL SELECT 2, "cat toy"
UNION ALL SELECT 10, "dog food"
union all select 10, "dog collar"
union all select 11, "dog food"
union all select 11, "candy"
union all select 12, "candy"
union all select 12, "apples"
union all select 15, "paper"
), helper as (
select distinct a, b
from (
Select user,min(item) as b, array_agg(item) as a_list
from your_table
group by 1
), unnest(a_list) as a
)
Select * except(tmp_count),
first_value(item) over(partition by b order by tmp_count desc,b) as item_most_common
from
(
select * ,
count(item) over(partition by b,item) as tmp_count
from your_table
left join (select X.a, min(X.b) as b from (select DSU(array_agg(struct(''||a,''||b))) as X from helper),unnest(X) X group by 1 order by 1) as combinder
on ''||item=combinder.a
)
The data is in the table your_table. A helper table is used to buid all pairs of two items, which any user brought. Combined as an array, this is giving to the UDF DSU. This function returns all items in column a and in column b the group. We want the most common item of the group to be shown as group name, therefore we use some window functions to determine it.

Related

Query key values in a json column

I have a table "jobs" with one of the columns called "check_list" ( varchar(max) that has JSON values, an example value would be
{
"items":[
{
"name":"machine 1",
"state":"",
"comment":"",
"isReleaseToProductionCheck":true,
"mnachine_id":10
},
{
"name":"machine 2",
"state":"",
"comment":"",
"isReleaseToProductionCheck":true,
"machine_id":12
}
]
}
Now how would I write a SQL query to only return the rows where the column "check_list" has items[machine_id] = 12

In the end after some trial and error this was the solution that worked for me. I had to add the ISJSON check because some of the older data was invalid
WITH jobs (id, workorder, selectedMachine) AS(
SELECT
[id],
[workorder],
(
select
*
from
openjson(check_list, '$.items') with (machine_id int '$.machine_id')
where
machine_id = 12
) as selectedMachine
FROM
engineering_job_schedule
WHERE
ISJSON(check_list) > 0
)
Select
*
from
jobs
where
selectedMachine = 12

Applying BigQuery javascript UDF to groups

I am trying to apply a javascript user defined function to groups. In the following code, --group by my_group within tuple is commented out. I want to apply the temp function test on every my_group within test_data. The code runs if group by is commented out. If I try to include group by, it produces a "scalar subquery produced more than one element". What change should I make so that I can output an array per group (my_group)?
#standardSQL create function
CREATE TEMP FUNCTION test(a ARRAY<STRING>)
RETURNS ARRAY< STRING >
LANGUAGE js AS '''
var combine = function(a) {
var fn = function(n, src, got, all) {
if (n == 0) {
if (got.length > 0) {
all[all.length] = got;
} return;
}
for (var j = 0; j < src.length; j++) {
fn(n - 1, src.slice(j + 1), got.concat([src[j]]), all);
} return;
}
var all = [];
for (var i = 1; i < a.length; i++) {
fn(i, a, [], all);
}
all.push(a);
return all;
}
return combine(a)
''';
WITH test_data AS (
SELECT 'Shirt' item, 'Cashless' my_group UNION ALL
SELECT 'Jeans', 'Cashless' UNION ALL
SELECT 'Jeans', 'Cash' UNION ALL
SELECT 'Cap', 'Cash' UNION ALL
SELECT 'Shirt', 'Cash' UNION ALL
SELECT 'Cap', 'Cashless'
),
tuple as (
SELECT ARRAY_AGG(DISTINCT item) items
FROM test_data
--group by my_group (uncommenting it creates error)
)
select * from unnest(test((select items from tuple)))
I am looking for an output like the following:
my_group Item
Cash Shirt
Cash Jeans
Cash Cap
Cash Shirt,Jeans
Cash Shirt,Cap
Cash Jeans,Cap
Cash Shirt,Jeans,Cap
Cashless Shirt
Cashless Jeans
Cashless Cap
Cashless Shirt,Jeans
Cashless Shirt,Cap
Cashless Jeans,Cap
Cashless Shirt,Jeans,Cap

Consider below (omitting the function piece to keep answer compact enough ...)
WITH test_data AS (
SELECT 'Shirt' item, 'Cashless' my_group UNION ALL
SELECT 'Jeans', 'Cashless' UNION ALL
SELECT 'Jeans', 'Cash' UNION ALL
SELECT 'Cap', 'Cash' UNION ALL
SELECT 'Shirt', 'Cash' UNION ALL
SELECT 'Cap', 'Cashless'
), tuple as (
SELECT my_group, ARRAY_AGG(DISTINCT item) items
FROM test_data
group by my_group
)
select my_group, item
from tuple,
unnest(test(items)) item
with output

What SQL query is the equivalent to this function for retrieving a list of unique items

I'm trying to change this function into an SQL query (using Room). The goal is to return a list of items with no duplicates.
A duplicate is defined by either the item.id or any combination of linked ids being present.
fun removeDuplicates(items: List<Table>?) : List<Table>?{
val returnItems = ArrayList<Table>()
items?.distinctBy { _item ->
_item.id
}?.forEach { item ->
val LID1 = item.linked_id_1
val LID2 = item.linked_id_2
val isFoundReturnItem = returnItems.firstOrNull {
(it.linked_id_1 == LID2 && it.linked_id_2 == LID1) ||
(it.linked_id_1 == LID1 && it.linked_id_2 == LID2)
}
//only add to our new list if not already present
if(isFoundReturnItem == null)
returnItems.add(item)
}
return returnItems
}

If I read your question right here is the answer for Microsoft SQL. Structure:
Select Distinct Field1, Field2, ...
From Table
Where Field1 between 'a' and 'm'
Your Script: The distinct command makes distinct rows.
Select Distinct Item
From YourTableName
You can also use GROUP BY this allows aggregations on distinct values
Select Field1, Field2 = max(Field2), ...
From Table
Where Field1 between 'a' and 'm'
Group by Field1

How can convert SQL to lambda or LINQ

How can I convert below SQL to lambda or LINQ?
with cte
as (select * from test1
union all
select * from test2)
select * from cte
union all
select sum(columnA),sum(columnB),sum(columnC) from cte

In Linq UNION ALL is .Concat(), so:
var cte = test1.Concat(test2);
var sums = new MyModel
{
columnA = cte.Sum(c => c.columnA),
columnB = cte.Sum(c => c.columnB),
columnC = cte.Sum(c => c.columnC),
}
return cte.Concat(IEnumerable.Repeat(sums, 1));
You must remember that test1 and test2 must be type MyModel and MyModel contains only columnA, columnB and columnC.

I put two tables together in one datagridvie but in the last row of datagridview I need the total for both tables in the country, I can do one row in total for one table and another row for the other table I also don't need it, like I can only have one line with the total of both tables.
DataContex db = new DataContex();
var query = (
from v1 in db.View1
where shf.Date >= dpDate.Value && shf.Date <= dpDate1.Value
select new
{
v1.Name,
v1.Date,
v1.Quality,
v1.Rat,
v1.Total
}
).Concat
(
from v2 in db.View2
where f.Date >= dpDate.Value && f.Date <= dpDate1.Value
select new
{
v2.Name,
v2.Date,
v2.Quality,
v2.Rat,
v2.Total
}
).Concat
(from View2 in
(from v2 in db.View2
where v2.Date >= dpDate.Value && sh.Date <= dpDate1.Value
select new
{
v2.Name,
v2.Date,
v2.Quality,
v2.Rate,
v2.Total
})
group v2 by new { v2.NRFA } into g
select new
{
Name = "Total:",
Date = dpDate1.Value,
Quality = (decimal?)g.Sum(p => p.Quality),
Rate = (decimal?)g.Sum(p => p.Rate),
Total = (decimal?)g.Sum(p => p.Total)
}
);
Blockquote

Safe casting a regexp match from REGEXP_REPLACE

I have a table with some code points (e.g. &#38) which I want to strip out from a text value in BigQuery.
My strategy is to use a regexp replace on the number replacing the number with the valid character.
If I try:
WITH items as (SELECT "Test & " as item)
SELECT
CODE_POINTS_TO_STRING([SAFE_CAST(REGEXP_EXTRACT(item, r"&#([0-9]{2})") AS INT64)]) as test_replace
FROM items
This will produce the output that I want for the entry
[
{
"test_replace": "&"
}
]
If I try:
WITH items as (SELECT "Test & " as item)
SELECT
REGEXP_REPLACE(
item,
r"&#([0-9]{2});",
CODE_POINTS_TO_STRING([SAFE_CAST("\\1" as INT64)])
) as full_replace
FROM items
This will produce a null output
[
{
"full_replace": null
}
]
However if I hard code the value in:
WITH items as (SELECT "Test & " as item)
SELECT
REGEXP_REPLACE(
item,
r"&#([0-9]{2});",
CODE_POINTS_TO_STRING([SAFE_CAST("38" as INT64)])
) as full_replace
FROM items
This works.
[
{
"full_replace": "Test & "
]
I know that the regexp is evaluating correctly as if I try:
WITH items as (SELECT "Test & " as item)
SELECT
REGEXP_REPLACE(
item,
r"&#([0-9]{2});",
CONCAT("\\1", "test")
) as part_replace
FROM ITEMS
This will return:
[
{
"part_replace": "Test 38test "
}
]
My question is therefore, how do I get the SAFE_CAST() Function to evaluate the regexp match (it seems to be evaluating the string literal).

I have a table with some code points (e.g. &#38) which I want to strip out from a text value in BigQuery.
Try approach as in below example
#standardSQL
CREATE TEMP FUNCTION multiReplace(item STRING, arr ARRAY<STRUCT<x STRING, y STRING>>)
RETURNS STRING
LANGUAGE js AS """
for (i = 0; i < arr.length; i++) {
item = item.replace(arr[i].x, arr[i].y)
};
return item;
""";
WITH items AS (
SELECT "Test & abc ' xyz" AS item UNION ALL
SELECT "abc xyz"
)
SELECT item, multiReplace(item, points) full_replace
FROM (
SELECT
item,
ARRAY(
SELECT AS STRUCT val, CODE_POINTS_TO_STRING([SAFE_CAST(SUBSTR(val, -3, 2) AS INT64)]) point
FROM UNNEST(REGEXP_EXTRACT_ALL(item, r'(&#[0-9]{2};)')) val
) points
FROM items
)
with result
Row item full_replace
1 Test & abc ' xyz Test & abc ' xyz
2 abc xyz abc xyz
Option 2
While the simplest way to approach above is
#standardSQL
CREATE TEMP FUNCTION multiReplace(item STRING)
RETURNS STRING
LANGUAGE js AS """
var decodeHtmlEntity = function(str) {
return str.replace(/&#([0-9]{2});/g, function(match, dec) {
return String.fromCharCode(dec);
});
};
return decodeHtmlEntity(item);
""";
WITH items AS (
SELECT "Test & abc ' xyz" AS item UNION ALL
SELECT "abc xyz"
)
SELECT item, multiReplace(item) full_replace
FROM items
with the same output

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

create a group of linked items - google-bigquery

Related

Query key values in a json column

Applying BigQuery javascript UDF to groups

What SQL query is the equivalent to this function for retrieving a list of unique items

How can convert SQL to lambda or LINQ

Safe casting a regexp match from REGEXP_REPLACE

Categories

Resources