Split multi valued cells in more than one column into rows (Open Refine) - openrefine

I have been cleaning a table on Open Refine. I now have it like this:
REF Handle Size Price
2002, 2003 t-shirt1 M, L 23
3001, 3002, 3003 t-shirt2 S, M, L 24
I need to split those multivalued cells in REF and Size so that I get:
REF Handle Size Price
2002 t-shirt1 M 23
2003 t-shirt1 L 23
3001 t-shirt2 S 24
3002 t-shirt2 M 24
3003 t-shirt2 L 24
Is it possible to do this in Open Refine? The "Split multi-valued cells..." command only takes care of one column.
Thank you,
Ana Rita

Yes, it's possible :
Split the 1st column using ", " as separator.
Move column 2 at position one
display your project as record (not row)
Split column 3 using ", " as separator
Fill down columns 4 and 2
reorder columns
Here's my recipe in GREL :
[
{
"op": "core/row-removal",
"description": "Remove rows",
"engineConfig": {
"facets": [
{
"invert": false,
"expression": "row.starred",
"selectError": false,
"omitError": false,
"selectBlank": false,
"name": "Starred Rows",
"omitBlank": false,
"columnName": "",
"type": "list",
"selection": [
{
"v": {
"v": true,
"l": "true"
}
}
]
}
],
"mode": "row-based"
}
},
{
"op": "core/multivalued-cell-split",
"description": "Split multi-valued cells in column Column 1",
"columnName": "Column 1",
"keyColumnName": "Column 1",
"separator": ", ",
"mode": "plain"
},
{
"op": "core/column-move",
"description": "Move column Column 2 to position 0",
"columnName": "Column 2",
"index": 0
},
{
"op": "core/multivalued-cell-split",
"description": "Split multi-valued cells in column Column 3",
"columnName": "Column 3",
"keyColumnName": "Column 2",
"separator": ", ",
"mode": "plain"
},
{
"op": "core/fill-down",
"description": "Fill down cells in column Column 4",
"engineConfig": {
"facets": [],
"mode": "record-based"
},
"columnName": "Column 4"
},
{
"op": "core/fill-down",
"description": "Fill down cells in column Column 2",
"engineConfig": {
"facets": [],
"mode": "record-based"
},
"columnName": "Column 2"
},
{
"op": "core/column-reorder",
"description": "Reorder columns",
"columnNames": [
"Column 1",
"Column 2",
"Column 3",
"Column 4"
]
}
]
Hervé

Just found a nice, free OpenRefine plugin that offers "Unpaired pivot":
VIB-Bits plugin
From their documentation:
3.2.1 Unpaired pivot...
Unpaired pivot is the transformation of data that is organized in rows to a representation of that
data in separate columns. A simple example would be transforming
Category
Value
a
1
a
2
b
3
c
2
into
Value a
Value b
Value c
1
3
2
2

Related

Conditional nested data in Strapi

I'm trying to set up a strapi CMS which will allow a user to create an entry with a set of conditional tags.
The requirement is that they can create a product which has tags. Those tags are to be selected from a pre-defined nested list that looks something like this:
- Category 1
- Item 1
- Item 2
- Item 3
- Category 2
- Item 4
- Item 5
- Item 6
- Category 3
- Item 7
- Item 8
- Item 9
Each product can have multiple categories and each category can have multiple items. They need to select at least one category and for every category they select, they need to select at least one item.
So, for example, they could create a product which had a selection that looked like this:
- Category 2
- Item 4
- Category 3
- Item 8
- Item 9
I attempted to create a component which did sort of work but the user was able to select any item they wanted from any category instead of that item being limited to the parent category.
Is this something Strapi is capable of? If so, how do I go about it?
For reference, here are the schemas I set up for the category and items collections:
{
"kind": "collectionType",
"collectionName": "category",
"info": {
"singularName": "categories",
"pluralName": "category",
"displayName": "categories",
"description": ""
},
"options": {
"draftAndPublish": true
},
"pluginOptions": {},
"attributes": {
"Name": {
"type": "string",
"required": true
},
"categories": {
"type": "relation",
"relation": "oneToMany",
"target": "api::category.category",
"mappedBy": "category"
}
}
}
{
"kind": "collectionType",
"collectionName": "items",
"info": {
"singularName": "item",
"pluralName": "items",
"displayName": "items",
"description": ""
},
"options": {
"draftAndPublish": true
},
"pluginOptions": {},
"attributes": {
"Name": {
"type": "string",
"required": true
},
"domain": {
"type": "relation",
"relation": "manyToOne",
"target": "api::categories.categories",
"inversedBy": "items"
}
}
}
and here is the component I made:
{
"collectionName": "components_tags_tag_selectors",
"info": {
"displayName": "Tag Selector",
"icon": "tags",
"description": ""
},
"options": {},
"attributes": {
"domain": {
"type": "relation",
"relation": "oneToOne",
"target": "api::categories.categories"
},
"categories": {
"type": "relation",
"relation": "oneToMany",
"target": "api::item.item"
}
}
}

BigQuery select rows with two (or more / less) matches in a repeated field

I am having a schema that looks like:
[
{
"name": "name",
"type": "STRING",
"mode": "REQUIRED"
},
{
"name": "frm",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "c",
"type": "STRING",
"mode": "REQUIRED"
},
{
"name": "n",
"type": "STRING",
"mode": "REQUIRED"
}
]
},
{
"name": "",
"type": "STRING",
"mode": "NULLABLE"
}
]
With a sample record that looks like this:
I am trying to write a query that selects this row when there is a row in frm that matches C = 'X' and another row that has C = 'Z'. Only when both conditions are true, I would love to select the "name" of the parent row. I actually have no clue how I could achieve this. Any suggestions?
E.g. this works, but I am unnesting frm two times, there must a more efficient way I guess.
SELECT name FROM `t2`
WHERE 'X' in UNNEST(frm.c) AND 'Y' in UNNEST(frm.c)
Consider below approach
select name
from your_table t
where 2 = (
select count(distinct c)
from t.frm
where c in ('X', 'Z')
)

Filtering out objects from multiple arrays in a JSONB column

I have a JSON structure with two arrays saved in a JSONB column. A bit simplified it looks like this
{
"prop1": "abc",
"prop2": "xyz",
"items": [
{
"itemId": "123",
"price": "10.00"
},
{
"itemId": "124",
"price": "9.00"
},
{
"itemId": "125",
"price": "8.00"
}
],
"groups": [
{
"groupId": "A",
"discount": "20",
"discountId": "1"
},
{
"groupId": "B",
"discount": "30",
"discountId": "2"
},
{
"groupId": "B",
"discount": "20",
"discountId": "3"
},
{
"groupId": "C",
"discount": "40",
"discountId": "4"
}
]
}
Schema:
CREATE TABLE campaign
(
id TEXT PRIMARY KEY,
data JSONB
);
Since each row (data column) can be fairly large, I'm trying to filter out matching item objects and group objects from the items and groups arrays.
My current query is this
SELECT * FROM campaign
WHERE
(data -> 'items' #> '[{"productId": "123"}]') OR
(data -> 'groups' #> '[{"groupId": "B"}]')
which returns rows containing either the matching group or the matching item. However, depending on the row, the data column can be a fairly large JSON object (there may be hundreds of objects in items and tens in groups and I've omitted several keys/properties for brevity in this example) which is affecting query performance (I've added GIN indexes on the items and groups arrays, so missing indices is not why it's slow).
How can I filter out the items and groups arrays to only contain matching elements?
Given this matching row
{
"prop1": "abc",
"prop2": "xyz",
"items": [
{
"itemId": "123",
"price": "10.00"
},
{
"itemId": "124",
"price": "9.00"
},
{
"itemId": "125",
"price": "8.00"
}
],
"groups": [
{
"groupId": "A",
"discount": "20",
"discountId": "1"
},
{
"groupId": "B",
"discount": "30",
"discountId": "2"
},
{
"groupId": "B",
"discount": "20",
"discountId": "3"
},
{
"groupId": "C",
"discount": "40",
"discountId": "4"
}
]
}
I'd like the result to be something like this (the matching item/group could be in different columns from the rest of the data column - doesn't have to be returned in a single JSON object with two arrays like this, but I would prefer it if doesn't affect performance or lead to a really hairy query):
{
"prop1": "abc",
"prop2": "xyz",
"items": [
{
"itemId": "123",
"price": "10.00"
}
],
"groups": [
{
"groupId": "B"
"discount": "20",
"discountId": "3"
}
]
}
What I've managed to do so far is unwrap and match an object in the items array using this query, which removes the 'items' array from the data column and filters out the matching item object to a separate column, but I'm struggling to join this with matches in the groups array.
SELECT data - 'items', o.obj
FROM campaign c
CROSS JOIN LATERAL jsonb_array_elements(c.data #> '{items}') o(obj)
WHERE o.obj ->> 'productId' = '124'
How can I filter both arrays in one query?
Bonus question: For the groups array I also want to return the object with the lowest discount value if possible. Or else the result would need to be an array of matching group objects instead of a single matching group.
Related questions: How to filter jsonb array elements and How to join jsonb array elements in Postgres?
If your postgres version is 12 or more, you can use the jsonpath language and functions. The query below returns the expected result with the subset of items and groups which match the given criteria. Then you can adapt this query within a sql function so that the search criteria is an input parameter.
SELECT jsonb_set(jsonb_set( data
, '{items}'
, jsonb_path_query_array(data, '$.items[*] ? (#.itemId == "123" && #.price == "10.00")'))
, '{groups}'
, jsonb_path_query_array(data, '$.groups[*] ? (#.groupId == "B" && #.discount == "20" && #.discountId == "3")'))
FROM (SELECT
'{
"prop1": "abc",
"prop2": "xyz",
"items": [
{
"itemId": "123",
"price": "10.00"
},
{
"itemId": "124",
"price": "9.00"
},
{
"itemId": "125",
"price": "8.00"
}
],
"groups": [
{
"groupId": "A",
"discount": "20",
"discountId": "1"
},
{
"groupId": "B",
"discount": "30",
"discountId": "2"
},
{
"groupId": "B",
"discount": "20",
"discountId": "3"
},
{
"groupId": "C",
"discount": "40",
"discountId": "4"
}
]
}' :: jsonb) AS d(data)
WHERE jsonb_path_exists(data, '$.items[*] ? (#.itemId == "123" && #.price == "10.00")')
AND jsonb_path_exists(data, '$.groups[*] ? (#.groupId == "B" && #.discount == "20" && #.discountId == "3")')

How to filter Cosmos DB data based on value of an element in an array of values Using SQL API

I have a cosmosDB collection with below Data in it.
I have to find out the data only for EVENT named ABC and its value using SQL query.
[
{
"ID": "01XXXXX",
"EVENTS": [
{
"Name": "ABC",
"Value": 0
},
{
"Name": "XYZ",
"Value": 4
},
{
"Name": "PQR",
"Value": 5
}
]
},
{
"ID": "02XXXXX",
"EVENTS": [
{
"Name": "ABC",
"Value": 1
},
{
"Name": "XYZ",
"Value": 2
},
{
"Name": "PQR",
"Value": 3
}
]
}
]
I have tried the below code but it is not working since EVENT is an array.
SELECT * FROM c where c.EVENTS.Name = 'ABC'
Is there any way to find filter out the data only with Event Name as ABC using SQL?
Try using join
SELECT c FROM c
join l in c.EVENTS
where l.Name = 'ABC'

array of json object

database screenshot [
{
"id": "901651",
"supplier_id": "180",
"price": "18.99",
"product_id": "books",
"name": "bookmate",
"quantity": "1"
},
{
"id": "1423326",
"supplier_id": "180",
"price": "53.99",
"product_id": "books",
"name": "classmate",
"quantity": "5"
}
]
"
[{"id":"3811088","supplier_id":"2609","price":"22.99","product_id":"book","name":"classmate","quantity":"10"}]"
I have my purchased books details stored in an array of json object in a field named items in table purchase_list. This corresponds to only one order.Field may contain single or multiple orders. There are multiple orders like this. how can i get the total number of each type of book purchased and the type of books only using pgsql query to generate jasper report. for eg: classmate:15, bookmate:1
you can unnest array and aggregate it:
t=# with c(j) as (values('[
{
"id": "901651",
"supplier_id": "180",
"price": "18.99",
"product_id": "books",
"name": "bookmate",
"quantity": "1"
},
{
"id": "1423326",
"supplier_id": "180",
"price": "53.99",
"product_id": "books",
"name": "classmate",
"quantity": "5"
}
,{"id":"3811088","supplier_id":"2609","price":"22.99","product_id":"book","name":"classmate","quantity":"10"}]'::jsonb))
, agg as (select jsonb_array_elements(j) jb from c)
, mid as (select format('"%s":"%s"',jb->>'name',sum((jb->>'quantity')::int)) from agg group by jb->>'name')
select format('{%s}',string_agg(format,','))::jsonb from mid;
format
--------------------------------------
{"bookmate": "1", "classmate": "15"}
(1 row)
looks ugly, but gives the idea