How can I join two datasets using a key in OpenRefine, with the secondary table having more than one value? - openrefine

I have a dataset X like this:
Code | Name
------------
123 | AAA
456 | BBB
And the other Y like this:
Code | Level
------------
123 | A
123 | B
456 | B
456 | C
I want to join them using OpenRefine to something like this:
Code | Name | Level A | Level B | Level C
------------------------------------------
123 | AAA | value | value | -
456 | BBB | - | value | value
When I try to add a column using cell.cross() from 'X.Code' it only gets the value from the first appearance of 'X.Code' in 'Y'.
cell.cross("Y", "Code")[0].cells["Rede"].value[0]
How can I get to this desired output, using GREL?

You need Columnize by key/value your project Y to have one column by Level like the example below. Use Transpose -> Columnize by key/value
Code | Level A | Level B | Level C
------------------------------------------
123 | value | value | -
456 | - | value | value
Then you can use the cell.cross function for each column. For example: cell.cross("Y", "Code")[0].cells["Level A"].value[0] to import the data into the Project X

Related

Postgres: How do I count occurrences of each enum value when they exist in columns as an array?

I have an enum State which can contain values like CA, NY, etc.
If I have a table Users , with a column states that contains an array of State values, so for example {CA, NY} how can I write a query to count the users grouped by each State value? so for {CA, NY} that should count 1 for CA and 1 for NY
So If I had records like:
| id | states |
| -- | ------- |
| 1 | {CA,NY} |
| 2 | {CA} |
| 3 | {NV,CA} |
I would expect a query to output:
| State | count |
| ----- | ----- |
| CA | 3 |
| NV | 1 |
| NY | 1 |
The first piece of advice is to normalise your data. You are breaking 2nd Normal form by holding multiple pieces of information in a single column.
Assuming you can't change that, then you will need to SPLIT the data like this
enter link description here
and you can then COUNT() and group it.

DAX calculated column using IF to evaluate value from another table

I'm using SSAS Tabular, where I have two tables with a 1:n relationship, Position and Transaction. There is an active 1:n relationship on PositionID
Position
+------------+------+--------+
| PositionID | Type | Source |
+------------+------+--------+
| C1000 | A | 1 |
+------------+------+--------+
| C1200 | B | 2 |
+------------+------+--------+
| C1400 | C | 1 |
+------------+------+--------+
Transaction
+---------+------------+--------+
| TransID | PositionID | Amount |
+---------+------------+--------+
| 1 | C1000 | 150 |
+---------+------------+--------+
| 2 | C1000 | 200 |
+---------+------------+--------+
| 3 | C1400 | 350 |
+---------+------------+--------+
I want to create a calculated column on table Transaction which has the following logic:
IF Position[Type]="A" AND Position[Source]<>1 THEN Transaction[Amount] * -1 ELSE Transaction[Amount] * -1
I've tried using the RELATED function in DAX but its not detecting the related Position table; when I type it manually it returns the error "cannot find table":
=IF(RELATED(Position[Type]) = 'A' && RELATED(Position[Source]) <> 1;-1*Transaction[Amount];Transaction[Amount])
I have no duplicates on the table which is on the 1 side of the 1:n relationship. Should I try a different DAX function?
I tried using LOOKUPVALUE, and so far it looks good.
=IF(LOOKUPVALUE(Position[Type];Position[PositionID];Transaction[PositionID])="A"&&LOOKUPVALUE(Position[Source];Position[PositionID];Transaction[PositionID])<>"1";-1*Transaction[Amount];Transaction[Amount])

Parsing an column containing integers

I am writing a query that pulls 50 fields from 12 sub-queries in the FROM statement. Each sub query is left-joined on two fields the Item SKU and Brand ID. However there is one table where the Brand ID is concatenated in a comma delimited column.
My problem is that I am having trouble parsing this column so that I can use it as a foreign key to join to the other sub-queries. I tried setting the column = to a variable and then using the String_Split function, but was getting the error 'No column selected for column 1' in the query. If anyone has any suggestions of how to parse this data into a NEW row so that SKU that has multiple brands associated to it each has a row showing that SKU and Brand. I have added a screen shot of the data that needs to be parsed. Thanks!
Isn't this example on the MSSQL docs exactly what you're trying to do?
+-----------+--------------------+----------------------------+
| ProductId | Name | Tags |
+-----------+--------------------+----------------------------+
| 1 | Full-Finger Gloves | clothing,road,touring,bike |
| 2 | LL Headset | bike |
| 3 | HL Mountain Frame | bike,mountain |
+-----------+--------------------+----------------------------+
gets transformed into (note the change in the column name!)
+-----------+--------------------+----------+
| ProductId | Name | value |
+-----------+--------------------+----------+
| 1 | Full-Finger Gloves | clothing |
| 1 | Full-Finger Gloves | road |
| 1 | Full-Finger Gloves | touring |
| 1 | Full-Finger Gloves | bike |
| 2 | LL Headset | bike |
| 3 | HL Mountain Frame | bike |
| 3 | HL Mountain Frame | mountain |
+-----------+--------------------+----------+
using
SELECT ProductId, Name, value
FROM Product
CROSS APPLY STRING_SPLIT(Tags, ',');
Since you didn't share any code, it's impossible for me to be more specific... but this example really seems to me should be the piece you're missing.

postgres: Multiply column of table A with rows of table B

Fellow SOers,
Currently I am stuck with the following Problem.
Say we have table "data" and table "factor"
"data":
---------------------
| col1 | col2 |
----------------------
| foo | 2 |
| bar | 3 |
----------------------
and table "factor" (the amount of rows is variable)
---------------------
| name | val |
---------------------
| f1 | 7 |
| f2 | 8 |
| f3 | 9 |
| ... | ... |
---------------------
and the following result should look like this:
---------------------------------
| col1 | f1 | f2 | f3 | ...|
---------------------------------
| foo | 14 | 16 | 18 | ...|
| bar | 21 | 24 | 27 | ...|
---------------------------------
So basically I want the column "col2" multiplicated with all the contents of "val" of table "factor" AND the content of column "name" should act as tableheader/columnname for the result.
We are using postgres 9.3 (upgrade to higher version may be possible), so an extended Search resulted in multiple possible solutions: using crosstab (though even with crosstab I was not able to figure this one out), using CTE "With" (preferred, but also no luck). Probably this may also be done with the correct use of array() and unnest().
Hence, any help is appreciated on how to achieve this (the less code, the better)
Tnx in advance!
This package seems to do what you want:
https://github.com/hnsl/colpivot

Excel VBA to transpose set of rows if value exists in another column

I'm trying to find a way via VB script that will transpose rows from column A into a new sheet but only if there is a value in column B for rows that contain numbers. I have a sheet with ~75K rows on it that I need to do this for, and I tried creating pivot tables which allowed me to get the data into its current format but I need the data to be in columns.
The tricky part of this is that in column A, I only need to look at the rows that are all numbers and not the other rows that have text.
I created a sample sheet to view, where the sample data is in the SOURCE tab and what I want the data to look like in the TRANSPOSED tab.
https://docs.google.com/spreadsheets/d/1ujbaouZFqiPw0DbO78PCnz25OY2ugF1HtUqMg_J7KeI/edit?usp=sharing
Any help would be appreciated.
UPDATE and Answer:
I modified my approach and went back to the original source data which was not part of a pivot table and was able to use a simple match formula between the 2 data sources. So, my original data looked like this:
+----------------+---------+--------+--------------+
| Gtin | Brand | Name | TaxonomyText |
+----------------+---------+--------+--------------+
| 00030085075605 | brand 1 | name 1 | cat1 |
| 00041100015112 | brand 2 | name 2 | cat2 |
| 00041100015099 | brand 3 | name 3 | cat3 |
| 00030085075608 | brand 4 | name 4 | cat4 |
+----------------+---------+--------+--------------+
I had another sheet containing the data I needed to match to in this format:
+----------------+---------+
| Gtin | Brand |
+----------------+---------+
| 00030085075605 | brand 1 |
| 00041100015112 | brand 2 |
| 00041100015098 | brand 3 |
| 00030085075608 | brand 4 |
+----------------+---------+
I created a new column in my source sheet and used a if error match formula:
=IFERROR(IF(MATCH(A14,data_to_match!$A:$A,0),"yes",),"no")
Then copied this formula down for every row, about 75K rows which very quickly added a yes or a no.
+----------------+---------+---------+--------+--------------+
| Gtin | matched | Brand | Name | TaxonomyText |
+----------------+---------+---------+--------+--------------+
| 00030085075605 | yes | brand 1 | name 1 | cat1 |
| 00041100015112 | yes | brand 2 | name 2 | cat2 |
| 00041100015098 | no | brand 3 | name 3 | cat3 |
| 00030085075608 | yes | brand 4 | name 4 | cat4 |
+----------------+---------+---------+--------+--------------+
The final step was to just filter for Yes values and I had all the data that I needed.
My mistake was going to a pivot table first which put the data in a very funky format causing me to have to do a transpose, which wasn't really necessary. Hopefully this can help others....