OpenRefine split single column with repetitive values into well formated columns with headers - openrefine

I have a single column in OpenRefine like this:
.TI
Localisation et dénomination :
Provenance des matériaux de monuments de Senlis
.DA
Date du cliché :
Janvier 1970
.R16N
Commune :
Senlis
.R17N
Département-Région-Pays :
Oise
Picardie
France
.R62N
Localisation plus précise dans l'édifice :
Cave
.R13
Datation de l'édifice :
Lutétien
Éocène
Paléogène
.MC
Mots-clés :
Pierre
Roche
Géologie
Caractérisation
Calcaire
Carrière
Liais
Liais de Senlis
Carrière souterraine
Data for each item begin with name of the tag, "Localisation...", "Date...", "Commune" etc.(the codes like .TI, .DA etc. are not important), followed by a value, and every tag or value are in successive rows, around ten thousands. I would like to have something like this, with tags as column headers:
Localisation et dénomination | Date du cliché | Commune | Département-Région-Pays | etc.
Provenance des matériaux de monuments de Senlis | Janvier 1970 | Senlis | Oise, Picardie, France | etc.
Any idea ?
Thanks

I'd suggest approaching as follows:
Import the data you have into OpenRefine in a single column (lets call it "Data")
Remove all the "codes" from your data:
Create a custom text facet using the GREL value.startsWith(".").toString()
Select "true" in that facet
Remove all the selected rows
Remove the facet
Remove all the blank rows from your data
Facet by blank (null or empty)
Select "true" in that facet
Remove all the selected rows
Remove the facet
Add a new column to your OpenRefine project using "Add column based on existing column" from the "Data" column
Use the GREL if(value.endsWith(":"),value,"")
Call the new column (for example) "Key"
Move the new Key column to be the first in the project
In the Key column use "Fill down"
Remove the rows that have the "Key" in both the Key and Data columns
On the Data column, create a custom text facet with the GREL value.endsWith(":").toString()
Select "true" in that facet
Remove all the selected rows
Remove the facet
Transpose the data based:
In the Key column select Transpose -> Columnize by key/value columns
Select "Key" as the Key column and "Data" as the Value column
Click OK
The result should be a table with each of your headings as the column headings, and the values listed underneath. Because some of your headings have multiple values (e.g. for a single "Localisation et dénomination :" you can have many "Département-Région-Pays :" etc. you may need to use the "Record" function in OpenRefine together with the "Join multi-valued cells" function to get the values into a single comma delimited cell

Related

How to extract the data between the last two brackets in Postgresql?

So for example I have the following column:
Hello this is (BR20134)
Paris, France (BR10293)
(BR62543) Spice girls (BR6729)
I need to have following column
BR20134
BR10293
BR6729
I am using this query to extract what's between columns:
select substring(Column FROM '\((BR.+)\)')
FROM Table
but obviously it only works if there is one pair of brackets in a field and not two.
I am using Postgresql btw.
Thanks
This always extracts the last parenthesized expression:
substring(textcolumn FROM '\((BR[^)]+)\)[^(]*$')

OpenRefine split column with repetitive values

I have a single column in OpenRefine like this:
Title
A Star is born
Author
George Cukor
Date
1954
Other tags...
Data for each item begin with name of the tag (Title, Author, Date etc.), followed by a value, and every tag or value are in successive rows, around ten thousands.
I would like to have as many columns as tags and as many rows as items containing title, date, author etc., something like this:
Title | Author | Date | etc.
A Star is born | George Cukor | 1954 | etc.
Any idea ?
Thanks
This is your original dataset:
Use "Transpose --> Transpose cells in rows into columns" (leaving option 2 as default). You will get this:
Then, on the first column, apply "Transpose --> Columnize by key/value columns" and don't change the default options there either. Final result:
This will obviously work with more tags/columns, but only if each of them is followed by a single value.

How to display concatenated value in MS Access Combobox

I'm trying to fill the combobox with values from a concatenated field in a MS Access query. The embedded image is what is currently shown in the drop down box and what is shown in the box when a value is selected.
The problem is that i do NOT want the values in the drop down box to show as if in columns, but rather as a concatenated string. So, instead of ... TAYLOR | AVICHAI ... it should be TAYLOR, AVICHAI. And additionally, when the value is selected, then instead of showing just TAYLOR it would show TAYLOR, AVICHAI.
I've tried every property I can think of and tried concatenating in the original table, the query and even in vba code AFTER just grabbing the two fields from the database.
Any help? Concatenated View
You need to concatenate the values together in your query and display that field in the combo box.
SELECT peopleID, lastName & ", " & firstName AS name FROM tblPeople
And then in your the format tab of your combobox set:
column count to 2
column widths to 0";1"
This will cause only your column with a width (the combined names) to be displayed in the drop down and when selected.
The documentation says: "In a combo box, the first visible column is displayed in the text box portion of the control."
More precisely, the value shown is the value of the first column with a non-zero width.
Thus, to achieve your goal, modify your query so that it returns the following:
Taylor, Avichai | Taylor | Avichai
Raines, Patricia | Raines | Patricia
...
Then, in the combox box properties, set
the number of columns to 3 and
the column widths such that the first column is very small (but not zero).

Generate a new row for a set of fields of the input row (and generate a query for each new row)

We have a .csv file that has information about the migration flows of people across districts in a city.
We are creating a transformation that loads data from a .csv file to a database (2 tables):
each row has the following information:
- field 1: Name of the origin district
- field 2 (name of the field = name of the destination district): Value of the field = number of people that have changed from origin district to this destination district
This repeats for each destination district.
Suppose there are 20 districts so the total number of fields is 21
We want a step that generates the following output (transform data structure):
A new row with the following structure:
Field 1: Name of the origin district
Field 2: Name of the destination district
Field 3: Number of people that has changed from district "Field 1" to district "Field 2"
So the output of this step must contain 20x20 rows.We will then insert the 400 rows in the following database table:
We can not find any transformation step that can generate this new data structure. We will try the javascript step to manually implement a loop for each origin district and then generate the insert into the database table for each new row.
To move columns from where they are listen in columns in one row (pivoted tables), to have one row pr. column, and a key column you should use Row Normaliser.

Pig Latin: using field in one table as position value to access data in another table

Let's say we have two tables. The first table has following description:
animal_count: {zoo_name:chararray, counts:()}
The meaning of "zoo_name" fields is obvious. "counts" fields contains counts of each specific animal species. In order to know what exact species a given field in "counts" tuple represents, we use another table:
species_position : {species:chararray, position:int}
Let assume we have following data in "species_position" table:
"tiger", 0
"elephant", 1
"lion", 2
This data means the first field in animal_count.counts is the number of tigers in a given zoo. The second field in that tuple is the number of elephants, and so on. So, if we want to represent that fact that "san diego zoo" has 2 tigers, 4 elephants and no lion, we will have following data in "animal_count" table:
"san diego zoo", (2, 4, 0)
Given this setup, how can I write a query to extract the number of a given species in all zoos? I was hoping for something like:
FOREACH species_position GENERATE species, animal_count.counts.$position;
Of course, the "animal_count.counts.$position" won't work.
Is this possible without resorting to UDF?