Transpose rows into columns with OpenRefine (variable number of rows)

Transpose rows into columns with OpenRefine (variable number of rows) - openrefine

Can anyone help me transpose a variable number of rows into columns?
I have data like this:
ID, NAMES
1, Jon
1, Jonny
1, Jonathan
2, James
3, Bill
3, William
4, Robert
4, Bob
4, Bobby
4, Rob
And want this:
ID, Name1, Name2, Name3, Name4
1, Jon, Jonny, Jonathan
2, James
3, Bill, William
4, Robert, Bob, Bobby, Rob
In other words, for each ID I want to find all rows with that ID and put each name into a separate column (or a single column with the names in a comma-separated list)
I know that each ID will have a maximum of 4 names.
I think this is easy with OpenRefine but I really can't figure it out. Can anyone help?

The way you can approach this is:
Create OpenRefine "records" based on the ID field
Merge together the names associated with one record into a single cell
Split the new multi-valued cell of names into multiple columns
In detail:
In the ID column use "Edit Cells -> Blank down"
Make sure you are in records mode (top left corner of the data grid "Show as: Records"
In NAMES column use "Edit Cells -> Join multi-valued cells" specifying a separator you are confident won't appear in any of the names (e.g. a vertical pipe character | )
In the NAMES column then use "Edit Column -> Split into several columns" specifying the same separator
This should give the outcome you are looking for here

Related

SQL WHERE column values into capital letters

Let's say I have the following entries in my database:
Id
Name
12
John Doe
13
Mary anne
13
little joe
14
John doe
In my program I have a string variable that is always capitalized, for example:
myCapString = "JOHN DOE"
Is there a way to retrieve the rows in the table by using a WHERE on the name column with the values capitalized and then matching myCapString?
In this case the query would return two entries, one with id=12, and one with id=14
A solution is NOT to change the actual values in the table.

A general solution in Postgres would be to capitalize the Name column and then do a comparison against an all-caps string literal, e.g.
SELECT *
FROM yourTable
WHERE UPPER(Name) = 'JOHN DOE';
If you need to implement this is Knex, you will need to figure out how to uppercase a column. This might require using a raw query.

OpenRefine split column with repetitive values

I have a single column in OpenRefine like this:
Title
A Star is born
Author
George Cukor
Date
1954
Other tags...
Data for each item begin with name of the tag (Title, Author, Date etc.), followed by a value, and every tag or value are in successive rows, around ten thousands.
I would like to have as many columns as tags and as many rows as items containing title, date, author etc., something like this:
Title | Author | Date | etc.
A Star is born | George Cukor | 1954 | etc.
Any idea ?
Thanks

This is your original dataset:
Use "Transpose --> Transpose cells in rows into columns" (leaving option 2 as default). You will get this:
Then, on the first column, apply "Transpose --> Columnize by key/value columns" and don't change the default options there either. Final result:
This will obviously work with more tags/columns, but only if each of them is followed by a single value.

Custom Sort Order in CAML Query

How would one go about telling a CAML query to sort the results in a thoroughly custom order?
.
For instance, for a given field:
-- when equal to 'Chestnut' at the top,
-- then equal to 'Zebra' next,
-- then equaling 'House'?
Finally, within those groupings, sort on a second condition (such as 'Name'), normally ascending.
So this
ID Owns Name
————————————————————
1 Zebra Sue
2 House Jim
3 Chestnut Sid
4 House Ken
5 Zebra Bob
6 Chestnut Lou
becomes
ID Owns Name
————————————————————
6 Chestnut Lou
3 Chestnut Sid
5 Zebra Bob
1 Zebra Sue
2 House Jim
4 House Ken
In SQL, this can be done with Case/When. But in CAML? Not so much!

CAML does not have such a sort operator by my knowledge. The workaround might be that you add a calculated column to the list with a number datatype and formula
=IF(Owns="Chestnut",0,IF(Owns="Zebra",1,IF(Owns="House",3,999))).
Now it is possible to order on the calculated column, which translates the custom sort order to numbers. Another solution is that you create a second list with the items to own, and a second column which contains their sort order. You can link these two lists and order by the item list sort order. The benefit is that a change in the sort order is as easy as editing the respective listitems.

SQL Server Multiple Likes

I have an unusual question that seems simple but has me stumped in a SQL Server stored procedure.
I have 2 tables as described below.
tblMaster
ID, CommitDate, SubUser, OrigFileName
Sample data
ID CommitDate SubUser OrigFileName
----------------------------------------
1 2014-10-07 Test1 Test1.pdf
2 2014-10-08 Test2 Test2.pdf
3 2014-10-09 Test3 Test3.pdf
The above table is basically the header table that tracks the committed files. In addition to this, we have a details table with the following structure.
tblIndex
ID, FileID (Linking column to the header row described above), Word
Sample data:
1. 1, 1, Oil
2. 2, 1, oil
3. 3, 2, oil
4. 4, 2, tank
5. 5, 3, tank
The above rows represent the words that we want to search on and if a certain criteria matches return the corresponding filename/header row ID. What I would love to figure out to do is if I do a search for
One word (i.e. "oil"), then the system should respond with all the files that meet the criteria (easiest case and figured out)
If more than one word is searched for (i.e. "oil" and "tank"), then we should only see the second file since it is the only one that has both oil and tank as its key words.
Tried using a LIKE "%oil%" AND LIKE "%tank%" and that resulted in no rows being created since one value can't be both oil and tank.
Tried doing a LIKE "%oil%" OR LIKE "%tank%" but I get files 1, 2, and 3 since the OR is inclusive of all the other rows.
One last thing, I recognize I could just do a search for the first term and then save the results into a temp table and then search for the second term in that second table and I will get what I am looking for. The problem with that is that I don't exactly know how many items will be searched for. I don't want to have to create a structure where I am constantly having to store data into another temp table if someone does a search for 6 "keywords".
Any help/ideas will be much appreciated.

try this ! slightly differing from the previous answer
SELECT distinct FileID,COUNT(distinct t.word) FROM tblIndex t
WHERE t.Word LIKE '%oil%' OR t.Word LIKE '%tank%'
GROUP BY FileID
HAVING COUNT(distinct t.word) > 1

One simple option would be to do something like this :
SELECT FileID
FROM tblIndex t
WHERE t.Word LIKE '%oil%' OR t.Word LIKE '%tank%'
GROUP BY FileID
HAVING COUNT(*) > 1
This assume you do not have duplicate in your tblIndex.
I'm also unsure whether you really need the like or not. According to your sample data you don't, a basic comparison would be way more efficient and would avoid possible collisions.

How do you query only part of the data in the row of a column - Microsoft SQL Server

I have a column called NAME, I have 2000 rows in that column that are filled with people's full names, e.g. ANN SMITH. How do I do a query that will list all the people whose first name is ANN? There are about 20 different names whose first name is ANN but the surname is different.
I tried
and (NAME = 'ANN')
but it returned zero results.
I have to enter the FULL name and (NAME = 'ANN SMITH') ANN SMITH to even get a result .
I just want to list all the people with there first name as ANN

Try in your where clause:
Where Name like 'ANN %'
Should work mate.
ANN% will find all results where ANN is first then anything after.
%ANN% will find the 3 letters ANN in any part of that rows field.
Hope it helps
Also usually Name is separated into First names and second name columns.
this will save Having to use wild cards in your SQL and provide A bit more normalized data.

SELECT NAME
FROM NAMES
WHERE NAME LIKE 'ANN %'
This should wildcard select anything that begins with 'ANN' followed by a space.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas