OpenRefine split column with repetitive values - openrefine

I have a single column in OpenRefine like this:
Title
A Star is born
Author
George Cukor
Date
1954
Other tags...
Data for each item begin with name of the tag (Title, Author, Date etc.), followed by a value, and every tag or value are in successive rows, around ten thousands.
I would like to have as many columns as tags and as many rows as items containing title, date, author etc., something like this:
Title | Author | Date | etc.
A Star is born | George Cukor | 1954 | etc.
Any idea ?
Thanks

This is your original dataset:
Use "Transpose --> Transpose cells in rows into columns" (leaving option 2 as default). You will get this:
Then, on the first column, apply "Transpose --> Columnize by key/value columns" and don't change the default options there either. Final result:
This will obviously work with more tags/columns, but only if each of them is followed by a single value.

Related

Create a list from Table

I managed to enter data to a database via a form;
actually works like a charm.
Now, what I need, is a lookup function (preferably not a form), with which I can search a table on another worksheet.
Let's say, I have an edit field or a cell, in which I enter a term which shall be looked for in a certain column on the table in another worksheet.
I would like to get a list of all entries which contain the word and the value from another cell (an ID).
Example:
Search term: Tom
Table:
Tim | 2
Tom | 3
Tommy | 5
The List should Show Tom and Tommy and their respective IDs,
but everything I tried didn't turn out as intended (mostly didn't work at all)...

Spitting long column values to managable size for presenting data neatly

Hi I was wondering if there is a way to split long column values in this case I am using SSRS to get the distinct values with the number of product ID against a category into a matrix/pivot table in SSRS. The problem lies with the amount of distinct category makes it a nightmare to make the report look pretty shall we say. Is there a dynamic way to split the columns in say groups of 10 to make the table look nicer and easy to read. I was thinking of using in operator then the list of values but that means managing the data every time a new category gets added. Is there a dynamic way to present the data in the best way possible? There are 135 distinct category values
Also I am open to suggestions to make the report to nicer if anyone has any thoughts. I am new to SSRS and trying to get to grips with its.
Here is an example of my problem
enter image description here
Are your column names coming back from the database under the SubCat field you note in the comments above? If so I imagine your dataset looks something like this
Subcat | Logno
---------+---------------
SubCatA | 34
SubCatB | 65
SubCatC | 120
SubCatD | 8
SubCatE | 19
You can edit this so that there is an index of each individual category being returned also, using the Row_Number() function. Add the field
ROW_NUMBER() OVER (ORDER BY SubCat ASC) AS ColID
To your query. This will result in the following.
Subcat | LogNo | ColID
-----------+--------------+----------
SubCatA | 34 | 1
SubCatB | 65 | 2
SubCatC | 120 | 3
SubCatD | 8 | 4
SubCatE | 19 | 5
Now there is a numeric identifier for each column you can perform some logic on it to arrange itself nicely on the page.
This solution involves a Tablix, nested inside a Matrix nested inside a Matrix as follows
First create a Matrix (Matrix1), and set it’s datasource to your dataset. Set the Row Group Properties to group on the following expression where ‘4’ is the number of columns you wish to display horizontally.
=CInt(Floor((Fields!ColID.Value - 1) / 4))
Then in the data section of the Matrix (bottom right corner) insert a rectangle and on this insert a new Matrix (Matrix 2). Remove the leftmost row. Set the column header to be the Column Name SubCat. This will automatically set the column grouping to be SubCat.
Finally, in the Data Section of Matrix 2 add a new Rectangle and Add a Tablix on it. Remove the Header Row, and set it to be one column wide only. Set the Data to be the information you wish to display, i.e. LogNo.
Finally, delete the Leftmost and Topmost rows/columns from Matrix 1 to make it look tidier (Note Delete Column Row only! Not associated groups!)
Then when the report is run it should look similar to the following. Note in my example SubCat = ColName, and LogNo = NumItems, and I have multiple values per SubCat.
Hopefully you find this helpful. If not, please ask for clarification.
Can you do something like this:
The following gives the steps (in two columns, down then across)

Clash of multivalued attribute

I am having a database having name and hobbies(as multivalued attribute) and I want to find out what is the count of occurence of more than one same value
For example
If this is a sample database
A reading
A dancing
B reading
B dancing
Then the result should be
List of hobbies | Number of occurrence
-----------------|---------------------
reading, dancing | 2
I think you have a query like this:
SELECT hobbies, Count(*) As hNo
FROM t
GROUP BY hobbies
That have a result set like this:
hobbies | hNo
--------+------
reading | 2
dancing | 2
Now for this data-set you can follow answers of this question [Concatenate many rows into a single text string] to have them in one row.

SQL: Find highest number if its in nvarchar format containing special characters

I need to pull the record containing the highest value, specifically I only need the value from that field. The problem is that the column is nvarchar format that contains a mix of numbers and special characters. The following is just an example:
PK | Column 2 (nvarchar)
-------------------
1 | .1.1.
2 | .10.1.1
3 | .5.1.7
4 | .4.1.
9 | .10.1.2
15 | .5.1.4
Basically, because of natural sort, the items in column 2 are sorted as strings. So instead of returning the PK for the row containing ".10.1.2" as the highest value i get the PK for the row that contains ".5.1.7" instead.
I attempted to write some functions to do this but it seems what I've written looked way more complicated than it should be. Anyone got something simple or complicated functions are the only way?
I want to make clear that I'm trying to grab the PK of the record that contains the highest Column 2 value.
This query might return what you desire
SELECT MAX(CAST(REPLACE(Column2, '.', '') as INT)) FROM table

How do I create sql query for searching partial matches?

I have a set of items in db .Each item has a name and a description.I need to implement a search facility which takes a number of keywords and returns distinct items which have at least one of the keywords matching a word in the name or description.
for example
I have in the db ,three items
1.item1 :
name : magic marker
description: a writing device which makes erasable marks on whiteboard
2.item2:
name: pall mall cigarettes
description: cigarette named after a street in london
3.item3:
name: XPigment Liner
description: for writing and drawing
A search using keyword 'writing' should return magic marker and XPigment Liner
A search using keyword 'mall' should return the second item
I tried using the LIKE keyword and IN keyword separately ,..
For IN keyword to work,the query has to be
SELECT DISTINCT FROM mytable WHERE name IN ('pall mall cigarettes')
but
SELECT DISTINCT FROM mytable WHERE name IN ('mall')
will return 0 rows
I couldn't figure out how to make a query that accommodates both the name and description columns and allows partial word match..
Can somebody help?
update:
I created the table through hibernate and for the description field, used javax.persistence #Lob annotation.Using psql when I examined the table,It is shown
...
id | bigint | not null
description | text |
name | character varying(255) |
...
One of the records in the table is like,
id | description | name
21 | 133414 | magic marker
First of all, this approach won't scale in the large, you'll need a separate index from words to item (like an inverted index).
If your data is not large, you can do
SELECT DISTINCT(name) FROM mytable WHERE name LIKE '%mall%' OR description LIKE '%mall%'
using OR if you have multiple keywords.
This may work as well.
SELECT *
FROM myTable
WHERE CHARINDEX('mall', name) > 0
OR CHARINDEX('mall', description) > 0