OpenRefine split in multiple cells - openrefine

I have a simple table like this :
id | name
-------------------
1 | Jack, Jeff, Win
-------------------
2 | Jonhy, chin
I want to split the cell name by "," and want to preserve the id, so I after the split the table should look like this :
id | name
-------------------
1 | Jack
-------------------
1 | Jeff
-------------------
1 | Win
-------------------
2 | Jonhy
-------------------
2 | chin
HOWEVER, if I click on edit cell > split multiple-values cell the cell will be separated but the id will be blank (in the case that the cell was separated), below I gave an example that shows you how it looks like after clicking on it
id | name
-------------------
1 | Jack
-------------------
null | Jeff
-------------------
null | Win
-------------------
2 | Jonhy
-------------------
null | chin

After you have done the "Split multi-valued cells" you need to then do a "Fill Down" on the 'id' column. This will have the effect of replicating the ID numbers in each of the 'null' celks created by the Split command

Related

How to update Multiple rows in PostgreSQL with other fields of same table?

I have table which consist of 35,000 records in which half of the rows have name as null value
If the field has null value, i want to update the field with the value of username.
Can anyone help me with this ?
This is sample table
name | username | idnumber | type
----------------------------------------------
-- | jack | 1 | A
Mark | Mark | 2 | B
-- | dev | 3 | A
After update i want it to look like this
name | username | idnumber | type
----------------------------------------------
jack | jack | 1 | A
Mark | Mark | 2 | B
dev | dev | 3 | A
You seem to want:
update t
set name = username
where name is null;
Note that -- is not a typical representation of NULL values. You might consider <null> for instance.

Identify duplicate record in Dataframe

I have a dataframe as below which identifies full name of any person:
-------------------
| f_name | l_name |
-------------------
| abc | xyz |
| xyz | abc |
| pqr | lmn |
-------------------
Here the second row is basically same as first row.
Consider the case where an entry has come in the data where by mistake last name is put under first name(f_name) and first name is put under last name(l_name)
How can I identify and drop/resolve such duplicate/erroneous records in spark dataframe?
Desired Result:
-------------------
| f_name | l_name |
-------------------
| abc | xyz |
| pqr | lmn |
-------------------
Solution could be with udf or SQL or both. Thnx!
Use dropDuplicates function available for Dataset with the proper key:
val df = Seq(
("abc", "xyz"),
("xyz", "abc"),
("pqr", "lmn")
).toDF("f_name", "l_name")
df.withColumn("key", array_sort(array('f_name, 'l_name))).dropDuplicates("key")
+------+------+----------+
|f_name|l_name| key|
+------+------+----------+
| pqr| lmn|[lmn, pqr]|
| abc| xyz|[abc, xyz]|
+------+------+----------+

Newbie in dilemma due to OCD tries to reorder SQL database automatically

Sorry, I'm very new to SQL. I just learned it few hours ago. I'm using MariaDB + InnoDB Engine with HeidiSQL software + CodeIgniter 3. Let's say I have a table named disciples with the following data:
-------------------
| sort_id | name |
-------------------
| 1 | Peter |
| 4 | John |
| 3 | David |
| 5 | Petrus |
| 2 | Matthew |
-------------------
I'm fully aware that it's better to have a column called sort_id to be able to fetch the data using ORDER BY if I prefer a custom sorting. But if I delete row 3, the new table will look like this:
-------------------
| sort_id | name |
-------------------
| 1 | Peter |
| 4 | John |
| 5 | Petrus |
| 2 | Matthew |
-------------------
The thing is I'm having OCD (imagine there are 1000 rows), it hurts my eyes to see this mess with some missing numbers (in this case number 3 - see the above table) under sort_id. I think it has something to do with "relational database". Is there a way to quickly and automatically "re-assign/reset" new sort_id numbers to given rows and sort them ASC order according to the name using SQL code without having to do it manually?
-------------------
| sort_id | name |
-------------------
| 1 | John |
| 2 | Matthew |
| 3 | Peter |
| 4 | Petrus |
-------------------
I figured this out after reading the answer from Lynn Crumbling.
She made me realized I need a primary key in order to have a better management for my rows which is exactly what I was looking for. It happens that InnoDB automatically creates a primary key and is hidden from HeidiSQL interface unless I specify a specific column for example id. Now, I can re-organize my table rows by editing the primary key id and the table row will automatically sort itself the way I want. Before this, I edited the sort_id but the data did not update accordingly because it was not the primary key.
------------------------
| id | sort_id | name |
------------------------
| 1 | 1 | Peter |
| 2 | 4 | John |
| 3 | 5 | Petrus |
| 4 | 2 | Matthew |
------------------------
Thank you.

Can I use the row record as header for a table?

Let say I have 2 table, rowName and realRecord
rowName has 2 column
1.Index______________2.FieldName
1____________________Name
2____________________Surname
3____________________Age
4____________________Gender
And realRecord column header will depend on what I have for field name in the previous table. For example in this case, realRecord will have Name, Surname, Age and Gender as header
I wish to do it as I may have to add additional column and I want to add to rowName table as it will be neater
Assume your table rowName has following records:
Index | FieldName
-------------------
1 | Name
2 | SurName
3 | Age
4 | Gender
Your RealRecord has,
RecordID | EmployeeID | FieldIndex | Value
1 | E123 | 1 | Nishanthi
2 | E123 | 2 | Grashia
and so on... If you want a new field say Nationality, You can perhaps add additional records as below:
rowName :
Index | FieldName
-------------------
5 | Nationality
RealRecord
RecordID | EmployeeID | FieldIndex | Value
100 | E123 | 5 | Indian
You can achieve this scenario easily if you have realRecord structure as in my answer (Multiple Rows instead of Multiple Headers as per your existing spec)

MS Access 2007 Report - Spanning Row and Column in Tabular Layout

How to span a row and column in tabular layout like below:
---------------------
| Name |
ID | --------------
| First | Last |
---------------------
Sorry for this newbie question
Thanks before
OK. It's quite simple. First create your report in tabular format, then modify the report to put the fields where you want them. You can rearrange the fields however you want. So you start out with:
--------------------------
ID | Name| First | Last |
--------------------------
Then just move First & Last underneath Name, so you get:
---------------------
| Name |
ID | --------------
| First | Last |
---------------------
If you're asking if there's an automatic way, then no.