Creating Table from Comma Separated String in Hive - hive

I have the following data:
1 'a'
2 'b'
3 'c'
3 'd'
And I wanted to create a table in hive, from that data, with the following columns:
map | value
1 | a
2 | b
3 | c
3 | d
There is a restriction on the starting data type, which has to be a string. From there onward, I can manipulate the data structure however I want. (the starting input has to be a string type)
Any help would be appreciated.

Related

Is it possible to map values onto a table given corresponding row and column indices in SQL?

I have a SQL table in the form of:
| value | row_loc | column_loc |
|-------|---------|------------|
| a | 0 | 1 |
| b | 1 | 1 |
| c | 1 | 0 |
| d | 0 | 0 |
I would like to find a way to map it onto a table/grid, given the indices, using SQL. Something like:
| d | a |
| c | b |
(The context being, I would like to create a colour map with colours corresponding to values a, b, c, d, in the locations specified)
I would be able to do this iteratively in python, but cannot figure out how to do it in SQL, or if it is even possible! Any help or guidance on this problem would be greatly appreciated!
EDIT: a, b, c, d are examples of numeric values (which would not be able to be selected using named variables in practice, so I'm relying on selecting them based on location. Also worth noting, the number of rows and columns will always be the same. The value column is also not the primary key to this table, so is not necessarily unique, it is just as a continuous value.
Yes, it is possible, assuming the column number is limited since SQL supports only determined number of columns. The number of rows in result set depends on number of distinct row_loc values so we have to group by column row_loc. Then choose value using simple case.
with t (value, row_loc, column_loc) as (
select 'a', 0, 1 from dual union all
select 'b', 1, 1 from dual union all
select 'c', 1, 0 from dual union all
select 'd', 0, 0 from dual
)
select max(case column_loc when 0 then value else null end) as column0
, max(case column_loc when 1 then value else null end) as column1
from t
group by row_loc
order by row_loc
I tested it on Oracle. Not sure what to do if multiple values match on same coordinate, I chose max. For different vendors you could also utilize special clauses such as count ... filter (where ...). Or the Oracle pivot clause can also be used.

How to unnest two lists from two columns in BigQuery without cross product, as individual rows

I have a table in BigQuery, it has two columns, each column contains an array. for a given row, both columns will contain arrays of the same length, but that length can vary from row to row:
WITH tbl AS (
select ['a','b','c'] AS one, [1,2,3] as two
union all
select ['a','x'] AS two, [10,20] as two
)
select * from tbl
So the table will look like:
row | one | two
-----------------------
1 | [a,b,c] | [1,2,3]
2 | [a,x] | [10,20]
I would like to unnest in such a way that each row, in the new table, will have an element of an array from column1 and an corresponding element from column2. So from the table above, I am looking to get:
row | one | two
---------
1 | a | 1
2 | b | 2
3 | c | 3
4 | a | 10
5 | x | 20
Any help would be much appreciated! Thanks!
below is for BigQuery Standard SQL
#standardSQL
SELECT z.*
FROM `project.dataset.table` t,
UNNEST(ARRAY(
SELECT AS STRUCT one, two
FROM UNNEST(one) one WITH OFFSET
JOIN UNNEST(two) two WITH OFFSET
USING(OFFSET)
)
) z
You can test, play with above using sample data from your question - result will be
Row one two
1 a 1
2 b 2
3 c 3
4 a 10
5 x 20
I dont fully understand the syntax, could you please explain it?
Explanation:
Step 1
for each row in table below array is calculated
ARRAY(
SELECT AS STRUCT one, two
FROM UNNEST(one) one WITH OFFSET
JOIN UNNEST(two) two WITH OFFSET
USING(OFFSET)
)
Elements of this array are structs with respective value from two column - they are being matched with each other by JOIN'ing on their positions in initial arrays (OFFSET)
Step 2
Then this array gets UNNEST'ed and cross JOIN'ed with respective row in the table - and whole row is actually ignored and only that struct (z) is being brought into to the output
Step 3
And finally to output not a a struct but rather as a separate columns - z.* is used
Hope this helped :o)

Open Refine--create new column by looking up values from a pair of columns

I have a table in OpenRefine with columns A, B, and C like this:
A | B | C | D
---|---|---|---
a | 1 | b | 2
b | 2 | |
c | 3 | a | 1
d | 4 | c | 3
I want to create a column D by fetching the values from B corresponding to those in C, using A as an index. Hope that makes sense? I'm not having much luck figuring out how to do this in GREL.
You can use the 'cross' function to look up values across the project. Cross is usually used to look up values in a different OpenRefine project/file, but actually it works the same if you point it back at the same project you are already in.
So - from Col C, you can use "Add new column based on this column" with the GREL:
cell.cross("Your project name","Col A")
You'll get back an array of 'rows' - and if the same value appears in Column A multiple times you could get multiple rows.
To extract a value from the array you can use something like:
forEach(cell.cross("Your project name","Col A"),r,r.cells["Col B"].value).join("|")
The final 'join' is necessary to convert the array into a string which is required to be able to store the result (arrays can't be stored directly)

sqlite string replace/delete

I have a column in my database table with name tags which contains comma separated strings and it has records like this-
index | tags
-------------
1 | a,b,c
2 | b
3 | c
4 | z
5 | b,a,c
6 | p,f,w
7 | a,c,b
(for simplicity i am denoting strings with characters)
Now i want to replace/delete particular string.
Delete - say I want to delete b from all rows. If tags column become empty after this operation that row/record should be deleted (index 2 in this case). My records should look like this after this operation.
index | tags
-------------
1 | a,c
3 | c
4 | z
5 | a,c
6 | p,f,w
7 | a,c
Replace - say I want to replace all a with k on original records
index | tags
-------------
1 | k,b,c
2 | b
3 | c
4 | z
5 | b,k,c
6 | p,f,w
7 | k,c,b
Question - I am thinking of using replace function somehow but not sure how to meet above requirement with that. Can i do this in a single sql command? If not please suggest best way to do this (may be multiple sql commands).
I use MSSQL, I'm not sure sqlite. But, you use REPLACE function, like this:
To remove b:
UPDATE Your_Table
SET tags = REPLACE(REPLACE(tags, ',b', ' '), 'b,', ' ')
UPDATE Your_Table
SET tags = NULL WHERE tags = 'b'
To replace a with k:
UPDATE Your_Table
SET tags = REPLACE(tags, 'a', 'k')

How to merge two column with same data type of a table into one column using PostgreSql

I have found lots of answer to merge two column into one, but I want something like below:
I have a table named A
id_one | id_two
---------------
1 | 3
3 | 9
3 | 6
I want to combine these two column into one like
id
----
1
3
9
6
use Union to combine the two column which also removes duplicates
select id_one from yourtable
union
select id_two from yourtable