how to get the first tuple in a string column using presto - sql

so i am having a column in the table, the data type of the column is varchar, but it contains an array of tuples, so what I need is to extract the first tuple of the array in the table
this is the original table
userid
comments
1
[["hello world",1],["How did you",1],[" this is the one",1]]
2
[["hello ",1],["How ",1],[" this",1]]
and this is what i am looking for , please notice that the datatype of 'comments' column is varchar.
userid
comments
1
hello world
2
hello

json_extract_scalar should do the trick:
WITH dataset (userid, comments) AS (
VALUES (1, json '[["hello world",1],["How did you",1],[" this is the one",1]]'),
(2, json '[["hello ",1],["How ",1],[" this",1]]')
)
--query
select userid,
json_extract_scalar(comments, '$[0][0]')
from dataset
Output:
userid
comments
1
hello world
2
hello
Note that it will allow to extract only single value, if you want multiple values you will need to do some casting (similar to one done here but using arrays, for example array(json)).

Related

splitting a dict-like varchar column into multiple columns using SQL presto

I have a column in a table that is varchar but has a dictionary-like format. Some rows have more key-value pairs (for example first row has 3 pairs and second row has 4 pairs).
For example:
column
{"customerid":"12345","name":"John", "likes":"Football, Running"}
{"customerid":"54321","name":"Sam", "likes":"Art", "dislikes":"Hiking"}
I need a query that can "explode" the column like so:
customerid
name
likes
dislikes
12345
John
Football, Running
54321
Sam
Art
Hiking
No extra rows are added. Just extra columns (There are other already existing columns in the table).
I've tried casting the varchar column to an array and then using UNNEST function but it doesn't work. I think that method creates extra rows.
I am using Prestosql.
Your data looks like json, so you can parse and process it:
-- sample data
WITH dataset (column) AS (
VALUES ('{"customerid":"12345","name":"John", "likes":"Football, Running"}' ),
('{"customerid":"54321","name":"Sam", "likes":"Art", "dislikes":"Hiking"}')
)
--query
select json_extract_scalar(json_parse(column), '$.customerid') customerid,
json_extract_scalar(json_parse(column), '$.name') name,
json_extract_scalar(json_parse(column), '$.likes') likes,
json_extract_scalar(json_parse(column), '$.dislikes') dislikes
from dataset
Output:
customerid
name
likes
dislikes
12345
John
Football, Running
54321
Sam
Art
Hiking
In case of many columns you can prettify it by casting to parsed json to map (depended on contents it can be map(varchar, varchar) or map(varchar, json)):
--query
select m['customerid'] customerid,
m['name'] name,
m['likes'] likes,
m['dislikes'] dislikes
from (
select cast(json_parse(column) as map(varchar, varchar)) m
from dataset
)

How to extract all (including int and float) numerical values in a string column in Google BigQuery?

I have a table Table_1 on Google BigQuery which includes a string column str_column. I would like to write a SQL query (compatible with Google BigQuery) to extract all numerical values in str_column and append them as new numerical columns to Table_1. For example, if str_column includes first measurement is 22 and the other is 2.5; I need to extract 22 and 2.5 and save them under new columns numerical_val_1 and numerical_val_2. The number of new numerical columns should ideally be equal to the maximum number of numerical values in str_column, but if that'd be too complex, extracting the first 2 numerical values in str_column (and therefore 2 new columns) would be fine too. Any ideas?
Consider below approach
select * from (
select str_column, offset + 1 as offset, num
from your_table, unnest(regexp_extract_all(str_column, r'\b([\d.]+)\b')) num with offset
)
pivot (min(num) as numerical_val for offset in (1,2,3))
if applied to sample data like in your question - output is

Performing numerical operations on array values column in Hive

Perform numerical operations on a hive table
id
array
value
a
[10:20]
2
b
[30:40:50]
5
I want to convert above table into the following
id
array
value
converted_array
a
[10:20]
2
[20:40]
b
[30:40:50]
5
[150:200:250]
I want to multiply 'array' column with 'value' column, and create a new column 'converted_array' using hql. I know how to do this in python but I was wondering if there's a way to do it in hive.
Try using collect_set over exploded view as shown below, this will multiply the constant integer value in the value column with each element in the array(using explode) and then return the result(using collect_set):
select id, collect_set(pve) as pv, value1, collect_set(pve*value1) as converted_array
from stacko1
lateral view explode(price_lov) t as pve
group by id,value1;
This should give you the desired output.

Get Postgres Table Data as Json Without Field Names

I want to convert Postgres table data to JSON without repeated field names at JSON result. When I use current PostgreSQL json functions, JSON results look likes similar to this : [{"id":"1234","name":"XYZ"},....]. But by this way, all field names unnecessarily exists on every row. So we does not prefer this way for the network bandwith.
We want to get JSON result such as [["1234","XYZ"],....]. So total length of result json string will be much smaller.
Well, you could use json(b)_build_array() to turn each record to an array - this requires you to enumerate the column names:
select jsonb_build_array(id, name) js from mytable
If you want all rows in a single array of arrays, then you can use aggregation on top of this:
select jsonb_agg(jsonb_build_array(id, name)) res from mytable
Demo on DB Fiddle:
select jsonb_agg(jsonb_build_array(id, name)) res
from (values(1, 'foo'), (2, 'bar')) as t(id, name)
| res |
| :----------------------- |
| [[1, "foo"], [2, "bar"]] |

need to split a column value

I have a below table
id name total
1 a 2
2 b 3
3 c,d,e,f 15
Expected Output:-
id name total
1 a 2
2 b 3
3 c 15
4 d 15
5 e 15
5 f 15
I tried split function and also XML, but didn't work.
As you dont specify the DB name, Assuming SQL SERVER. You can try this one.
Working Example
SELECT A.[id],
Split.a.value('.', 'VARCHAR(100)') AS String,A.total
FROM (SELECT [id],
CAST ('<M>' + REPLACE([name], ',', '</M><M>') + '</M>' AS XML) AS String ,
[total]
FROM #t) AS A
CROSS APPLY String.nodes ('/M') AS Split(a);
Refer this article
Which version of SQL are you using?
The split function is for splitting a string of text, but what you are requesting is a change to the format of the table itself.
Your table has a tuple of id=3, name=c,d,e,f, total=15.
If you want id=3, name=c and so on, you have to change the data.
From the way your question is phrased, it implies that you want the data to be presented in a different way, but the id is the defining column which differentiates between rows in the database.
You could automatically generate a new table, in which case the split statement would be useful to get each element out of your comma separated record.
Once you have that list of items, assuming your id field is an identity field (auto incrementing), you could run an insert statement for each element.
You might be able to get the sort of output you're looking for using an inner select that splits the comma separated list of values, but you would need some procedural SQL (or T-SQL... you do not specify your SQL server) to iterate over the values and insert them into a new table.
If you do go down this route, the id values will have to be thrown away, and you would treat the list as just a raw data set.
EDIT: The example posted by Have No Display Name is about as close as you're going to get with the data in the form it is.
The IDs for the names 'c','d','e' and 'f' will all be 3, but your format will be very close.