splitting a dict-like varchar column into multiple columns using SQL presto

splitting a dict-like varchar column into multiple columns using SQL presto - sql

I have a column in a table that is varchar but has a dictionary-like format. Some rows have more key-value pairs (for example first row has 3 pairs and second row has 4 pairs).
For example:
column
{"customerid":"12345","name":"John", "likes":"Football, Running"}
{"customerid":"54321","name":"Sam", "likes":"Art", "dislikes":"Hiking"}
I need a query that can "explode" the column like so:
customerid
name
likes
dislikes
12345
John
Football, Running
54321
Sam
Art
Hiking
No extra rows are added. Just extra columns (There are other already existing columns in the table).
I've tried casting the varchar column to an array and then using UNNEST function but it doesn't work. I think that method creates extra rows.
I am using Prestosql.

Your data looks like json, so you can parse and process it:
-- sample data
WITH dataset (column) AS (
VALUES ('{"customerid":"12345","name":"John", "likes":"Football, Running"}' ),
('{"customerid":"54321","name":"Sam", "likes":"Art", "dislikes":"Hiking"}')
)
--query
select json_extract_scalar(json_parse(column), '$.customerid') customerid,
json_extract_scalar(json_parse(column), '$.name') name,
json_extract_scalar(json_parse(column), '$.likes') likes,
json_extract_scalar(json_parse(column), '$.dislikes') dislikes
from dataset
Output:
customerid
name
likes
dislikes
12345
John
Football, Running
54321
Sam
Art
Hiking
In case of many columns you can prettify it by casting to parsed json to map (depended on contents it can be map(varchar, varchar) or map(varchar, json)):
--query
select m['customerid'] customerid,
m['name'] name,
m['likes'] likes,
m['dislikes'] dislikes
from (
select cast(json_parse(column) as map(varchar, varchar)) m
from dataset
)

Related

How to get a postgresql query result as JSON

I have text field in a table and I want query this field using where a condition: I want to query all records that has at least one word from as list of words and returns a JSON like this:
text
The employee was fired today
He likes chocolate a lot
She eat chocolate today
Car was stolen yesterday
select * from tbl
where text CONTAINS ANY ['today','likes','eat']
Desidered Output 1:
{"id":"1", "text":"The employee was fired today", "tag":"today"}
{"id":"2", "text":"He likes chocolate a lot", "tag":"likes"}
{"id":"3", "text":"She eat chocolate today", "tag":["today","eat"]}
Desidered Output 2:
text tag tag_counts
The employee was fired today today 1
He likes chocolate a lot likes 1
She eat chocolate today eat, today 2
I would like to get any of these outputs.
I already found that I can use WHERE IN ('today','likes','eat') but I can't find out how to get the result in any of the desired output, if possible.

I chose the column name words for the text column. "text" is a basic type name and too confusing as such.
For your given table with a plain text column:
SELECT *
FROM tbl t
, LATERAL (
SELECT string_agg(tag, ', ') AS tags, count(*) AS tag_count
FROM (
SELECT unnest(string_to_array(words, ' ')) tag
INTERSECT ALL
SELECT unnest ('{today, likes, eat}'::text[])
) i
) ct
WHERE string_to_array(t.words, ' ') && '{today, likes, eat}';
Simpler with a text array (text[]) in the table to begin with:
SELECT *
FROM tbl1 t
, LATERAL (
SELECT string_agg(tag, ', ') AS tags, count(*) AS tag_count
FROM (
SELECT unnest(words) tag
INTERSECT ALL
SELECT unnest ('{today, likes, eat}'::text[])
) i
) ct
WHERE t.words && '{today, likes, eat}';
db<>fiddle here
Can be supported with a GIN index. An expression index for the text column:
CREATE INDEX tbl_words_gin_idx ON tbl USING gin (string_to_array(words, ' '));
Simpler, yet again, for text[]:
CREATE INDEX tbl1_words_gin_idx ON tbl1 USING gin (words);
See:
Query and order by number of matches in JSON array

how to get the first tuple in a string column using presto

so i am having a column in the table, the data type of the column is varchar, but it contains an array of tuples, so what I need is to extract the first tuple of the array in the table
this is the original table
userid
comments
1
[["hello world",1],["How did you",1],[" this is the one",1]]
2
[["hello ",1],["How ",1],[" this",1]]
and this is what i am looking for , please notice that the datatype of 'comments' column is varchar.
userid
comments
1
hello world
2
hello

json_extract_scalar should do the trick:
WITH dataset (userid, comments) AS (
VALUES (1, json '[["hello world",1],["How did you",1],[" this is the one",1]]'),
(2, json '[["hello ",1],["How ",1],[" this",1]]')
)
--query
select userid,
json_extract_scalar(comments, '$[0][0]')
from dataset
Output:
userid
comments
1
hello world
2
hello
Note that it will allow to extract only single value, if you want multiple values you will need to do some casting (similar to one done here but using arrays, for example array(json)).

SQLite3: merge rows with common columns

For some context I have a table in SQLite3 that currently looks like this:
What I am looking to do is merge rows with the same breed. The same columns will not be populated in both cases. So far I have tried this kind of query but it doesn't really do the job I am looking for, as it will not deduplicate or merge the rows as desired. Also it seems to be difficult to generalise to all columns without having to manually type out each column name.
select distinct t1.breed, coalesce(t1.dog_group_1, t2.dog_group_1) from breed_merge t1 left join breed_merge t2 on t1.breed = t2.breed;
Output:
Afador|
Affenhuahua|
Affenpinscher|
Affenpinscher|GROUP 1 - TOYS
Afghan Hound|
Afghan Hound|GROUP 4 - HOUNDS
...
Desired output:
Afador|
Affenhuahua|
Affenpinscher|GROUP 1 - TOYS
Afghan Hound|GROUP 4 - HOUNDS
...

For this sample data, where you have max 2 rows for each breed and each of these 2 rows (if they exist) contain a value or null, all you have to do is group by breed and use an aggregate function like MAX() for each of the other columns:
SELECT breed, MAX(imgsrc) imgsrc, MAX(dog_group_1) dog_group_1, .....
FROM breed_merge
GROUP BY breed

Get union and intersection of jsonb array in Postgresql

I have a DB of people with jsonb column interests. In my application user can search for people by providing their hobbies which is set of some predefined values. I want to offer him a best match and in order to do so I would like to count match as intersection/union of interests. This way the top results won't be people who have plenty of hobbies in my DB.
Example:
DB records:
name interests::jsonb
Mary ["swimming","reading","jogging"]
John ["climbing","reading"]
Ann ["swimming","watching TV","programming"]
Carl ["knitting"]
user input in app:
["reading", "swimming", "knitting", "cars"]
my script should output this:
Mary 0.4
John 0.2
Ann 0.16667
Carl 0.25
Now I'm using
SELECT name
FROM people
WHERE interests #>
ANY (ARRAY ['"reading"', '"swimming"', '"knitting"', '"cars"']::jsonb[])
but this gives me even records with many interests and no way to order it.
Is there any way I can achieve it in a reasonable time - let's say up to 5 seconds in DB with around 400K records?
EDIT:
I added another example to clarify my calculations. My calculation needs to filter people with many hobbies. Therefore match should be calculated as Intersection(input, db_record)/Union(input, db_record).
Example:
input = ["reading"]
DB records:
name interests::jsonb
Mary ["swimming","reading","jogging"]
John ["climbing","reading"]
Ann ["swimming","watching TV","programming"]
Carl ["reading"]
Match for Mary would be calculated as (LENGTH(["reading"]))/(LENGTH(["swimming","reading","jogging"])) which is 0.3333
and for Carl it would be (LENGTH(["reading"]))/LENGTH([("reading")]) which is 1
UPDATE: I managed to do it with
SELECT result.id, result.name, result.overlap_count/(jsonb_array_length(persons.interests) + 4 - result.overlap_count)::decimal as score
FROM (SELECT t1.name as name, t1.id, COUNT(t1.name) as overlap_count
FROM (SELECT name, id, jsonb_array_elements(interests)
FROM persons) as t1
JOIN (SELECT unnest(ARRAY ['"reading"', '"swimming"', '"knitting"', '"cars"'])::jsonb as elements) as t2 ON t1.jsonb_array_elements = t2.elements
GROUP BY t1.name, t1.id) as result
JOIN persons ON result.id = persons.id ORDER BY score desc
Here's my fiddle https://dbfiddle.uk/?rdbms=postgres_12&fiddle=b4b1760854b2d77a1c7e6011d074a1a3
However it's not fast enough and I would appreciate any improvements.

One option is to unnest the parameter and use the ? operator to check each and every element the jsonb array:
select
t.name,
x.match_ratio
from mytable t
cross join lateral (
select avg( (t.interests ? a.val)::int ) match_ratio
from unnest(array['reading', 'swimming', 'knitting', 'cars']) a(val)
) x
It is not very clear what are the rules behind the result that you are showing. This gives you a ratio that represents the percentage of values in the parameter array that can be found in the interests of each person (so Mary gets 0.5 since she has two interests in common with the search parameter, and all other names get 0.25).
Demo on DB Fiddle

One option would be using jsonb_array_elements() to unnest the jsonb column :
SELECT name, count / SUM(count) over () AS ratio
FROM(
SELECT name, COUNT(name) AS count
FROM people
JOIN jsonb_array_elements(interests) AS j(elm) ON TRUE
WHERE interests #>
ANY (ARRAY ['"reading"', '"swimming"', '"knitting"', '"cars"']::jsonb[])
GROUP BY name ) q
Demo

How can an get count of the unique lengths of a string in database rows?

I am using Oracle and I have a table with 1000 rows. There is a last name field and
I want to know the lengths of the name field but I don't want it for every row. I want a count of the various lengths.
Example:
lastname:
smith
smith
Johnson
Johnson
Jackson
Baggins
There are two smiths length of five. Four others, length of seven. I want my query to return
7
5
If there were 1,000 names I expect to get all kinds of lengths.
I tried,
Select count(*) as total, lastname from myNames group by total
It didn't know what total was. Grouping by lastname just groups on each individual name unless it's a different last name, which is as expected but not what I need.
Can this be done in one SQL query?

SELECT Length(lastname)
FROM MyTable
GROUP BY Length(lastname)

select distinct(LENGTH(lastname)) from mynames;

Select count(*), Length(column_name) from table_name group by Length(column_name);
This will work for the different lengths in a single column.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

splitting a dict-like varchar column into multiple columns using SQL presto - sql

Related

How to get a postgresql query result as JSON

how to get the first tuple in a string column using presto

SQLite3: merge rows with common columns

Get union and intersection of jsonb array in Postgresql

How can an get count of the unique lengths of a string in database rows?

Categories

Resources