Getting JSON Array elements as records - sql

I'm trying to extract elements from a simple JSON array in a PostgreSQL table like:
id(int) vtags(jsonb)
-------- -----------------------
1 {"tags": ["a","b","c"]}
2 {"tags": ["x","y"]}
I would like to devise a SELECT statement to produce an output like:
id tags
--- -----------------------
1 a
1 b
1 c
2 x
2 y

Use jsonb_array_elements() to unnest the elements of the array:
select t.id,
jt.tag
from the_table t
cross join jsonb_array_elements_text(jt.vtags -> 'tags') as jt(tag)
order by t.id;

Related

Get rows which don't satisfy any of below condition

I have a table with id and different values. I want to have my output something which would looks like this
id value
----------
1 t
1 f
2 t
3 f
4 f
4 f
Expected output
id value
---------
3 f
4 f
If we look at the output, my condition here to check if my id has all f as value then return f, if it has all t then don't, and if any one of the id has t also don't include row in output.
How to achieve this ?
create subquery and exclude the values accordingly. i think hiveql supports where clause subqueries.
select id, value
from your_data_source
where id not in
(select id from your_data_source where value='t' group by id)
group by id, value

Resultset rows into array Json [PrestoDB]

I have a table where I have multiple rows against id. I want to convert each row as an entry to an array of array containing key-value pair in prestoDB using sql
id
col1
col2
col3
1
2ad
ff.
sdfs
1
asf.
erew
dsds
1
vfdv
dfds
sdf
and I want the output to be something like this
id
value
1
{{'col1':'2ad','col2':'ff','col3':'sdfs'},{'col1':'asf','col2':'erew','col3':'dsds'},{'col1':'vfdv','col2':'dfds','col3':'sdf'}}
...
....
with the below query I am able to achieve almost:
select id, CAST( MAP(Array['col1','col2','col3'],Array [k."col1",
k."col2", k."col3"]) As Json) as tt
from table k order by 1;
|id| value|
|--|---- |
| 1| {'col1':'2ad','col2':'ff','col3':'sdfs'}|
|1|{'col1':'asf','col2':'erew','col3':'dsds'}|
|1|{'col1':'vfdv','col2':'dfds','col3':'sdf'}|
|...|....|
but I am still not able to concatenate based on ID as array_agg only works on a string and I don't know how to proceed
with below query i was able to achieve it
select mm.id ,CAST(array_agg(mm.tt) as Json) from (select nn."id" as id , nn.tt
from(
select k."id", CAST( MAP(Array['col1','col2','col3'],Array [k."col1",
k."col2", k."col3"]) As Json) as tt
from table k order by 1
)nn group by 1,2
)mm group by 1;

Count matches between multiple columns and words in a nested array

My earlier question was resolved. Now I need to develop a related, but more complex query.
I have a table like this:
id description additional_info
-------------------------------------------
123 games XYD
124 Festivals sport swim
And I need to count matches to arrays like this:
array_content varchar[] := {"Festivals,games","sport,swim"}
If either of the columns description and additional_info contains any of the tags separated by a comma, we count that as 1. So each array element (consisting of multiple words) can only contribute 1 to the total count.
The result for the above example should be:
id RID Matches
1 123 1
2 124 2
The answer isn't simple, but figuring out what you are asking was harder:
SELECT row_number() OVER (ORDER BY t.id) AS id
, t.id AS "RID"
, count(DISTINCT a.ord) AS "Matches"
FROM tbl t
LEFT JOIN (
unnest(array_content) WITH ORDINALITY x(elem, ord)
CROSS JOIN LATERAL
unnest(string_to_array(elem, ',')) txt
) a ON t.description ~ a.txt
OR t.additional_info ~ a.txt
GROUP BY t.id;
Produces your desired result exactly.
array_content is your array of search terms.
How does this work?
Each array element of the outer array in your search term is a comma-separated list. Decompose the odd construct by unnesting twice (after transforming each element of the outer array into another array). Example:
SELECT *
FROM unnest('{"Festivals,games","sport,swim"}'::varchar[]) WITH ORDINALITY x(elem, ord)
CROSS JOIN LATERAL
unnest(string_to_array(elem, ',')) txt;
Result:
elem | ord | txt
-----------------+-----+------------
Festivals,games | 1 | Festivals
Festivals,games | 1 | games
sport,swim | 2 | sport
sport,swim | 2 | swim
Since you want to count matches for each outer array element once, we generate a unique number on the fly with WITH ORDINALITY. Details:
PostgreSQL unnest() with element number
Now we can LEFT JOIN to this derived table on the condition of a desired match:
... ON t.description ~ a.txt
OR t.additional_info ~ a.txt
.. and get the count with count(DISTINCT a.ord), counting each array only once even if multiple search terms match.
Finally, I added the mysterious id in your result with row_number() OVER (ORDER BY t.id) AS id - assuming it's supposed to be a serial number. Voilá.
The same considerations for regular expression matches (~) as in your previous question apply:
Postgres query to calculate matching strings

How to get an array in postgres where the array size is greater than 1

I have a table that looks like this:
val | fkey | num
------------------
1 | 1 | 1
1 | 2 | 1
1 | 3 | 1
2 | 3 | 1
What I would like to do is return a set of rows in which values are grouped by 'val', with an array of fkeys, but only where the array of fkeys is greater than 1. So, in the above example, the return would look something like:
1 | [1,2,3]
I have the following query aggregates the arrays:
SELECT val, array_agg(fkey)
FROM mytable
GROUP BY val;
But this returns something like:
1 | [1,2,3]
2 | [3]
What would be the best way of doing this? I guess one possibility would be to use my existing query as a subquery, and do a sum / count on that, but that seems inefficient. Any feedback would really help!
Use Having clause to filter the groups which is having more than fkey
SELECT val, array_agg(fkey)
FROM mytable
GROUP BY val
Having Count(fkey) > 1
Using the HAVING clause as #Fireblade pointed out is probably more efficient, but you can also leverage subqueries:
SQLFiddle: Subquery
SELECT * FROM (
select val, array_agg(fkey) fkeys
from mytable
group by val
) array_creation
WHERE array_length(fkeys,1) > 1
You could also use the array_length function in the HAVING clause, but again, #Fireblade has used count(), which should be more efficient. Still:
SQLFiddle: Having Clause
SELECT val, array_agg(fkey) fkeys
FROM mytable
GROUP BY val
HAVING array_length(array_agg(fkey),1) > 1
This isn't a total loss, though. Using the array_length in the having can be useful if you want a distinct list of fkeys:
SELECT val, array_agg(DISTINCT fkey) fkeys
There may still be other ways, but this method is more descriptive, which may allow your SQL to be easier to understand when you come back to it, years from now.

Ordering by list of strings in Oracle SQL without LISTAGG

I'm working with two entities: Item and Attribute, which look something like the following:
Item
----
itemId
Attribute
---------
attributeId
name
An Item has Attributes, as specified in an association table:
ItemAttribute
--------------
itemId
attributeId
When this data gets to the client, it will be displayed with a row per Item, and each row will have a list of Attributes by name. For example:
Item Attributes
---- ----------
1 A, B, C
2 A, C
3 A, B
The user will have the option to sort on the Attributes column, so we need the ability to sort the data as follows:
Item Attributes
---- ----------
3 A, B
1 A, B, C
2 A, C
At present, we're getting one row of data per ItemAttribute row. Basically:
SELECT Item.itemId,
Attribute.name
FROM Item
JOIN ItemAttribute
ON ItemAttribute.itemId = Item.itemId
JOIN Attribute
ON Attribute.attributeId = ItemAttribute.attributeId
ORDER BY Item.itemId;
Which produces a result like:
itemId name
------ ----
1 A
1 B
1 C
2 A
2 C
3 A
3 B
The actual ORDER BY clause is based on user input. It's usually a single column, so the ordering is simple, and the app-side loop that processes the result set combines the Attribute names into a comma-separated list for presentation on the client. But when the user asks to sort on that list, it'd be nice to have Oracle sort the results so that -- using the example above -- we'd get:
itemId name
------ ----
3 A
3 B
1 A
1 B
1 C
2 A
2 C
Oracle's LISTAGG function can be used to generate the attribute lists prior to sorting; however Attribute.name can be a very long string, and it is possible that the combined list is greater than 4000 characters, which would cause the query to fail.
Is there a clean, efficient way to sort the data in this manner using Oracle SQL (11gR2)?
There are really two questions here:
1) How to aggregate more than 4000 characters of data
Is it even sensible to aggregate so much data and display it in a single column?
Anyway you will need some sort of large structure to display more than 4000 characters, like a CLOB for example. You could write your own aggregation method following the general guideline described in one of Tom Kyte's thread (obviously you would need to modify it so that the final output is a CLOB).
I will demonstrate a simpler method with a nested table and a custom function (works on 10g):
SQL> CREATE TYPE tab_varchar2 AS TABLE OF VARCHAR2(4000);
2 /
Type created.
SQL> CREATE OR REPLACE FUNCTION concat_array(p tab_varchar2) RETURN CLOB IS
2 l_result CLOB;
3 BEGIN
4 FOR cc IN (SELECT column_value FROM TABLE(p) ORDER BY column_value) LOOP
5 l_result := l_result ||' '|| cc.column_value;
6 END LOOP;
7 return l_result;
8 END;
9 /
Function created.
SQL> SELECT item,
2 concat_array(CAST (collect(attribute) AS tab_varchar2)) attributes
3 FROM data
4 GROUP BY item;
ITEM ATTRIBUTES
1 a b c
2 a c
3 a b
2) How to sort large data
Unfotunately you can't sort by an arbitrarily large column in Oracle: there are known limitations relative to the type and the length of the sort key.
Trying to sort with a clob will result in an ORA-00932: inconsistent datatypes: expected - got CLOB.
Trying to sort with a key larger than the database block size (if you decide to split your large data into many VARCHAR2 for example) will yield an ORA-06502: PL/SQL: numeric or value error: character string buffer too small
I suggest you sort by the first 4000 bytes of the attributes column:
SQL> SELECT * FROM (
2 SELECT item,
3 concat_array(CAST (collect(attribute) AS tab_varchar2)) attributes
4 FROM data
5 GROUP BY item
6 ) order by dbms_lob.substr(attributes, 4000, 1);
ITEM ATTRIBUTES
3 a b
1 a b c
2 a c
As Vincent already said, sort keys are limited (no CLOB, max block size).
I can offer a slightly different solution which works out of the box in 10g and newer, without the need for a custom function and type using XMLAgg:
with ItemAttribute as (
select 'name'||level name
,mod(level,3) itemid
from dual
connect by level < 2000
)
,ItemAttributeGrouped as (
select xmlagg(xmlparse(content name||' ' wellformed) order by name).getclobval() attributes
,itemid
from ItemAttribute
group by itemid
)
select itemid
,attributes
,dbms_lob.substr(attributes,4000,1) sortkey
from ItemAttributeGrouped
order by dbms_lob.substr(attributes,4000,1)
;
Clean is subjective, and efficiency would need to be checked (but it's still only hitting the tables once so probably shouldn't be any worse), but if you have a finite upper limit on the number of attributes any item might - or at leads how many you have to consider for ordering - had then you could use multiple lead calls to do this:
SELECT itemId, name
FROM (
SELECT itemId, name, min(dr) over (partition by itemId) as dr
FROM (
SELECT itemId, name,
dense_rank() over (order by name, name1, name2, name3, name4) as dr
FROM (
SELECT Item.itemId,
Attribute.name,
LEAD(Attribute.name, 1)
OVER (PARTITION BY Item.itemId
ORDER BY Attribute.name) AS name1,
LEAD(Attribute.name, 2)
OVER (PARTITION BY Item.itemId
ORDER BY Attribute.name) AS name2,
LEAD(Attribute.name, 3)
OVER (PARTITION BY Item.itemId
ORDER BY Attribute.name) AS name3,
LEAD(Attribute.name, 4)
OVER (PARTITION BY Item.itemId
ORDER BY Attribute.name) AS name4
FROM Item
JOIN ItemAttribute
ON ItemAttribute.itemId = Item.itemId
JOIN Attribute
ON Attribute.attributeId = ItemAttribute.attributeId
)
)
)
ORDER BY dr, name;
So, the inner query is getting the two values you care about, and using four lead calls (just as an example, so this can sort based on a maximum of the first five attribute names, but could of course be extended by adding more!) to get a picture of what else each item has. With your data this gives:
ITEMID NAME NAME1 NAME2 NAME3 NAME4
---------- ---------- ---------- ---------- ---------- ----------
1 A B C
1 B C
1 C
2 A C
2 C
3 A B
3 B
The next query out does a dense_rank over those five ordered attribute names, which assigns a rank to each itemID and name, giving:
ITEMID NAME DR
---------- ---------- ----------
1 A 1
1 B 4
1 C 6
2 A 3
2 C 6
3 A 2
3 B 5
The next query out finds the minimum of those calculated dr values for each itemId, using the analytic version of min, so each itemID=1 gets min(dr) = 1, itemId=2 gets 3, and itemId=3 gets 2. (You could combine those two levels by selecting min(dense_rank(...)) but that's (even) less clear).
The final outer query uses that minimum rank for each item to do the actual ordering, giving:
ITEMID NAME
---------- ----------
1 A
1 B
1 C
3 A
3 B
2 A
2 C