How can I assign pre-determined codes (1,2,3, etc,) to a JSON-type column in PostgreSQL? - sql

I'm extracting a table of 2000+ rows which are park details. One of the columns is JSON type. Image of the table
We have about 15 attributes like this and we also have a documentation of pre-determined codes assigned to each attribute.
Each row in the extracted table has a different set of attributes that you can see in the image. Right now, I have cast(parks.services AS text) AS "details" to get all the attributes for a particular park or extract just one of them using the code below:
CASE
WHEN cast(parks.services AS text) LIKE '%uncovered%' THEN '2'
WHEN cast(parks.services AS text) LIKE '%{covered%' THEN '1' END AS "details"
This time around, I need to extract these attributes by assigning them the codes. As an example, let's just say
Park 1 - {covered, handicap_access, elevator} to be {1,3,7}
Park 2 - {uncovered, always_open, handicap_access} to be {2,5,3}
I have thought of using subquery to pre-assign the codes, but I cannot wrap my head around JSON operators - in fact, I don't know how to extract them on 2000+ rows.
It would be helpful if someone could guide me in this topic. Thanks a lot!

You should really think about normalizing your tables. Don't store arrays. You should add a mapping table to map the parks and the attribute codes. This makes everything much easier and more performant.
step-by-step demo:db<>fiddle
SELECT
t.name,
array_agg(c.code ORDER BY elems.index) as codes -- 3
FROM mytable t,
unnest(attributes) WITH ORDINALITY as elems(value, index) -- 1
JOIN codes c ON c.name = elems.value -- 2
GROUP BY t.name
Extract the array elements into one record per element. Add the WITH ORDINALITY to save the original order.
Join your codes on the elements
Create code arrays. To ensure the correct order, you can use the index values created by the WITH ORDINALITY clause.

Related

How to Query JSON Within A Database

I would like to query information from databases that were created in this format:
index
label
key
data
1
sneaker
UPC
{“size”: “value”, “color”: “value”, “location”: “shelf2”}
2
location
shelf2
{“height”: “value”, “row”: “value”, “column”: “value”}
Where a large portion of the data is in one cell stored in a json array. To make matters a bit tricky, the attributes in json aren’t in any particular order, and sometimes they reference other cells. Ie in the above example there is a “location” attribute which has more data in another row. Additionally sometimes the data cell is a multidimensional array where values are nested inside another json array.
I’m seeking to do certain query tasks like
Find all locations that have a sneaker
Or find all sneakers with a particular color etc
What’s the industry accepted solution on how to do this?
These are sqlite databases that I’m currently using DB Browser for SQLite to query. Definitely open to better solutions if they exist.
The design that you have needs SQLite's JSON1 extension.
The tasks that you mention in your question can be accomplished with the use of functions like json_extract().
Find all locations that have a sneaker
SELECT t1.*
FROM tablename t1
WHERE t1.label = 'location'
AND EXISTS (
SELECT 1
FROM tablename t2
WHERE t2.label = 'sneaker'
AND json_extract(t2.data, '$.location') = t1.key
)
Find all sneakers with a particular color
SELECT *
FROM tablename
WHERE label = 'sneaker'
AND json_extract(data, '$.color') = 'blue'
See the demo.
For more complicated tasks, such as getting values out of json arrays there are other functions like json_each().

function to sum all first value of Results SQL

I have a table with "Number", "Name" and "Result" Column. Result is a 2D text Array and I need to create a Column with the name "Average" that sum all first values of Result Array and divide by 2, can somebody help me Pls, I must use the create function for this. Its look like this:
Table1
Number
Name
Result
Average
01
Kevin
{{2.0,10},{3.0,50}}
2.5
02
Max
{{1.0,10},{4.0,30},{5.0,20}}
5.0
Average = ((2.0+3.0)/2) = 2.5
= ((1.0+4.0+5.0)/2) = 5.0
First of all: You should always avoid storing arrays in the table (or generate them in a subquery if not extremely necessary). Normalize it, it makes life much easier in nearly every single use case.
Second: You should avoid more-dimensional arrays. The are very hard to handle. See Unnest array by one level
However, in your special case you could do something like this:
demo:db<>fiddle
SELECT
number,
name,
SUM(value) FILTER (WHERE idx % 2 = 1) / 2 -- 2
FROM mytable,
unnest(avg_result) WITH ORDINALITY as elements(value, idx) -- 1
GROUP BY number, name
unnest() expands the array elements into one element per record. But this is not an one-level expand: It expand ALL elements in depth. To keep track of your elements, you could add an index using WITH ORDINALITY.
Because you have nested two-elemented arrays, the unnested data can be used as follows: You want to sum all first of two elements, which is every second (the odd ones) element. Using the FILTER clause in the aggregation helps you to aggregate only exact these elements.
However: If that's was a result of a subquery, you should think about doing the operation BEFORE array aggregation (if this is really necessary). This makes things easier.
Assumptions:
number column is Primary key.
result column is text or varchar type
Here are the steps for your requirements:
Add the column in your table using following query (you can skip this step if column is already added)
alter table table1 add column average decimal;
Update the calculated value by using below query:
update table1 t1
set average = t2.value_
from
(
select
number,
sum(t::decimal)/2 as value_
from table1
cross join lateral unnest((result::text[][])[1:999][1]) as t
group by 1
) t2
where t1.number=t2.number
Explanation: Here unnest((result::text[][])[1:999][1]) will return the first value of each child array (considering you can have up to 999 child arrays in your 2D array. You can increase or decrease it as per your requirement)
DEMO
Now you can create your function as per your requirement with above query.

SQL function for returning multiple XPath hits

I have a huge table with a column which contains large XML documents. I want to get all the values of a particular attribute name (Surname), occurring at any point in any of the XML values.
Currently I have this query...
select distinct XmlCol.value('(//#Surname)[1]','varchar(200)') from (
select * from MyTable
)
...it grabs the first occurrence of my desired attribute in each entry of the XML column, however as it only grabs the first, there may be any number of attributes appearing after that occurrence, in the same XML value.
The value() function only works with a single result, hence why I need to provide the [1] specifying return the first hit.
Is there a way to repeat this function to get all the hits in a piece of XML, or is there another function which takes an XPath and can return multiple values?
Illustrated example
In case above is not clear, a simple example would be if MyTable had a single XmlCol column, with just 2 rows.
Row 1
<SimpleXML>
<ArbitraryElement Surname="Smith"/>
<ArbitraryElement>
<ArbitraryInnerElement Surname="Bauer"/>
</ArbitraryElement>
</SimpleXML>
Row 2
<SimpleXML Surname="Bond">
</SimpleXML>
Note the attribute appears at different locations and in different elements, I want it to work with any amount of nested elements.
Currently my method only hits the first element per XML entry, so gives the output:
Smith, Bond
I'd like it to return an arbitrary amount per entry, meaning the result should be:
Smith, Bauer, Bond
You would want to use a CROSS APPLY to achieve this.
select distinct XmlCol.value('.', 'varchar(max)') as [Value]
from MyTable
CROSS APPLY MyTable.XmlCol.nodes('(//#Surname)') as [Table]([Column])

How to calculate TF-IDF in OracleSQL?

This is a text mining project. The purpose of this project is to see how every word weighs differently in a different document.
Now I am having two tables, one table with TF information (WORD | WordFrequency_in_EachFile), another table with IDF (WORD | HowManyFile_have_EachWord). I am not sure what query use for this calculation.
The math I am trying to do here is:
WordFrequency_in_EachFile*(log(N/HowManyFile_have_EachWord)+1)
N is the total number of document.
Below is my code:
create table TF_IDF (WORD, TF*IDF) as
select A.frequency*((log(10,132366/B.totalcount)+1))
from term_frequency A, document_frequency B
where A.WORD=B.WORD;
Here 1323266 is the total number of my documents, and totalcount is how many documents a word shows.
Since I am new to SQL, I would appreciate a little explanation to your code. Thanks a lot!
Calculation looks good, but there are some invalid syntax.
Right variant may look like below:
create table TF_IDF as
select
A.Word as Word,
A.frequency*( log(10, 132366/B.totalcount) + 1) as TFIDF
from
term_frequency A,
document_frequency B
where
A.WORD=B.WORD
;
In CREATE ... AS SELECT ... statement you don't need column specifications. Column names and types derived from field aliases.
Also, you must provide values for Word column in new table.
And one more point: there are one excess pair of brackets in expression.

Find a string within an array column in PostgreSQL

I have built a series of views in a PostgreSQL database that includes a couple of array columns. The view definition is as follows:
create view articles_view as
(select articles.*,
array(select row(people.*)::people
from people
where articles.spubid=people.spubid and
people.stype='Author' and
bactive='t'
order by people.iorder) as authors,
array(select row(people.*)::people
from people
where articles.spubid=people.spubid and
people.stype='Editor' and
bactive='t'
order by people.iorder) as editors,
array(select row(people.*)::people
from people
where articles.spubid=people.spubid and
people.stype='Reviewer' and
bactive='t'
order by people.iorder) as reviewers,
array(select row(status.*)::status
from status
where articles.spubid=status.spubid and
bactive='t') as status
from articles
where articles.bactive='t');
Essentially what I want to do is an iLike on the 'author' column to determine if a specific user id exists in that array. Obviously I can't use iLike on that datatype so I need to find another approach.
Here is an example of data in the 'authors' array:
{"(2373,t,f,f,\"2011-08-01
11:57:40.696496\",/Pubs/pubs_edit_article.php,\"2011-08-09
15:36:29.281833\",000128343,A00592,Author,1,Nicholas,K.,Kreidberg,\"\",123456789,t,Admin,A,A,A,0,\"\")","(2374,t,f,f,\"2011-08-01
11:57:40.706617\",/Pubs/pubs_edit_article.php,\"2011-08-09
15:36:29.285428\",000128343,A00592,Author,2,John,D.,Doe,\"\",234567890,t,IT,A,A,A,0,\"\")","(2381,t,f,f,\"2011-08-09
14:45:14.870418\",000128343,\"2011-08-09
15:36:29.28854\",000128343,A00592,Author,3,Jane,E,Doe,\"\",345678901,t,Admin,A,A,A,,\"\")","(2383,t,f,f,\"2011-08-09
15:35:11.845283\",567890123,\"2011-08-09
15:36:29.291388\",000128343,A00592,Author,4,Test,T,Testerton,\"\",TestTesterton,f,N/A,A,A,A,,\"\")"}
What I want to be able to do is a query the view and find out if the string '123456789' (that is the user id assigned to Nicholas Kreidberg in the array) exists in the array. I don't care which user it is assigned to or where it appears in the array, all I need to know is if '123456789' shows up anywhere in the array.
Once I know how to write a query that determines if the condition above is true then my application will simply execute that query and if rows are returned it will know that the user id passed to the query is an author for that publication and proceed accordingly.
Thanks in advance for any insight that can be provided on this topic.
Might this:
select ...
from ...
where ...
and array_to_string(authors, ', ') like '%123456789%';`
do the trick?
Otherwise, there is the unnest function...
The "Array Functions and Operators" chapter has more details.
The ANY() function can do the job for you:
SELECT * FROM people WHERE '123456789' = ANY(authors);
Given people.authors is of type text[].