SQL function for returning multiple XPath hits - sql

I have a huge table with a column which contains large XML documents. I want to get all the values of a particular attribute name (Surname), occurring at any point in any of the XML values.
Currently I have this query...
select distinct XmlCol.value('(//#Surname)[1]','varchar(200)') from (
select * from MyTable
)
...it grabs the first occurrence of my desired attribute in each entry of the XML column, however as it only grabs the first, there may be any number of attributes appearing after that occurrence, in the same XML value.
The value() function only works with a single result, hence why I need to provide the [1] specifying return the first hit.
Is there a way to repeat this function to get all the hits in a piece of XML, or is there another function which takes an XPath and can return multiple values?
Illustrated example
In case above is not clear, a simple example would be if MyTable had a single XmlCol column, with just 2 rows.
Row 1
<SimpleXML>
<ArbitraryElement Surname="Smith"/>
<ArbitraryElement>
<ArbitraryInnerElement Surname="Bauer"/>
</ArbitraryElement>
</SimpleXML>
Row 2
<SimpleXML Surname="Bond">
</SimpleXML>
Note the attribute appears at different locations and in different elements, I want it to work with any amount of nested elements.
Currently my method only hits the first element per XML entry, so gives the output:
Smith, Bond
I'd like it to return an arbitrary amount per entry, meaning the result should be:
Smith, Bauer, Bond

You would want to use a CROSS APPLY to achieve this.
select distinct XmlCol.value('.', 'varchar(max)') as [Value]
from MyTable
CROSS APPLY MyTable.XmlCol.nodes('(//#Surname)') as [Table]([Column])

Related

SQL query find few strings in diferent columns in a table row (restrictive)

I have a table like this one (in a SQL SERVER):
field_name
field_descriptor
tag1
tag2
tag3
tag4
tag5
house
your home
home
house
null
null
null
car
first car
car
wheel
null
null
null
...
...
...
...
...
...
...
I'm developing a WIKI with a searchbar, which should be able to handle a query with more than one string for search. As an user enters a second string (spaced) the query should be able to return results that match restrictively the two strings (if exists) in any column, and so with a three string search.
Easy to do for one string with a simple SELECT with ORs.
Tried in the fronted in JS with libraries like match-sorter but it's heavy with a table with more than 100,000 results and more in the future.
I thought the query should do the heavy work, but maybe there is no simple way doing it.
Thanks in advance!
Tried to do the heavy work with all results in frontend with filtering and other libraries like match-sorter. Works but take several seconds and blocks the front.
Tried to create a simple OR/AND query but the posibilities with 3 search-strings (could be 1, 2 or 3) matching any column to any other possibility is overwhelming.
You can use STRING_SPLIT to get a separate row per search word from the search words string. Then only select rows where all search words have a match.
The query should look like this:
select *
from mytable t
where exists
(
select null
from (select value from string_split(#search, ' ')) search
having min(case when search.value in (t.tag1, t.tag2, t.tag3, t.tag4, t.tag5) then 1 else 0 end) = 1
);
Unfortunately, SQL Server seems to have a flaw (or even a bug) here and reports:
Msg 8124 Level 16 State 1 Line 8
Multiple columns are specified in an aggregated expression containing an outer reference. If an expression being aggregated contains an outer reference, then that outer reference must be the only column referenced in the expression.
Demo: https://dbfiddle.uk/kNL1PVOZ
I don't have more time at hand right now, so you may use this query as a starting point to get the final query.

function to sum all first value of Results SQL

I have a table with "Number", "Name" and "Result" Column. Result is a 2D text Array and I need to create a Column with the name "Average" that sum all first values of Result Array and divide by 2, can somebody help me Pls, I must use the create function for this. Its look like this:
Table1
Number
Name
Result
Average
01
Kevin
{{2.0,10},{3.0,50}}
2.5
02
Max
{{1.0,10},{4.0,30},{5.0,20}}
5.0
Average = ((2.0+3.0)/2) = 2.5
= ((1.0+4.0+5.0)/2) = 5.0
First of all: You should always avoid storing arrays in the table (or generate them in a subquery if not extremely necessary). Normalize it, it makes life much easier in nearly every single use case.
Second: You should avoid more-dimensional arrays. The are very hard to handle. See Unnest array by one level
However, in your special case you could do something like this:
demo:db<>fiddle
SELECT
number,
name,
SUM(value) FILTER (WHERE idx % 2 = 1) / 2 -- 2
FROM mytable,
unnest(avg_result) WITH ORDINALITY as elements(value, idx) -- 1
GROUP BY number, name
unnest() expands the array elements into one element per record. But this is not an one-level expand: It expand ALL elements in depth. To keep track of your elements, you could add an index using WITH ORDINALITY.
Because you have nested two-elemented arrays, the unnested data can be used as follows: You want to sum all first of two elements, which is every second (the odd ones) element. Using the FILTER clause in the aggregation helps you to aggregate only exact these elements.
However: If that's was a result of a subquery, you should think about doing the operation BEFORE array aggregation (if this is really necessary). This makes things easier.
Assumptions:
number column is Primary key.
result column is text or varchar type
Here are the steps for your requirements:
Add the column in your table using following query (you can skip this step if column is already added)
alter table table1 add column average decimal;
Update the calculated value by using below query:
update table1 t1
set average = t2.value_
from
(
select
number,
sum(t::decimal)/2 as value_
from table1
cross join lateral unnest((result::text[][])[1:999][1]) as t
group by 1
) t2
where t1.number=t2.number
Explanation: Here unnest((result::text[][])[1:999][1]) will return the first value of each child array (considering you can have up to 999 child arrays in your 2D array. You can increase or decrease it as per your requirement)
DEMO
Now you can create your function as per your requirement with above query.

How can I assign pre-determined codes (1,2,3, etc,) to a JSON-type column in PostgreSQL?

I'm extracting a table of 2000+ rows which are park details. One of the columns is JSON type. Image of the table
We have about 15 attributes like this and we also have a documentation of pre-determined codes assigned to each attribute.
Each row in the extracted table has a different set of attributes that you can see in the image. Right now, I have cast(parks.services AS text) AS "details" to get all the attributes for a particular park or extract just one of them using the code below:
CASE
WHEN cast(parks.services AS text) LIKE '%uncovered%' THEN '2'
WHEN cast(parks.services AS text) LIKE '%{covered%' THEN '1' END AS "details"
This time around, I need to extract these attributes by assigning them the codes. As an example, let's just say
Park 1 - {covered, handicap_access, elevator} to be {1,3,7}
Park 2 - {uncovered, always_open, handicap_access} to be {2,5,3}
I have thought of using subquery to pre-assign the codes, but I cannot wrap my head around JSON operators - in fact, I don't know how to extract them on 2000+ rows.
It would be helpful if someone could guide me in this topic. Thanks a lot!
You should really think about normalizing your tables. Don't store arrays. You should add a mapping table to map the parks and the attribute codes. This makes everything much easier and more performant.
step-by-step demo:db<>fiddle
SELECT
t.name,
array_agg(c.code ORDER BY elems.index) as codes -- 3
FROM mytable t,
unnest(attributes) WITH ORDINALITY as elems(value, index) -- 1
JOIN codes c ON c.name = elems.value -- 2
GROUP BY t.name
Extract the array elements into one record per element. Add the WITH ORDINALITY to save the original order.
Join your codes on the elements
Create code arrays. To ensure the correct order, you can use the index values created by the WITH ORDINALITY clause.

SQL: Automate the data extraction of categories only that have different character size

I want the following results from the dataset below. I tried using LEFT Function but it will not work as the number of characters are different in every row. I have 741 rows like this but I only need rows with only categories, subcategories are not required.
Dining->Breakfast
Dining->Casual_Restaurants
Entertainment->Movie_Theaters
Entertainment->Professional_Sports_Venues
DATASET:
Dining->Breakfast
Dining->Breakfast->Casual_Restaurants
Dining->Breakfast->QSR_Restaurants
Dining->Breakfast->QSR_Restaurants->Chick_Fil_A
Dining->Casual_Restaurants
Dining->Casual_Restaurants_PIQonly
Dining->Casual_Restaurants->Applebees
Entertainment->Movie_Theaters
Entertainment->Movie_Theaters->AMC_Theaters
Entertainment->Movie_Theaters->Carmike_Cinema
Entertainment->Professional_Sports_Venues
Entertainment->Professional_Sports_Venues->MLB_Stadiums
Entertainment->Professional_Sports_Venues->MLS_Stadiums
One approach would be to exclude all records with more than 2 levels from your result:
create table #a (cat varchar(500))
insert into #a (cat)
values ('Dining->Breakfast')
,('Dining->Breakfast->Casual_Restaurants')
,('Dining->Breakfast->QSR_Restaurants')
,('Dining->Breakfast->QSR_Restaurants->Chick_Fil_A')
,('Dining->Casual_Restaurants')
,('Dining->Casual_Restaurants_PIQonly')
,('Dining->Casual_Restaurants->Applebees')
,('Entertainment->Movie_Theaters')
,('Entertainment->Movie_Theaters->AMC_Theaters')
,('Entertainment->Movie_Theaters->Carmike_Cinema')
,('Entertainment->Professional_Sports_Venues')
,('Entertainment->Professional_Sports_Venues->MLB_Stadiums')
,('Entertainment->Professional_Sports_Venues->MLS_Stadiums')
SELECT cat
FROM #a
WHERE cat NOT LIKE '%->%->%'
In the above query we are using a Like pattern to exclude records that have more than 2 levels (identified by having -> appear twice in the list.
The below sample code uses CharIndex to find the start of the second '->' in the string. To ensure that there is always a second '->' it appends it to the end of the string.
This logic does presume that all categories contain at least 2 levels. If not then you will need to tweak the logic (possibly using a case statement to test for how many levels there are).
DECLARE #val varchar(200) = 'Dining->Breakfast->QSR_Restaurants->Chick_Fil_A'
SELECT LEFT(#val+'->', CHARINDEX('->', #val+'->', CHARINDEX('->', #val+'->', 0) + 2)-1)

Working with the SQL Server XML data type

I've got a table which has a XML field.
The typical XML it contains is;
<things>
<Fruit>
<imageId>39</imageId>
<title>Apple</title>
</Fruit>
<Fruit>
<imageId>55</imageId>
<title>Pear</title>
</Fruit>
<Fruit>
<imageId>76</imageId>
<title>Grape</title>
</Fruit>
</things>
In my table i've got around 50 rows, i'm only concerned with two fields, omId (int primary key) and omText (my xml data).
What i'm trying to achieve is a way of saying, across all xml data in the whole table... give me all of the xmlElements where the title is X. Or give me a count of all items that use an imageId of 55.
I'm using the XML data type VALUE and QUERY functions to retrieve the data.
select omID,
omText.query('/things/Fruit')
,cast('<results>' + cast(omText.query('/things/Fruit') as varchar(max)) + '</results>' as xml) as Value
from dbo.myTable
where omText.value('(/things/Fruit/imageId)[1]', 'int') = 76
Which only works where the id i'm searching for is the first one in the document. It doesn't seem to search all of the xml.
Fundamentally the resultset comes back with one row for each entry in the TABLE, wheras i think i need to have one row for each matched ELEMENT... Not exactly sure how to start writing a group-by for this tho.
I'm starting to feel like i'm making this harder than it needs to be...... thoughts & ideas please.
What i'm trying to achieve is a way of saying, across all
xml data in the whole table... give me all of the xmlElements
where the title is X.
Not sure if I totally understood your question here - or are you looking for this? You would grab all the /things/Fruit elements a "nodes" and cross join them against your "base data" in the myTable - the result would be one row per XML element in your XML data field:
select
omID,
T.Fruit.query('.')
from
dbo.myTable
cross apply
omText.nodes('/things/Fruit') as T(Fruit)
where
T.Fruit.value('(title)[1]', 'varchar(50)') = 'X'
Or give me a count of all items that use an imageId of 55.
select
count(*)
from
dbo.myTable
cross apply
omText.nodes('/things/Fruit') as T(Fruit)
where
T.Fruit.value('(imageId)[1]', 'int') = 55
Is that what you're looking for?
Marc