Parsing through free text field in SQL - sql

I have a field in my table that has different types of data in it which can be translated into different columns or rows.
Free Text Field
TypeID
{Visit Info}[<Visit Date>2022-01-01</Visit Date><Visit Type>Clinical</Visit Type>]{/Visit Info}{Costs}[<Laboratory>30.91</Laboratory><Encounter>15.00</Encounter>]{/Costs}
1
{Index Events}[<Date>2022-03-04</Date><Diagnosis>I10</Diagnosis>]{/Index Events}
2
{Visit Info}[<Visit Date>2022-10-12</Visit Date><Visit Type>Administrative</Visit Type>]{/Visit Info}{Costs}[<Consultation>25.00</Consultation>]{/Costs}
1
The idea is that the data is enclosed in {Category}, <Subcategory> and then the data related to the <Subcategory>. I need to write a query that returns the data like in the following result set:
TypeID
Category
Subcategory
Result
1
Visit Info
Visit Date
2022-01-01
1
Visit Info
Visit Type
Clinical
1
Costs
Laboratory
30.91
1
Costs
Encounter
15.00
2
Index Events
Date
2022-03-04
2
Index Events
Diagnosis
I10
1
Visit Info
Visit Date
2022-10-12
1
Visit Info
Visit Type
Administrative
1
Costs
Consultation
25.00
I'd appreciate it if I could be pointed to the right direction in terms of what functions to use.
I'm using SSMS on SQL Server 2016
For anyone wondering, I ended up replacing the "{" and "}" values with "<" and ">" respectively and removing the "[" and "]" values.
After this I cast the field as XML and used XML nodes to parse through the field, like so:
SELECT
TypeID,
UniqueID,
[Free Text Field],
tbl.col.value('local-name(..)','VARCHAR(MAX)') AS Name,
tbl.col.value('local-name(.)','VARCHAR(MAX)') AS Name,
tbl.col.value('.[1]','VARCHAR(MAX)') AS Value
FROM [My Table] a
CROSS APPLY [My Table].[Free Text Field].nodes('/*/*') AS tbl(col)

If your data looked like this then it would be valid XML and would be easy to parse with the XML parser
Free Text Field
TypeID
<Visit Info><Visit Date>2022-01-01</Visit Date><Visit Type>Clinical</Visit Type></Visit Info><Costs><Laboratory>30.91</Laboratory><Encounter>15.00</Encounter></Costs>
1
<Index Events><Date>2022-03-04</Date><Diagnosis>I10</Diagnosis></Index Events>
2
<Visit Info><Visit Date>2022-10-12</Visit Date><Visit Type>Administrative</Visit Type></Visit Info><Costs><Consultation>25.00</Consultation></Costs>
1

You can do a lot with substring, like:
WITH cte AS (
SELECt 'Visit Info' as category, 'Visit Date' as subcategory UNION
SELECt 'Costs' as category, 'Laboratory' as subcategory
)
SELECT
typeid,
category,
subcategory,
SUBSTRING(ss,y1+len(subcategory)+2,y2-(y1+len(subcategory)+2)) as value
FROM (
SELECT
typeid,
category,
subcategory,
SUBSTRING(free,x1+len(category)+2,x2-(x1+len(category)+2)) as ss,
CHARINDEX(CONCAT('<',subcategory,'>'),SUBSTRING(free,x1+(len(category)+2),x2-x1+len(category)+2+1)) as y1,
CHARINDEX(CONCAT('</',subcategory,'>'),SUBSTRING(free,x1+(len(category)+2),x2-x1+len(category)+2+1)) as y2
FROM (
SELECT
typeid,
category,
free,
subcategory,
CHARINDEX(CONCAT('{',category,'}'),free) as x1,
CHARINDEX(CONCAT('{/',category,'}'),free) as x2
FROM test
CROSS APPLY cte )x
WHERE x1<>0 and x2>x1
) y
WHERE y1<>0 and y2>y1 ;
The other category and subcategory's can be done in the same way
Output of this:
typeid
category
subcategory
value
1
Costs
Laboratory
30.91
1
Visit Info
Visit Date
2022-01-01
1
Visit Info
Visit Date
2022-10-12
see: DBFIDDLE

Related

Merge the results from multiple tables side by side in PostgreSQL

I have 3 tables using with I need to make a resulting table.
Scenario:
The table 'Incoming Sentences' contains the steram of sentences
flowing in to the database.
The table 'tagged_sentences' contains
the sentences from 'incoming_sentences' which are tagged/labelled by the editor. Sometimes the admin overwrites the label if editor makes any mistake in labelling the data. Admin labelled data is final and considered to be correct.
The table 'accounts' contain the user's account level information
Below are the tables with sample information.
Incoming Sentences
id
sentence
market
model_identified_intent
tagged_at
1
abcd
en_in
alphabets
12/12/2021
2
1234
en_in
numeric
11/13/2021
3
a1b2
en_in
alphaNumeric
10/14/2021
4
efgh
en_in
alphabets
10/15/2021
5
e5f6
en_in
alphaNumeric
11/16/2021
Tagged Sentences
id
tagger_id
sentence_id
tagger_tagged_intent
1
32
1
alphabets
2
32
2
alphabets
3
32
3
Numeric
4
33
2
Numeric
5
33
3
alphaNumeric
User Account Table
id
user_role
email
name
32
editor
editor#editor.com
editor123
33
admin
admin#admin.com
admin456
Expected Output:
I want to pull the result as 'total tagged senteces per month' in one column and 'total corrections per month by the admin'. Through which the error rate can be known.
year-month
total_tagged
Total Error (Corrected by admin)
2021-10
2
1
2021-11
2
1
2021-12
1
0
Requesting your help in solving this. I tried the below code. But it isn't working as expected.
WITH cte1 AS (SELECT tggs.id id,
tggs.sentence AS sentence,
tggs.market AS market,
tggs.prod_identified_intent AS prod_identified_intent,
tggs.tagged_at AS tagged_at,
ROW_NUMBER() OVER (PARTITION BY tagged_at) AS rn
FROM tagging_sentences tggs),
cte2 AS (SELECT tgds.sentence_id_id AS sentence_id,
tgds.tagger_id_id AS tagger_id,
tgds.tagged_intent AS tagged_intent
FROM tagged_sentences tgds),
cte3 AS (SELECT acts.id AS account_id, acts.email AS email, acts.role AS role FROM accounts AS acts),
cte4 AS (SELECT tggs.tagged_at, COUNT(*) AS count, ROW_NUMBER() OVER (PARTITION BY count(*)) AS rn
FROM tagging_sentences AS tggs
JOIN tagged_sentences AS tgds ON tggs.id = tgds.sentence_id_id
JOIN accounts acts ON tgds.tagger_id_id = acts.id
WHERE tgds.tagger_id_id = 33
AND tgds.sentence_id_id IN (SELECT tagging_sentences.id
FROM tagging_sentences,
tagged_sentences
WHERE tagged_sentences.tagger_id_id = 32) GROUP BY tagged_at)
SELECT TO_CHAR(cte1.tagged_at, 'YYYY-MM'),
COUNT(cte1.sentence), cte4.count
FROM cte1
JOIN cte2 ON cte1.id = cte2.sentence_id
JOIN cte3 ON cte2.tagger_id = cte3.account_id
JOIN cte4 ON cte1.rn = cte4.rn
GROUP BY TO_CHAR(cte1.tagged_at, 'YYYY-MM'), TO_CHAR(cte4.tagged_at, 'YYYY-MM'), cte4.count;
I started trying to determine just where your initial query went awry, but there are still too many inconsistencies; columns names not defined, tables not defined, etc/ And I was not sure what all the CTEs were for. A couple seem to do nothing bu "convert" a table into a CTE (absolutely not necessary). And to attempt getting ROW_NUMBER() results to match (highly doubtful at any rate). The other issue is the magic numbers (id = 32, 33), what happens when there is another Editor and/or Admin? It just seemed overly complex.
So with that I undertook a rewrite. First I abandoned the CTE approach and used just simple Joins. An Inner Join between Incoming Sentences and Tagged Sentences and an Outer Join between Tagged Sentences and User Accounts. The twist being 2 Outer Joins, 1 getting editor role the other getting admin role. Thus removing the dependency on the magic numbers. With these in place the only thing that remained was simple counting: ( see demo )
select date_trunc( 'month', i.tagged_at) "Year Month"
, count(uae.user_role) +
count(uaa.user_role) "Total Tagged"
, count(uaa.user_role) "Total Error (Corrected by admin)"
from incoming_sentences i
join tagged_sentences t
on (t.sentence_id = i.id)
left join user_accounts uae
on ( uae.id = t.tagger_id
and uae.user_role = 'editor'
)
left join user_accounts uaa
on ( uaa.id = t.tagger_id
and uaa.user_role= 'admin'
)
group by date_trunc( 'month', i.tagged_at)
order by date_trunc( 'month', i.tagged_at);
Note: Demo includes addition of another Editor and Admin. It also shows the case where editor and admin take action in different month.
Good luck with it.

SQL - Query to split original sort

I hope my title is ok as I really don’t know how to call it.
Anyway, I have a table with the following :
ID - Num (Primary Key)
Category - VarChar
Name - VarChar
DateForName - Date
Data looks like that :
1 100 111 31/12/2017
2 101 210 30/12/2017
3 100 112 29/12/2017
4 101 203 27/12/2017
5 100 117 20/12/2017
6 103 425 08/12/2017
To generate this table, I just sorted by date DESC.
Is there a way to add a new column with the order per Category like :
1 100|1
2 101|1
3 100|2
4 101|2
5 100|3
6 103|1
Max
You want analytical function row_number():
select t.*
from (select *, row_number() over (partition by Category order by date desc) Seq
from table
) t
order by id;
Yes, SQL has a couple options for you to add a column that is populated with a ranking of the rows based on the category and id columns.
If you just want to add a column to the select statement, I recommend using the RANK() function.
See more details here:
https://learn.microsoft.com/en-us/sql/t-sql/functions/rank-transact-sql?view=sql-server-2017
For your current table, try the following select statement:
SELECT
[ID],
[Category],
[Name],
[DateForName],
RANK() OVER (PARTITION BY [Category] ORDER BY [DateForName] DESC) AS [CategoryOrder]
FROM [TableName]
Alternatively, if you want to add a permanent column (aka a field) to the existing table, I recommend treating this as a calculated column. See more information here:
https://learn.microsoft.com/en-us/sql/relational-databases/tables/specify-computed-columns-in-a-table?view=sql-server-2017
Because the new column would be completely based on two pre-existing columns and only those two columns. SQL can do a great job maintaining this for you.
Hope this helps!

SQL query to get lowest 3 values per column grouping by another column

So i have a table that has a set of information like this
name Type PRICE
11111 XX 0.001
22222 YY 0.002
33333 ZZ 0.0001
11111 YY 0.021
11111 ZZ 0.0111
77777 YY 0.1
77777 ZZ 1.2
Now these numbers go on for about a million rows and there could be upwards of 20 of the same 'name' mapping to 20 different TYPE. But there will only be 1 unique type per name. What I mean by this is that 11111 could have XX,YY,ZZ on it but it cannot have YY,ZZ,YY on it.
What I need is to get the lowest 3 prices and what TYPE they are per name.
Right now I can get the lowest price per name by doing:
select name, type, min(price) from table group by name;
However that is just for the lowest price but I need the lowest 3 prices. I've been trying for a couple days and I cant seem to get it. All help is appreciated.
Also, please let me know if I forgot any information, i'm still trying to figure out stack overflow :P
Oh and the database is a noSQL that uses SQL syntax.
edit: I can't seem to get the format down for my example data from my table to show correctly
If your database supports window functions, and allowing for the possibility that there may be more than three rows in your data with any of the three lowest prices, this should do it:
select the_table.*
from
the_table
inner join (
select name, price
from (
select name, price, row_number() over(partition by name order by price) as rn
from the_table) as x
where rn < 4
) as y on y.name=the_table.name and y.price=the_table.price;

Total Sum SQL Server

I have a query that collects many different columns, and I want to include a column that sums the price of every component in an order. Right now, I already have a column that simply shows the price of every component of an order, but I am not sure how to create this new column.
I would think that the code would go something like this, but I am not really clear on what an aggregate function is or why I get an error regarding the aggregate function when I try to run this code.
SELECT ID, Location, Price, (SUM(PriceDescription) FROM table GROUP BY ID WHERE PriceDescription LIKE 'Cost.%' AS Summary)
FROM table
When I say each component, I mean that every ID I have has many different items that make up the general price. I only want to find out how much money I spend on my supplies that I need for my pressure washers which is why I said `Where PriceDescription LIKE 'Cost.%'
To further explain, I have receipts of every customer I've worked with and in these receipts I write down my cost for the soap that I use and the tools for the pressure washer that I rent. I label all of these with 'Cost.' so it looks like (Cost.Water), (Cost.Soap), (Cost.Gas), (Cost.Tools) and I would like it so for Order 1 it there's a column that sums all the Cost._ prices for the order and for Order 2 it sums all the Cost._ prices for that order. I should also mention that each Order does not have the same number of Costs (sometimes when I use my power washer I might not have to buy gas and occasionally soap).
I hope this makes sense, if not please let me know how I can explain further.
`ID Location Price PriceDescription
1 Park 10 Cost.Water
1 Park 8 Cost.Gas
1 Park 11 Cost.Soap
2 Tom 20 Cost.Water
2 Tom 6 Cost.Soap
3 Matt 15 Cost.Tools
3 Matt 15 Cost.Gas
3 Matt 21 Cost.Tools
4 College 32 Cost.Gas
4 College 22 Cost.Water
4 College 11 Cost.Tools`
I would like for my query to create a column like such
`ID Location Price Summary
1 Park 10 29
1 Park 8
1 Park 11
2 Tom 20 26
2 Tom 6
3 Matt 15 51
3 Matt 15
3 Matt 21
4 College 32 65
4 College 22
4 College 11 `
But if the 'Summary' was printed on every line instead of just at the top one, that would be okay too.
You just require sum(Price) over(Partition by Location) will give total sum as below:
SELECT ID, Location, Price, SUM(Price) over(Partition by Location) AS Summed_Price
FROM yourtable
WHERE PriceDescription LIKE 'Cost.%'
First, if your Price column really contains values that match 'Cost.%', then you can not apply SUM() over it. SUM() expects a number (e.g. INT, FLOAT, REAL or DECIMAL). If it is text then you need to explicitly convert it to a number by adding a CAST or CONVERT clause inside the SUM() call.
Second, your query syntax is wrong: you need GROUP BY, and the SELECT fields are not specified correctly. And you want to SUM() the Price field, not the PriceDescription field (which you can't even sum as I explained)
Assuming that Price is numeric (see my first remark), then this is how it can be done:
SELECT ID
, Location
, Price
, (SELECT SUM(Price)
FROM table
WHERE ID = T1.ID AND Location = T1.Location
) AS Summed_Price
FROM table AS T1
to get exact result like posted in question
Select
T.ID,
T.Location,
T.Price,
CASE WHEN (R) = 1 then RN ELSE NULL END Summary
from (
select
ID,
Location,
Price ,
SUM(Price)OVER(PARTITION BY Location)RN,
ROW_number()OVER(PARTITION BY Location ORDER BY ID )R
from Table
)T
order by T.ID

SQL Combing the top 2 field values into 1 value

I have a very simple query that returns the Notes field. Since there can be multiple notes, I only want the top 2. No problem. However, I'm going to be using the sql within another query. I really don't want 2 lines in my results. I would like to combine the results into 1 field value so I only have 1 result line in the results. Is this possible?
For example, I currently get the following:
12345 1001 500.00 "Note 1"
12345 1001 500.00 "Note 2"
What I would like to see is this:
12345 1001 500.00 "Note 1 AND Note 2"
Following is the sql:
select top 2 rcai.field_value
from rnt_agrs ra
inner join rnt_agr_inv_notes rain on ra.rnt_agr_nbr=rain.rea_rnt_agr_nbr
inner join RNT_CUST_ADDNL_INFO rcai on rain.rea_rnt_agr_nbr=rcai.rea_rnt_agr_nbr and rain.bac_acc_id=rcai.bac_acct_id
where ra.rnt_agr_nbr=128260511
Thanks for your help. I appreciate this forum for help with these issues.....
Get the next row's value and filter all but the first row:
select ..., rcai.field_value || ' AND '
min(rcai.field_value) -- next row's value (same as LEAD in Standard SQL)
over (partition by ra.rnt_agr_nbr
order by rcai.field_value
rows between 1 following and 1 following) as next_field_value
from rnt_agrs ra
inner join rnt_agr_inv_notes rain on ra.rnt_agr_nbr=rain.rea_rnt_agr_nbr
inner join RNT_CUST_ADDNL_INFO rcai on rain.rea_rnt_agr_nbr=rcai.rea_rnt_agr_nbr and rain.bac_acc_id=rcai.bac_acct_id
where ra.rnt_agr_nbr=128260511
qualify
row_number() -- only the first row
over (partition by ra.rnt_agr_nbr
order by rcai.field_value) = 1
If there might be only a single row you need to add a COALESCE(min...,'') to get rid of the NULL.
Both OLAP functions specify the same PARTITION and ORDER, so this is a single working step.
select *,(SELECT top 2 rcai.field_value + ' AND ' AS [text()]
FROM RNT_CUST_ADDNL_INFO rcai
WHERE rcai.rea_rnt_agr_nbr = rain.rea_rnt_agr_nbr
AND rcai.bac_acct_id=rain.bac_acc_id
FOR XML PATH('')) AS Notes
from
rnt_agrs ra inner join rnt_agr_inv_notes rain
on ra.rnt_agr_nbr=rain.rea_rnt_agr_nbr
I had something like this, where there was a 1 to many, and I wanted a semicolon delimited set of values in a single column with the main record.
You could use PIVOT to transform the two note rows into two note columns based on row number, then concatenate them. Here's an example:
SELECT pvt.[1] + ' and ' + pvt.[2]
FROM
( --the selection of your table data, including a row-number column
SELECT Msg, ROW_NUMBER() OVER(ORDER BY Id)
--sample data shown here, but this would be your real table
FROM (VALUES(1, 'Note 1'), (2, 'Note 2'), (3, 'Note 3')) Note(Id, Msg)
) Data (Msg, Row)
PIVOT (MAX(Msg) FOR Row IN ([1], [2])) pvt
Note that MAX is used for the aggregate in the PIVOT since an aggregate is required, but since ROW_NUMBER is unique, you're only aggregating a single value.
This could also be easily extended to the first N rows - just include the row numbers you want in the pivot and combine them as desired in the select statement.