Separating out key value pairs

Separating out key value pairs - sql

In T-SQL how can one separate columns; one with the key and the other with the value for strings that follow the pattern below?
Examples of the strings that need to be processed are:
country_code: "US"province_name: "NY"city_name: "Old Chatham"
postal_code: "11746-8031"country_code: "US"province_name: "NY"street_address: "151 Millet Street North"city_name: "Dix Hills"
street_address: "1036 Main Street, Holbrook, NY 11741"
Desired outcome for example 1 would be:
Key
Value
country_code
US
province_name
NY
city_name
Old Chatham

Nice to see Old Chatham ... a little touch of home
My first thought was to "correct" the JSON string, but that got risky.
Here is an option that will parse and pair the key/values
Example or dbFiddle
Select A.*
,C.*
From YourTable A
Cross Apply ( values ( replace(replace(replace(SomeCol,'"',':'),': :',':'),'::',':') ) ) B(CleanString)
Cross Apply (
Select [Key] =max(case when Seq=1 then Val end)
,[Value]=max(case when Seq=0 then Val end)
From (
Select Seq = row_number() over (order by [Key]) % 2
,Grp = (row_number() over (order by [Key])-1) / 2
,Val = Value
From OpenJSON( '["'+replace(string_escape(CleanString,'json'),':','","')+'"]' )
Where ltrim(Value)<>''
) C1
Group By Grp
) C
Results

Related

How to Group By in Big Query

I have record data structure in the BQ, when i run the following query my output is as follow:
Query : SELECT v.key, v.value from table unnest(dimensions.key_value) v;
key value
region region1
loc location1
region region1
loc location1
region region2
loc location2
Now i want to do group by using region and location so my output will be as follow:
groupBy Count
region1,location1 2
region2,location2 1
If i need to do group by using only one key then it would be a simple query:
SELECT v.key, count(*) from table, unnest(dimensions.key_value) v group by v.key;
But how to do for more than one key ?

Maybe pivot it first?
with pivotted as (
select
(select value from t.dimensions.key_value where key = 'region') as region,
(select value from t.dimensions.key_value where key = 'loc') as loc
from table t
)
select region, loc, count(*)
from pivotted
group by region, loc

Hmmm . . . You seem to be assuming that the ordering of the values is important. This is not a good way to store repeating pairs of data -- arrays of structs seems better. But you can use with offset and some arithmetic.
Assuming that region and loc are the only values and they are interleaved (as in your example):
with t as (
select struct(array[struct('region' as key, 'region1' as value),
struct('loc', 'location1'),
struct('region', 'region1'),
struct('loc', 'location1'),
struct('region', 'region2'),
struct('loc', 'location2')
] as key_value) as dimensions
)
select rl.region, rl.loc, count(*)
from (select (select array_agg(region_loc) as region_locs
from (select struct(max(case when kv.key = 'region' then kv.value end) as region,
max(case when kv.key = 'loc' then kv.value end) as loc
) as region_loc
from unnest(dimensions.key_value) kv with offset n
group by floor( n / 2 )
) rl2
) as region_locs
from t
) rl3 cross join
unnest(rl3.region_locs) rl
group by 1, 2;

SQL - Split string to columns by multiple delimiters

There appear to be numerous solutions to this problem, however my solutions needs to be dynamic as the number of delimiters changes from between 0 and 3 and needs to be relatively efficient as it will be running across >10m rows across 5 loops.
As example:
US
US-AL
US-AL-Talladega
US-AL-Talladega-35160
The solution would need to be able to deposit each item in a Country, State, County, ZIP field with a NULL field if the information is not within the string.
Any comments on the best approach would be appreciated or even point me in the direction of where I may have possible missed a solution would be much appreciated

Another option is with a little XML in concert with a CROSS or OUTER APPLY
Example
Declare #YourTable table (YourCol varchar(100))
Insert Into #YourTable values
('US')
,('US-AL')
,('US-AL-Talladega')
,('US-AL-Talladega-35160')
Select A.*
,B.*
From #YourTable A
Outer Apply (
Select Country = xDim.value('/x[1]','varchar(max)')
,State = xDim.value('/x[2]','varchar(max)')
,County = xDim.value('/x[3]','varchar(max)')
,ZIP = xDim.value('/x[4]','varchar(max)')
From (Select Cast('<x>' + replace(YourCol,'-','</x><x>')+'</x>' as xml) as xDim) as A
) B
Returns
YourCol Country State County ZIP
US US NULL NULL NULL
US-AL US AL NULL NULL
US-AL-Talladega US AL Talladega NULL
US-AL-Talladega-35160 US AL Talladega 35160

you will need a delimited splitter. Like DelimitedSplit8K from http://www.sqlservercentral.com/articles/Tally+Table/72993/
; with tbl as
(
select col = 'US' union all
select col = 'US-AL' union all
select col = 'US-AL-Talladega' union all
select col = 'US-AL-Talladega-35160'
)
select t.col,
max(case when ItemNumber = 1 then Item end) as Country,
max(case when ItemNumber = 2 then Item end) as State,
max(case when ItemNumber = 3 then Item end) as County,
max(case when ItemNumber = 4 then Item end) as Zip
from tbl t
cross apply dbo.[DelimitedSplit8K](t.col, '-')
group by t.col

Alphanumeric sort on nvarchar(50) column

I am trying to write a query that will return data sorted by an alphanumeric column, Code.
Below is my query:
SELECT *
FROM <<TableName>>
CROSS APPLY (SELECT PATINDEX('[A-Z, a-z][0-9]%', [Code]),
CHARINDEX('', [Code]) ) ca(PatPos, SpacePos)
CROSS APPLY (SELECT CONVERT(INTEGER, CASE WHEN ca.PatPos = 1 THEN
SUBSTRING([Code], 2,ISNULL(NULLIF(ca.SpacePos,0)-2, 8000)) ELSE NULL END),
CASE WHEN ca.PatPos = 1 THEN LEFT([Code],
ISNULL(NULLIF(ca.SpacePos,0)-0,1)) ELSE [Code] END) ca2(OrderBy2, OrderBy1)
WHERE [TypeID] = '1'
OUTPUT:
FFS1
FFS2
...
FFS12
FFS1.1
FFS1.2
...
FFS1.1E
FFS1.1R
...
FFS12.1
FFS12.2
FFS.12.1E
FFS12.1R
FFS12.2E
FFS12.2R
DESIRED OUTPUT:
FFS1
FFS1.1
FFS1.1E
FFS1.1R
....
FFS12
FFS12.1
FFS12.1E
FFS12.1R
What am I missing or overlooking?
EDIT:
Let me try to detail the table contents a little better. There are records for FFS1 - FFS12. Those are broken into X subs, i.e., FFS1.1 - FFS1.X to FFS12.1 - FFS12.X. The E and the R was not a typo, each sub record has two codes associated with it: FFS1.1E & FFS1.1R.
Additionally I tried using ORDER BY but it sorted as
FFS1
...
FFS10
FFS2

This will work for any count of parts separated by dots. The sorting is alphanumerical for each part separately.
DECLARE #YourValues TABLE(ID INT IDENTITY, SomeVal VARCHAR(100));
INSERT INTO #YourValues VALUES
('FFS1')
,('FFS2')
,('FFS12')
,('FFS1.1')
,('FFS1.2')
,('FFS1.1E')
,('FFS1.1R')
,('FFS12.1')
,('FFS12.2')
,('FFS.12.1E')
,('FFS12.1R')
,('FFS12.2E')
,('FFS12.2R');
--The query
WITH Splittable AS
(
SELECT ID
,SomeVal
,CAST(N'<x>' + REPLACE(SomeVal,'.','</x><x>') + N'</x>' AS XML) AS Casted
FROM #YourValues
)
,Parted AS
(
SELECT Splittable.*
,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS PartNmbr
,A.part.value(N'text()[1]','nvarchar(max)') AS Part
FROM Splittable
CROSS APPLY Splittable.Casted.nodes(N'/x') AS A(part)
)
,AddSortCrit AS
(
SELECT ID
,SomeVal
,(SELECT LEFT(x.Part + REPLICATE(' ',10),10) AS [*]
FROM Parted AS x
WHERE x.ID=Parted.ID
ORDER BY PartNmbr
FOR XML PATH('')
) AS SortColumn
FROM Parted
GROUP BY ID,SomeVal
)
SELECT ID
,SomeVal
FROM AddSortCrit
ORDER BY SortColumn;
The result
ID SomeVal
10 FFS.12.1E
1 FFS1
4 FFS1.1
6 FFS1.1E
7 FFS1.1R
5 FFS1.2
3 FFS12
8 FFS12.1
11 FFS12.1R
9 FFS12.2
12 FFS12.2E
13 FFS12.2R
2 FFS2
Some explanation:
The first CTE will transform your codes to XML, which allows to address each part separately.
The second CTE returns each part toegther with a number.
The third CTE re-concatenates your code, but each part is padded to a length of 10 characters.
The final SELECT uses this new single-string-per-row in the ORDER BY.
Final hint:
This design is bad! You should not store these values in concatenated strings... Store them in separate columns and fiddle them together just for the output/presentation layer. Doing so avoids this rather ugly fiddle...

Combining Rows in SQL Viia Recursive Query

I have the following table.
Animal Vaccine_Date Vaccine
Cat 2/1/2016 y
Cat 2/1/2016 z
Dog 2/1/2016 z
Dog 1/1/2016 x
Dog 2/1/2016 y
I would like to get the results to be as shown below.
Animal Vaccine_Date Vaccine
Dog 1/1/2016 x
Dog 2/1/2016 y,z
Cat 2/1/2016 y,z
I have the following code which was supplied via my other post at "Combine(concatenate) rows based on dates via SQL"
WITH RECURSIVE recCTE AS
(
SELECT
animal,
vaccine_date,
CAST(min(vaccine) as VARCHAR(50)) as vaccine, --big enough to hold concatenated list
cast (1 as int) as depth --used to determine the largest/last group_concate (the full group) in the final select
FROM TableOne
GROUP BY 1,2
UNION ALL
SELECT
recCTE.animal,
recCTE.vaccine_date,
trim(trim(recCTE.vaccine)|| ',' ||trim(TableOne.vaccine)) as vaccine,
recCTE.depth + cast(1 as int) as depth
FROM recCTE
INNER JOIN TableOne ON
recCTE.animal = TableOne.animal AND
recCTE.vaccine_date = TableOne.vaccine_date and
TableOne.vaccine > recCTE.vaccine
WHERE recCTE.depth < 5
)
--Now select the result with the largest depth for each animal/vaccine_date combo
SELECT * FROM recCTE
QUALIFY ROW_NUMBER() OVER (PARTITION BY animal,vaccine_date ORDER BY depth desc) =1
But this results in the following.
Animal Vaccine_Date vaccine depth
Cat 2/1/2016 y,z,z,z,z 5
Dog 1/1/2016 x 1
Dog 2/1/2016 y,z,z,z,z 5
The "z" keeps repeating. This is because the code is saying anything greater than the minimum vaccine. To account for this, the code was changed to the following.
WITH RECURSIVE recCTE AS
(
SELECT
animal,
vaccine_date,
CAST(min(vaccine) as VARCHAR(50)) as vaccine, --big enough to hold concatenated list
cast (1 as int) as depth, --used to determine the largest/last group_concate (the full group) in the final select
vaccine as vaccine_check
FROM TableOne
GROUP BY 1,2,5
UNION ALL
SELECT
recCTE.animal,
recCTE.vaccine_date,
trim(trim(recCTE.vaccine)|| ',' ||trim(TableOne.vaccine)) as vaccine,
recCTE.depth + cast(1 as int) as depth,
TableOne.vaccine as vaccine_check
FROM recCTE
INNER JOIN TableOne ON
recCTE.animal = TableOne.animal AND
recCTE.vaccine_date = TableOne.vaccine_date and
TableOne.vaccine > recCTE.vaccine and
vaccine_check <> recCTE.vaccine_check
WHERE recCTE.depth < 5
)
--Now select the result with the largest depth for each animal/vaccine_date combo
SELECT * FROM recCTE
QUALIFY ROW_NUMBER() OVER (PARTITION BY animal,vaccine_date ORDER BY depth desc) =1
However, this resulted in the following.
Animal Vaccine_Date vaccine depth vaccine_check
Cat 2/1/2016 y 1 y
Dog 1/1/2016 x 1 x
Dog 2/1/2016 y 1 y
What is missing in the code to get the desired results of the following.
Animal Vaccine_Date Vaccine
Dog 1/1/2016 x
Dog 2/1/2016 y,z
Cat 2/1/2016 y,z

Hmmm. I don't have Teradata on hand but this is a major shortcoming in the project (in my opinion). I think this will work for you, but it might need some tweaking:
with tt as (
select t.*,
row_number() over (partition by animal, vaccine_date order by animal) as seqnum
count(*) over (partition by animal, vaccine_date) as cnt
),
recursive cte as (
select animal, vaccine_date, vaccine as vaccines, seqnum, cnt
from tt
where seqnum = 1
union all
select cte.animal, cte.dte, cte.vaccines || ',' || t.vaccine, tt.seqnum, tt.cnt
from cte join
tt
on tt.animal = cte.animal and
tt.vaccine_date = cte.vaccine_date and
tt.seqnum = cte.seqnum + 1
)
select cte.*
from cte
where seqnum = cnt;

If your Teradata Database version is 14.10 or higher it supports XML data type. This also means that XMLAGG function is supported which would be useful for your case and would let you avoid recursion.
Check if XMLAGG function exists, which is installed with XML Services as an UDF:
SELECT * FROM dbc.FunctionsV WHERE FunctionName = 'XMLAGG'
If it does, then the query would look like:
SELECT
animal,
vaccine_date
TRIM(TRAILING ',' FROM CAST(XMLAGG(vaccine || ',' ORDER BY vaccine) AS VARCHAR(10000)))
FROM
tableone
GROUP BY 1,2
I have no way of testing this atm, but I believe this should work with possibility of minor tweaks.

I was able to get the desired results with the following SQL. This doesn't seem very efficient at all and is not dynamic. However, I can add extra sub querys as needed to combine more vaccines by animal by date.
select
qrya.animal
,qrya.vaccine_date
,case when qrya.vac1 is not null then qrya.vac1 else null end ||','||case when qrya.animal=qryb.animal and qrya.vaccine_date=qryb.vaccine_date then qryb.Vac2 else 'End' end as vaccine_List
from
(
select
qry1.Animal
,qry1.Vaccine_Date
,case when qry1.Vaccine_Rank = 1 then qry1.vaccine end as Vac1
from
(
select
animal
,vaccine_date
,vaccine
,row_number() over (partition by animal,vaccine_date order by vaccine) as Vaccine_Rank
from TableOne
) as qry1
where vac1 is not null
group by qry1.Animal,
qry1.Vaccine_Date
,case when qry1.Vaccine_Rank = 1 then qry1.vaccine end
) as qrya
join
(
select
qry1.Animal
,qry1.Vaccine_Date
,case when qry1.Vaccine_Rank = 2 then qry1.vaccine end as Vac2
from
(
select
animal
,vaccine_date
,vaccine
,row_number() over (partition by animal,vaccine_date order by vaccine) as Vaccine_Rank
from TableOne
) as qry1
where vac2 is not null
group by qry1.Animal,
qry1.Vaccine_Date
,case when qry1.Vaccine_Rank = 2 then qry1.vaccine end
) as qryb
on qrya.Animal=qryb.Animal

sort table with null as "insignificant"

I have a table with two columns: col_order (int), and name (text). I would like to retrieve ordered rows such that, when col_order is not null, it determines the order, but when its null, then name determines the order. I thought of an order by clause such as
order by coalesce( col_order, name )
However, this won't work because the two columns have different type. I am considering converting both into bytea, but: 1) to convert the integer is there a better method than just looping moding by 256, and stacking up individual bytes in a function, and 2) how do I convert "name" to insure some sort of sane collation order (assuming name has order ... well citext would be nice but I haven't bothered to rebuild to get that ... UTF8 for the moment).
Even if all this is possible (suggestions on details welcome) it seems like a lot of work. Is there a better way?
EDIT
I got an excellent answer by Gordon, but it shows I didn't phrase the question correctly. I want a sort order by name where col_order represents places where this order is overridden. This isn't a well posed problem, but here is one acceptable solution:
col_order| name
----------------
null | a
1 | foo
null | foo1
2 | bar
Ie -- here if col_order is null name should be inserted after name closest in alphabetical order but less that it. Otherwise, this could be gotten by:
order by col_order nulls last, name
EDIT 2
Ok ... to get your creative juices flowing, this seems to be going in the right direction:
with x as ( select *,
case when col_order is null then null else row_number() over (order by col_order) end as ord
from temp )
select col_order, name, coalesce( ord, lag(ord,1) over (order by name) + .5) as ord from x;
It gets the order from the previous row, sorted by name, when there is no col_order. It isn't right in general... I guess I'd have to go back to the first row with non-null col_order ... it would seem that sql standard has "ignore nulls" for window functions which might do this, but isn't implemented in postgres. Any suggestions?
EDIT 3
The following would seem close -- but doesn't work. Perhaps window evaluation is a bit strange with recursive queries.
with recursive x(col_order, name, n) as (
select col_order, name, case when col_order is null then null
else row_number() over (order by col_order) * t end from temp, tot
union all
select col_order, name,
case when row_number() over (order by name) = 1 then 0
else lag(n,1) over (order by name) + 1 end from x
where x.n is null ),
tot as ( select count(*) as t from temp )
select * from x;

Just use multiple clauses:
order by (case when col_order is not null then 1 else 2 end),
col_order,
name
When col_order is not null, then 1 is assigned for the first sort key. When it is null, then 2 is assigned. Hence, the not-nulls will be first.

Ok .. the following seems to work -- I'll leave the question "unanswered" though pending criticism or better suggestions:
Using the last_agg aggregate from here:
with
tot as ( select count(*) as t from temp ),
x as (
select col_order, name,
case when col_order is null then null
else (row_number() over (order by col_order)) * t end as n,
row_number() over (order by name) - 1 as i
from temp, tot )
select x.col_order, x.name, coalesce(x.n,last_agg(y.n order by y.i)+x.i, 0 ) as n
from x
left join x as y on y.name < x.name
group by x.col_order, x.n, x.name, x.i
order by n;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Separating out key value pairs - sql

Related

How to Group By in Big Query

SQL - Split string to columns by multiple delimiters

Alphanumeric sort on nvarchar(50) column

Combining Rows in SQL Viia Recursive Query

sort table with null as "insignificant"

Categories

Resources