SQL server query on json string for stats - sql

I have this SQL Server database that holds contest participations. In the Participation table, I have various fields and a special one called ParticipationDetails. It's a varchar(MAX). This field is used to throw in all contest specific data in json format. Example rows:
Id,ParticipationDetails
1,"{'Phone evening': '6546546541', 'Store': 'StoreABC', 'Math': '2', 'Age': '01/01/1951'}"
2,"{'Phone evening': '6546546542', 'Store': 'StoreABC', 'Math': '2', 'Age': '01/01/1952'}"
3,"{'Phone evening': '6546546543', 'Store': 'StoreXYZ', 'Math': '2', 'Age': '01/01/1953'}"
4,"{'Phone evening': '6546546544', 'Store': 'StoreABC', 'Math': '3', 'Age': '01/01/1954'}"
I'm trying to get a a query runing, that will yield this result:
Store, Count
StoreABC, 3
StoreXYZ, 1
I used to run this query:
SELECT TOP (20) ParticipationDetails, COUNT(*) Count FROM Participation GROUP BY ParticipationDetails ORDER BY Count DESC
This works as long as I want unique ParticipationDetails. How can I change this to "sub-query" into my json strings. I've gotten to this query, but I'm kind of stuck here:
SELECT 'StoreABC' Store, Count(*) Count FROM Participation WHERE ParticipationDetails LIKE '%StoreABC%'
This query gets me the results I want for a specific store, but I want the store value to be "anything that was put in there".
Thanks for the help!

first of all, I suggest to avoid any json management with t-sql, since is not natively supported. If you have an application layer, let it to manage those kind of formatted data (i.e. .net framework and non MS frameworks have json serializers available).
However, you can convert your json strings using the function described in this link.
You can also write your own query which works with strings. Something like the following one:
SELECT
T.Store,
COUNT(*) AS [Count]
FROM
(
SELECT
STUFF(
STUFF(ParticipationDetails, 1, CHARINDEX('"Store"', ParticipationDetails) + 9, ''),
CHARINDEX('"Math"',
STUFF(ParticipationDetails, 1, CHARINDEX('"Store"', ParticipationDetails) + 9, '')) - 3, LEN(STUFF(ParticipationDetails, 1, CHARINDEX('"Store"', ParticipationDetails) + 9, '')), '')
AS Store
FROM
Participation
) AS T
GROUP BY
T.Store

Related

Sql help to find username in a data set

Data
row data
1 (172.32.313.20:5892) User 'ant\john' requested
2 User ant\john logged on from 172.31.13.2129
3 user=ant\john domain=ant.amazon.com server=172.31.19.541 protocol=LDAPS result=0:Success
I need to pull the username (john) from this dataset .
select message,replace(TRIM(split_part(split_part(message, 'requested', 1), 'User ', 2)), 'ant\\', '') username1,
replace(TRIM(split_part(split_part(message, 'requested', 1), 'user=', 2)), 'ant\\', '')username2
from test_kemp_log.archive
where message like '%john%'
Is there a better way to extract User(user) ant/ information from dataset?
The following could work in your case.
SELECT regexp_substr(data, 'ant\\\\'||'([a-z]+)', 1,1, 'e'), data From test_kemp_log.archive
WHERE message like '%john%';
'e' is for extracting the first group. In our case, it's the first thing that matches ([a-z]+) after "ant\".

BigQuery extract repeated JSON

I have the following data in one column in BigQuery:
{"id": "81", "type": ["{'id2': '12', 'type2': 'main'}", "{'id2': '15', 'type2': 'sub'}"]}
I would like to parse this and have 'id2' and 'type2' as nested fields. I tried using JSON_VALUE_ARRAY(data, "$.type") that correctly creates the nested rows but couldn't process extracting 'id2' and 'type2'. I believe maybe the "s are the issue inside the list, how could I get past those?
UPDATE:
This is the format I would like to achieve.
Consider below approach
select
json_value(json, '$.id') id, array(
select as struct json_value(trim(type, '"'), '$.id2') as id2, json_value(trim(type, '"'), '$.type2') as type2
from unnest(json_extract_array(json, '$.type')) type
) type
from your_table

SQL: Consolidating results

I am a relative beginner in SQL (Learned and forgotten many times) and entirely self taught so please excuse my likely lack of proper terminology. I made a query that pulls items that have been returned, each items' row has a return code. Here is a result sample:
In the final report(Created with Visual Studio), I would like to be able to have a count of returns by return type but I would need to consolidate the 40 or so return codes into 4 or 5 return type groups. So RET_CODE values ) and . are both product quality issues and would count in the "Product Quality" row.
I could use some help with what the best way to accomplish this would be.
Thank You
Andrew
The bad way!
You could do this by creating the grouping within your SQL statement with something like
SELECT
*,
CASE
WHEN RET_CODE IN ('.', ')') THEN 'Quality Error'
WHEN RET_CODE IN ('X', 'Y', 'Z') THEN 'Some Other error'
ELSE [Desc1]
END AS GroupDescription
FROM myTable
The problem with this approach is that you have to keep repeating it every time you want to something similar.
The better option. (but not perfect!)
Assuming you do not already have such a table...
Create a table that contains the grouping. You can use this in the future whenever you need to do this kind of thing.
For example.
CREATE TABLE dbo.MyErrorGroupTable (RET_CODE varchar(10), GroupDescription varchar(50))
INSERT INTO dbo.MyErrorGroupTable VALUES
('.', 'Quality Error'),
(')', 'Quality Error'),
('X', 'Some Other Error'),
('Y', 'Some Other Error'),
('.', 'Some Other Error'),
('P', 'UPS Error'),
('A', 'PAck/Pick Error')
etc....
Then you can simply join to this table and use the GroupDescription in your report
e.g.
SELECT
a.*, b.GroupDescription
FROM myTable a
JOIN MyErrorGroupTable b ON a.RET_CODE = b.RET_CODE
Hope this helps...
You're looking for the GROUP BY clause.

Using SQL Developer, I want to display only the numbers from a string after a specific character

I am using SQL Developer (and it must be with SQL Developer), I need to take a string that looks like XML data but it really is just a string and display the data into a table. The data is from a large table that has no numbers in some user Id's and some that has no numbers in Job Ids but the XML like tags are still there. Again, it is not XML just made to look like XML so no XML commands will work.
Data:
<UserId>1234567</UserId><JobId>1234567890123</JobId><Date>Wed May 09 13:08:24 EDT 2018</Date>
Here is what I have so far:
select company_id, location_id,
regexp_substr(xml_provision_responses,'UserId>([[:digit:]]+<?)') as USER_Id,
regexp_substr(xml_provision_responses,'UserPitId>([[:digit:]]+<?)') as JOB_ID
From Entitymgr.Cr_Response_Matrix
Where CAST(regexp_replace(SUBSTR(xml_provision_responses,-24,4), '[^0-9]','')as varchar(20))='2018'
and company_Id = 9876543 and location_Id = 9876543210987;
The first Where condition was just to limit the data pull to just this year and the second Where condition was just added to just focus on one data point so that part does not matter.
Expected output:
COMPANY_ID: 9876543
LOCATION_Id: 9876543210987
USER_ID: 1234567
JOB_ID: 1234567890123
========================================
Actual Output:
COMPANY_ID: 9876543
LOCATION_Id: 9876543210987
USER_ID: UserId>1234567<
JOB_ID: JobId>1234567890123<
========================================
I want to only display the number portion of the string that are between the > < tags. And if there is no number between the tags to put null or the word missing into the table in that specific column.
You need to use some more parameters of regexp_substr:
select regexp_substr(xml_provision_responses, '(<JobId>)(.*)(</JobId>)', 1, 1, 'i', 2),
regexp_substr(xml_provision_responses, '(<UserId>)(.*)(</UserId>)', 1, 1, 'i', 2)
...
The idea is to divide the matching string into 3 parts:
the tag opening ((<UserId>))
the content ((.*))
the tag closure ((</UserId>))
and then only get the second matching subexpression (see the parameter 2 in the function calls.
I'm aware that you said (and stressed) that the string is not XML, but what you've shown does seem to be enough like XML to let you use the XML functions in the database anyway:
-- cte for your data
with cr_response_matrix (company_id, location_id, xml_provision_responses) as (
select 9876543, 9876543210987,
'<UserId>1234567</UserId><JobId>1234567890123</JobId><Date>Wed May 09 13:08:24 EDT 2018</Date>'
from dual
)
-- actual query
select crm.company_id, crm.location_id, xml.user_id, xml.job_id
from cr_response_matrix crm
cross join xmltable ('/root'
passing xmltype('<root>' || xml_provision_responses || '</root>')
columns user_id number path 'UserId',
job_id number path 'JobId',
tsz_str varchar2(28) path 'Date'
) xml
where substr(tsz_str, -4) = '2018';
COMPANY_ID LOCATION_ID USER_ID JOB_ID
--------------- --------------- --------------- ---------------
9876543 9876543210987 1234567 1234567890123
Of course, your actual strings might have other stuff that makes this approach invalid.
You could add an XML header as well as a dummy root node:
select crm.company_id, crm.location_id, xml.user_id, xml.job_id
from cr_response_matrix crm
cross join xmltable ('/root'
passing xmltype('<?xml version="1.0" encoding="UTF-8" standalone="no" ?><root>'
|| xml_provision_responses || '</root>')
columns user_id number path 'UserId',
job_id number path 'JobId',
tsz_str varchar2(28) path 'Date'
) xml
where extract(year from to_timestamp_tz(replace(xml.tsz_str, ' E', ' US/Eastern E'),
'Dy Mon DD HH24:MI:SS TZR TZD YYYY', 'NLS_DATE_LANGUAGE=ENGLISH')) = 2018;
Just for fun I've also converted the Date value to a full timestamp with time zone to extract the actual year instead of using substr().
But this is all academic if the data isn't consistently as close to XML as your example suggested, and the adjusted regular expressions are reliable.
Can't say you didn't warn me...

IQueryDescription SQL Query Returning Limited Records

I created a SQL query that is working in SQL Management Studio, but as soon as I transfer the query into ArcMap's IQueryDescription to try and run it with a User Form, the results that I'm getting back are very limited and there is no clear pattern to the results it is returning. (409 records vs. 15 records)
I am even copying and pasting the query that works into the ArcObjects code, but I get a limited number of records back and no errors are thrown.
Has anyone run into this? What direction should I look for a solution?
I included the query below, although the query itself works just fine in SQL Management Studio.
SELECT CADDATA.CALLINDEX.NCALLHIST2, CADDATA.CALLINDEX.NHISTSEQUENCE, CADDATA.CALLINDEX.SZCALL, CADDATA.CALLINDEX.SZCALLTYPE, CADDATA.CALLINDEX.SZCALLDESC, CADDATA.CALLINDEX.SZGEOGROUP, CADDATA.CALLINDEX.SZPRIORITY, CADDATA.CALLINDEX.SZDISPOSITION, CADDATA.CALLINDEX.LTCREATED, CADDATA.CALLINDEX.LTENTERED, CADDATA.CALLINDEX.LTDISPATCHED, CADDATA.CALLINDEX.LTENROUTE, CADDATA.CALLINDEX.LTONSCENE, CASE WHEN RIGHT(SZLOCATION, CHARINDEX(',', SZLOCATION)) = '' THEN SZLOCATION ELSE LEFT(SZLOCATION, CHARINDEX(',', SZLOCATION) - 1) END AS Location, cast(LTCREATED as date) as gedatedidid FROM CADDATA.CALLINDEX INNER JOIN CADDATA.CALLINDEX_MXSEQUNCE ON CADDATA.CALLINDEX.NCALLHIST1 = CADDATA.CALLINDEX_MXSEQUNCE.NCALLHIST1 AND CADDATA.CALLINDEX.NCALLHIST2 = CADDATA.CALLINDEX_MXSEQUNCE.NCALLHIST2 AND CADDATA.CALLINDEX.NHISTSEQUENCE = CADDATA.CALLINDEX_MXSEQUNCE.NHISTSEQUENCE WHERE (CADDATA.CALLINDEX.SZCALLTYPE IN ('1818', '1825', 'AH', 'BC', 'BS', 'BUS', 'CH', 'F', 'FP', 'PAT', 'PC', 'PW', 'SUB', 'TL', 'TS', 'TST', 'VHC', 'VP')) AND (CADDATA.CALLINDEX.SZGEOGROUP IN ('D1', 'D2', 'D3')) AND (cast(LTCREATED as date) >= '2013-6-4') AND (cast(LTCREATED as date) <= '2013-6-4')