Convert sql query to nested json - sql

I have the following SQL query:
SELECT att.prod_name, att.prod_group, att.prod_size, obj.physical_id, obj.variant, max(obj.last_updated_date)
FROM Table1 obj
join Table2 att
on obj.prod_name = att.prod_name
where
obj.access_state = 'cr'
AND obj.variant in ('Front')
AND obj.size_code in ('LARGE')
AND att.prod_name in ('prod_1','prod_2')
group by 1,2,3,4,5
The output currently looks like this:
prod_name prod_group prod_size physical_id variant max
prod_1 1 Large - 2 Oz jnjnj3lnzhmui Front 8/8/2020
prod_1 1 Large - 2 Oz pokoknujyguin Front 6/8/2020
prod_2 1 Large - 3 Oz oijwu8ygtoiim Front 4/2/2018
prod_2 1 Large - 3 Oz ytfbeuxxxx2u2 Front 7/2/2018
prod_2 1 Large - 3 Oz rtyferqdctyyx Front 4/4/2020
How can I convert this to a nested json in the query itself ?
Required output: (Variant and max date can be ignored)
{"prod_name":"prod_1" , "prod_group":"1", "prod_size":"Large - 2 Oz", "physical_id":{"physical_id_1":"jnjnj3lnzhmui", "physical_id2" : "pokoknujyguin"}}
{"prod_name":"prod_2" , "prod_group":"1", "prod_size":"Large - 3 Oz", "physical_id":{"physical_id_1":"oijwu8ygtoiim", "physical_id2" : "ytfbeuxxxx2u2", "physical_id3" : "rtyferqdctyyx"}}

As stated there aren't built in JSON statements for Redshift like there are in BigQuery TO_JSON() or SQL Server FOR JSON.
So, you are stuck with either writing a conversion yourself in a coding language like Java or Python or writing up a bunch of string manipulation code to "fake it" directly in Redshift.
Something akin to:
SELECT CHR(123) || '"prod_name"'|| ':' || '"' || nvl(prod_name,0) || '"' || ',' ||
'"prod_group"'|| ':' || '"' || nvl(prod_group,'') || '"' || ',' ||
'"prod_size"'|| ':' || '"' || nvl(prod_size,'') || '"' || Chr(125) FROM TABLE1
The nvl protects you from null values if present. The nesting aspects get a little harder, but with enough patience you should get there.
Good luck!

Related

Observing time difference between Oracle and SQL Server during data migration

We are moving data from Oracle on one Geography to an SQL Server database in another Geography. We are noticing that the time related columns for different objects are different in the two geographies. They are differing by 2 hours.
At source: 2022/05/13 12:01:00
At target: 2022/05/13 10:01:00
I am using this SQL to extract the data at source Oracle database
select distinct item.pitem_id || '~'
|| TO_CHAR(paoItem.PCREATION_DATE, 'dd-Mon-YYYY HH24:MI') || '~'
|| puItemOwner.puser_id || '~'
|| pgItem.pname || '~'
|| puItemLMU.puser_id || '~'
|| TO_CHAR(paoItem.PLAST_MOD_DATE, 'dd-Mon-YYYY HH24:MI')
-- Item Information
from infodba.PITEM item
inner join infodba.PPOM_APPLICATION_OBJECT paoItem on paoItem.puid = item.puid
inner join infodba.PPOM_GROUP pgItem on pgItem.puid = paoItem.ROWNING_GROUPU
inner join infodba.PPOM_USER puItemOwner on puItemOwner.puid = paoItem.ROWNING_USERU
inner join infodba.PPOM_USER puItemLMU on puItemLMU.puid = paoItem.RLAST_MOD_USERU
where item.pitem_id in '3204-001-0613-C';
This query is giving me the correct result itself - 2022/05/13 12:01:00. But when I import this data into the target system, the date is getting update to 2022-05-13 10:01:00.000.
I am assuming that this time difference of 2 hours is because of the probable Geography time difference. If so, what can I do to ensure that the data gets persisted correctly.
Please share me some idea on which options or where to look for this sort of issue.
Thanks,
Pavan.

Can i use group_concat inside another group_concat in SQL?

So i have a query like this:
select
p.identifier,
GROUP_CONCAT(
'[' ||
'{"thumbnail":' ||
'"' ||
ifnull(s.thumbnail,'null') ||
'"' ||
',"title:' ||
'"' ||
s.title ||
'","content": [' ||
GROUP_CONCAT(
'{"text":' ||
ifnull(c.text,'null') ||
'", "image":' ||
ifnull(c.image,'null') ||
'", "caption": "' ||
ifnull(c.caption,'null') ||
'"},'
) ||
']},'
)
from pois as p
join stories as s on p.identifier = s.poiid
join content c on s.storyid = c.storyid
group by s.storyid
And i got and error:
in prepare, misuse of aggregate function GROUP_CONCAT()
To see clearly i have a big object named POIS every POI have multiple STORIES and every STORY have multiple CONTENTS, and i want to display x rows(how many pois i have) and inside the column have every story that is connected to their poi(and every content inside stories) and i need this in json format so i can parse the database query and read back into my object.
I hope its clear what is my problem and you can help me.
So i changed the query to something like this:
SELECT p.identifier, (
SELECT json_group_array(json_object('id', s.storyid))
FROM stories AS s
WHERE s.poiid=p.identifier
) as stories,
(
SELECT json_group_array(json_object('id', c.contentid, 'storyId', s.storyid))
FROM content AS c
JOIN stories AS s ON c.storyid=s.storyid
WHERE s.poiid=p.identifier
) as contents
FROM pois AS p
GROUP BY p.identifier
this is my result:
enter image description here
but the 3 rd column i would like to put inside the second(every pois have multiple stories and every story have one or multiple contents, so the contents should be inside their stories.

REGEXP gives different outputs

I've used REGEXP to find text patterns, but I'm having a problem with one part of it. I'd like to classify a ticketing fare calculation line as having only 1 of the following labels:
blank (there are no Q surcharges)
QCPN (there are only instance(s) of the existing format Q20.00)
QPRO (there are only instance(s) of the new additional format Q LONSYD20.00)
QBOTH (there are examples of both QCPN and QBOTH)
Below SQL:
SELECT JOUR.JOUR_FSERNR
AS TICKET,
CASE
WHEN REGEXP_LIKE (
JOUR.JOUR_FCA1LN
|| JOUR.JOUR_FCA2LN
|| JOUR.JOUR_FCA3LN
|| JOUR.JOUR_FCA4LN,
'Q[[:space:]][[:alpha:]]{6}')
THEN
'QPRO'
WHEN REGEXP_LIKE (
JOUR.JOUR_FCA1LN
|| JOUR.JOUR_FCA2LN
|| JOUR.JOUR_FCA3LN
|| JOUR.JOUR_FCA4LN,
'Q[[:digit:]]+\.[[:digit:]]+.*END')
THEN
'QCPN'
WHEN REGEXP_LIKE (
JOUR.JOUR_FCA1LN
|| JOUR.JOUR_FCA2LN
|| JOUR.JOUR_FCA3LN
|| JOUR.JOUR_FCA4LN,
'Q[[:space:]][[:alpha:]]{6}')
AND REGEXP_LIKE (
JOUR.JOUR_FCA1LN
|| JOUR.JOUR_FCA2LN
|| JOUR.JOUR_FCA3LN
|| JOUR.JOUR_FCA4LN,
'[[ALPHA]]{3}Q[[:digit:]]+\.[[:digit:]]')
THEN
'QBOTH'
ELSE
NULL
END
AS QTYPE,
( (JOUR.JOUR_FCA1LN || JOUR.JOUR_FCA2LN) || JOUR.JOUR_FCA3LN)
|| JOUR.JOUR_FCA4LN
AS FARECALC
FROM "S00BJOUR" JOUR
WHERE JOUR.JOUR_FSERNR = '9999889652'
If you look at the above SQL and find the CASE WHEN line that outputs 'QCPN', you'll see there's an "END" text string 'Q[[:digit:]][[:graph:]]END'. I put ‘END’ in there because I only want the REGEXP to look to the left of 'END' in a fare calc line.
But it gives me some incorrect outputs as shown in the attached image Incorrect Outputs in RED:
Any help to have this corrected is much appreciated.
It feels strange, you don't put so much quantifier. Like QCPN, whathever it means shoudld be :
Q[[:digit:]]{2}\.[[:digit:]]{2}
To match your Q20.00 example at least.
EDIT :
In your "with end exemple", didnt work because you dont put any quantifier :
Q[[:digit:]][[:graph:]]END
#Match Q5.END, Q22END, Q8AEND
#Dont match Q20.00 END
But :
Q[[:digit:]]+\.[[:digit:]]+.*END
#Match "Q1.2 ZA VD END Q20.2"
#Dont match "Q5.END", "BF END Q10.25"
For the second problem, QPRO/QBOTH, the probleme is :
Q HAMAOQ20.00
Try
[[:space:]]Q[[:digit:]]+\.[[:digit:]]+.
For QCPN regex.

Closest match in text fields between two tables: how to improve plpgsql function

I am trying to geocode data in one table (A) with address data contained in another table (B). As street names can be written differently, I would like to first go through the data in A and for each tuple find the street name in B that is closest to the one in the tuple of A, within a given postcode zone. For text matching I am currently using the similarity() function and the '%' operator from the pg_trgm extension.
A contains data from different countries and so the function parameters contain the respective tables names, but also the country I'm treating and the names of the relevant fields in the address data table (B).
Relevant fields in the respective tables are:
A
id | bigint | non NULL
cp | character varying |
rue | character varying |
rue_trouvee | character varying |
iso_pays | character varying |
with index:
"tableA_temp_pkey" PRIMARY KEY, btree (id)
"idx_tableA_pays" btree (iso_pays)
B
rue | character varying(90) |
code_post | character varying(5) |
x | double precision |
y | double precision |
with indexes:
"idx_fradresses_code_post" btree (code_post)
"idx_fradresses_rue_trgm" gin (rue gin_trgm_ops)
Currently, I am using this PLPGSQL function:
CREATE OR REPLACE FUNCTION trouver_rue_proche(datatable TEXT, addresstable TEXT, address_rue TEXT, address_cp TEXT, pays TEXT) RETURNS INTEGER AS $$
DECLARE
rec_data RECORD;
nom_rue RECORD;
counter INTEGER;
BEGIN
counter := 0;
FOR rec_data IN
EXECUTE SELECT id, rue, cp FROM ' || quote_ident(datatable) || ' WHERE iso_pays = ' || quote_literal(pays) || ' AND x is null'
LOOP
counter := counter + 1;
EXECUTE 'SELECT ' || quote_ident(address_rue) || ' as rue_t FROM geocode.' || quote_ident(addresstable) || ' WHERE ' || quote_ident(address_cp) || ' = ' || quote_literal(rec_data.cp) || ' AND ' || quote_ident(address_rue) || ' % ' || quote_literal(rec_data.rue) || ' ORDER BY similarity(' || quote_ident(address_rue) || ', ' || quote_literal(rec_data.rue) || ') DESC LIMIT 1' INTO nom_rue;
EXECUTE 'UPDATE ' || quote_ident(datatable) || ' SET rue_trouvee = $1 WHERE id = $2' USING nom_rue.rue_t, rec_data.id;
END LOOP;
RETURN counter;
END
$$
LANGUAGE plpgsql;
When trying to run this function for a country where 584,670 tuples still have x=NULL, and for which the address table contains 25,228,340 tuples, the function has been running for almost 3 days now.
My machine has the following specs:
Intel(R) Core(TM) i3-3225 CPU # 3.30GHz
8GB RAM
I'm running PostgreSQL 9.1 with the following parameters in postgresql.conf:
shared_buffers = 4096MB
work_mem = 512MB
Any hints on how to improve the efficiency of this function ?
After the hint from Richard Huxton, this is the query I used:
UPDATE tableA set rue_trouvee=t4.rue
FROM (SELECT id, rue
FROM (SELECT t1.id, t2.rue, similarity(t1.rue, t2.rue) as similarity, rank()
OVER (PARTITION BY t1.id ORDER BY similarity(t1.rue, t2.rue) DESC)
FROM tableA t1 JOIN tableB t2
ON (t1.cp = t2.code_post AND t1.rue % t2.rue)
WHERE t1.x is null AND t1.iso_pays='FR') t3
WHERE rank=1) t4
WHERE tableA.id=t4.id
I imagine that this could be solved more elegantly and more efficiently, but at least this worked and gave me the updates I wanted after 5 hours.

Splitting a Path Enumeration Model PathString in MySQL

I'm trying to implement a Path Enumeration model as per Joe Celko's book (page 38). The relevant attributes of my table (and the support table that just contains sequential integers) look like this:
Contribution
------------
ContributionID
PathString
_IntegerSeries
--------------
IntegerID
_IntegerSeries contains integers 1 to n where n is bigger than I'll ever need. Contribution contains three records:
1 1
2 12
3 123
... and I use a modified version of Joe's query:
SELECT SUBSTRING( c1.PathString
FROM (s1.IntegerID * CHAR_LENGTH(c1.ContributionID))
FOR CHAR_LENGTH(c1.ContributionID)) AS ContID
FROM
Contribution c1, _IntegerSeries s1
WHERE
c1.ContributionID = 3
AND s1.IntegerID <= CHAR_LENGTH(c1.PathString)/CHAR_LENGTH(c1.ContributionID);
... to successfully return a result set containing all of ContributionID 3's superiors in the hierarchy. Now, in this example, the PathString column holds plain integer values and obviously we run into trouble once we hit ContributionID 10. So we modify the PathString column to include separators:
1 1.
2 1.2.
3 1.2.3.
Now... the book doesn't give an example of getting superiors when the PathString uses delimiters... so I'll have to figure that out later. But it does give an example for how to split up a PathString (which I'm guessing is going to help me do superior searches). The MySQL version of the example code to do this is:
SELECT SUBSTRING( '.' || c1.PathString || '.'
FROM s1.IntegerID + 1
FOR LOCATE('.', '.' || c1.PathString || '.', s1.IntegerID + 1) - s1.IntegerID - 1) AS Node
FROM _IntegerSeries s1, Contribution c1
WHERE
SUBSTRING('.' || c1.PathString || '.' FROM s1.IntegerID FOR 1) = '.'
AND IntegerID < CHAR_LENGTH('.' || c1.PathString || '.');
... but this code returns an empty result set. I'm doing something wrong, but I'm not sure what. Figured I'd put this out to the stackoverflow community prior to bothering Joe with an email. Anyone have any thoughts?
UPDATE
Quassnoi's query... slightly modified a bit after testing, but exactly the same as his original functionally. Very nice. Much cleaner than what I was using. Big thanks.
SET #contributionID = 3;
SELECT ca.*
FROM
Contribution c INNER JOIN _IntegerSeries s
ON s.IntegerID < #contributionID AND SUBSTRING_INDEX(c.PathString, '.', s.IntegerID) <> SUBSTRING_INDEX(c.PathString, '.', s.IntegerID + 1)
INNER JOIN Contribution ca
ON ca.PathString = CONCAT(SUBSTRING_INDEX(c.PathString, '.', s.IntegerID), '.')
WHERE c.ContributionID = #contributionID;
This is because || in MySQL is boolean OR, not string concatenation.
To find all ancestors of a given Contribution, use:
SELECT ca.*
FROM Contribution с
JOIN IntegerSeries s
ON IntegerID < CHAR_LENGTH(c.path)
AND SUBSTRING_INDEX(c.path, '.', IntegerID) <> SUBSTRING_INDEX(c.path, '.', IntegerID + 1)
JOIN Contribution ca
ON ca.path = CONCAT(SUBSTRING_INDEX(c.path, '.', IntegerID), '.')
WHERE c.ContributionID = 3