Snowflake - using json_parse and select distinct to un-nested column and compare with another column - sql

I have 2 columns, 1 is a nested column named custom_field and the other is sales_id I want to compare the sales_id_2 values in custom_field with sales_id column
I've tried this but it didn't work:
select distinct parse_json(custom_fields) as CUSTOM_FIELDS
from my_table where custom_fields:sales_id_2 = sales_id;
but I get the error:
SQL compilation error: error line 1 at position 111 Invalid argument
types for function 'GET': (VARCHAR(16777216), VARCHAR(2)).
+-----------------------------------------------------+
| custom_field | sales_id |
|-----------------------------------------------------|
| | |
| { | 235324115 |
| "sales_id_2": 235324115, | 1234351 |
| "g": 12, | |
| "r": 255 | |
| } | |
| { | 678322341 |
| "sales_id_2": 1234351, | 5648561 |
| "g": 13, | |
| "r": 254 | |
| } | |
I'm hoping to see empty results, because I believe sales_id_2 is the same as sales_id

:: is for casting, plus you are trying a JSON operation on a varchar column. try this
select distinct parse_json(custom_fields) as CUSTOM_FIELDS from my_table where parse_json(custom_fields):sales_id_2 = sales_id;

Related

Update of value in array of jsonb returns error"invalid input syntax for type json"

I have a column of type jsonb which contains json arrays of the form
[
{
"Id": 62497,
"Text": "BlaBla"
}
]
I'd like to update the Id to the value of a column word_id (type uuid) from a different table word.
I tried this
update inflection_copy
SET inflectionlinks = s.json_array
FROM (
SELECT jsonb_agg(
CASE
WHEN elems->>'Id' = (
SELECT word_copy.id::text
from word_copy
where word_copy.id::text = elems->>'Id'
) THEN jsonb_set(
elems,
'{Id}'::text [],
(
SELECT jsonb(word_copy.word_id::text)
from word_copy
where word_copy.id::text = elems->>'Id'
)
)
ELSE elems
END
) as json_array
FROM inflection_copy,
jsonb_array_elements(inflectionlinks) elems
) s;
Until now I always get the following error:
invalid input syntax for type json
DETAIL: Token "c66a4353" is invalid.
CONTEXT: JSON data, line 1: c66a4353...
The c66a4535 is part of one of the uuids of the word table. I don't understand why this is marked as invalid input.
EDIT:
To give an example of one of the uuids:
select to_jsonb(word_id::text) from word_copy limit(5);
returns
+----------------------------------------+
| to_jsonb |
|----------------------------------------|
| "078c979d-e479-4fce-b27c-d14087f467c2" |
| "ef288256-1599-4f0f-a932-aad85d666c9a" |
| "d1d95b60-623e-47cf-b770-de46b01042c5" |
| "f97464c6-b872-4be8-9d9d-83c0102fb26a" |
| "9bb19719-e014-4286-a2d1-4c0cf7f089fc" |
+----------------------------------------+
As requested the respective columns id and word_id from the word table:
+---------------------------------------------------+
| row |
|---------------------------------------------------|
| ('27733', '078c979d-e479-4fce-b27c-d14087f467c2') |
| ('72337', 'ef288256-1599-4f0f-a932-aad85d666c9a') |
| ('72340', 'd1d95b60-623e-47cf-b770-de46b01042c5') |
| ('27741', 'f97464c6-b872-4be8-9d9d-83c0102fb26a') |
| ('72338', '9bb19719-e014-4286-a2d1-4c0cf7f089fc') |
+---------------------------------------------------+
+----------------+----------+----------------------------+
| Column | Type | Modifiers |
|----------------+----------+----------------------------|
| id | bigint | |
| value | text | |
| homonymnumber | smallint | |
| pronounciation | text | |
| audio | text | |
| level | integer | |
| alpha | bigint | |
| frequency | bigint | |
| hanja | text | |
| typeeng | text | |
| typekr | text | |
| word_id | uuid | default gen_random_uuid() |
+----------------+----------+----------------------------+
I would suggest you to modify your sub query as follow :
update inflection_copy AS ic
SET inflectionlinks = s.json_array
FROM
(SELECT jsonb_agg(CASE WHEN wc.word_id IS NULL THEN e.elems ELSE jsonb_set(e.elems, array['Id'], to_jsonb(wc.word_id::text)) END ORDER BY e.id ASC) AS json_array
FROM inflection_copy AS ic
CROSS JOIN LATERAL jsonb_path_query(ic.inflectionlinks, '$[*]') WITH ORDINALITY AS e(elems, id)
LEFT JOIN word_copy AS wc
ON wc.id::text = e.elems->>'Id'
) AS s
The LEFT JOIN clause will return wc.word_id = NULL when there is no wc.id which corresponds to e.elems->>'id', so that e.elems is unchanged in the CASE.
The ORDER BY clause in the aggregate function jsonb_agg will ensure that the order is unchanged in the jsonb array.
jsonb_path_query is used instead of jsonb_array_elements so that to not raise an error when ic.inflectionlinks is not a jsonb array and it is used in lax mode (which is the default behavior).
see the test result in dbfiddle

Postgresql query substract from one table

I have a one tables in Postgresql and cannot find how to build a query.
The table contains columns nr_serii and deleteing_time. I trying to count nr_serii and substract from this positions with deleting_time.
My query:
select nr_serii , count(nr_serii ) as ilosc,count(deleting_time) as ilosc_delete
from MyTable
group by nr_serii, deleting_time
output is:
+--------------------+
| "666666";1;1 |
| "456456";1;0 |
| "333333";3;0 |
| "333333";1;1 |
| "111111";1;1 |
| "111111";3;0 |
+--------------------+
The part of table with raw data:
+--------------------------------+
| "666666";"2020-11-20 14:08:13" |
| "456456";"" |
| "333333";"" |
| "333333";"" |
| "333333";"" |
| "333333";"2020-11-20 14:02:23" |
| "111111";"" |
| "111111";"" |
| "111111";"2020-11-20 14:08:04" |
| "111111";"" |
+--------------------------------+
And i need substract column ilosc and column ilosc_delete
example:
nr_serii:333333 ilosc:3-1=2
Expected output:
+-------------+
| "666666";-1 |
| "456456";1 |
| "333333";2 |
| "111111";2 |
| ... |
+-------------+
I think this is very simple solution for this but i have empty in my head.
I see what you want now. You want to subtract the number where deleting_time is not null from the ones where it is null:
select nr_serii,
count(*) filter (where deleting_time is null) - count(deleting_time) as ilosc_delete
from MyTable
group by nr_serii;
Here is a db<>fiddle.

SQL GROUPING with conditional

I am sure this is easy to accomplish but after spending the whole day trying I had to give up and ask for your help.
I have a table that looks like this
| PatientID | VisitId | DateOfVisit | FollowUp(Y/N) | FollowUpWks |
----------------------------------------------------------------------
| 123456789 | 2222222 | 20180802 | Y | 2 |
| 123456789 | 3333333 | 20180902 | Y | 4 |
| 234453656 | 4443232 | 20180506 | N | NULL |
| 455344243 | 2446364 | 20180618 | Y | 12 |
----------------------------------------------------------------------
Basically I have a list of PatientIDs, each patient can have multiple visits (VisitID and DateOfVisit). FollowUp(Y/N) specifies whether the patients has to be seen again and in how many weeks (FollowUpWks).
Now, what I need is a query that extracts PatientsID, DateOfVisit (the most recent one and only if FollowUp is YES) and the FollowUpWks field.
Final result should look like this
| PatientID | VisitId | DateOfVisit | FollowUp(Y/N) | FollowUpWks |
----------------------------------------------------------------------
| 123456789 | 3333333 | 20180902 | Y | 4 |
| 455344243 | 2446364 | 20180618 | Y | 12 |
----------------------------------------------------------------------
The closest I could get was with this code
SELECT PatientID,
Max(DateOfVisit) AS LastVisit
FROM mytable
WHERE FollowUp = True
GROUP BY PatientID;
The problem is that when I try adding the FollowUpWks field to the SELECT I get the following error: "The query does not include the specified expression as part of an aggregate function." However, if I add FollowUpWks to the GROUP BY statement than I get all visits, not just the most recent ones.
You need to match back to the most recent visit. One method uses a correlated subquery:
SELECT t.*
FROM mytable as t
WHERE t.FollowUp = True AND
t.DateOfVisit = (SELECT MAX(t2.DateOfVisit)
FROM mytable as t2
WHERE t2.PatientID = t.PatientID
);

In Hive, what is the difference between explode() and lateral view explode()

Assume there is a table employee:
+-----------+------------------+
| col_name | data_type |
+-----------+------------------+
| id | string |
| perf | map<string,int> |
+-----------+------------------+
and the data inside this table:
+-----+------------------------------------+--+
| id | perf |
+-----+------------------------------------+--+
| 1 | {"job":80,"person":70,"team":60} |
| 2 | {"job":60,"team":80} |
| 3 | {"job":90,"person":100,"team":70} |
+-----+------------------------------------+--+
I tried the following two queries but they all return the same result:
1. select explode(perf) from employee;
2. select key,value from employee lateral view explode(perf) as key,value;
The result:
+---------+--------+--+
| key | value |
+---------+--------+--+
| job | 80 |
| team | 60 |
| person | 70 |
| job | 60 |
| team | 80 |
| job | 90 |
| team | 70 |
| person | 100 |
+---------+--------+--+
So, what is the difference between them? I did not find suitable examples. Any help is appreciated.
For your particular case both queries are OK. But you can't use multiple explode() functions without lateral view. So, the query below will fail:
select explode(array(1,2)), explode(array(3, 4))
You'll need to write something like:
select
a_exp.a,
b_exp.b
from (select array(1, 2) as a, array(3, 4) as b) t
lateral view explode(t.a) a_exp as a
lateral view explode(t.b) b_exp as b

casting a REAL as INT and comparing

I am casting a real to an int and a float to an int and comparing the two like this:
where
cast(a.[SUM(PAID_AMT)] as int)!=cast(b.PAID_AMT as int)
but i am still getting results where the two are equal. for example:
+-----------+-----------+------------+------------+----------+
| accn | load_dt | pmtdt | sumpaidamt | Bpaidamt |
+-----------+-----------+------------+------------+----------+
| A133312 | 6/7/2011 | 11/28/2011 | 98.39 | 98.39 |
| A445070 | 6/2/2011 | 9/22/2011 | 204.93 | 204.93 |
| A465606 | 5/19/2011 | 10/19/2011 | 560.79 | 560.79 |
| A508742 | 7/12/2011 | 10/19/2011 | 279.65 | 279.65 |
| A567730 | 5/27/2011 | 10/24/2011 | 212.76 | 212.76 |
| A617277 | 7/12/2011 | 10/12/2011 | 322.02 | 322.02 |
| A626384 | 6/16/2011 | 10/21/2011 | 415.84 | 415.84 |
| AA0000044 | 5/12/2011 | 5/23/2011 | 197.38 | 197.38 |
+-----------+-----------+------------+------------+----------+
here is the full query:
select
a.accn,
a.load_dt,
a.pmtdt,
a.[SUM(PAID_AMT)] sumpaidamt,
sum(b.paid_amt) Bpaidamt
from
[MILLENNIUM_DW_DEV].[dbo].[Millennium_Payment_Data_May2011_July2012] a
join
F_PAYOR_PAYMENTS_DAILY b
on
a.accn=b.ACCESSION_ID
and
a.final_rpt_dt=b.FINAL_REPORT_DATE
and
a.load_dt=b.LOAD_DATE
and
a.pmtdt=b.PAYMENT_DATE
where
cast(a.[SUM(PAID_AMT)] as int)!=cast(b.PAID_AMT as int)
group by
a.accn,
a.load_dt,
a.pmtdt,
a.[SUM(PAID_AMT)]
what am i doing wrong? how do i return only records that are NOT equal?
I don't see why there is an issue.
The query is returning the sum of the payments in b (sum(b.paid_amt) Bpaidamt). The where clause is comparing individual payments. This just means that there is more than one payment.
Perhaps your intention is to have a HAVING clause instead:
having cast(a.[SUM(PAID_AMT)] as int)!=cast(sum(b.PAID_AMT) as int)
You can do a round and a cast statement.
cast(round(sumpaidamt,2) as money) <> cast(round(Bpaidamt,2) as money)
Sql Fiddle showing how it would work http://sqlfiddle.com/#!3/4eb79/1