extract the second found matching substring using Postgresql - sql

I use the bellow query to extract a value from a column that stores JSON objects.
The issue with it, it does only pull the first value matching to the regex inside SUBSTRING which is -$4,000.00, is there's a parameter to pass to the SUBSTRING to pull the value -$1,990.00 as well in another column.
SELECT attribute_actions_text
, SUBSTRING(attribute_actions_text FROM '"Member [Dd]iscount:":"(.+?)"') AS column_1
, '' AS column_2
FROM (
VALUES
('[{"Member Discount:":"-$4,000.00"},{"Member discount:":"-$1,990.00"}]')
, (NULL)
) ls(attribute_actions_text)
Desired result :
column_1 column_2
-$4,000.00 -$1,990.00

Try this
WITH data(id,attribute_actions_text) as (
VALUES
(1,'[{"Member Discount:":"-$4,000.00"},{"Member Discount:":"-$1,990.00"}]')
, (2,'[{"Member Discount:":"-$4,200.00"},{"Member Discount:":"-$1,890.00"}]')
, (3,NULL)
), match as (
SELECT
id,
m,
ROW_NUMBER()
OVER (PARTITION BY id) AS r
FROM data, regexp_matches(data.attribute_actions_text, '"Member [Dd]iscount:":"(.+?)"', 'g') AS m
)
SELECT
id
,(select m from match where id = d.id AND r=1) as col1
,(select m from match where id = d.id AND r=2) as col2
FROM data d
Result
1,"{-$4,000.00}","{-$1,990.00}"
2,"{-$4,200.00}","{-$1,890.00}"
3,NULL,NULL

Related

Snowflake SQL - OBJECT_CONSTRUCT from COUNT and GROUP BY

I'm trying to summarize data in a table:
counting total rows
counting values on specific fields
getting the distinct values on specific fields
and, more importantly, I'm struggling with:
getting the count for each field nested in an object
given this data
COL1
COL2
A
0
null
1
B
null
B
null
the expected result from this query would be:
with dummy as (
select 'A' as col1, 0 as col2
union all
select null, 1
union all
select 'B', null
union all
select 'B', null
)
select
count(1) as total
,count(col1) as col1
,array_agg(distinct col1) as dist_col1
--,object_construct(???) as col1_object_count
,count(col2) as col2
,array_agg(distinct col2) as dist_col2
--,object_construct(???) as col2_object_count
from
dummy
TOTAL
COL1
DIST_COL1
COL1_OBJECT_COUNT
COL2
DIST_COL2
COL2_OBJECT_COUNT
4
3
["A", "B"]
{"A": 1, "B", 2, null: 1}
2
[0, 1]
{0: 1, 1: 1, null: 2}
I've tried several functions inside OBJECT_CONSTRUCT mixed with ARRAY_AGG, but all failed
OBJECT_CONSTRUCT can work with several columns but only given all (*), if you try a select statement inside, it will fail
another issue is that analytical functions are not easily taken by the object or array functions in Snowflake.
You could use Snowflake Scripting or Snowpark for this but here's a solution that is somewhat flexible so you can apply it to different tables and column sets.
Create test table/view:
Create or Replace View dummy as (
select 'A' as col1, 0 as col2
union all
select null, 1
union all
select 'B', null
union all
select 'B', null
);
Set session variables for table and colnames.
set tbname = 'DUMMY';
set colnames = '["COL1", "COL2"]';
Create view that generates the required table_column_summary data:
Create or replace View table_column_summary as
with
-- Create table of required column names
cn as (
select VALUE::VARCHAR CNAME
from table(flatten(input => parse_json($colnames)))
)
-- Convert rows into objects
,ro as (
select
object_construct_keep_null(*) row_object
-- using identifier on session variable to dynamically supply table/view name
from identifier($tbname) )
-- Flatten row objects into key/values
,rof as (
select
key col_name,
ifnull(value,'null')::VARCHAR col_value
from ro, lateral flatten(input => row_object), cn
-- You will only need this filter if you need a subset
-- of columns from the source table/query summarised
where col_name = cn.cname)
-- Get the column value distinct value counts
,cdv as (
select col_name,
col_value,
sum(1) col_value_count
from rof
group by 1,2
)
-- and derive required column level stats and combine with cdv
,cv as (
select
(select count(1) from dummy) total,
col_name,
object_construct('COL_COUNT', count(col_value) ,
'COL_DIST', array_agg(distinct col_value),
'COL_OBJECT_COUNT', object_agg(col_value,col_value_count)) col_values
from cdv
group by 1,2)
-- Return result
Select * from cv;
Use this final query if you want a solution that works flexibility with any table/columns provided as input...
Select total, object_agg(col_name, col_values) col_values_obj
From table_column_summary
Group by 1;
Or use this final query if you want the fixed columns output as described in your question...
Select total,
COL1[0]:COL_COUNT COL1,
COL1[0]:COL_DIST DIST_COL1,
COL1[0]:COL_OBJECT_COUNT COL1_OBJECT_COUNT,
COL2[0]:COL_COUNT COL2,
COL2[0]:COL_DIST DIST_COL2,
COL2[0]:COL_OBJECT_COUNT COL2_OBJECT_COUNT
from table_column_summary
PIVOT ( ARRAY_AGG ( col_values )
FOR col_name IN ( 'COL1', 'COL2' ) ) as pt (total, col1, col2);

how to convert jsonarray to multi column from hive

example:
there is a json array column(type:string) from a hive table like:
"[{"filed":"name", "value":"alice"}, {"filed":"age", "value":"14"}......]"
how to convert it into :
name age
alice 14
by hive sql?
I've tried lateral view explode but it's not working.
thanks a lot!
This is working example of how it can be parsed in Hive. Customize it yourself and debug on real data, see comments in the code:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field_map['field'] = 'name' then field_map['value'] end) as name,
max(case when field_map['field'] = 'age' then field_map['value'] end) as age --do the same for all fields
from
(
select t.id,
t.str as original_string,
str_to_map(regexp_replace(regexp_replace(trim(a.field),', +',','),'\\{|\\}|"','')) field_map --remove extra characters and convert to map
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14
One more approach using get_json_object:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field = 'name' then value end) as name,
max(case when field = 'age' then value end) as age --do the same for all fields
from
(
select t.id,
get_json_object(trim(a.field),'$.field') field,
get_json_object(trim(a.field),'$.value') value
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14

Getting Number of Common Values from 2 comma-seperated strings

I have a table that contains comma-separated values in a column In Postgres.
ID PRODS
--------------------------------------
1 ,142,10,75,
2 ,142,87,63,
3 ,75,73,2,58,
4 ,142,2,
Now I want a query where I can give a comma-separated string and it will tell me the number of matches between the input string and the string present in the row.
For instance, for input value ',142,87,', I want the output like
ID PRODS No. of Match
------------------------------------------------------------------------
1 ,142,10,75, 1
2 ,142,87,63, 2
3 ,75,73,2,58, 0
4 ,142,2, 1
Try this:
SELECT
*,
ARRAY(
SELECT
*
FROM
unnest(string_to_array(trim(both ',' from prods), ','))
WHERE
unnest = ANY(string_to_array(',142,87,', ','))
)
FROM
prods_table;
Output is:
1 ,142,10,75, {142}
2 ,142,87,63, {142,87}
3 ,75,73,2,58, {}
4 ,142,2, {142}
Add the cardinality(anyarray) function to the last column to get just a number of matches.
And consider changing your database design.
Check This.
select T.*,
COALESCE(No_of_Match,'0')
from TT T Left join
(
select ID,count(ID) No_of_Match
from (
select ID,unnest(string_to_array(trim(t.prods, ','), ',')) A
from TT t)a
Where A in ('142','87')
group by ID
)B
On T.Id=b.id
Demo Here
OutPut
If you install the intarray extension, this gets quite easy:
select id, prods, cardinality(string_to_array(trim(prods, ','), ',')::int[] & array[142,87])
from bad_design;
Otherwise it's a bit more complicated:
select bd.id, bd.prods, m.matches
from bad_design bd
join lateral (
select bd.id, count(v.p) as matches
from unnest(string_to_array(trim(bd.prods, ','), ',')) as l(p)
left join (
values ('142'),('87') --<< these are your input values
) v(p) on l.p = v.p
group by bd.id
) m on m.id = bd.id
order by bd.id;
Online example: http://rextester.com/ZIYS97736
But you should really fix your data model.
with data as
(
select *,
unnest(string_to_array(trim(both ',' from prods), ',') ) as v
from myTable
),
counts as
(
select id, count(t) as c from data
left join
( select unnest(string_to_array(',142,87,', ',') ) as t) tmp on tmp.t = data.v
group by id
order by id
)
select t1.id, t1.prods, t2.c as "No. of Match"
from myTable t1
inner join counts t2 on t1.id = t2.id;

GBQ SQL: Return blank spaces if a record is not found in the table

I have a query as below. I would like SQL to return blank spaces if a key is not found in the table.
Select * from table_A where key in (1, 2, 3, 4)
Output:
1 x y
2 a b
'' '' ''
4 ds c
Assuming table_A has 3 columns and key 3 record in not in the table
Instead of empty strings you should work with NULL values to be type-safe.
NULL indicates that there is no value present in contrast to empty string or zeros which are still values of a certain type.
If you wanted to use empty strings you'd have to cast the key to a string on the run - not very convenient.
The trick to get your result is to create an ideal key-table with all keys - I'm using generate_array here from 1 to the max(key). Then left join your table to it and voila:
WITH test AS (SELECT * FROM UNNEST([
STRUCT(1 AS key, 'x' AS col1, 'y' AS col2),
STRUCT(2 AS key, 'a' AS col1, 'b' AS col2),
STRUCT(4 AS key, 'x' AS col1, 'y' AS col2)
])
)
SELECT
test.*
FROM UNNEST(GENERATE_ARRAY(1, (SELECT MAX(key) FROM test))) AS key
LEFT JOIN test USING(key)
gives you
If you wanted all keys, just SELECT * FROM ...

SQL Server - How to delete some rows of columns without disrupting the rest of the record

I have it
-- -- -- --
01 A1 B1 99
01 A1 B1 98
02 A2 B2 97
02 A2 B2 96
I need this
-- -- -- --
01 A1 B1 99
98
02 A2 B2 97
96
------------
I can not repeat the data that I will present in a excel,
My result needs to be just so.
In my actual table, the last column are responses of forms and the first columns (those that can not repeat) are customer data as (phone, name ...).
The end result of this "query" will populate a "DataTable" and will be presented in a file "xlsx".
Thanks for sharing knowledge ^^
If you have SQL2012+
SELECT
ISNULL(NULLIF(Column1,LAG(Column1) OVER(ORDER BY Column1)),'')
,ISNULL(NULLIF(Column2,LAG(Column2) OVER(ORDER BY Column1,Column2)),'')
,ISNULL(NULLIF(Column3,LAG(Column3) OVER(ORDER BY Column1,Column2,Column3)),'')
,Column4
FROM #mytable
ORDER BY Column1,Column2,Column3,Column4 DESC
It's a little messy, but you can do it in the database. You basically make a subquery that gets the smallest value, and then join that to the regular table and blank out values that don't match. I created your sample set like this:
CREATE TABLE mytable (N1 VARCHAR(2), A VARCHAR(2), B VARCHAR(2), N2 VARCHAR(2))
INSERT INTO mytable VALUES
('01', 'A1', 'B1', '99'),
('01', 'A1', 'B1', '98'),
('02', 'A2', 'B2', '97'),
('02', 'A2', 'B2', '96')
And then was able to get the result like this:
SELECT
CASE WHEN O.N2 = I.N2 THEN O.N1 ELSE '' END,
CASE WHEN O.N2 = I.N2 THEN O.A ELSE '' END,
CASE WHEN O.N2 = I.N2 THEN O.B ELSE '' END,
O.N2
FROM
(SELECT MAX(N2) AS N2, N1, A, B FROM mytable GROUP BY N1, A, B) I
INNER JOIN mytable O
ON O.A = I.A AND O.B = I.B AND O.N1 = I.N1
ORDER BY O.N1 ASC
we can use ROW_NUMBER to get the sequence and substitute '' for all rows where sequence is greater than 1
with CTE
AS
( SELECT ID, ColumnA, ColumnB, value,ROW_NUMBER() over ( PARTITION by id order by id) as seq
FROM tableA
)
, CTE1
AS
(
select id, ColumnA, ColumnB, value, seq from CTE where seq =1
UNION
SELECT id ,'','', value , seq from CTE where seq >1
)
SELECT case when seq >1 THEN NULL ELSE id END as id, columnA, columnB, value from CTE1
You can achieve what you want using a query.
You haven't provided DDL so I am going to asume your columns are called a, b, c and d respectively
; WITH cte AS (
SELECT a
, b
, c
, d
, Row_Number() OVER (PARTITION BY a, b, c ORDER BY d) As sequence
FROM your_table
)
SELECT CASE WHEN sequence = 1 THEN a ELSE '' END As a
, CASE WHEN sequence = 1 THEN b ELSE '' END As b
, CASE WHEN sequence = 1 THEN c ELSE '' END As c
, d
FROM cte
ORDER
BY a
, b
, c
, d
The idea is to assign an incremental counter to each row, that restarts after each change of a + b + c.
We then use a conditional statement to show a value or not (basically only show on the first instance of each group)
The analytic ROW_NUMBER() function is good for this. I've made up column names because you didn't supply any. To assign a row number by customer, use something like this:
SELECT
Name,
Phone,
Address,
Response,
ROW_NUMBER() OVER (PARTITION BY Name, Phone, Address ORDER BY Response) AS CustRow
FROM myTable
That will assign row number within each customer. Try it yourself and I think it will make sense.
You can put it into a subquery or CTE from there and only show customer ID information like name, phone, and address when you're on the first row for each customer:
SELECT
CASE WHEN CustRow = 1 THEN Name ELSE '' END AS Name,
CASE WHEN CustRow = 1 THEN Phone ELSE '' END AS Phone,
CASE WHEN CustRow = 1 THEN Address ELSE '' END AS Address,
Response
FROM (
SELECT
Name,
Phone,
Address,
Response,
ROW_NUMBER() OVER (PARTITION BY Name, Phone, Address ORDER BY Response) AS CustRow
FROM myTable) custSubquery
ORDER BY Name, Phone, Address
The custSubquery on the second-to-last line is because SQL Server requires all subqueries to be aliased, even if the alias isn't used.
The most important thing is to determine how your last column will be ordered for display and to make sure that it's consistent in the ROW_NUMBER() function as well as the final ORDER BY.
If you need more help, please supply table and column names, and specify how results are ordered within each customer.