Postgres query json array to find the count which matches the condition - sql

I have a table my_table with a column name itinerary in my Postgres 12 DB.
select column_name, data_type from information_schema.columns where table_name = 'my_table' and column_name = 'itinerary';
column_name | data_type
-------------+-----------
itinerary | ARRAY
(1 row)
Every element in the itinerary is a JSON with the field name address, which has the field name city. I can find the count which matches the condition for the first element of the itinerary by using the following query:
select count(*) from my_table where lower(itinerary[1]->'address'->>'city') = 'oakland';
count
-------
12
(1 row)
and I can also find the length of an array by using the following query:
select array_length(itinerary, 1) from my_table limit 1;
I would like to find all the records which can have a city name Oakland in their itinerary, not only as a first stop. I am not sure how to figure out that. Thanks in advance.

You can use exists and unnest():
select count(*)
from mytable t
where exists (
select 1
from unnest(t.itinerary) as x(obj)
where x.obj -> 'address'->>'city' = 'oakland'
)

Related

Combine multiple rows with different column values into a single one

I'm trying to create a single row starting from multiple ones and combining them based on different column values; here is the result i reached based on the following query:
select distinct ID, case info when 'name' then value end as 'NAME', case info when 'id' then value end as 'serial'
FROM TABLENAME t
WHERE info = 'name' or info = 'id'
Howerver the expected result should be something along the lines of
I tried with group by clauses but that doesn't seem to work.
The RDBMS is Microsoft SQL Server.
Thanks
SELECT X.ID,MAX(X.NAME)NAME,MAX(X.SERIAL)AS SERIAL FROM
(
SELECT 100 AS ID, NULL AS NAME, '24B6-97F3'AS SERIAL UNION ALL
SELECT 100,'A',NULL UNION ALL
SELECT 200,NULL,'8113-B600'UNION ALL
SELECT 200,'B',NULL
)X
GROUP BY X.ID
For me GROUP BY works
A simple PIVOT operator can achieve this for dynamic results:
SELECT *
FROM
(
SELECT id AS id_column, info, value
FROM tablename
) src
PIVOT
(
MAX(value) FOR info IN ([name], [id])
) piv
ORDER BY id ASC;
Result:
| id_column | name | id |
|-----------|------|------------|
| 100 | a | 24b6-97f3 |
| 200 | b | 8113-b600 |
Fiddle here.
I'm a fan of a self join for things like this
SELECT tName.ID, tName.Value AS Name, tSerial.Value AS Serial
FROM TableName AS tName
INNER JOIN TableName AS tSerial ON tSerial.ID = tName.ID AND tSerial.Info = 'Serial'
WHERE tName.Info = 'Name'
This initially selects only the Name rows, then self joins on the same IDs and now filter to the Serial rows. You may want to change the INNER JOIN to a LEFT JOIN if not everything has a Name and Serial and you want to know which Names don't have a Serial

Redshift comma separated string to column [duplicate]

I am wondering how to convert comma-delimited values into rows in Redshift. I am afraid that my own solution isn't optimal. Please advise. I have table with one of the columns with coma-separated values. For example:
I have:
user_id|user_name|user_action
-----------------------------
1 | Shone | start,stop,cancell...
I would like to see
user_id|user_name|parsed_action
-------------------------------
1 | Shone | start
1 | Shone | stop
1 | Shone | cancell
....
A slight improvement over the existing answer is to use a second "numbers" table that enumerates all of the possible list lengths and then use a cross join to make the query more compact.
Redshift does not have a straightforward method for creating a numbers table that I am aware of, but we can use a bit of a hack from https://www.periscope.io/blog/generate-series-in-redshift-and-mysql.html to create one using row numbers.
Specifically, if we assume the number of rows in cmd_logs is larger than the maximum number of commas in the user_action column, we can create a numbers table by counting rows. To start, let's assume there are at most 99 commas in the user_action column:
select
(row_number() over (order by true))::int as n
into numbers
from cmd_logs
limit 100;
If we want to get fancy, we can compute the number of commas from the cmd_logs table to create a more precise set of rows in numbers:
select
n::int
into numbers
from
(select
row_number() over (order by true) as n
from cmd_logs)
cross join
(select
max(regexp_count(user_action, '[,]')) as max_num
from cmd_logs)
where
n <= max_num + 1;
Once there is a numbers table, we can do:
select
user_id,
user_name,
split_part(user_action,',',n) as parsed_action
from
cmd_logs
cross join
numbers
where
split_part(user_action,',',n) is not null
and split_part(user_action,',',n) != '';
Another idea is to transform your CSV string into JSON first, followed by JSON extract, along the following lines:
... '["' || replace( user_action, '.', '", "' ) || '"]' AS replaced
... JSON_EXTRACT_ARRAY_ELEMENT_TEXT(replaced, numbers.i) AS parsed_action
Where "numbers" is the table from the first answer. The advantage of this approach is the ability to use built-in JSON functionality.
If you know that there are not many actions in your user_action column, you use recursive sub-querying with union all and therefore avoiding the aux numbers table.
But it requires you to know the number of actions for each user, either adjust initial table or make a view or a temporary table for it.
Data preparation
Assuming you have something like this as a table:
create temporary table actions
(
user_id varchar,
user_name varchar,
user_action varchar
);
I'll insert some values in it:
insert into actions
values (1, 'Shone', 'start,stop,cancel'),
(2, 'Gregory', 'find,diagnose,taunt'),
(3, 'Robot', 'kill,destroy');
Here's an additional table with temporary count
create temporary table actions_with_counts
(
id varchar,
name varchar,
num_actions integer,
actions varchar
);
insert into actions_with_counts (
select user_id,
user_name,
regexp_count(user_action, ',') + 1 as num_actions,
user_action
from actions
);
This would be our "input table" and it looks just as you expected
select * from actions_with_counts;
id
name
num_actions
actions
2
Gregory
3
find,diagnose,taunt
3
Robot
2
kill,destroy
1
Shone
3
start,stop,cancel
Again, you can adjust initial table and therefore skipping adding counts as a separate table.
Sub-query to flatten the actions
Here's the unnesting query:
with recursive tmp (user_id, user_name, idx, user_action) as
(
select id,
name,
1 as idx,
split_part(actions, ',', 1) as user_action
from actions_with_counts
union all
select user_id,
user_name,
idx + 1 as idx,
split_part(actions, ',', idx + 1)
from actions_with_counts
join tmp on actions_with_counts.id = tmp.user_id
where idx < num_actions
)
select user_id, user_name, user_action as parsed_action
from tmp
order by user_id;
This will create a new row for each action, and the output would look like this:
user_id
user_name
parsed_action
1
Shone
start
1
Shone
stop
1
Shone
cancel
2
Gregory
find
2
Gregory
diagnose
2
Gregory
taunt
3
Robot
kill
3
Robot
destroy
Here are two ways to achieve this.
In my example, I'm assuming that I am accepting a comma separated list of values. My values look like schema.table.column.
The first involves using a recursive CTE.
drop table if exists #dep_tbl;
create table #dep_tbl as
select 'schema.foobar.insert_ts,schema.baz.load_ts' as dep
;
with recursive tmp (level, dep_split, to_split) as
(
select 1 as level
, split_part(dep, ',', 1) as dep_split
, regexp_count(dep, ',') as to_split
from #dep_tbl
union all
select tmp.level + 1 as level
, split_part(a.dep, ',', tmp.level + 1) as dep_split_u
, tmp.to_split
from #dep_tbl a
inner join tmp on tmp.dep_split is not null
and tmp.level <= tmp.to_split
)
select dep_split from tmp;
the above yields:
|dep_split|
|schema.foobar.insert_ts|
|schema.baz.load_ts|
The second involves a stored procedure.
CREATE OR REPLACE PROCEDURE so_test(dependencies_csv varchar(max))
LANGUAGE plpgsql
AS $$
DECLARE
dependencies_csv_vals varchar(max);
BEGIN
drop table if exists #dep_holder;
create table #dep_holder
(
avoid varchar(60000)
);
IF dependencies_csv is not null THEN
dependencies_csv_vals:='('||replace(quote_literal(regexp_replace(dependencies_csv,'\\s','')),',', '\'),(\'') ||')';
execute 'insert into #dep_holder values '||dependencies_csv_vals||';';
END IF;
END;
$$
;
call so_test('schema.foobar.insert_ts,schema.baz.load_ts')
select
*
from
#dep_holder;
the above yields:
|dep_split|
|schema.foobar.insert_ts|
|schema.baz.load_ts|
in conclusion
If you only care about one single column in your input (the X delimited values), then I think the stored procedure is easier/faster.
However, if you have other columns you care about and want to keep those columns along with your comma separated value column now transformed to rows, OR, if you want to know the argument (original list of delimited values), I think the stored procedure is the way to go. In that case, you can just add those other columns to your columns selected in the recursive query.
You can get the expected result with the following query. I'm using "UNION ALL" to convert a column to row.
select user_id, user_name, split_part(user_action,',',1) as parsed_action from cmd_logs
union all
select user_id, user_name, split_part(user_action,',',2) as parsed_action from cmd_logs
union all
select user_id, user_name, split_part(user_action,',',3) as parsed_action from cmd_logs
Here's my equally-terrible answer.
I have a users table, and then an events table with a column that is just a comma-delimited string of users at said event. eg
event_id | user_ids
1 | 5,18,25,99,105
In this case, I used the LIKE and wildcard functions to build a new table that represents each event-user edge.
SELECT e.event_id, u.id as user_id
FROM events e
LEFT JOIN users u ON e.user_ids like '%' || u.id || '%'
It's not pretty, but I throw it in a WITH clause so that I don't have to run it more than once per query. I'll likely just build an ETL to create that table every night anyway.
Also, this only works if you have a second table that does have one row per unique possibility. If not, you could do LISTAGG to get a single cell with all your values, export that to a CSV and reupload that as a table to help.
Like I said: a terrible, no-good solution.
Late to the party but I got something working (albeit very slow though)
with nums as (select n::int n
from
(select
row_number() over (order by true) as n
from table_with_enough_rows_to_cover_range)
cross join
(select
max(json_array_length(json_column)) as max_num
from table_with_json_column )
where
n <= max_num + 1)
select *, json_extract_array_element_text(json_column,nums.n-1) parsed_json
from nums, table_with_json_column
where json_extract_array_element_text(json_column,nums.n-1) != ''
and nums.n <= json_array_length(json_column)
Thanks to answer by Bob Baxley for inspiration
Just improvement for the answer above https://stackoverflow.com/a/31998832/1265306
Is generating numbers table using the following SQL
https://discourse.looker.com/t/generating-a-numbers-table-in-mysql-and-redshift/482
SELECT
p0.n
+ p1.n*2
+ p2.n * POWER(2,2)
+ p3.n * POWER(2,3)
+ p4.n * POWER(2,4)
+ p5.n * POWER(2,5)
+ p6.n * POWER(2,6)
+ p7.n * POWER(2,7)
as number
INTO numbers
FROM
(SELECT 0 as n UNION SELECT 1) p0,
(SELECT 0 as n UNION SELECT 1) p1,
(SELECT 0 as n UNION SELECT 1) p2,
(SELECT 0 as n UNION SELECT 1) p3,
(SELECT 0 as n UNION SELECT 1) p4,
(SELECT 0 as n UNION SELECT 1) p5,
(SELECT 0 as n UNION SELECT 1) p6,
(SELECT 0 as n UNION SELECT 1) p7
ORDER BY 1
LIMIT 100
"ORDER BY" is there only in case you want paste it without the INTO clause and see the results
create a stored procedure that will parse string dynamically and populatetemp table, select from temp table.
here is the magic code:-
CREATE OR REPLACE PROCEDURE public.sp_string_split( "string" character varying )
AS $$
DECLARE
cnt INTEGER := 1;
no_of_parts INTEGER := (select REGEXP_COUNT ( string , ',' ));
sql VARCHAR(MAX) := '';
item character varying := '';
BEGIN
-- Create table
sql := 'CREATE TEMPORARY TABLE IF NOT EXISTS split_table (part VARCHAR(255)) ';
RAISE NOTICE 'executing sql %', sql ;
EXECUTE sql;
<<simple_loop_exit_continue>>
LOOP
item = (select split_part("string",',',cnt));
RAISE NOTICE 'item %', item ;
sql := 'INSERT INTO split_table SELECT '''||item||''' ';
EXECUTE sql;
cnt = cnt + 1;
EXIT simple_loop_exit_continue WHEN (cnt >= no_of_parts + 2);
END LOOP;
END ;
$$ LANGUAGE plpgsql;
Usage example:-
call public.sp_string_split('john,smith,jones');
select *
from split_table
You can try copy command to copy your file into redshift tables
copy table_name from 's3://mybucket/myfolder/my.csv' CREDENTIALS 'aws_access_key_id=my_aws_acc_key;aws_secret_access_key=my_aws_sec_key' delimiter ','
You can use delimiter ',' option.
For more details of copy command options you can visit this page
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

SQL: Delete rows in a table where one field's value is lesser than group average

For now, I am first running the following query:
select group_name, avg(numeric_field) as avg_value, count(group_name) as n from table_name group by group_name order by n desc;
Suppose I get output:
group_name | avg_value | n
----------------------------------------
nice_group_name| 1566.353 | 2034
other_group | 235.43 | 1390
.
.
.
I am then deleting records in each group one by one manually using the following query for each group:
delete from table_name where group_name = 'nice_group_name' and numeric_field < 1567;
Here 1567 is the approximate avg_value for nice_group_name.
How can I run the second query for all rows of the result of first query automatically?
You can use a correlated subquery:
delete from table_name
where numeric_field < (select avg(t2.numeric_field)
from table_name t2
where t2.group_name = table_name.group_name
);
For performance, you want an index on tablename(group_name, numeric_field).
If you have few groups, you might find this more efficient:
with a as (
select group_name, avg(numeric_field) as anf
from table_name
group by group_name
)
delete from table_name
where numeric_field < (select a.anf from a where a.group_name = table_name.group_name);
If table_name has some primary key field (say id) then use the following:
alter table table_name rename to bak;
create temp table avg_val as
select group_name as g, avg(numeric_field) as a from bak
group by group_name;
create table table_name as
select * from bak where id in (
select bak.id from
avg_val join bak on bak.group_name = avg_val.g
where avg_val.a <= bak.numeric_field
);
Check table_name. If all has went well you can delete the backed up old table:
drop table bak;
Briefly, the steps are:
Rename the original table
Create a temporary table of average value for each group
Create a new table with all rows from original table where numeric_field is not less than average for that group.
delete the renamed original table.

How to select all possible values of columns from all tables?

SELECT POM.TABLE_NAME, POM.COLUMN_NAME
FROM ALL_TAB_COLUMNS POM
WHERE POM.COLUMN_NAME LIKE'%STATUS%'
I want to see all possible values in columns on the list(in one row if possible). How can i modify this select to do it?
i want soemthing like this
TABLE_NAME | COLUMN_NAME |VALUES
-----------| ----------- | -------
CAR | COLOR | RED,GREEN
You can use the below query for your requirement. It fetched distinct column values for a table.
It can be used only for the table having limited number of distinct values as I have used LISTAGG function.
SELECT POM.TABLE_NAME, POM.COLUMN_NAME,
XMLTYPE(DBMS_XMLGEN.GETXML('SELECT LISTAGG(COLUMN_NAME,'','') WITHIN GROUP (ORDER BY COLUMN_NAME) VAL
FROM (SELECT DISTINCT '|| POM.COLUMN_NAME ||' COLUMN_NAME
FROM '||POM.OWNER||'.'||POM.TABLE_NAME||')')
).EXTRACT('/ROWSET/ROW/VAL/text()').GETSTRINGVAL() VAL
FROM ALL_TAB_COLUMNS POM
WHERE POM.COLUMN_NAME LIKE'%STATUS%';

SQL how to find rows which have highest value of specific column

For example, the table has columns MYINDEX and NAME.
MYINDEX | NAME
=================
1 | BOB
2 | BOB
3 | CHARLES
Ho do I find row with highest MYINDEX for specific NAME? E.g. I want to find ROW-2 for name "BOB".
SELECT Max(MYINDEX) FROM table WHERE NAME = [insertNameHere]
EDIT: to get the whole row:
Select * //never do this really
From Table
Where MYINDEX = (Select Max(MYINDEX) From Table Where Name = [InsertNameHere]
There are several ways to tackle this one. I'm assuming that there may be other columns that you want from the row, otherwise as others have said, simply name, MAX(my_index) ... GROUP BY name will work. Here are a couple of examples:
SELECT
MT.name,
MT.my_index
FROM
(
SELECT
name,
MAX(my_index) AS max_my_index
FROM
My_Table
GROUP BY
name
) SQ
INNER JOIN My_Table MT ON
MT.name = SQ.name AND
MT.my_index = SQ.max_my_index
Another possible solution:
SELECT
MT1.name,
MT1.my_index
FROM
My_Table MT1
WHERE
NOT EXISTS
(
SELECT *
FROM
My_Table MT2
WHERE
MT2.name = MT1.name AND
MT2.my_index > MT1.my_index
)
SELECT MAX(MYINDEX) FROM table
WHERE NAME = 'BOB'
For the whole row, do:
SELECT * FROM table
WHERE NAME = 'BOB'
AND MyIndex = (SELECT Max(MYINDEX) from table WHERE NAME = 'BOB')
If you wanted to see the highest index for name = 'Bob', use:
SELECT MAX(MYINDEX) AS [MaxIndex]
FROM myTable
WHERE Name = 'Bob'
If you want to skip the inner join, you could do:
SELECT * FROM table WHERE NAME = 'BOB' ORDER BY MYINDEX DESC LIMIT 1;
Use
FROM TABLE SELECT MAX(MYINDEX), NAME GROUP BY NAME