JSON Extract JSON in Metabase SQL - sql

i have this table
id
status
outgoing
1
paid
{"a945248027_14454878":"processing"}
2
unpaid
{"old.a945248027_14454878":"cancelled"}
i am trying to extract the value after underscore i.e 14454878
i tried extracting the keys using this query on metabase
select id, outgoing,
substring(key from '_([^_]+)$') as key
from table,
cross join lateral jsonb_object_keys(outgoing) as j(key);
but i keep getting the error
ERROR: function jsonb_object_keys(json) does not exist Hint: No function matches the given name and argument types. You might need to add explicit type casts. Position: 129
Please help

The column is defined as json but i used a function that expects jsonb.
so i changed Use jsonb_object_keys() to jsonb_object_keys()
select id, outgoing,
substring(key from '_([^_]+)$') as key
from table,
cross join lateral json_object_keys(outgoing) as j(key);

Related

How do I create a new table based on a with query

I have a table that includes a JSON. I want to get a certain parameter of the JSON and remove the € sign out of this JSON (so I get a numeric value that I can sum).
That's the base query that works:
With C as (
SELECT A.identifier,
JSON_VALUE(A.jsonBody,'$.path) as somethingA,
JSON_VALUE(A.jsonBody,'$.path) as somethingB
FROM table A WITH(NOLOCK)
join table B on (A.identifier = B.identifier)
WHERE A.statement = 'x')
select C.identifier,
replace (C.fieldA, '€','') as TotalA,
replace (C.fieldB, '€','') as TotalB
from C)
Unfortunately I am now stuck as I want to get the sum of TotalA per identifier.
As described my query works but I want to get the sum per identifier. So I thought I'd be an idea to simply create (create xxx as (query)) but I get an error.
Anyone has an idea on
how I can improve this query in general (replace + JSON_Value in one commmand?)
how I can get the TotalA, TotalB amount per identifier?

Fuzzy matching a List in SQL

Have a set of data that is riddled with duplicates. The company names are either written as their Workplace name, e.g. Amazon, or the legal name, e.g Amazon.com Inc. Both entries have information I need.
Issue with the name is I am running a subquery to generate the correct list of companies to search for, however the LIKE function only seems to work for a set list.
FROM CRM.organizations
WHERE name LIKE (SELECT org_name FROM CRM.deals WHERE UUID IS NOT NULL AND status = 'won')```
The code above returns the following error: 'Error: Scalar subquery produced more than one element'
Trying to understand if there is a function that can help, or I will need to create a list manually with: 'companyAinc';'companyBllc';....
Well, the LIKE operator doesn't support directly passing a list a values to match with, you can use the CROSS APPLY to map each value to fuzzy match in your statement.
You can refer to this example for the same to use multiple clauses with LIKE operator.
On the other hand you can also try using User-defined functions/routines, in which you can map all your the returned values with the LIKE and OR operators and return your required query as a string.
FROM CRM.organizations
WHERE name in (SELECT org_name FROM CRM.deals WHERE UUID IS NOT NULL AND status = 'won');
FROM CRM.organizations
WHERE exists (SELECT 1 FROM CRM.deals WHERE UUID IS NOT NULL AND status = 'won' and organizations.name like deals.org_name );

SQL Query validation failure on GCP BigQuery with github_repos dataset

I would like to get a list all unique repositories on GutHub by using the following command:
SELECT DISTINCT repo_name FROM `bigquery-public-data.github_repos.commits`
However I get the following error:
Column repo_name of type ARRAY cannot be used in SELECT DISTINCT at [1:17]
In the schema it says repo_name is of type STRING, what am I doing wrong?
repo_name is defined as a "string" with mode "repeated" in the table schema which roughly means an ARRAY of STRING in BigQuery.
https://cloud.google.com/bigquery/docs/nested-repeated
What does REPEATED field in Google Bigquery mean?
As another user posted, in the schema of the bigquery-public-data.github_repos.commits table you can see that the repo_name field is defined as a STRING REPEATED which means that each entry of repo_name is an array constituted by string-type elements. You can see this with the following query:
#standardSQL
SELECT repo_name
FROM `bigquery-public-data.github_repos.commits`
LIMIT 100;
In order to find the distinct repo names you can employ the UNNEST operator to expand each one of the repo_name elements. The following query performs a CROSS JOIN that adds a new field repo_name_single to the table constituted by the individual repository names. This way, the DISTINCT function can be employed.
#standardSQL
SELECT DISTINCT(repo_name_unnest)
FROM `bigquery-public-data.github_repos.commits`
CROSS JOIN UNNEST(repo_name) AS repo_name_unnest;
You can use the below query
SELECT
commit
, repo_name
FROM
`bigquery-public-data.github_repos.commits`,
UNNEST(repo_name) as repo_name
WHERE
commit = 'c87298e36356ac19519a93dee3dfac8ebffe45e8'
Which will give a result like below
Row | commit | repo_name
===================================================================
1 | c87298e36356ac19519a93dee3dfac8ebffe45e8 | noondaysun/sakai
2 | c87298e36356ac19519a93dee3dfac8ebffe45e8 | OpenCollabZA/sakai

Issue with Postgres not recognizing CAST on join

I'm trying to join two tables together based on an ID column. The join is not working successfully because I cannot join a varchar column on an integer column, despite using cast().
In the first table, the ID column is character varying, in the format of: XYZA-123456.
In the second table, the ID column is simply the number: 123456.
-- TABLE 1
create table fake_receivers(id varchar(11));
insert into fake_receivers(id) values
('VR2W-110528'),
('VR2W-113640'),
('VR4W-113640'),
('VR4W-110528'),
('VR2W-110154'),
('VMT2-127942'),
('VR2W-113640'),
('V16X-110528'),
('VR2W-110154'),
('VR2W-110528');
-- TABLE 2
create table fake_stations(receiver_sn integer, station varchar);
insert into fake_stations values
('110528', 'Baff01-01'),
('113640', 'Baff02-02'),
('110154', 'Baff03-01'),
('127942', 'Baff05-01');
My solution is to split the string at the dash, take the number after the dash, and cast it as an integer, so that I may perform the join:
select cast(split_part(id, '-', 2) as integer) from fake_receivers; -- this works fine, seemingly selects integers
However, when I actually attempt to perform the join, I'm getting the following error, despite using an explicit cast:
select cast(split_part(id, '-', 2) as integer), station
from fake_receivers
inner join fake_locations
on split_part = fake_locations.receiver_sn -- not recognizing split_part cast as integer!
>ERROR: operator does not exist: character varying = integer
>Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts.
Strangely enough, I can perform this join with my full dataset (a queried result set shows up) but I then can't manipulate it at all (e.g. sorting, filtering it) - I get an error saying ERROR: invalid input syntax for integer: "UWM". The string "UWM" appears nowhere in my dataset or in my code, but I strongly suspect it has to do with the split_part cast from varchar to integer going wrong somewhere.
-- Misc. info
select version();
>PostgreSQL 10.5 on x86_64-apple-darwin16.7.0, compiled by Apple LLVM version 9.0.0 (clang-900.0.39.2), 64-bit
EDIT: dbfiddle exhibiting behavior
You need to include your current logic directly in the join condition:
select *
from fake_receivers r
inner join fake_stations s
on split_part(r.id, '-', 2)::int = s.receiver_sn;
Demo

How to do calculations on json data in Postgres

I'm storing AdWords report data in Postgres. Each report is stored in a table named Reports, which has a jsonb column named 'data'. Each report has json stored in its 'data' field that looks that looks like this:
[
{
match_type: "exact",
search_query: "gm hubcaps",
conversions: 2,
cost: 1.24
},
{
match_type: "broad",
search_query: "gm auto parts",
conversions: 34,
cost: 21.33
},
{
match_type: "phrase",
search_query: "silverdo headlights",
conversions: 63,
cost: 244.05
}
]
What I want to do is query off these data hashes and sum up the total number of conversions for a given report. I've looked though the Postgresql docs and it looks like you can only really do calculations on hashes, not arrays of hashes like this. Is what I'm trying to do possible in postgres? Do I need to make a temp table out of this array and do calculations off that? Or can I use a stored procedure?
I'm using Postgresql 9.4
EDIT
The reason I'm not just using a regular, normalized table is that this is just one example of how report data could be structured. In my project, reports have to allow arbitrary keys, because they are populated by users uploading CSV's with any columns they like. It's basically just a way to get around having arbitrarily many, user-created tables.
What I want to do is query off these data hashes and sum up the conversions
The fastest way should be with jsonb_populate_recordset(). But you need a registered row type for it.
CREATE TEMP TABLE report_data (
-- match_type text -- commented out, because we only need ..
-- , search_query text -- .. conversions for this query
conversions int
-- , cost numeric
);
A temp table is one way to register a row type ad-hoc. More explanation in this related answer:
jsonb query with nested objects in an array
Assuming a table report with report_id as PK for lack of inforamtion.
SELECT r.report_id, sum(d.conversions) AS sum_conversions
FROM report r
LEFT JOIN LATERAL jsonb_populate_recordset(null::report_data, r.data) d ON true
-- WHERE r.report_id = 12345 -- only for given report?
GROUP BY 1;
The LEFT JOIN ensures you get a result, even if data is NULL or empty or the JSON array is empty.
For a sum from a single row in the underlying table, this is faster:
SELECT d.sum_conversions
FROM report r
LEFT JOIN LATERAL (
SELECT sum(conversions) AS sum_conversions
FROM jsonb_populate_recordset(null::report_data, r.data)
) d ON true
WHERE r.report_id = 12345; -- enter report_id here
Alternative with jsonb_array_elements() (no need for a registered row type):
SELECT d.sum_conversions
FROM report r
LEFT JOIN LATERAL (
SELECT sum((value->>'conversions')::int) AS sum_conversions
FROM jsonb_array_elements(r.data)
) d ON true
WHERE r.report_id = 12345; -- enter report_id here
Normally you would implement this as plain, normalized table. I don't see the benefit of JSON here (except that your application seems to require it, like you added).
You could use unnest:
select sum(conv) from
(select d->'conversion' as conv from
(select unnest(data) as d from <your table>) all_data
) all_conv
Disclaimer: I don't have Pg 9.2 so I couldn't test it myself.
EDIT: this is assuming that the array you mentioned is a Postgresql array, i.e. that the data type of your data column is character varying[]. If you mean the data is a json array, you should be able to use json_array_elements instead of unnest.