Retrieving JSON elements that don't share a key - sql

I have a JSON that looks like this:
"names": {
"jack": {
"probability": 0
},
"bob": {
"probability": 0.5
},
"ana": {
"probability": 0.2
}
"bill": {
}
And I want to query it to get a table of names and probabilities, and in case there is no value of probability, get 0.
So the output should look like this:
name probability
jack 0
bob 0.5
ana 0.2
bill 0
Couldn't find a way to query the json in a proper way to get these results. I am using Postgresql.

You are looking for the jsonb_each() function:
select n.name, coalesce(val ->> 'probability', '0')::numeric
from the_table t
cross join jsonb_each( t.col -> 'names') as n(name, val)
If you column isn't jsonb (which it should), but json, you have to use json_each() instead.
Online example

Related

Cannot parse SQL result count from Logic App

I run this simple query in Logic App using the "Execute a SQL query (V2)" connector to find out if a number exists in my table.
select count(*) from users where user_number='724-555-5555';
If the number exist, I get this JSON , but somehow I cant parse it.
[
{
"": 1
}
]
Any idea how to simply retrieve 0 or 1 ?
Thanks
David
You need to add an explicit column name:
SELECT
count(*) AS cnt
FROM
users
WHERE
user_number = '724-555-5555';
That will give you this result:
[ { "cnt": 1 } ]
...which is valid JSON.

Select documents from Fauna collection between two dates AND that satisfy another criteria

I am able to use the following code to retrieve documents from a Fauna collection that have a date which falls between start and end dates:
Paginate(Range(Match(Index("orders_by_date")) , start, end))
Is it possible to add another criteria to this statement to retrieve not only, in this case, orders between two dates but also have the field status = "completed".
Thank you
You can create an index that way:
CreateIndex(
{
name:'orders_by_date_status',
source:Collection("orders"),
terms: [{field:['data','status']}],
values:[{field:['data','order_date']},{field:['ref']}]
}
)
and query your collection with a query like this:
Paginate(
Range(
Match('orders_by_date_status','completed'),
[Date("2020-03-20")],
[Date("2020-06-20")]
)
)
to get back something like this:
{
data: [
[Date("2020-05-20"), Ref(Collection("orders"), "285246145700037121")],
[Date("2020-06-20"), Ref(Collection("orders"), "285246152717107713")]
]
}
Hope this answers your question.
Luigi

complex couchbase query using metadata & group by

I am new to Couchbase and kind a stuck with the following problem.
This query works just fine in the Couchbase Query Editor:
SELECT
p.countryCode,
SUM(c.total) AS total
FROM bucket p
USE KEYS (
SELECT RAW "p::" || ca.token
FROM bucket ca USE INDEX (idx_cr)
WHERE ca._class = 'backend.db.p.ContactsDo'
AND ca.total IS NOT MISSING
AND ca.date IS NOT MISSING
AND ca.token IS NOT MISSING
AND ca.id = 288
ORDER BY ca.total DESC, ca.date ASC
LIMIT 20 OFFSET 0
)
LEFT OUTER JOIN bucket finished_contacts
ON KEYS ["finishedContacts::" || p.token]
GROUP BY p.countryCode ORDER BY total DESC
I get this:
[
{
"countryCode": "en",
"total": 145
},
{
"countryCode": "at",
"total": 133
},
{
"countryCode": "de",
"total": 53
},
{
"countryCode": "fr",
"total": 6
}
]
Now, using this query in a spring-boot application i end up with this error:
Unable to retrieve enough metadata for N1QL to entity mapping, have you selected _ID and _CAS?
adding metadata,
SELECT
meta(p).id AS _ID,
meta(p).cas AS _CAS,
p.countryCode,
SUM(c.total) AS total
FROM bucket p
trying to map it to the following object:
data class CountryIntermediateRankDo(
#Id
#Field
val id: String,
#Field
#NotNull
val countryCode: String,
#Field
#NotNull
val total: Long
)
results in:
Unable to execute query due to the following n1ql errors:
{“msg”:“Expression must be a group key or aggregate: (meta(p).id)“,”code”:4210}
Using Map as return value results in:
org.springframework.data.couchbase.core.CouchbaseQueryExecutionException: Query returning a primitive type are expected to return exactly 1 result, got 0
Clearly i missed something important here in terms of how to write proper Couchbase queries. I am stuck between needing metadata and getting this key/aggregate error that relates to the GROUP BY clause. I'd be very thankful for any help.
When you have a GROUP BY query, everything in the SELECT clause should be either a field used for grouping or a group aggregate. You need to add the new fields into the GROUP by statement, sort of like this:
SELECT
_ID,
_CAS,
p.countryCode,
SUM(p.c.total) AS total
FROM testBucket p
USE KEYS ["foo", "bar"]
LEFT OUTER JOIN testBucket finished_contacts
ON KEYS ["finishedContacts::" || p.token]
GROUP BY p.countryCode, meta(p).id AS _ID, meta(p).cas AS _CAS
ORDER BY total DESC
(I had to make some changes to your query to work with it effectively. You'll need to retrofit the advice to your specific case.)
If you need more detailed advice, let me suggest the N1QL forum https://forums.couchbase.com/c/n1ql . StackOverflow is great for one-and-done questions, but the forum is better for extended interactions.

How to query and iterate over array of structures in Athena (Presto)?

I have a S3 bucket with 500,000+ json records, eg.
{
"userId": "00000000001",
"profile": {
"created": 1539469486,
"userId": "00000000001",
"primaryApplicant": {
"totalSavings": 65000,
"incomes": [
{ "amount": 5000, "incomeType": "SALARY", "frequency": "FORTNIGHTLY" },
{ "amount": 2000, "incomeType": "OTHER", "frequency": "MONTHLY" }
]
}
}
}
I created a new table in Athena
CREATE EXTERNAL TABLE profiles (
userId string,
profile struct<
created:int,
userId:string,
primaryApplicant:struct<
totalSavings:int,
incomes:array<struct<amount:int,incomeType:string,frequency:string>>,
>
>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://profile-data'
I am interested in the incomeTypes, eg. "SALARY", "PENSIONS", "OTHER", etc.. and ran this query changing jsonData.incometype each time:
SELECT jsonData
FROM "sampledb"."profiles"
CROSS JOIN UNNEST(sampledb.profiles.profile.primaryApplicant.incomes) AS la(jsonData)
WHERE jsonData.incometype='SALARY'
This worked fine with CROSS JOIN UNNEST which flattened the incomes array so that the data example above would span across 2 rows. The only idiosyncratic thing was that CROSS JOIN UNNEST made all the field names lowercase, eg. a row looked like this:
{amount=1520, incometype=SALARY, frequency=FORTNIGHTLY}
Now I have been asked how many users have two or more "SALARY" entries, eg.
"incomes": [
{ "amount": 3000, "incomeType": "SALARY", "frequency": "FORTNIGHTLY" },
{ "amount": 4000, "incomeType": "SALARY", "frequency": "MONTHLY" }
],
I'm not sure how to go about this.
How do I query the array of structures to look for duplicate incomeTypes of "SALARY"?
Do I have to iterate over the array?
What should the result look like?
UNNEST is a very powerful feature, and it's possible to solve this problem using it. However, I think using Presto's Lambda functions is more straight forward:
SELECT COUNT(*)
FROM sampledb.profiles
WHERE CARDINALITY(FILTER(profile.primaryApplicant.incomes, income -> income.incomeType = 'SALARY')) > 1
This solution uses FILTER on the profile.primaryApplicant.incomes array to get only those with an incomeType of SALARY, and then CARDINALITY to extract the length of that result.
Case sensitivity is never easy with SQL engines. In general I think you should not expect them to respect case, and many don't. Athena in particular explicitly converts column names to lower case.
You can combine filter with cardinality to filter array elements having incomeType = 'SALARY' more than once.
This can be further improve so that intermediate array is not materialized by using reduce (see examples in the docs; I'm not quoting them here, since they do not directly answer your question).

Working with Structs within Arrays for new BigQuery Standard SQL

I'm trying to find rows with duplicate fields in an array of structs within a Google BigQuery table, using the new Standard SQL. The data in the table (simplified) where each row looks a bit like this:
{
"Session": "abc123",
"Information" [
{
"Identifier": "e8d971a4-ef33-4ea1-8627-f1213e4c67dc"
},
{
"Identifier": "1c62813f-7ec4-4968-b18b-d1eb8f4d9d26"
},
{
"Identifier": "e8d971a4-ef33-4ea1-8627-f1213e4c67dc"
}
]
}
My end goal is to display the rows that have Information entities with duplicate Identifier values present. However, most of the queries I attempt get an error message of the following form:
Cannot access field Identifier on a value with type ARRAY<STRUCT<Identifier STRING>>
Is there a way to work with the data inside of a STRUCT within an ARRAY?
Here's my first attempt at a query:
SELECT
Session,
Information
FROM
`events.myevents`
WHERE
COUNT(DISTINCT Information.Identifier) != ARRAY_LENGTH(Information.Identifier)
LIMIT
1000
And another using a subquery:
SELECT
Session,
Information
FROM (
SELECT
Session,
Information,
COUNT(DISTINCT Information.Identifier) AS info_count_distinct,
ARRAY_LENGTH(Information) AS info_count
FROM
`events.myevents`
WHERE
COUNT(DISTINCT Information.Identifier) != ARRAY_LENGTH(Information.Identifier)
LIMIT
1000)
WHERE
info_count != info_count_distinct
Try below
SELECT Session, Identifier, COUNT(1) AS dups
FROM `events.myevents`, UNNEST(Information)
GROUP BY Session, Identifier
HAVING dups > 1
ORDER BY Session
Should give you what you expect plus number of dups.
Like below (example)
Session Identifier dups
abc123 e8d971a4-ef33-4ea1-8627-f1213e4c67dc 2
abc345 1c62813f-7ec4-4968-b18b-d1eb8f4d9d26 3