Clean a JSON in a PostGreSQL request - sql

I have a SQL request that is almost perfect (for what I want to do):
WITH liste_fichiers_joints AS (
SELECT
id_dans_table,
ARRAY_AGG (row_to_json(f)) ids_fichier
FROM
fichiers_joints fj
LEFT JOIN fichiers f ON f.id = fj.id_fichier
WHERE
nom_table = 'taches'
GROUP BY
id_dans_table
)
SELECT t.id, t.nom, lfj.ids_fichier
FROM taches t
JOIN liste_fichiers_joints lfj ON lfj.id_dans_table = t.id
As you may have guessed, I'd like to get in the same request getting all the tasks: the id of a task, the name of the task but also in an array all the ids and names of the attached files if there are any.
The result is nearly what I want, but the last column displays this:
{"{\"uuid\":\"fd809b1f-6849-4322-a654-67f70c46a435\",\"nom\":\"test.png\",\"date\":\"2020-11-17T01:21:24.223354\",\"status\":\"TMP\",\"id\":185}"}
I'd like to remove the uuid and status parts, I tried some subrequests, up to no avail.
Also, I'd like to remove the backslashes \, because otherwise it will be complicated to use this column as a JSON in my Javascript.
Does anybody has a clue?
Thanks in advance.

You can use json[b]_build_object() instead of row_to_json[b](): it accepts a list of key/value pairs, so you have fine-grained control about what is going into your objects.
Also, you most likely want a JSON array, rather than a Postgres array of JSON objects.
I would recommend changing this:
ARRAY_AGG (row_to_json(f)) ids_fichier
To:
jsonb_agg(
jsonb_build_object('nom', f.nom, 'date', f.date, 'id', f.id)
) as ids_fichier

Related

How can I get the last element of an array? SQL Bigquery

I'm working on building a follow-network form Github's available data on Google BigQuery, e.g.: https://bigquery.cloud.google.com/table/githubarchive:day.20210606
The key data is contained in the "payload" field, STRING type. I managed to unnest the data contained in that field and convert it to an array, but how can I get the last element?
Here is what I have so far...
select type,
array(select trim(val) from unnest(split(trim(payload, '[]'))) val) payload
from `githubarchive.day.20210606`
where type = 'MemberEvent'
Which outputs:
How can I get only the last element, "Action":"added"} ?
I know that
select array_reverse(your_array)[offset(0)]
should do the trick, however I'm unsure how to combine that in my code. I've been trying different options without success, for example:
with payload as ( select array(select trim(val) from unnest(split(trim(payload, '[]'))) val) payload from `githubarchive.day.20210606`)
select type, ARRAY_REVERSE(payload)[ORDINAL(1)]
from `githubarchive.day.20210606` where type = 'MemberEvent'
The desired output should look like:
To get last element in array you can use below approach
select array_reverse(your_array)[offset(0)]
I'm unsure how to combine that in my code
select type, array_reverse(array(
select trim(val)
from unnest(split(trim(payload, '[]'))) val
))[offset(0)]
from `githubarchive.day.20210606`
where type = 'MemberEvent'
There is a solution without reversing the array.
SELECT event[OFFSET(ARRAY_LENGTH(event)-1)

Trying to explode an array with unnest() in Presto and failing due to extra column

I have data from a query that looks like this:
SELECT
model_features
FROM some_db
which returns:
{
"food1": 0.65892159938812,
"food2": 0.90786880254745,
"food3": 0.88357985019684,
"food4": 0.99999821186066,
"food5": 0.99237471818924,
"food6": 0.62127977609634
}
{
"food4": 0.9999965429306,
"text1": 0.82206630706787
}
...
etc.
What I am eventually trying to do is simply get a count of each of the "food1", "food2" features,
but to do so (i think) I need to trim out the unnecessary numeric data. I'm at a loss as to how to do this, as everytime I try to simply unnest
SELECT
t.concepts
FROM some_db
CROSS JOIN UNNEST(model_features) AS t(concepts)
I get this error:
Column alias list has 1 entries but 't' has 2 columns available
Anyone mind pointing me in the right direction?
Solved this for myself: the issue was I needed to avoid dropping the second column of information in order for the query to execute. This may not be the canonical best way to approach, but it worked:
SELECT
t.concepts,
t.probabilities
FROM some_db
CROSS JOIN UNNEST(model_features) AS t(concepts,probabilities)

SQL Server extract first array element from JSON

I have json stored in one of the columns in SQL Server and I need to modify it to remove the square brackets from it. The format is as below. Can't seem to find a good way of doing it.
[ { "Message":"Info: this is some message here.", "Active":true } ]
One way is to do it using below query, but this query is very very slow and I need to run on a very large set of data.
select a.value
from dbo.testjson e
cross apply OPENJSON(e.jsontext) as a
where isjson(e.jsontext) = 1
The only other way I can think of is just doing string manipulation but it can be error prone. Could someone help with this?
Ok, figured it out:
select
json_query(
'[{"Message":"Info: this is some message here.","Active":true}]',
'$[0]'
)
This will return the inner message.
You should add the property name, in this case Message, in order to get only that part. Keep in mind that it's case sensitive. Something like;
select json_value('[{"Message":"Info: this is some message here.","Active":true}]', '$[0].Message')

Replace Asterisk(*) with "anything" in SQL

I am having a tons of URL's in my database and want to filter them by user-defined string in format something/*/something, where * stands for "anything". So when user defines checkout/*/complete, it means it filters out url's like:
http://my_url.com/checkout/15/complete
http://my_url.com/checkout/85/complete
http://my_url.com/checkout/something/complete
http://my_url.com/super/checkout/something/complete
etc.
How do I do that in SQL? Or should I filter out all the results and use PHP to do the job?
My SQL request now is
SELECT * FROM custom_logs WHERE pn='$webPage' AND id IN ( SELECT MAX(id) FROM custom_logs WHERE action_clicked_text LIKE '%{$text_value_active}%' GROUP BY token ) order by action_timestamp desc
This filters out all the log messages with user-defined text in column action_clicked_text, but uses LIKE statement, which will not work with * inside.
You want like. Either:
where url like '%checkout/%/complete%'
to get the urls that match he pattern. Or:
where url not like '%checkout/%/complete%'
to get the other urls.

Django select only rows with duplicate field values

suppose we have a model in django defined as follows:
class Literal:
name = models.CharField(...)
...
Name field is not unique, and thus can have duplicate values. I need to accomplish the following task:
Select all rows from the model that have at least one duplicate value of the name field.
I know how to do it using plain SQL (may be not the best solution):
select * from literal where name IN (
select name from literal group by name having count((name)) > 1
);
So, is it possible to select this using django ORM? Or better SQL solution?
Try:
from django.db.models import Count
Literal.objects.values('name')
.annotate(Count('id'))
.order_by()
.filter(id__count__gt=1)
This is as close as you can get with Django. The problem is that this will return a ValuesQuerySet with only name and count. However, you can then use this to construct a regular QuerySet by feeding it back into another query:
dupes = Literal.objects.values('name')
.annotate(Count('id'))
.order_by()
.filter(id__count__gt=1)
Literal.objects.filter(name__in=[item['name'] for item in dupes])
This was rejected as an edit. So here it is as a better answer
dups = (
Literal.objects.values('name')
.annotate(count=Count('id'))
.values('name')
.order_by()
.filter(count__gt=1)
)
This will return a ValuesQuerySet with all of the duplicate names. However, you can then use this to construct a regular QuerySet by feeding it back into another query. The django ORM is smart enough to combine these into a single query:
Literal.objects.filter(name__in=dups)
The extra call to .values('name') after the annotate call looks a little strange. Without this, the subquery fails. The extra values tricks the ORM into only selecting the name column for the subquery.
try using aggregation
Literal.objects.values('name').annotate(name_count=Count('name')).exclude(name_count=1)
In case you use PostgreSQL, you can do something like this:
from django.contrib.postgres.aggregates import ArrayAgg
from django.db.models import Func, Value
duplicate_ids = (Literal.objects.values('name')
.annotate(ids=ArrayAgg('id'))
.annotate(c=Func('ids', Value(1), function='array_length'))
.filter(c__gt=1)
.annotate(ids=Func('ids', function='unnest'))
.values_list('ids', flat=True))
It results in this rather simple SQL query:
SELECT unnest(ARRAY_AGG("app_literal"."id")) AS "ids"
FROM "app_literal"
GROUP BY "app_literal"."name"
HAVING array_length(ARRAY_AGG("app_literal"."id"), 1) > 1
Ok, so for some reason none of the above worked for, it always returned <MultilingualQuerySet []>. I use the following, much easier to understand but not so elegant solution:
dupes = []
uniques = []
dupes_query = MyModel.objects.values_list('field', flat=True)
for dupe in set(dupes_query):
if not dupe in uniques:
uniques.append(dupe)
else:
dupes.append(dupe)
print(set(dupes))
If you want to result only names list but not objects, you can use the following query
repeated_names = Literal.objects.values('name').annotate(Count('id')).order_by().filter(id__count__gt=1).values_list('name', flat='true')