Trying to explode an array with unnest() in Presto and failing due to extra column - sql

I have data from a query that looks like this:
SELECT
model_features
FROM some_db
which returns:
{
"food1": 0.65892159938812,
"food2": 0.90786880254745,
"food3": 0.88357985019684,
"food4": 0.99999821186066,
"food5": 0.99237471818924,
"food6": 0.62127977609634
}
{
"food4": 0.9999965429306,
"text1": 0.82206630706787
}
...
etc.
What I am eventually trying to do is simply get a count of each of the "food1", "food2" features,
but to do so (i think) I need to trim out the unnecessary numeric data. I'm at a loss as to how to do this, as everytime I try to simply unnest
SELECT
t.concepts
FROM some_db
CROSS JOIN UNNEST(model_features) AS t(concepts)
I get this error:
Column alias list has 1 entries but 't' has 2 columns available
Anyone mind pointing me in the right direction?

Solved this for myself: the issue was I needed to avoid dropping the second column of information in order for the query to execute. This may not be the canonical best way to approach, but it worked:
SELECT
t.concepts,
t.probabilities
FROM some_db
CROSS JOIN UNNEST(model_features) AS t(concepts,probabilities)

Related

SQL Server extract first array element from JSON

I have json stored in one of the columns in SQL Server and I need to modify it to remove the square brackets from it. The format is as below. Can't seem to find a good way of doing it.
[ { "Message":"Info: this is some message here.", "Active":true } ]
One way is to do it using below query, but this query is very very slow and I need to run on a very large set of data.
select a.value
from dbo.testjson e
cross apply OPENJSON(e.jsontext) as a
where isjson(e.jsontext) = 1
The only other way I can think of is just doing string manipulation but it can be error prone. Could someone help with this?
Ok, figured it out:
select
json_query(
'[{"Message":"Info: this is some message here.","Active":true}]',
'$[0]'
)
This will return the inner message.
You should add the property name, in this case Message, in order to get only that part. Keep in mind that it's case sensitive. Something like;
select json_value('[{"Message":"Info: this is some message here.","Active":true}]', '$[0].Message')

Using JOIN in Query within MS Access 2016 for Fields in the Long Text Format

I have two queries which are almost identical. The only difference is the format of the fields being joined. One works, the other doesn't.
The query which JOINs two Integer fields works perfectly.
The query which JOINs two Long Text fields produces the following error:
"Cannot join on Memo, OLE, or Hyperlink Object (alarmlogwithstring2.[Tag_Value]=ECLString.[Tag_Value])."
Functional Query:
SELECT alarmlogwithdescs.TableIndex, alarmlogwithdescs.Date_Stamp, alarmlogwithdescs.Time_Stamp, alarmlogwithdescs.Tag_Name, alarmlogwithdescs.Tag_Value, ErrorCodeLookup.ErrorDescription
FROM ErrorCodeLookup INNER JOIN alarmlogwithdescs ON ErrorCodeLookup.[Tag_Value] = alarmlogwithdescs.[Tag_Value]
ORDER BY alarmlogwithdescs.TableIndex;
Nonfunctional Query:
SELECT alarmlogwithstring2.TableIndex, alarmlogwithstring2.Date_Stamp, alarmlogwithstring2.Time_Stamp, alarmlogwithstring2.Tag_Value, ECLString.ErrorDescription
FROM alarmlogwithstring2 INNER JOIN ECLString ON alarmlogwithstring2.[Tag_Value] = ECLString.[Tag_Value]
ORDER BY alarmlogwithstring2.TableIndex;
What I've Tried:
1.) I swapped the table following "FROM" to be ECLString with all necessary changes that should follow. (i.e. Then, after INNER JOIN I changed ECLString to be alarmlogwithstring2, etc...) This makes the two queries more identical, but shouldn't have an effect on the outcome. I did the same for the functional query just to be sure. The functional one still worked and the nonfunctional one still does not...
2.) I tried making my lookup table's Tag_Value field Short Text while keeping the actual data table's Tag_Value field Long Text. No effect.
3.) I tried changing the JOIN type when creating the relationship between the two tables. No effect.
4.) Changed alarmlogwithstring2.[Tag_Value]=ECLString.[Tag_Value]
to CAST(alarmlogwithstring2.[Tag_Value] AS varchar(max)) = CAST(ECLString.[Tag_Value] AS varchar(max)) and get the following error:
"Syntax error (missing operator) in query expression CAST(alarmlogwithstring2.[Tag_Value] AS varchar(max)) = CAST(ECLString.[Tag_Value] AS varchar(max))."
For whatever reason, after clicking "Ok" to close the error message the comma following SELECT alarmlogwithstring2.TableIndex, is highlighted, suggesting the missing operator is there. Okay?
Any help would be greatly appreciated. Thank you for your time!
Got it! Works for my situation, at least. Any other method for doing this would still be appreciated.
This works for me because my Tag_Value field contains text such as "Error0, Error1, Error2," etc...
So, I used the following code:
SELECT alarmlogwithstring2.TableIndex, alarmlogwithstring2.Date_Stamp, alarmlogwithstring2.Time_Stamp, alarmlogwithstring2.Tag_Value, ECLString.ErrorDescription
FROM alarmlogwithstring2 INNER JOIN ECLString ON Right( alarmlogwithstring2.[Tag_Value] , 1) = Right(ECLString.[Tag_Value], 1)
ORDER BY alarmlogwithstring2.TableIndex;
This works because of the integer on the end of my Tag_Value text. Using the Right(string,length) function causes only the integers within each value to be compared as they're all on the right-side of the value.
If your situation is similar to mine, then the code above is fine; however, if your number of error codes (or whatever) gets into the double digits, be sure to reflect this in the fields of both tables. (i.e. Make Error0 => Error00, make Error1 => Error01, etc...) within both tables and use Right(string,2) instead of Right(string,1). [Seems obvious, but may not be for everyone.]
However, this will NOT always be the case for me and everyone else. Someone may have pure text, for example. Thus, again, if you know of another, more general, solution, please, do let me know and I'll make your answer the answer for this question.
Thanks!
Got it. See below for general solution. It uses StrComp(string1,string2)=0 to match strings.
SELECT alarmlogwithstring2.TableIndex, alarmlogwithstring2.Date_Stamp, alarmlogwithstring2.Time_Stamp, alarmlogwithstring2.Tag_Name, alarmlogwithstring2.Tag_Value, ECLString.ErrorDescription
FROM alarmlogwithstring2 INNER JOIN ECLString ON StrComp(alarmlogwithstring2.[Tag_Value], ECLString.[Tag_Value]) = 0
ORDER BY alarmlogwithstring2.TableIndex;

How to custom sort this data in SQL Server 2012?

I have some hard time figuring it out how to custom sort data below the way I want to. Meaning it should be in order like this:
201-1-1
201-1-2
201-1-3
.......
201-2-1
and so on if you know what I mean.
Instead I'm getting this sort executing below code:
select *
from test.dbo.accounts
order by account_name asc
Output:
201-10-1
201-10-2
201-1-1
201-11-1
201-11-2
201-11-3
201-11-4
201-11-6
201-1-2
201-12-1
201-12-2
201-12-3
201-12-4
201-12-6
201-1-3
201-13-1
201-13-2
201-13-3
201-13-4
201-13-6
201-1-4
201-14-1
201-14-2
201-14-4
201-14-6
201-15-1
201-15-2
201-15-3
201-15-4
201-15-6
201-1-6
201-16-1
201-16-2
201-16-3
201-16-4
201-16-6
201-16-7
201-1-7
201-17-1
201-17-2
201-17-4
201-17-6
201-18-1
201-18-2
201-18-3
201-18-4
201-18-6
201-19-1
Thanks
For your sample data, this following trick will work:
order by len(account_name), account_name
This only works because the only variable length component is the second component and because the hyphen is "smaller" than digits.
You should normalize the accounts names so all the components are the same length, by left padding the numbers with zeros.
Ugh. String manipulation in SQL can be extremely cumbersome. There might be a better way to do this, but this does seem to work.
select accoutn_name
from test.dbo.accounts
order by left(account_name,charindex('-',account_name,1)-1)
,replace(right(left(account_name,CHARINDEX('-',account_name,1)+2),2),'-', '')
,REPLACE(right(account_name,2),'-','')
BTW, this is a very expensive process to run. If it's productionalized, you'll want to come up with a better solution.

PostgreSQL view won't work - column doesnt exist

Hi I am trying to migrate an access database into postgresql and everything was going well until i tried this view. I am wanting it to create a new column called 'CalculatedHours'. And as Im new to postgresql I am slightly confused. Heres the code that I keep putting into pgAdmin and getting the error...
SELECT "SessionsWithEnrolmentAndGroups"."SessionID",
"Assignments"."Staff",
"SessionsWithEnrolmentAndGroups"."groups",
"SessionsWithEnrolmentAndGroups"."SessionQty",
"SessionsWithEnrolmentAndGroups"."Hours",
"SessionsWithEnrolmentAndGroups"."Weeks",
"Assignments"."Percentage",
"Assignments"."AdditionalHours",
Round((coalesce(("groups"),1)*("SessionQty")*("Hours")*("Weeks")
*("Percentage"))) AS CalculatedHours,
(CalculatedHours)+coalesce(("AdditionalHours"),0) AS "TotalHours"
FROM "SessionsWithEnrolmentAndGroups"
INNER JOIN "Assignments"
ON "SessionsWithEnrolmentAndGroups"."SessionID" = "Assignments"."SessionID";
You cannot access column aliases in the same select where they are defined. I would suggest a subquery:
SELECT t.*,
(CalculatedHours)+coalesce(("AdditionalHours"), 0) AS "TotalHours"
FROM (SELECT eag."SessionID", a, eag."groups", eag."SessionQty",
eag."Hours", eag."Weeks", a."Percentage", a."AdditionalHours",
Round((coalesce(("groups"),1)*("SessionQty")*("Hours")*("Weeks")*("Percentage"))) AS CalculatedHours
FROM "SessionsWithEnrolmentAndGroups" eag INNER JOIN
"Assignments" a
ON eag."SessionID" = a."SessionID"
) t;
Your queries would also be much more readable using table aliases and getting rid of the escape characters (double quotes) unless they are really, really needed.

Django select only rows with duplicate field values

suppose we have a model in django defined as follows:
class Literal:
name = models.CharField(...)
...
Name field is not unique, and thus can have duplicate values. I need to accomplish the following task:
Select all rows from the model that have at least one duplicate value of the name field.
I know how to do it using plain SQL (may be not the best solution):
select * from literal where name IN (
select name from literal group by name having count((name)) > 1
);
So, is it possible to select this using django ORM? Or better SQL solution?
Try:
from django.db.models import Count
Literal.objects.values('name')
.annotate(Count('id'))
.order_by()
.filter(id__count__gt=1)
This is as close as you can get with Django. The problem is that this will return a ValuesQuerySet with only name and count. However, you can then use this to construct a regular QuerySet by feeding it back into another query:
dupes = Literal.objects.values('name')
.annotate(Count('id'))
.order_by()
.filter(id__count__gt=1)
Literal.objects.filter(name__in=[item['name'] for item in dupes])
This was rejected as an edit. So here it is as a better answer
dups = (
Literal.objects.values('name')
.annotate(count=Count('id'))
.values('name')
.order_by()
.filter(count__gt=1)
)
This will return a ValuesQuerySet with all of the duplicate names. However, you can then use this to construct a regular QuerySet by feeding it back into another query. The django ORM is smart enough to combine these into a single query:
Literal.objects.filter(name__in=dups)
The extra call to .values('name') after the annotate call looks a little strange. Without this, the subquery fails. The extra values tricks the ORM into only selecting the name column for the subquery.
try using aggregation
Literal.objects.values('name').annotate(name_count=Count('name')).exclude(name_count=1)
In case you use PostgreSQL, you can do something like this:
from django.contrib.postgres.aggregates import ArrayAgg
from django.db.models import Func, Value
duplicate_ids = (Literal.objects.values('name')
.annotate(ids=ArrayAgg('id'))
.annotate(c=Func('ids', Value(1), function='array_length'))
.filter(c__gt=1)
.annotate(ids=Func('ids', function='unnest'))
.values_list('ids', flat=True))
It results in this rather simple SQL query:
SELECT unnest(ARRAY_AGG("app_literal"."id")) AS "ids"
FROM "app_literal"
GROUP BY "app_literal"."name"
HAVING array_length(ARRAY_AGG("app_literal"."id"), 1) > 1
Ok, so for some reason none of the above worked for, it always returned <MultilingualQuerySet []>. I use the following, much easier to understand but not so elegant solution:
dupes = []
uniques = []
dupes_query = MyModel.objects.values_list('field', flat=True)
for dupe in set(dupes_query):
if not dupe in uniques:
uniques.append(dupe)
else:
dupes.append(dupe)
print(set(dupes))
If you want to result only names list but not objects, you can use the following query
repeated_names = Literal.objects.values('name').annotate(Count('id')).order_by().filter(id__count__gt=1).values_list('name', flat='true')