BigQuery IF condition then append value into Array - Standard SQL - google-bigquery

In BQ (Standard SQL) I would like to Append a value into an existing Array IF a condition is satisfied
example
IF (REGEXP_CONTAINS(prodTitle, r'(?i)ecksofa'),ARRAY_CONCAT(prodcategory, ("1102")))
is this correct and efficient?
can I use multiple IFs and ARRAY_CONCAT in the same Query?
example
IF (REGEXP_CONTAINS(prodTitle, r'(?i)ecksofa'),ARRAY_CONCAT(prodcategory, ("1102")))
IF (REGEXP_CONTAINS(prodTitle, r'(?i)blablan'),ARRAY_CONCAT(prodcategory, ("1103")))

Guess your purpose is like below for single IF (corrected your expression a little bit):
IF (REGEXP_CONTAINS(prodTitle, r'(?i)ecksofa'),
ARRAY_CONCAT(prodcategory, ["1102"]),
prodcategory)
In order to chain multiple IF and concat the output array, I would use SQL like below:
ARRAY_CONCAT(prodcategory,
IF (REGEXP_CONTAINS(prodTitle, r'(?i)ecksofa'), ["1102"], []),
IF (REGEXP_CONTAINS(prodTitle, r'(?i)blablan'), ["1103"], []),
...
)
To be more efficient, it is better to replace
REGEXP_CONTAINS(prodTitle, r'(?i)ecksofa')
=>
STRPOS(LOWER(prodTitle), 'ecksofa') != 0

Related

Check if value exists exactly like that with SQL query (regexp)?

Example json in the socialMedia database column
[{"id":"1463dae5-1168-432e-8e55-c61820d69c49","value":"person2"},
{"id":"c61820d69c49-8e55-432e-8e55-8e55","value":"person1"}]
I want to run a query to check if "value":"person1" or "value:"person2" or something else like that exists in the json field. Is that possible to do with regexp or something?
Using JSON_SEARCH
App\Models\User::whereRaw('JSON_SEARCH(users.name, "all", "%person%")')->get();
Or using JSON_EXTRACT
App\Models\User::where(DB::raw('JSON_EXTRACT(`name`, "$.*")'), 'LIKE', '%person%')->get();

Passing Optional List argument from Django to filter with in Raw SQL

When using primitive types such as Integer, I can without any problems do a query like this:
with connection.cursor() as cursor:
cursor.execute(sql='''SELECT count(*) FROM account
WHERE %(pk)s ISNULL OR id %(pk)s''', params={'pk': 1})
Which would either return row with id = 1 or it would return all rows if pk parameter was equal to None.
However, when trying to use similar approach to pass a list/tuple of IDs, I always produce a SQL syntax error when passing empty/None tuple, e.g. trying:
with connection.cursor() as cursor:
cursor.execute(sql='''SELECT count(*) FROM account
WHERE %(ids)s ISNULL OR id IN %(ids)s''', params={'ids': (1,2,3)})
works, but passing () produces SQL syntax error:
psycopg2.ProgrammingError: syntax error at or near ")"
LINE 1: SELECT count(*) FROM account WHERE () ISNULL OR id IN ()
Or if I pass None I get:
django.db.utils.ProgrammingError: syntax error at or near "NULL"
LINE 1: ...LECT count(*) FROM account WHERE NULL ISNULL OR id IN NULL
I tried putting the argument in SQL in () - (%(ids)s) - but that always breaks one or the other condition. I also tried playing around with pg_typeof or casting the argument, but with no results.
Notes:
the actual SQL is much more complex, this one here is a simplification for illustrative purposes
as a last resort - I could alter the SQL in Python based on the argument, but I really wanted to avoid that.)
At first I had an idea of using just 1 argument, but replacing it with a dummy value [-1] and then using it like
cursor.execute(sql='''SELECT ... WHERE -1 = any(%(ids)s) OR id = ANY(%(ids)s)''', params={'ids': ids if ids else [-1]})
but this did a Full table scan for non empty lists, which was unfortunate, so a no go.
Then I thought I could do a little preprocessing in python and send 2 arguments instead of just the single list- the actual list and an empty list boolean indicator. That is
cursor.execute(sql='''SELECT ... WHERE %(empty_ids)s = TRUE OR id = ANY(%(ids)s)''', params={'empty_ids': not ids, 'ids': ids})
Not the most elegant solution, but it performs quite well (Index scan for non empty list, Full table scan for empty list - but that returns the whole table anyway, so it's ok)
And finally I came up with the simplest solution and quite elegant:
cursor.execute(sql='''SELECT ... WHERE '{}' = %(ids)s OR id = ANY(%(ids)s)''', params={'ids': ids})
This one also performs Index scan for non empty lists, so it's quite fast.
From the psycopg2 docs:
Note You can use a Python list as the argument of the IN operator using the PostgreSQL ANY operator.
ids = [10, 20, 30]
cur.execute("SELECT * FROM data WHERE id = ANY(%s);", (ids,))
Furthermore ANY can also work with empty lists, whereas IN () is a SQL syntax error.

Codeigniter database queries with multiple LIKE parameters

How can I perform a query with multiple LIKE parameters?
For example, I have this string to search through:
"I like searching very much"
This is the code I currently use:
$searTerm = "like"
$this->db->or_like('list.description', $SearchTerm,'both');
But i want to search with 2 or 3 parameters. like this:
$searTerm = "like"
$searTerm1 = "much"
How can i perform this to get the same result?
You can simply repeat the like parameters on the active record. In your example you would do something like this:
$this->db->or_like('list.description', $searchTerm1);
$this->db->or_like('list.description', $searchTerm2);
$this->db->or_like('list.description', $searchTerm3);
...
This will just join each or_like with an AND in the WHERE clause.
Firstly, you need to define the array with like variables then, its very important to put the or_like statement above the where clause in order to make multiple 'OR' statements for like 'AND' the where clause.
Here is example:
$this->db->or_like(array('column_name1' => $k, 'column_name2' => $k))
$this->db->where($whereColumn, $whereValue)
You can use like group
$this->db->group_start()->like('column_name1', $value)
->or_group_start()
->like('column_name2', $value)
->group_end()
->group_end();

Django select only rows with duplicate field values

suppose we have a model in django defined as follows:
class Literal:
name = models.CharField(...)
...
Name field is not unique, and thus can have duplicate values. I need to accomplish the following task:
Select all rows from the model that have at least one duplicate value of the name field.
I know how to do it using plain SQL (may be not the best solution):
select * from literal where name IN (
select name from literal group by name having count((name)) > 1
);
So, is it possible to select this using django ORM? Or better SQL solution?
Try:
from django.db.models import Count
Literal.objects.values('name')
.annotate(Count('id'))
.order_by()
.filter(id__count__gt=1)
This is as close as you can get with Django. The problem is that this will return a ValuesQuerySet with only name and count. However, you can then use this to construct a regular QuerySet by feeding it back into another query:
dupes = Literal.objects.values('name')
.annotate(Count('id'))
.order_by()
.filter(id__count__gt=1)
Literal.objects.filter(name__in=[item['name'] for item in dupes])
This was rejected as an edit. So here it is as a better answer
dups = (
Literal.objects.values('name')
.annotate(count=Count('id'))
.values('name')
.order_by()
.filter(count__gt=1)
)
This will return a ValuesQuerySet with all of the duplicate names. However, you can then use this to construct a regular QuerySet by feeding it back into another query. The django ORM is smart enough to combine these into a single query:
Literal.objects.filter(name__in=dups)
The extra call to .values('name') after the annotate call looks a little strange. Without this, the subquery fails. The extra values tricks the ORM into only selecting the name column for the subquery.
try using aggregation
Literal.objects.values('name').annotate(name_count=Count('name')).exclude(name_count=1)
In case you use PostgreSQL, you can do something like this:
from django.contrib.postgres.aggregates import ArrayAgg
from django.db.models import Func, Value
duplicate_ids = (Literal.objects.values('name')
.annotate(ids=ArrayAgg('id'))
.annotate(c=Func('ids', Value(1), function='array_length'))
.filter(c__gt=1)
.annotate(ids=Func('ids', function='unnest'))
.values_list('ids', flat=True))
It results in this rather simple SQL query:
SELECT unnest(ARRAY_AGG("app_literal"."id")) AS "ids"
FROM "app_literal"
GROUP BY "app_literal"."name"
HAVING array_length(ARRAY_AGG("app_literal"."id"), 1) > 1
Ok, so for some reason none of the above worked for, it always returned <MultilingualQuerySet []>. I use the following, much easier to understand but not so elegant solution:
dupes = []
uniques = []
dupes_query = MyModel.objects.values_list('field', flat=True)
for dupe in set(dupes_query):
if not dupe in uniques:
uniques.append(dupe)
else:
dupes.append(dupe)
print(set(dupes))
If you want to result only names list but not objects, you can use the following query
repeated_names = Literal.objects.values('name').annotate(Count('id')).order_by().filter(id__count__gt=1).values_list('name', flat='true')

linq match word with boundaries

say i have a nvarchar field in my database that looks like this
1, "abc abccc dabc"
2, "abccc dabc"
3, "abccc abc dabc"
i need a select LINQ query that would match the word "abc" with boundaries not part of a string
in this case only row 1 and 3 would match
from row in table.AsEnumerable()
where row.Foo.Split(new char[] {' ', '\t'}, StringSplitOptions.None)
.Contains("abc")
select row
It's important to include the call to AsEnumerable, which means the query is executed on the client-side, else (I'm pretty sure) the Where clause won't get converted into SQL succesfully.
Maybe a regular expression like this (nb - not compiled or tested):
var matches = from a in yourCollection
where Regex.Match(a.field, ".*\sabc\s.*")
select a;
datacontext.Table.Where(
e => Regex.Match(e.field, #"(.*?[\s\t]|^)abc([\s\t].*?|$)")
);
or
datacontext.Table.Where(
e => e.Split(' ', '\t').Contains("abc");
);
For efficiency, you want to do as much of the filtering as possible on the server, and then the rest of the filtering on the client. You can't use Regex on the server (SQL Server doesn't support it) so the solution is to first use a LIKE-type search (by calling .Contains) then use Regex on the client to further refine the results:
db.MyTable
.Where (t => t.MyField.Contains ("abc"))
.AsEnumerable() // Executes locally from this point on
.Where (t => Regex.IsMatch (t.MyField, #"\babc\b"))
This ensures that you retrieve only the rows from SQL Server than contain the letters 'abc' (regardless of whether they're a word-boundary match or not) and use Regex on the client-side to further restrict the result set so that only matches that are on word boundaries are included.