Filter by array length in typesense query - typesense

I am using typesense and have the following document structure in a people collection:
{
...
people: ['Jim', 'Jane', 'Jack']
...
}
I would like to add a filter to my query that allows me to filter by the number of entries within the peoples array for each document.
e.g
{
filter_by: people:=[0...2]
}
This should return all documents that have between 0 - 2 entries within the peoples array.
The documentation states that this is possible for numeric fields, but does not mention anything related to array fields.
Numeric Filtering:
Filter documents with numeric values between a min and max value, using the range operator [min..max] or using simple comparison operators >, >= <, <=, =.
Is this possible or do i need to explicitly add in a numeric field to the documents.

It's not possible to filter based on array length in Typesense.
So you would have to create a new numeric field in each document called say people_length, that you calculate at indexing time and then use that field for filtering.

Related

How to get value string with regexp in bigquery

Hi i have string in BigQuery column like this
cancellation_amount: 602000
after_cancellation_transaction_amount: 144500
refund_time: '2022-07-31T06:05:55.215203Z'
cancellation_amount: 144500
after_cancellation_transaction_amount: 0
refund_time: '2022-08-01T01:22:45.94919Z'
i already using this logic to get cancellation_amount
regexp_extract(file,r'.*cancellation_amount:\s*([^\n\r]*)')
but the output only amount 602000, i need the output 602000 and 144500 become different column
Appreciate for helping
If your lines in the input (which will eventually become columns) are fixed you can use multiple regexp_extracts to get all the values.
SELECT
regexp_extract(file,r'cancellation_amount:\s*([^\n\r]*)') as cancellation_amount
regexp_extract(file,r'. after_cancellation_transaction_amount:\s*([^\n\r]*)') as after_cancellation_transaction_amount
FROM table_name
One issue I found with your regex expression is that .*cancellation_amount won't match after_cancellation_transaction_amount.
There is also a function called regexp_extract_all which returns all the matches as an array which you can later explode into columns, but if you have finite values separating them out in different columns would be a easier.

Wildcard search for array<string> in Athena

I have a table in Athena where one of the columns is of type array.
I tried the below query to get output containing earth but doesn't work.
How do I perform a wildcard search in this column?
Expected output after wildcard search:
select * from mytable
where contains(myarr,'eart%');
This is from memory, so it might need a bit of tweaking, but you can use a filter on the array elements
where cardinality(filter(myarr, q -> q like 'eart%')) > 0
filter creates an array of matches and cardinality tests for one or more elements in the array

Exclude rows when column contains a 1 in position 2 without using function

I have a column that will always be 5 digits long, and each digit will always be a 1 or a 0. I need to put in my where clause to exclude when the second position is equal to 1. For example 01000 is to be excluded but 10010 is to be kept. I currently have:
WHERE (SUBSTRING(field, 2, 1) <> '1') or field IS NULL
How do do this without using the Substring function?
Edit:Also, the column is a varchar(10) in the database. Does this matter?
You could use the like operator to check that character directly:
WHERE field LIKE '_1%' OR field IS NULL
Use LEFT and RIGHT and then check that is 1 or not as below-
WHERE RIGHT(LEFT(field,2),1) <> '1' OR field IS NULL
No.
If 'field' is of a string type, you need to use string functions to manipulate it. SUBSTRING or some other flavor of it.
You can also convert it to binary and use bitwise AND operator but that won't solve the root issue here.
You are facing the consequences of someone ignoring 1NF.
There is a reason why Codd insisted that every "cell" must be atomic. Your's is not.
Can you separate this bitmap into atomic attribute columns?

How to use ANY instead of IN in a WHERE clause?

I used to have a query like in Rails:
MyModel.where(id: ids)
Which generates sql query like:
SELECT "my_models".* FROM "my_models"
WHERE "my_models"."id" IN (1, 28, 7, 8, 12)
Now I want to change this to use ANY instead of IN. I created this:
MyModel.where("id = ANY(VALUES(#{ids.join '),('}))"
Now when I use empty array ids = [] I get the folowing error:
MyModel Load (53.0ms) SELECT "my_models".* FROM "my_models" WHERE (id = ANY(VALUES()))
ActiveRecord::JDBCError: org.postgresql.util.PSQLException: ERROR: syntax error at or near ")"
ActiveRecord::StatementInvalid: ActiveRecord::JDBCError: org.postgresql.util.PSQLException: ERROR: syntax error at or near ")"
Position: 75: SELECT "social_messages".* FROM "social_messages" WHERE (id = ANY(VALUES()))
from arjdbc/jdbc/RubyJdbcConnection.java:838:in `execute_query'
There are two variants of IN expressions:
expression IN (subquery)
expression IN (value [, ...])
Similarly, two variants with the ANY construct:
expression operator ANY (subquery)
expression operator ANY (array expression)
A subquery works for either technique, but for the second form of each, IN expects a list of values (as defined in standard SQL) while = ANY expects an array.
Which to use?
ANY is a later, more versatile addition, it can be combined with any binary operator returning a boolean value. IN burns down to a special case of ANY. In fact, its second form is rewritten internally:
IN is rewritten with = ANY
NOT IN is rewritten with <> ALL
Check the EXPLAIN output for any query to see for yourself. This proves two things:
IN can never be faster than = ANY.
= ANY is not going to be substantially faster.
The choice should be decided by what's easier to provide: a list of values or an array (possibly as array literal - a single value).
If the IDs you are going to pass come from within the DB anyway, it is much more efficient to select them directly (subquery) or integrate the source table into the query with a JOIN (like #mu commented).
To pass a long list of values from your client and get the best performance, use an array, unnest() and join, or provide it as table expression using VALUES (like #PinnyM commented). But note that a JOIN preserves possible duplicates in the provided array / set while IN or = ANY do not. More:
Optimizing a Postgres query with a large IN
In the presence of NULL values, NOT IN is often the wrong choice and NOT EXISTS would be right (and faster, too):
Select rows which are not present in other table
Syntax for = ANY
For the array expression Postgres accepts:
an array constructor (array is constructed from a list of values on the Postgres side) of the form: ARRAY[1,2,3]
or an array literal of the form '{1,2,3}'.
To avoid invalid type casts, you can cast explicitly:
ARRAY[1,2,3]::numeric[]
'{1,2,3}'::bigint[]
Related:
PostgreSQL: Issue with passing array to procedure
How to pass custom type array to Postgres function
Or you could create a Postgres function taking a VARIADIC parameter, which takes individual arguments and forms an array from them:
Passing multiple values in single parameter
How to pass the array from Ruby?
Assuming id to be integer:
MyModel.where('id = ANY(ARRAY[?]::int[])', ids.map { |i| i})
But I am just dabbling in Ruby. #mu provides detailed instructions in this related answer:
Sending array of values to a sql query in ruby?

return nth value in ranked numeric field using Qlikview

I want to return the nth value in a numeric field in a Qlikview chart. The field is not sorted in the load. I want n to be an expression.
I have tried using min(FieldName, round(expression)) but the offest value is not recognised and the 1st minimum value is returned.
Is there a way round this that allows me to use an expression to determine the value of n?
You need to use the Rank function. It specifically works on expressions and you can also specify how the rank behaves (min, max, avg).
rank([ total ] expression [ , mode [, format ] ])
There are also variants like HRank and VRank but they work in specific charts.
Then simply pick the nth value from the rank like
pick(n, rank())