Redis - keys, expression - redis

I use Redis and ruby. There are records:
"phones:initial:#{id}"
redis.hset "phones:initial:1", "query", query
redis.hset "phones:initial:2", "query", query
redis.hset "phones:initial:3", "query", query
redis.hset "phones:initial:4", "query", query
...
is it possible to get the keys, in which id corresponds to a expression?
"phones:initial: and id % m = n

NO, there's no built-in method to do that. You have to use the SCAN command to iterate on all keys, and select those that match the given pattern.
However, if you always use the same m for the expression, you can pre-compute n for each id, and make n as a part of the key. In this way, you can use Redis' pattern match to get those specific keys.
For example, suppose m == 2, you can have the following new keys: phones:initial:1:1, phones:initial:0:2, phones:initial:1:3, phones:initial:0:4. To get keys that match phones:initial: and id % 2 == 0, use the following command: SCAN 0 MATCH phones:initial:0:*

Related

Best practice for scaling SQL queries on joins?

I'm writing a REST api that works with SQL and am constantly finding myself in similar situations to this one, where I need to return lists of objects with nested lists inside each object by querying over table joins.
Let's say I have a many-to-many relationship between Users and Groups. I have a User table and a Group table and a junction table UserGroup between them. Now I want to write a REST endpoint that returns a list of users, and for each user the groups that they are enrolled in. I want to return a json with a format like this:
[
{
"username": "test_user1",
<other attributes ...>
"groups": [
{
"group_id": 2,
<other attributes ...>
},
{
"group_id": 3,
<other attributes ...>
}
]
},
{
"username": "test_user2",
<other attributes ...>
"groups": [
{
"group_id": 1,
<other attributes ...>
},
{
"group_id": 2,
<other attributes ...>
}
]
},
etc ...
There are two or three ways to query SQL for this that I can think of:
Issue a variable number of SQL queries: Query for a list of Users, then loop over each user to query over the junction linkage to populate the groups list for each user. The number of SQL queries linearly increases with the number of users returned.
example (using python flask_sqlalchemy / flask_restx):
users = db.session.query(User).filter( ... )
for u in users:
groups = db.session.query(Group).join(UserGroup, UserGroup.group_id == Group.id) \
.filter(UserGroup.user.id == u.id)
retobj = api.marshal([{**u.__dict__, 'groups': groups} for u in users], my_model)
# Total number of queries: 1 + number of users in result
Issue a constant number of SQL queries: This can be done by issuing one monolithic SQL query performing all joins with potentially lots of redundant data in the User's columns, or, often more preferably, a few separate SQL queries. For example, query for a list of Users, then query the Group table joining on GroupUsers, then manually group groups in server code.
example code:
from collections import defaultdict
users = db.session.query(User).filter( ... )
uids = [u.id for u in users]
groups = db.session.query(User.user_id, Group).join(UserGroup, UserGroup.group_id == Group.id) \
.filter(UserGroup.user_id._in(uids))
aggregate = defaultdict(list)
for g in groups:
aggregate[g.user_id].append(g[1].__dict__)
retobj = api.marshal([{**u.__dict__, 'groups': aggregate[u.id]} for u in users], my_model)
# Total number of queries: 2
The third approach, with limited usefulness, is to use string_agg or a similar approach to force SQL to concatenate a grouping into one string column, then unpack the string into a list server-side, for example if all I want was the group number I could use string_agg and group_by to get back "1,2" in one query to the User table. But this is only useful if you don't need complex objects.
I'm attracted to the second approach because I feel like it's more efficient and scalable because the number of SQL queries (which I have assumed is the main bottleneck for no particularly good reason) is constant, but it takes some more work on the server's side to filter all the groups into each user. But I thought part of the point of using SQL is to take advantage of its efficient sorting/filtering so you don't have to do it yourself.
So my question is, am I right in thinking that it's a good idea to make the number of SQL queries constant at the expense of more server-side processing and dev time? Is it a waste of time to try to whittle down the number of unnecessary SQL queries? Will I regret it if I don't, when API is tested at scale? Is there a better way to solve this problem that I'm not aware of?
Using joinedload option you can load all the data with just one query:
q = (
session.query(User)
.options(db.joinedload(User.groups))
.order_by(User.id)
)
users = q.all()
for user in users:
print(user.name)
for ug in user.groups:
print(" ", ug.name)
When you run the query above, all the groups would have been loaded already from the database using the query similar to below:
SELECT "user".id,
"user".name,
group_1.id,
group_1.name
FROM "user"
LEFT OUTER JOIN (user_group AS user_group_1
JOIN "group" AS group_1 ON group_1.id = user_group_1.group_id)
ON "user".id = user_group_1.user_id
And now you only need to serialize the result with proper schema.

Is there any way to search in json postgres column using matching clause?

I'm trying to search for a record from the Postgres JSON column. The stored data has a structure is like this
{
"contract_shipment_date": "2015-06-25T19:00:00.000Z",
"contract_product_grid": [
{
"product_name": "Axele",
"quantity": 22.58
},
{
"product_name": "Bell",
"quantity": 52.58
}
],
"lc_status": "Awaited"
}
My table name is Heap and column name is contract_product_grid. Also, contract_product_grid column can contain multiple product records.
I found this documentation but not able to get the desired output.
The required case is, I have a filter in which users can select product_name and on the basics of entered name by using matching clause, the record will be fetched and returned to users.
Suppose you entered Axele as a product_name input, and want to return the matching value for quantity key.
Then use :
SELECT js2->'quantity' AS quantity
FROM
(
SELECT JSON_ARRAY_ELEMENTS(value::json) AS js2
FROM heap,
JSON_EACH_TEXT(contract_product_grid) AS js
WHERE key = 'contract_product_grid'
) q
WHERE js2->> 'product_name' = 'Axele'
where expand the outermost JSON into key&value pairs through JSON_EACH_TEXT(json), and split all the elements with the newly formed array by JSON_ARRAY_ELEMENTS(value::json) function.
Then, filter out by spesific product_name within the main query.
Demo
P.S. Don't forget to wrap JSON column's value up with curly braces
SELECT *
FROM
(
SELECT JSON_ARRAY_ELEMENTS(contract_product_grid::json) AS js2
FROM heaps
WHERE 'contract_product_grid' = 'contract_product_grid'
) q
WHERE js2->> 'product_name' IN ('Axele', 'Bell')
As, I mentioned in question that my column name is 'contract_product_grid' and i have to only search from it.
Using this query, I'm able to get contract_product_grid information using IN clause with entered product name.
You need to unnest the array in order to be able to use a LIKE condition on each value:
select h.*
from heap h
where exists (select *
from jsonb_array_elements(h.contract_product_grid -> 'contract_product_grid') as p(prod)
where p.prod ->> 'product_name' like 'Axe%')
If you don't really need a wildcard search (so = instead of LIKE) you can use the contains operator #> which is a lot more efficient:
select h.*
from heap h
where h.contract_product_grid -> 'contract_product_grid' #> '[{"product_name": "Axele"}]';
This can also be used to search for multiple products:
select h.*
from heap h
where h.contract_product_grid -> 'contract_product_grid' #> '[{"product_name": "Axele"}, {"product_name": "Bell"}]';
If you are using Postgres 12 you can simplify that a bit using a JSON path expression:
select *
from heap
where jsonb_path_exists(contract_product_grid, '$.contract_product_grid[*].product_name ? (# starts with "Axe")')
Or using a regular expression:
select *
from heap
where jsonb_path_exists(contract_product_grid, '$.contract_product_grid[*].product_name ? (# like_regex "axe.*" flag "i")')

PostgreSQL query column that has JSON object with array of another JSON object nested inside

The JSON column data is formatted as such:
{
"associations": [
{
"dcn": "FI692HI",
"ucid": "1038753892",
"dcnName": "USED PARTS 4 SALE",
"ucidName": "A UCID NAME",
"dealerCode": "A187",
"dealerName": "SOME DEALER HERE LTD."
}
]
}
or like this in the bigger picture
party_no mdl mkr_cd assoc_strct
-------------------------------------
666 DOG 2 JSON object from above
267 DOG 1 JSON object from above
185 CAT 1 JSON object from above
I need to be able to query the keys in that JSON object that is inside the array, that is, I need to do queries for dcn, dcnName, ucid, ucidName, dealerCode, and dealerName values, like how you would do in a hash map in java or dictionary in python
SELECT
assoc_strct -> 'associations' AS json_array
FROM
assets.asset_latest al
So basically, say I wanted to query to see what the most frequency appearing value was for the "dcn" key and get its corresponding party_no lets say. so lets say I had a "dcn" key with a value of "BLUE42" appearing 1 million times, my results should be like:
party_no JSON val count
--------------------------------
666 BLUE42 1,000,000
Again, I just need a method to query the key/value pairs inside this JSON object, which holds an array, and then another JSON object that contains key/value pairs delimited by a comma (oh that's a mouthful). Not entirely sure who created the database with the JSON column that way (its my work), because I figured a { "outer_json_object" : [{"key" : "pair"}, {"key2": "pair2"}]} would be easier to access but maybe I'm wrong
I am not entirely sure I understand your question completely.
You will need to unnest the array for each party_no, then retrieve all the key/value pairs from the array elements, which essentially returns one row for each key/value pair times the number of elements in the array - for each party_no.
The unnesting is done using jsonb_array_elements() and extracting the key/value pairs can be done using jsonb_each().
The result can then be grouped and sorted descending and the first row is the one with the highest count:
select party_no,
t.val as "JSON Value",
count(*)
from data
cross join jsonb_array_elements(assoc_strct -> 'associations') as a(e)
cross join jsonb_each_text(a.e) as t(ky,val)
where t.ky = 'dcn'
group by party_no, t.val
order by count(*) desc
limit 1
Online example: https://rextester.com/LYTTY41242

How to retrieve keys in large REDIS databases using SCAN

I have a large redis database where I query keys using SCAN using the syntax:
SCAN 0 MATCH *something* COUNT 50
I get the result
1) "500000"
2) (empty list or set)
but the key is there. If I call subsequent with the new key in 1) at some time I will get the result.
I was under the impression MATCH would return matching keys until the max number specified by COUNT, but it seems REDIS scans COUNT keys and return only if they match.
Do I miss something? How can I do: "give me the first (count) keys that match the match" ?

Pig FILTER returns empty bag that I can't COUNT

I'm trying to count how many values in a data set match a filter condition, but I'm running into issues when the filter matches no entries.
There are a lot of columns in my data structure, but there's only three of use for this example: key - data key for the set (not unique), value - float value as recorded, nominal_value - float representing the nominal value.
Our use case right now is to find the number of values that are 10% or more below the nominal value.
I'm doing something like this:
filtered_data = FILTER data BY value <= (0.9 * nominal_value);
filtered_count = FOREACH (GROUP filtered_data BY key) GENERATE COUNT(filtered_data.value);
DUMP filtered_count;
In most cases, there are no values that fall outside of the nominal range, so filtered_data is empty (or null. Not sure how to tell which.). This results in filtered_count also being empty/null, which is not desirable.
How can I construct a statement that will return a value of 0 when filtered_data is empty/null? I've tried a couple of options that I've found online:
-- Extra parens in COUNT required to avoid syntax error
filtered_count = FOREACH (GROUP filtered_data BY key) GENERATE COUNT((filtered_data.value is null ? {} : filtered_data.value));
which results in:
Two inputs of BinCond must have compatible schemas. left hand side: #1259:bag{} right hand side: #1261:bag{#1260:tuple(cf#1038:float)}
And:
filtered_count = FOREACH (GROUP filtered_data BY key) GENERATE (filtered_data.value is null ? 0 : COUNT(filtered_data.value));
which results in an empty/null result.
The way you have it set up right now, you will lose information about any keys for which the count of bad values is 0. Instead, I'd recommend preserving all keys, so that you can see positive confirmation that the count was 0, instead of inferring it by absence. To do that, just use an indicator and then SUM that:
data2 =
FOREACH data
GENERATE
key,
((value <= 0.9*nominal_value) ? 1 : 0) AS bad;
bad_count = FOREACH (GROUP data2 BY key) GENERATE group, SUM(data2.bad);