Find values from one redis set which don't belong in another - redis

In redis, I have two sets, A and B.
I want to find which values from A are not already in B.
If I do SDIFF, my understanding is that it shows me the differences (like an outer join) of both. But I only want to know which from A is not already in B.
Is there a command to do this or do i need to loop through A and check if it's in B

SDIFF is the command you're looking for. If you look at the example in the documentation you'll see that it acts in the way you're describing:
key1 = {a,b,c,d}
key2 = {c}
key3 = {a,c,e}
SDIFF key1 key2 key3 = {b,d}

Related

A different merge

So I have two tables and thoses are the samples:
df1:
Element
Range
Family
Ae_aag2/0013F
5-2500
Chuviridae
Ae_aag2/0014F
300-2100
Flaviviridae
df2:
Element
Range
Family
0012F
30-720
Chuviridae
0013F
23-1200
Chuviridae
0013F
1300-2610
Xinmoviridae
And I need to join the tables in the following logic:
Element_df1
Element_df2
Family_df1
Family_df2
Ae_aag2/0013F
"0013F:23-1200,0013F:1300-2610"
Chuviridae
"Chuviridae,Xinmoviridae"
I need the common rows in the two dataframes of the column (Element) in one line, saving the element of the first and second and also the family of the first and second. If the 3 elements are common, in the two df, it should join the 3 in one single line.
I tried using the merge in pandas, but it gets me two lines, not one as I needed:
I searched and didn't find how make exceptions on how to merge the two dataframe. I tried using groupby afterwards but kind make worst :(
Unfortunately I don't have much knowledge on working with pandas. Please be kind I'm new at the subject.
Use:
df1.drop(columns='Range').merge(
df2.assign(group=lambda d: d['Element'],
Element=lambda d: d['Element']+':'+d['Range'])
.groupby('group')[['Element', 'Family']].agg(','.join),
left_on=df1['Element'].str.extract('/(.*)$', expand=False),
right_index=True, suffixes=('_df1', '_df2')
)#.drop(columns='key_0') # uncomment to remove the key
Output:
key_0 Element_df1 Family_df1 Element_df2 Family_df2
0 0013F Ae_aag2/0013F Chuviridae 0013F:23-1200,0013F:1300-2610 Chuviridae,Xinmoviridae

.loc only returning first value in a list instead of the full list

I have a data frame and I'm trying to return all the matching values in one column based on another column using loc. The dataframe looks like this.
Col1
Col2
Alpha
Bravo
Alpha
Charlie
Delta
Charlie
Delta
Echo
Delta
Echo
Mike
Rodeo
I'm wanting to return an additional column that has all the values in Col2 for each item in Col1. I tried posting the second table, but StackOverflow thought it was a code command and won't let me post.
When I run this code, Col3 is value from Col2 and not the full list, but when I run only the right side of the code for a specific value, it returns the correct list
for value in col1_values:
df.loc[df['Col1'].eq(value),'Col3'] = list(df['Col2'].loc[df['Col1'].eq(value)])

Using pandas to join on multiple soft keys and multiple hard keys with different names

Is it possible to use pandas to join on multiple soft keys e.g when we allow tolerance range for a match and multiple hard keys that are named differently in both tables?
It seems that pandas.merge_asof allows only to join on one soft key and does not allow to specify hard key names separately for left and right tables (in case they are differently named and renaming isn't easy to process).
Consider the following two datasets
table1:
soft keys: sk1, sk2
hard keys: x, y
sk1,sk2,x,y,val1
10,100,10,15,1
20,200,20,25,2
30,300,10,10,3
table2:
soft keys: sk1,sk2
hard keys: k1,k2
sk1,sk2,k1,k2,val2,x,y
15,110,10,15,3,1,1
23,230,20,25,5,2,2
34,330,10,10,-1,3,3
I would need something equivalent to
soft_merge(t1, t2, left_by=["x","y"], right_by=["k1","k2"], on=[sk1, sk2], tolerance=[5,15])
to get output (showed vals only for clarity):
val1 | val2
1 | 3
I understand that instead of left_by and right_by for hard keys we can just
use by and rename columns, but this might not be easily supportable by a system since other system components might rely on old namings. Is there any clean and nice way of achieving it without multiple naming-renaming?
But the problem of joining on multiple soft keys still remain unclear ...
Implement the tolerances after an exact merge:
m = df1.merge(df2, left_on=["x","y"], right_on=["k1","k2"])
mask = (m.sk1_x - m.sk1_y).abs().le(5) & (m.sk2_x - m.sk2_y).abs().le(15)
m.loc[mask, ['val1', 'val2']]
# val1 val2
#0 1 3
This doesn't ensure a 1:1 merge, and will give all combinations that achieve that tolerance. If you need the "nearest" match you will need to specify some distance formula and keep only the closest. Here I use the total absolute distance. Assuming val1 is a unique key:
m['dist'] = (m.sk1_x - m.sk1_y).abs() + (m.sk2_x - m.sk2_y).abs()
m.sort_values('dist').loc[mask].drop_duplicates('val1')

Pentaho Join table values

I'm rookie at kettle pentaho and Querys. What i'm trying to do is check if value A, in file 1 is in file 2.
I've got 2 files, that i export from my DB:
File 1:
Row1, Row2
A 3
B 5
C 99
Z 65
File 2:
Row1, Row2
A 3
D 11
E 22
Z 65
And i want to create one file output:
File Output
Row1, Row2
A 3
Z 65
What i'm doing: 2 files input, merge join, but no file output. Something missing here.
Any suggestion will be great!!!
You can have the two streams joined by a "Merge join" step, which allows you to set freely the join keys (in your case it seems like you want both fields to be used), and also what type of join, Inner, Left Outer, Right outer or Full outer.
You can use stream lookup for that. Start with the file input for file 1, then create a stream lookup step that uses the input stream for file 2 as its lookup stream. Now just match the columns and you can add the column from file 2 to your data stream.
sort both files in Ascending Order
use the MergeJoin step to join both tables on the sorted field (this case Row1)
use select vales step to delete unwanted field produced as result of the join
output your result using the dummy step or whatever output you prefer
this should work fine

fetch all keys from redis which are having some string into it

I am having some data stored in redis with below given keys
I have stored some keys into redis like
key1 = https://abc.net/v/140225014843/css/
key2 = https://abc.net/v/153729007613/css/
key3 = https://abc.net/v/240125414249/css/
key4 = https://abc.net/v/140225014843/css/:tokens
key5 = https://abc.net/v/240125414249/css/:tokens
Now i am having data = 140225014843 i want to fetch key and it's value which is having that data inside it .
Example : key1 is having data inside it so i want to fetch key1 from redis.
I am using django-redis.
Edit :
Key4 is also having data into it but i want to fetch only those keys which are of pattern like key1 .
You should rethink the way you name your keys, as it is an important decision.
You could use a List for each data value you have, with that value being the key and "paths" of that data value being the members of the list.
For example in your case you could do:
redis> RPUSH 140225014843 "css/"
redis> RPUSH 153729007613 "css/"
redis> RPUSH 240125414249 "css/"
redis> RPUSH 140225014843 "css/:tokens"
redis> RPUSH 240125414249 "css/:tokens"
Depending on what is the variable part in your data, you could adjust this approach. For example if "css/" is always present then you could omit it.
Also you may not want duplicates in your lists, in which case you should use a Set instead.