MySQL String Comparison with Percent Output - sql

I am trying to compare two entries of 6 numbers, each number which can either can be zero or 1 (i.e 100001 or 011101). If 3 out of 6 match, I want the output to be .5. If 2 out of 6 match, i want the output to be .33 etc.
Here are the SQL commands to create the table
CREATE TABLE sim
(sim_key int,
string int);
INSERT INTO sim (sim_key, string)
VALUES (1, 111000);
INSERT INTO sim (sim_key, string)
VALUES (2, 111111);
My desired output to compare the two strings, which share 50% of the characters, and output 50%.
Is it possible to do this sort of comparison in SQL? Thanks in advance

This returns the percentage of equal 1 bits in both strings:
select bit_count(conv(a.string, 2, 10) & conv(b.string, 2, 10))/6*100 as percent_match
from sim a, sim b where
a.sim_key=1 and b.sim_key=2;
As you store your bitfields as base 2 representation converted to numbers, we first need to do conversions: conv(a.string, 2, 10), conv(b.string, 2, 10).
Then we keep only bits that are 1 in each field: conv(a.string, 2, 10) & conv(b.string, 2, 10)
And we count them: bit_count(conv(a.string, 2, 10) & conv(b.string, 2, 10))
And finally we just compute the percentage: bit_count(conv(a.string, 2, 10) & conv(b.string, 2, 10)) / 6 * 100.
The query returns 50 for 111000 and 111111.
Here is an other version that also counts matching zeros:
select bit_count((conv(a.string, 2, 10) & conv(b.string, 2, 10)) | ((0xFFFFFFFF>>(32-6))&~(conv(a.string, 2, 10)|conv(b.string, 2, 10))))/6*100 as percent_match
from sim a, sim b where
a.sim_key=1 and b.sim_key=2;
Note that, while this solution works, you should really store this field like this instead:
INSERT INTO sim (sim_key, string)
VALUES (1, conv("111000", 2, 10));
INSERT INTO sim (sim_key, string)
VALUES (2, conv("111111", 2, 10));
Or to update existing data:
UPDATE sim SET string=conv(string, 10, 2);
Then this query gives the same results (if you updated your data as described above):
select bit_count(a.string & b.string)/6*100 as percent_match
from sim a, sim b where
a.sim_key=1 and b.sim_key=2;
And to count zeros too:
select bit_count((a.string & b.string) | ((0xFFFFFFFF>>(32-6))&~(a.string|b.string)))/6*100 as percent_match
from sim a, sim b where
a.sim_key=1 and b.sim_key=2;
(replace 6s by the size of your bitfields)

Since you are storing them as numbers, you can do this
SELECT BIT_COUNT(s1.string & s2.string) / BIT_COUNT(s1.string | s1.string)
FROM sim s1, sim s2
WHERE s1.sim_key = 1 AND s2.sim_key = 2

Related

How can I use IF and ELSE IF in looping and display 2 statements in GAMS?

I am a beginner level in this program. I try to improve this loop according to this condition. The details are as follows:
When CUTI(k) = CUTI(k)-4 then,
1)If the result shows this CUTI(k) value greater than 0, then print this CUTI(k) value.
2)If the result shows CUTI(k) value less than 0, then print this CUTI(k) value is added 12 with showing a word "*" after the number in display, e.g. 10*, 9*
I am not sure this loop is correct and enough to add this condition. Look forward to seeing your recoomendation. :)
set k /1*20/;
parameter
CUTI(k)/1 6, 2 2, 3 8, 4 5, 5 1, 6 3, 7 7, 8 8, 9 6, 10 8,11 1, 12 2, 13 4, 14 7,
15 5, 16 2, 17 8, 18 9, 19 2, 20 10/;
loop(k,
if(CUTI(k)-4 > 0,
CUTI(k) = CUTI(k)-4;
else
CUTI(k) = (CUTI(k)-4)+12 ;
)
);
display CUTI;
Your logic looks correct. However, instead of the loop/if/else you could simplify this to one assignment:
CUTI(k) = CUTI(k)-4+12$(CUTI(k)<=4);
However, modifying the display statement by adding a * to some elements is not possible. If you need to distinguish the cases in such a statement, you might assign the values to two different parameters and display them individually.

MS-Access (SQL): Strange behaviour of the MID() function when using decimal arguments

I have noticed strange behavior of the MID() function in MS Access when used in combination with decimal numbers as arguments.
The data is as follows:
table: Test
ID Name Surname
1 Jamal Winstone
2 Joe Roan
3 Jake Tumble
4 Lea More
The SQL statement is:
SELECT MID(Surname, ID, LEN(Name)/2) FROM Test
The results are:
Expr1000
Wi
oa
mb
e
However, shouldn't it be as follows?
MID(Winstone, 1, LEN(Jamal)/2) = MID(Winstone, 1, 5/2) = MID(Winstone, 1, 2.5) = Wi (only 2 characters)
MID(Roan, 2, LEN(Joe)/2) = MID(Roan, 2, 3/2) = MID(Roan, 2, 1.5) = o (only 1 character)
MID(Tumble, 3, LEN(Jake)/2) = MID(Tumble, 3, 4/2) = MID(Tumble, 3, 2) = mb (2 charactes)
MID(More, 4, LEN(Lea)/2) = MID(More, 4, 3/2) = MID(More, 4, 1.5) = e (only 1 character)
This is very strange. Any ideas why this is happening, are the numbers with decimal places rounded?
Thanks
The logic here is very simple:
Mid takes a Long, so needs to cast a float/decimal/currency to a Long first.
Casting to a long uses banker's rounding (to the nearest even) on halves (Clng(1.5) = Clng(2.5) = 2), see the docs.
So these results are entirely expected.
Use Int if you want the integer part (e.g. Int(1.99) = 1)

How to combine certain column values together in Python and make values in the other column be the means of the values combined?

I have a Panda dataframe where one of the columns is a sequence of numbers('sequence')many of them repeating and the other column values('binary variable') are either 1 or 0.
I have grouped by the values in the sequences column which are the same and made the column values in the binary variable be the % of entries which are non-zero in that group.
I now want to combine entries in the 'sequence' column with the same values together and make the column values in 'binary variable' the mean of the column values of those columns that that were combined.
So my data frame looks like this:
df = pd.DataFrame([{'sequence' : [1, 1, 4,4,4 ,6], 'binary variable' : [1,0,0,1,0,1]}).
I have then used this code to group together the same values in sequence. Using this code:
df.groupby(["sequence"]).apply(lambda 'binary variable': (binary variable!= 0).sum() / binary variable.count()*100 )
I am left with the sequence columns with non-repeating values and the binary variable column now being the percentage of non zeros
.
But now I want to group some of the column values together(so for this toy example the 1 and 4 values), and have the binary variable column have values which are the mean of the percentages of say the values for 1 and 4.
This isn't terribly well worded as finding it awkward to describe it but any help would be much appreciated, I've tried to look online and had many failed attempts with code of my own but it just is not working.
Any help would be greatly appreciated
It seems like you want to group the table twice and take the mean each time. For the second grouping, you need to create a new column to indicate the group.
Try this code:
import pandas as pd
# sequence groups for final average
grps = {(1,4):[1,4],
(5,6):[5,6]}
# initial data
df = pd.DataFrame({'sequence' : [1,1,4,4,4,5,5,6], 'binvar' : [1,0,0,1,0,1,0,1]})
gb = df.groupby(["sequence"])['binvar'].mean().reset_index() #.apply(lambda 'binary variable': (binary variable!= 0).sum() / binary variable.count()*100 )
def getgrp(x): # search groups
for k in grps:
if x in grps[k]:
return k
print(df.to_string(index=False))
gb['group'] = gb.apply(lambda r: getgrp(r[0]), axis = 1)
gb = gb.reset_index()
print(gb.to_string(index=False))
gb = gb[['group','binvar']].groupby("group")['binvar'].mean().reset_index()
print(gb.to_string(index=False))
Output
sequence binvar
1 1
1 0
4 0
4 1
4 0
5 1
5 0
6 1
index sequence binvar group
0 1 0.500000 (1, 4)
1 4 0.333333 (1, 4)
2 5 0.500000 (5, 6)
3 6 1.000000 (5, 6)
group binvar
(1, 4) 0.416667
(5, 6) 0.750000

I would like to transform a map into a field in a Pig Latin script

The description of my tuples of my relation (A) is as follows:
{a: int, b: int, c: map[]}
the map contains only one chararray but the key is not predictable. For example, a sample of my tuples is:
(1, 100, [key.152#hello])
(8, 110, [key.3000#bonjour])
(5, 103, [key.1#hallo])
(5, 103, [])
(8, 104, [key.11#buenosdias])
...
I would like to transform my relation (A) into a B relation so the B description would be:
{a: int, b: int, c: chararray}
With my sample, it would give:
(1, 100, hello)
(8, 110, bonjour)
(5, 103, hallo)
(8, 104, buenosdias)
...
(I want to filter empty maps too)
Any ideas?
Thank you.
Though writing the UDF is the right solution, if you want to hack something quick following solution using Regex might help.
A = LOAD 'sample.txt' as (a:int, b:int, c:chararray);
B = FOREACH A GENERATE a, b, FLATTEN(STRSPLIT(c, '#', 2)) as (key:chararray, value:chararray);
C = FOREACH B GENERATE a, b, FLATTEN(STRSPLIT(value, ']', 2)) as (value:chararray, ignore:chararray);
D = FILTER C BY value is not null;
E = FOREACH D GENERATE a, b, value;
STORE E INTO 'output/E';
For sample input
1 100 [key.152#hello]
8 110 [key.3000#bonjour]
5 103 [key.1#hallo]
5 103 []
8 104 [key.11#buenosdias]
The above code produces following output:
1 100 hello
8 110 bonjour
5 103 hallo
8 104 buenosdias

How to compare two different rows of two different tables and update a third table?

Im having two different tables Value(queid, m) and Ans(queid1, an). I want to compare queid and queid1 and if they are same then m and an's values and has to update the third table with correct values. Thanx a tons.
Table structures are
Value table will have two attributes queid and m. queid will have data like 3, 4, 5, 6, and m will have a, v, d, e
Ans table will have attributes queid1 and an. queid1 attributes will have data like 3, 4, 3, 4, 3, 3, 3, 2, 3, 4 and an will have data like a, v, a, a, a, c, e, r, e, d.
Now what i want is that it should compare the values of queid with queid1. so if we consider 3 ie first value of queid in value table, then it should find all the 3's in ans table and then is should compare a (ie the row corresponding to 3 in value table) with all the 3's in ans. And the corresponding right comparison of a's is to be stored in some third table.
This can be done by joining both tables on the queid & queid1 columns then filtering any results where the m & an columns are equal:
INSERT INTO NewTable (col1, col2)
SELECT V.queid, V.m
FROM Value V
JOIN Ans A
ON V.queid = A.queid1
WHERE V.m = A.an
;