SQL find all rows with assigned values - sql

MSSQL: i have this example data:
NAME AValue BValue
A 1 11
B 1 11
C 2 11
D 2 21
E 3 21
F 3 21
G 4 31
H 4 31
I 5 41
J 5 NULL
...
I am looking for algorhitm which looks for all the Names closed by values by different seed (AValue and Bvalue, in this case seed is given by 2 for AValue and by 3 for Bvalue, but this can be skipped and given later and so on, not only looking for smallest multiple). In this case output should be 1,2,3,4,11,21,31 as a first group/result. Then all the Names with these values can be updated etc.
I need to find out all the Names in "closed circle" of values by different seed.
EDIT:
(try of simplier example)
Imagine that you have list of names. Each name is given two numbers. In most cases these numbers are given by some seed (in this example AValue is given twice, BValue three times) but some numbers can be skipped, so you cannot just count smallest multiple of these different seeds(in this case it would be 2x3, ever 6 names you have closed group where no Name contains AValue or BValue from next/different group). For example Name A have 1 and 11. 1 is given for A and B, 11 for A, B, C. These Names have 1,2,11,21. So you check for 2 and 21 and then you get E and F in addition and then the loop of checking should continue, but as long as no more Names are contained there should be output 1,2,3,11,21. "Closed circle"

Related

PostgreSQL data transformation - Turn rows into columns

I have a table whose structure looks like the following:
k | i | p | v
Notice that the key (k) is not unique, there are no keys, nothing. Each key can have multiple attributes (i = 0, 1, 2, ...) which can be of different types (p) and have different values (v). One attribute type may also appear multiple times (p(i-1) = p(i)).
What I want to do is pick certain attribute types and their corresponding values and place them in the same row. For example I want to have:
k | attr_name1 | attr_name2
I have managed to make a query that does this and works for all keys (k) for which attr_name1 and attr_name2 appear in the column p of the initial table:
SELECT DISTINCT ON (key) fn.k AS key, fn.v AS attr_name1, a.v AS attr_name2
FROM Table fn
LEFT JOIN Table a ON fn.k = a.k
AND a.p = 'attr_name2'
WHERE fn.p = 'attr_name1'
I would like, however, to take into account the case where a certain key has no attribute named attr_name1 and insert a NULL value into the corresponding column of the new table. I am not sure how to achieve that. I have no issue using multiple queries or intermediate tables etc, but there are quite a lot of rows in the table and I need something that scales to millions of rows.
Any help would be appreciated.
Example:
k i p v
1 0 a 10
1 1 b 12
1 2 c 34
1 3 d 44
1 4 e 09
2 0 a 11
2 1 b 13
2 2 d 22
2 3 f 34
Would turn into (assuming I am only interested in columns a, b, c):
k a b c
1 10 12 34
2 11 13 NULL
I would use conditional aggregation. That is, an aggregate function around a CASE expression.
SELECT
k,
MAX(CASE WHEN p='a' THEN v END) AS a,
MAX(CASE WHEN p='b' THEN v END) AS b,
MAX(CASE WHEN p='c' THEN v END) AS c
FROM
your_table
GROUP BY
k
This presumes that (k, p) is unique. If there are duplicate keys, this will clearly find the one v with the highest value (for each (k,p))
As a general rule this kind of pivoting makes the data harder to process in SQL. This is often done for display purposes because humans find this easier to read. However, from a software engineering perspective, such formatting should not be done in the data layer; be careful that by doing this you don't actually make your future life harder.

SQL dealing every bit without run query repeatedly

I have a column using bits to record status of every mission. The index of bits represents the number of mission while 1/0 indicates if this mission is successful and all bits are logically isolated although they are put together.
For instance: 1010 is stored in decimal means a user finished the 2nd and 4th mission successfully and the table looks like:
uid status
a 1100
b 1111
c 1001
d 0100
e 0011
Now I need to calculate: for every mission, how many users passed this mission. E.g.: for mission1: it's 0+1+1+0+1 = 5 while for mission2, it's 0+1+0+0+1 = 2.
I can use a formula FLOOR(status%POWER(10,n)/POWER(10,n-1)) to get the bit of every mission of every user, but actually this means I need to run my query by n times and now the status is 64-bit long...
Is there any elegant way to do this in one query? Any help is appreciated....
The obvious approach is to normalise your data:
uid mission status
a 1 0
a 2 0
a 3 1
a 4 1
b 1 1
b 2 1
b 3 1
b 4 1
c 1 1
c 2 0
c 3 0
c 4 1
d 1 0
d 2 0
d 3 1
d 4 0
e 1 1
e 2 1
e 3 0
e 4 0
Alternatively, you can store a bitwise integer (or just do what you're currently doing) and process the data in your application code (e.g. a bit of PHP)...
uid status
a 12
b 15
c 9
d 4
e 3
<?php
$input = 15; // value comes from a query
$missions = array(1,2,3,4); // not really necessary in this particular instance
for( $i=0; $i<4; $i++ ) {
$intbit = pow(2,$i);
if( $input & $intbit ) {
echo $missions[$i] . ' ';
}
}
?>
Outputs '1 2 3 4'
Just convert the value to a string, remove the '0's, and calculate the length. Assuming that the value really is a decimal:
select length(replace(cast(status as char), '0', '')) as num_missions as num_missions
from t;
Here is a db<>fiddle using MySQL. Note that the conversion to a string might look a little different in Hive, but the idea is the same.
If it is stored as an integer, you can use the the bin() function to convert an integer to a string. This is supported in both Hive and MySQL (the original tags on the question).
Bit fiddling in databases is usually a bad idea and suggests a poor data model. Your data should have one row per user and mission. Attempts at optimizing by stuffing things into bits may work sometimes in some programming languages, but rarely in SQL.

Populate rows based on another table but check if columns exist?

Hi Guys I'm new here and can really do with your help writing a SQL script/function for the following problem.
I have a source table which contains three columns Name, Value, miNum. Example of the data inside this table is:
Name Value miNum
A+B+C 1+2+3 a1
C+D+E 3+4+5 a3
E+F 5+2 a7
Now, I have created a final_table and the columns of that table are same as the source table but with additional columns labelled a-z (29 columns in total).
What I want the script/Function to do is from the source table read each row and populate the corresponding column in final_table.
Example output of final_table
Name Value miNum A B C D E F
A+B+C 1+2+3 a1 1 2 3
C+D+E 3+4+5 a3 3 4 5
E+F 5+2 a7 5 2
new columns will be regularly added to the final_table so it won't make sense to hard code the columns into the SQL code. Is it possible to do all this without hardcoding column names??
please can someone kindly show me how I can achieve all this.
Thanks
Please add rest of the columns based on this schema:
select tst.*,
case when instr(name,'A') > 0 then substr(Value,instr(name,'A'),1) end A,
case when instr(name,'B') > 0 then substr(Value,instr(name,'B'),1) end B,
case when instr(name,'C') > 0 then substr(Value,instr(name,'C'),1) end C,
case when instr(name,'D') > 0 then substr(Value,instr(name,'D'),1) end D,
case when instr(name,'E') > 0 then substr(Value,instr(name,'E'),1) end E,
case when instr(name,'F') > 0 then substr(Value,instr(name,'F'),1) end F
from tst;
Gives result
NAME VALUE MINUM A B C D E F
------ ------ ------ - - - - - -
A+B+C 1+2+3 a1 1 2 3
C+D+E 3+4+5 a3 3 4 5
E+F 5+2 a7 5 2
Note that this approach works only for unique names, i.e. for duplicated names only the first value is shown, e.g.
NAME VALUE MINUM A B C D E F
------ ------ ------ - - - - - -
A+A+A 5+2+1 a11 5
The other restriction is that the keys are only A-Z and the values are single character, if this holds you may use this insertto populate the target table:
Insert into TARGET
(name, value, miNum,A,B,C,D,E,F)
select name, value, miNum,
case when instr(name,'A') > 0 then substr(Value,instr(name,'A'),1) end A,
case when instr(name,'B') > 0 then substr(Value,instr(name,'B'),1) end B,
case when instr(name,'C') > 0 then substr(Value,instr(name,'C'),1) end C,
case when instr(name,'D') > 0 then substr(Value,instr(name,'D'),1) end D,
case when instr(name,'E') > 0 then substr(Value,instr(name,'E'),1) end E,
case when instr(name,'F') > 0 then substr(Value,instr(name,'F'),1) end F
from source;
You will have to extend both insert column list and the query with the column up to Z.

Split a string to its subwords

Every letter has a value
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
TableA
String Length Value Subwords
exampledomain 13 132 #example-domain#example-do-main#
creditcard 10 85 #credit-card#credit-car-d#
TableB
Words Length Value
example 7 76
do 2 19
main 4 37
domain 6 56
credit 6 59
card 4 26
car 3 22
d 1 4
Explanation
TableA has string based over milion rows, and it will be new added 100k rows/daily to tableA.
And also "string" column has no whitespaces
TableB has words based over milion rows,there is every letter and words in 1-2 languages
What i want to do
i want to split strings in TableA to its subwords, as you see in example; "creditcard" i search in TableB all words and try to find which words when comes together matches the string
What i did,and couldnt solve my question
i took the string and JOIN the TableB with INNER JOINS i made 2-3 times INNER JOINS because there can be 3word 4word strings too, and that WORKED!! but it takes too much time even doing it for 100-200 strings. Guess i want to do it for 100k/everyday???
Now what i try to do
i gave values to everyletter as you see above,
Took the strings one by one and from their including letters i count the value of strings..
And the same for the words too in TableB..
Now i have every string in TableA and everyword in TableB with their VALUES..
_
1- i will take the string,length and value of it (Exmple; creditcard - 10 - 85)
2- and make a search in TableB to find the possible words when they come together, with their SUM(length), and SUM(value) matches the strings length and value, and write theese possibilities to a new column.
At last even their sum of length and sum of values matches each other there can be some posibilities that doesnt match the whole string i will elliminate theese ones (Example; "doma-in" can be "moda-in" too and their lengths and values are same but not same words)
I dont know but,i guess with that value method i can solve the time proplem??? , or if there is another ways to do that, i will be gratefull taking your advices.
Thanks
You could try to find the solutions recursively by looking always at the next letter. For example for the word DOMAIN
D - no
DO - is a word!
M - no
MA - no
MAI - no
MAIN - is a word!
No more letters --> DO + MAIN
DOM - is a word!
A - no
AI - no
AIN - no
Finished without result
DOMA - no
DOMAI - no
DOMAIN - is a word!
No more letters --> DOMAIN

Check if value is already in a query field to change the value of another

I'll clarify this: I have a data result with the twist that the two PK's (A and B) are the same, and field C doesn't.
Example:
A B C D
> 14 20 1 null
> 14 20 2 1
> 15 20 2 0
As you can see, D field has a null and a 0.
What I have to do is to change D's null value to 1 whenever A fields are the same, and there's more than 1 record with those, not touching the 0's in D.
I tried initially with NVLs and DECODEs, like this:
DECODE(migr.A,NULL,(NVL(C,1)),D) AS D
but I'm not getting all the records, only the D-1's.
I really don't want to relate to an extra table/step for validation, as my query result can be easily over 1 million records, but if that's the best, I'm ok.
Many thanks.